Machine Learning Performance on Predicting Banking Term Deposit

Nguyen Minh Tuan

Department of Mathematics, Faculty of Applied Science,

King Mongkut’s University of Technology North Bangkok, Thailand

Keywords:

Machine, Prediction, Deep Machine Learning, Banking, Term Deposit.

Abstract:

With the expansion of epidemic diseases and after the crises of the economy in the world, choosing ﬁnan-

cial deposits for many purposes is very helpful. To identify a customer whether deposit or not, based on the

information given to analyze and predict, it is becoming increasingly difﬁcult for banks to identify whether

customer that is right for them. Many banks will be reconﬁgured beyond recognition to attract customers,

while others are facing a shortage drawing customers to maintain the business as a corollary of advances in

particular. To serve customers with the information needed to select a suitable deposit in such a rapidly evolv-

ing and competitive arena requires more than merely following one’s passion. We assert such information may

be derived by analyzing some descriptions using deep neural network models, a novel approach to identifying

the descriptions about age, job, marital, education, default, balance, housing, loan, contact, day, month, du-

ration, campaigns, pdays, previous, outcome, deposit (y) in choosing an appropriate deposit customer. There

have been some researchers written about this prediction but they just focused on algorithms models instead

of concentrating on deep machine learning. In this paper, we will muster up algorithms using the models on

deep machine learning with Long-Short Term Memory (LSTM), Gated Recurrent Unit (GRU), Bidirectional

Long-Short Term Memory (BiLSTM), Bidirectional Gated Recurrent Unit (BiGRU), and Simple Recurrent

Neuron Network (SimpleRNN). The result will suggest suitable customers based on the information given.

The results showed that Gated Recurrent Unit (GRU) reaches the best accuracy with 90.08% at epoch 50

and the following is the Bidirectional Long-Short Term Memory (BiLSTM) model with 90.05% at epoch 50

The results will be helpful for the banks to conﬁrm whether the customers could deposit or not.

1 INTRODUCTION

The crisis of the economic and epidemic disease

could lead people to bankrupt and connect to the bank

for ﬁnance. Given such variety, the challenge for

banks is how to choose a method corresponding to

their situational ﬁnance. Basing on the phone calls,

the banks want to be sure to conﬁrm that a customer

whether making a term deposit at that time or not.

This is a ﬁnancial campaign of the bank in main-

taining the development and existing policy for cus-

tomers. The banks usually call the client over one

time to conﬁrm that they have subscribed to the bank-

ing deposit or not. This prediction will help the bank

to establish a new strategy for the patrons. In bank-

ing, the prediction needs to get the best accuracy to

propose a new strategy in business. Conﬁrmation in

banking before proposing a package to customers is

very important to the bank. They could cost much

https://orcid.org/0000-0002-4035-1759

time and much money for customer investigation. Af-

ter that, they will classify the customers and estimate

the deposit of the customers. The suitable ﬁndings

will reduce the time and money for the bank in call-

ing and attracting customers.

In paper (Hung et al., 2019), they fully compared

the Spark MLlib and ML packages and they showed

that ML packages got the best accuracy in predict-

ing term deposit. To compare Spark MLlib and ML

packages they applied Random Forest and Gradient

Boosting and the result for Gradient Boosting accu-

racy approach to 86%. They gave a good result for

the banking prediction but they just got a compari-

son in PySpark Apache. They never tried to use deep

machine learning to make a comparison in predicting

customers.

In paper (Kurapati et al., 2018), they used ma-

chine learning to predict the defaulters based on the

customer’s information. In that paper, they used

the algorithms in Scikit-Learn for prediction and

compared the accuracy of algorithms before feature

Tuan, N.

Machine Learning Performance on Predicting Banking Term Deposit.

DOI: 10.5220/0011096600003179

In Proceedings of the 24th International Conference on Enterprise Information Systems (ICEIS 2022) - Volume 1, pages 267-272

ISBN: 978-989-758-569-2; ISSN: 2184-4992

267

Table 1: Some example rows for data.

age job marital education default balance housing y

58 management married tertiary no 2143 yes no

44 technician single secondary no 29 yes no

33 entrepreneur married secondary no 2 yes no

47 blue-collar married unknown no 1506 yes no

33 unknown single unknown no 1 no no

35 management married tertiary no 231 yes no

28 management single tertiary no 447 yes no

42 entrepreneur divorced tertiary yes 2 yes no

58 retired married primary no 121 yes no

43 technician single secondary no 593 yes no

selection and after feature selection. Getting the

best algorithm is Random Forest to predict credit

defaulters compared to Decision Tree, Gradient

Boosting, and Extra Tree Classiﬁer. They compared

the accuracy of prediction in defaulting payment

using machine learning techniques. In paper (Zhu

et al., 2019), they got a study for predicting loan de-

fault base on machine learning algorithms. The result

proving the performance of Random Forest showed

the best answer compared to the other algorithms

such as Decision Tree, SVM, and Logistic Regres-

sion. In the paper (Gupta et al., 2020), they applied

machine learning for predicting bank loan systems

by using Logistic Regression and Random Forest

and they did not compare to the other algorithms.

In the paper (Rahman and Kumar, 2020), they also

used KNN, SVM, Decision Tree, and Random Forest

to predict in machine learning based on customer

churn prediction. However, all of the papers have not

applied deep machine learning to predicting or got a

comparison for the models in deep machine learning.

In this paper, we got data (as shown in Fig 1) and

applied deep machine learning to take an accuracy

comparison using the LSTM model, GRU model,

and other algorithm models. We did try many

different algorithms and chose the best performance

for predicting a customer depositing in a bank. After

trying all the algorithms, we chose for performance

ﬁve best algorithms like Long-Short Term Memory

(LSTM), Gated Recurrent Unit (GRU), Bidirectional

Long-Short Term Memory (BiLSTM), Bidirectional

Gated Recurrent Unit (BiGRU) and Simple Re-

current Neuron Network (SimpleRNN). The main

purpose is to ﬁnd a new comparison to generate

the customers’ information especially using deep

machine learning. Nowadays, banking is a useful

channel for relation in ﬁnance. They have the trend to

solve the ﬁnancial work through the bank especially

in the economic crisis. Through Coivid-19, many

things have changed, the working is limited and they

tend to live at home and solve the problem online

where deposit banking is examined by customers.

In unprecedented circumstances, the banks have to

make phone calls to conﬁrm the customers and have

some instructions to invite them to make a banking

term deposit. With machine learning, the banks could

solve the problems faster and decide the matters

rightly. This paper is aimed to conﬁrm a customer

whether they could make a deposit or not using deep

machine learning. It is also convenient for the bank

to solve with an enormously large of customers in a

short time.

In this paper, we got input data from

the following website: https://www.kagg

le.com/vinicius150987/bank-full-machine-

learning/data. We also used the data set with

45,211 records and a total of 16 attributes. 1 shows

some examples extracted from data. The data consist

of numerical columns such as age, balance, day,

duration, campaign, pdays, and previous and other

columns are strings such as job, marital, default,

housing, loan, month, and deposit (y) which will be

turned into numeric by the StringIndexer. The term

deposit (y) is choosing for a label in predicting.

2 LITERATURE REVIEWS

Machine learning day by day has become a useful ap-

plication in most traditional stochastic methods espe-

cially in ﬁnancial market forecasting as well as sup-

port vector machines (Ryll and Seidens, 2019; Ghani

et al., 2019). In this paper, we covered map all the

coulumns to convert the input numeric to string. Af-

ter that, we add all the attributes to the features. In the

next step, we apply train test split to slit the data. We

also try all the algorithms and choose some good algo-

rithms which are suitable to predict such as Gradient-

Boosted Trees, Random Forest, Decision Tree, Logis-

tic Regression, and deep learning machine is LSTM

ICEIS 2022 - 24th International Conference on Enterprise Information Systems

268

model, GRU model.

Long-Short Term Memory (LSTM) and Bidirectional

Long-Short Term Memory (BiLSTM) model is a kind

of recurrent neural network, is designed to memorize

information through using time (Junyu, 2020; Tuan

and Meesad, 2021). The structure of LSTM in every

section has repeated form, usually tanh gate, and has

been rearranged in effective layers. The target point

of the LSTM network is built input data by a sigmoid

function, called forget gate. LSTM has a function that

will omit or add the information to the system state by

the gate and sigmoid function (Fig 2).

Similar to LSTM, Gated Recurrent Unit (GRU) and

Bidirectional Gated Recurrent Unit (BiGRU) is a sup-

ply formed data transformed from input nodes to out-

put nodes (Verster et al., 2021; Tuan et al., 2021).

GRU consists of reset gate and updates gate. The

reset gate is to update the hidden state after the ﬁrst

investigation and control how much the previous in-

formation we want to keep in. The update gate allows

control of how much the new state and information

copied from the old state. GRU (Kumar et al., 2020)

is a very important model for predicting the order in

the artiﬁcial neural network (Fig 3). GRU has reduced

the break-gradient information to LSTM.

3 RESEARCH METHODOLOGY

In this paper, we approach the way to explore the

role of text preprocessing and feature representation

by using the available tools in Scikit-Learn library.

We use Python as a data analytics tool to implement

experiments. We extract randomly a dataset into

two parts in the ratio: 7:3 of total of 45,211 records.

All of them consist of four stages ( see Fig 1): Turn

all the string attributes into numerical attributes,

feature selection (adding all the numerical attributes

and categorical to features), predict banking term

deposit by using models, and using transformation

for the testing part with model and evaluation. In

the preprocessing, we clean the data with empty

records and apply ML packages to evaluate with four

differently detail preprocessing techniques:

+ Firstly, we divide the data into two kinds of

attributes. We put the string columns: age, balance,

day, duration, campaign, pdays, and previous into

the numerical columns. After that we apply map for

all the columns: job, material, education, default,

housing, loan, contact, month, poutcome, and put all

to the categorical column.

+ Secondly, we combine the numerical column and

categorical column to the new column “Features”

through the for the column deposit “y” to get the

label column. We transform the old data to new data

with selected columns.

+ Thirdly, we split data randomly into two parts:

training part (70%) and testing part (30%). We train

the model and execute the training in the pipeline and

predict the outcome.

+ Lastly, we evaluate the algorithms and get the best

algorithm with the greatest accuracy. After that, I

show the term deposit prediction.

To evaluate the model, we also use the accuracy, is

deﬁned as the numbers of the ratio of numbers of

samples correctly classify by algorithms to the total

numbers of samples for a given data set, as shown as

the equation 1:

Accuracy =

T P

BT P

+ T N

BT P

T P

BT P

+ T N

BT P

+ FP

BT P

+ FN

BT P

(1)

Where T P

BT P

is true positive samples of banking term

deposit prediction, FP

BT P

is mistake positive sam-

ples of banking term deposit prediction, TN

BT P

is true

negative samples of banking term deposit prediction,

BT P

is mistake negative samples of banking term

deposit prediction. With two deep learning machine

models: LSTM and GRU models, we combine all the

columns to features columns except the target y (de-

posit or not). We turn the target to 1 and 0 for predic-

tion. In the preprocessing, we tokenize and ﬁt the text

to the features column.

Figure 1: Steps for preprocessing dataset.

Machine Learning Performance on Predicting Banking Term Deposit

269

4 EXPRIMENTAL RESULTS

After processing the data, we apply machine learning

models and compare them. We choose the best algo-

rithm to apply prediction for the outcome. In Table 2,

we got predictions for accuracy using the testing part.

In the LSTM model, we consider the data as string

and split the data into two parts with the ratio of 7:3.

We built 4 layers consisting of Embedding, Spacial-

Dropout1D, LSTM, and Dense, the shape for input

and output is (64;5) with 49,669 parameters. In the

GRU model, we applied the same structure to LSTM

and built 4 layers consisting of Embedding, Spacial-

Dropout1D, GRU, and Dense, the shape for input and

output is (64;5) with 41,605 parameters.

In the BiLSTM model, we apply the same method and

the same ratio in PySpark. We built 4 layers consist-

ing of Embedding, SpacialDropout1D, LSTM, and

Dense, the shape for input and output is (64;5) with

83,013 parameters. In the BiGRU model, we built 4

layers consisting of Embedding, SpacialDropout1D,

GRU, and Dense, the shape for input and output is

(64;5) with 66,885 parameters. In the BiGRU model,

we built 4 layers consisting of Embedding, Spacial-

Dropout1D, GRU, and Dense, the shape for input and

output is (64;5) with 24,901 parameters. In this exper-

iment, the GRU model showed the best performance,

reaching the accuracy of 0.908 with 48s at the ﬁrst

50th epoch, and could get better when repeating more

times (Shown in Fig 3). The BiLSTM model comes

alongside 0.905 with 52s at the 50th epoch (Fig 4).

The LSTM get close to 0.903 (Fig 2) and the follow-

ing is BiGRU (Fig 5) and SimpleRNN (Fig 6) respec-

tively reaching 0.901 and 0.892.

Table 3 shows the summary of predictive number

Table 2: Five best results in deep machine learning using

Scikit-Learn library.

Models Accuracy

BiLSTM 0.905

BiGRU 0.901

SimpleRNN 0.892

LSTM 0.903

GRU 0.908

values, the ﬁrst label is 1 and prediction 1 are cor-

responding with true prediction numbers; the second

label 0 and prediction 1 are corresponding with false

prediction numbers; The third label 1 and prediction 0

are corresponding with false prediction numbers; the

fourth label 0 and prediction 0 are corresponding with

true prediction numbers.

Compared to the algorithms’ training in ScikitLearn

with the same data, we tried the algorithms and

Figure 2: LSTM prediction.

Figure 3: GRU prediction.

choose the ﬁve best models with accuracy as shown

in Table 2. We could see that with the same method,

the same models and algorithms but we get the best

answers in PySpark library than in Scikit-learn library

(usually online in Kaggle.com).

5 CONCLUSIONS AND FUTURE

WORK

The development of technology has led to many

things to solve in life especially in ﬁnance. They

usually have a tendency to get beneﬁts at home, so

ICEIS 2022 - 24th International Conference on Enterprise Information Systems

270

Table 3: Counting prediction values summary.

label predict BiLSTM BiGRU SimpleRNN LSTM GRU

1 1 390 347 586 346 368

0 1 209 260 381 418 446

1 0 1156 1199 960 1179 1151

0 0 11740 11689 11568 11621 15824

Figure 4: BiLSTM prediction.

Figure 5: BiGRU prediction.

investment by making a banking term deposit is one

of the channels to do this (Viswanathana et al., 2020;

Prasetyo et al., 2021). So, the banks need to have a

plan for calling the potential customers. To do this,

they must have a method for predicting whether cus-

tomers could have tended to make a proﬁt by banking

Figure 6: RNN prediction.

term deposit. In this paper, the comparison is a use-

ful method to know more clearly about the nature of

the problems. Machine Learning algorithms are ap-

plied to all the matters of the jobs especially in some-

thing changeable every time and every day (N.dikum,

2020; Zhong and Enke, 2019; Ren et al., 2021). In

this paper, we have summarized the ease of using al-

gorithms in machine learning in machine learning and

Scikit-learn library and proposed the best models in

deep machine learning such as BiLSTM model and

GRU model. The results showed the best prediction in

banking term deposit customers which is GRU mod-

els with an accuracy of 90.8%. The result is one of

the methods for banks to conﬁrm the target customers

and for expanded research in the future.

REFERENCES

Ghani, M. U., Awais, M., and Muzammul, M. (2019). Stock

market prediction using machine learning (ml) algo-

rithms. In Advances in Distributed Computing and

Artiﬁcial Intelligence Journal.

Gupta, A., Pant, V., Kumar, S., and Bansal, P. K. (2020).

Bank loan prediction system using machine learning.

In International Conference on System Modeling and

Advancement in Research Trends.

Machine Learning Performance on Predicting Banking Term Deposit

271

Hung, P. D., Hanh, T. D., and Tung, T. D. (2019). Term

deposit subscription prediction using spark mllib and

ml packages. In 5th International Conference on E-

Business and Applications.

Junyu, H. (2020). Prediction of ﬁnancial crisis based on

machine learning. In The 4th International Confer-

ence on Business and Information Management.

Kumar, V. S., Bhardwajb, B., Guptac, V., Kumarc, R., and

Mathurc, A. (2020). Gated recurrent unit (gru) based

short term forecasting for wind energy estimation. In

International Conference on Power, Energy, Control

and Transmission Systems.

Kurapati, N., Kumari, P., and Bhansali (2018). Predicting

the defaulters using machine learning techniques. In

Research Paper.

N.dikum, P. (2020). Machine learning algo-

rithms for ﬁnancial asset price forcasting. In

https://doi.org/10.1093/rfs/hhaa009.

Prasetyo, I., Utar, W., Susanto, H., Rusdiyanto, and Hi-

dayat, W. (2021). Comparative analysis of banking

ﬁnancial performance: Evidence from indonesia. In

Ilkogretim Online - Elementary Education Online.

Rahman, M. and Kumar, V. (2020). Machine learning based

customer churn prediction in banking. In Fourth Inter-

national Conference on Electronics, Communication

and Aerospace Technology.

Ren, L., Dong, J., Wang, X., Meng, Z., Zhao, L., and Deen,

M. J. (2021). A data-driven auto-cnn-lstm prediction

model for lithium-ion battery remaining useful life. In

IEEE Transactions Industrial Informatics.

Ryll, L. and Seidens, S. (2019). Evaluating the perfor-

mance of machine learning algorithms in ﬁnancial

market forecasting: A comprehensive survey. In

http://arxiv.org.

Tuan, N. M. and Meesad, P. (2021). A study of pre-

dicting the sincerity of a question asked using ma-

chine learning. In International Conference on Nat-

ural Language Processing and Information Retrieval,

https://doi.org/10.1145/3508230.3508258.

Tuan, N. M., P., M., and H.C, N. H. (2021). English-

vietnamese machine translation using deep learning.

In Recent Advances in Information and Communica-

tion Technology IC2IT.

Verster, T., Harcharan, S., Bezuidenhout, L., and Bae-

sens, B. (2021). Predicting take-up of home

loan offers using tree based ensemble models: A

south african case study. In Research Article at

https://doi.org/10.17159/sajs.2021/7607.

Viswanathana, P. K., Srinivasan, S., and Hariharan, N.

(2020). Predicting ﬁnancial health of banks for in-

vestor guidance using machine learning algorithms. In

Research Article.

Zhong, X. and Enke, D. (2019). Predicting the daily re-

turn direction of the stock market using hybrid ma-

chine learning agorithms. In Finacial Innovation.

Zhu, L., Qiu, D., Ergu, D., and Liu, C. Y. K. (2019). A

study on predicting loan default base on random forest

algorithm. In International conference on Information

Technology and Quantitative Management.

ICEIS 2022 - 24th International Conference on Enterprise Information Systems

272