The input of the model is a 130-dimensional
vector, which consists of two parts:
Financial indicator data: including 30
financial indicators, each of which is
normalized;
Financial text: the section of
“business
discussion and analysis” in the annual report.
The model splices the financial index data and
text data through the intersection of pd.merge
function of pandas library, and send them to the
convolutional neural network (CNN). The model
parameters of convolution neural network include
the number of convolution cores, the size of
convolution cores, the size of the pooling layer, and
so on. To select the best parameters to fit the model
in this paper, we reset the value range of parameters.
For example, the CNN convolution kernel size
d∈{2,3,4,5}, the number of CNN convolution cores
h∈{64,100,128,256}, the pool layer size c∈{5,6,7,8},
and the learning rate λ∈{0.01, 0.001, 0.0001},
epoch∈{5, 10, 15}, the weight value of cross-entropy
loss function f∈{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
0.9, 1, 2, 3, 4}.
3 RESULTS & DISCUSSION
We compare the model proposed in this paper with
other models. These models are:
S-CNN: Feature vectors are constructed based on
financial data, and then the CNN model is used to
extract features and realize classification.
S-SVM: The model based on financial data uses
SVM to classify.
S-XGB: The model based on financial data uses
XGBoost to classify.
The evaluation results of each model are shown
in Table 2:
Table 2: Experiment summary table.
Accuracy
True Positive
Rate
True Negative
Rate
F1 Value
S-CNN 78.00% 89.02% 54.57% 0.676623
T&S-CNN 85.00% 93.38% 77.67% 0.848035
S-SVM 70.83% 75.60% 63.38% 0.689527
S-XGB 77.12% 89.02% 54.57% 0.676623
It can be seen from the table that the prediction
effect of the CNN deep learning model based on
financial data is not significantly better than the
traditional machine learning model based on
financial data. After the combination of financial
data and financial text, the CNN model is higher
than other models’ inaccuracy, true positive rate,
true negative rate, and F1 value. There may be two
main reasons:
The convolutional neural network model pays
more attention to information, which leads to
insufficient attention to important
information. After adding the financial text
features, although there is still a lot of
information, with the help of the financial text
features, important features are highlighted.
From the perspective of the financial text, the
more information combined with the data, the
better. In this way, after the combination of
important information and data, after the
screening of multi-layer neural networks, the
more important information can be selected.
4 CONCLUSIONS
As more and more financial documents appear in the
stock market, investors, regulators, and researchers
need more deep learning models to process and
analyze the information disclosures of listed
companies. Taking all A-share listed companies in
the recent ten years as samples, this paper builds a
financial risk prediction model based on financial
text and financial data. The experimental results
show that compared with using only financial data,
the F1 value of the financial risk prediction model
based on the combination of text and financial data
is significantly improved, indicating that the latest