2
OVERVIEW OF ALBERT
MODEL AND CRNN MODEL
2.1 ALBERT Model
In recent years, thanks to the maturity and widespread
use of the Transfomer structure, a pre-training model
with rich corpus and a large amount of parameters has
become a very common method model in a short
period of time (Wang, Xu 2019). Moreover, in the
actual application process, the BERT model usually
needs to use distillation, compression or other
optimization techniques to process the model in order
to reduce the system pressure and storage pressure
during calculation. The ALBERT model is also
considered based on this point. Through various
means to reduce the amount of parameters, the BERT
model is "slim down", and a model with a smaller
memory capacity is obtained.
Compared with the BERT model, the ALBERT
model mainly has the following two points to be
improved.
First of all, the ALBERT model effectively
reduces the parameters in the BERT model through
the method of embedding layer parameter
factorization and cross-layer parameter sharing,
greatly reducing the memory cost during training, and
effectively improving the training speed of the model.
Secondly, in order to make up for the
shortcomings of Next Sentence Prediction (NSP)
tasks in the BERT model, the ALBERT model uses
Sentence Order Prediction (SOP) tasks instead of
NSP tasks in the BERT model to improve the effect
of downstream tasks with multiple sentence input
(Chen, Ren, Wang, et al. 2019).
2.2 CRNN Model
The CRNN model is currently a widely used image
and text recognition model that can recognize longer
text sequences. It uses Bi-directional Long Short-
Term Memory (BLSTM) and Crappy Tire
Corporation (CTC) components to learn the
contextual relationship in character images. This
effectively improves the accuracy of text recognition
and makes the model more robust (Deng, Cheng
2020). CRNN is a convolutional recurrent neural
network structure, which is used to solve image-
based sequence recognition problems, especially
scene text recognition problems. The entire CRNN
network structure consists of three parts, from bottom
to top:
1) Convolutional layer (CNN), using deep CNN
to extract features from the input image to obtain a
feature map;
2) Recurrent layer (RNN), using bidirectional
RNN (BLSTM) to predict the feature sequence, learn
each feature vector in the sequence, and output the
predicted label (true value) distribution;
3) CTC loss (transcription layer), using CTC loss
to convert a series of label distributions obtained from
the cyclic layer into the final label sequence.
3
CONSTRUCTION AND
EVALUATION OF
ALBERT-CRNN'S BARRAGE
TEXT SENTIMENT ANALYSIS
MODEL
3.1 Construction of ALBERT-CRNN's
Barrage Text Sentiment Analysis
Model
The ALBERT-CRNN barrage text sentiment analysis
method proposed in this paper has four main steps.
(1) Clean and preprocess the collected barrage
text, obtain text data with emotional polarity and
mark it;
(2) Use the ALBERT model to express the
dynamic features of the preprocessed barrage text;
(3) Pre-train the text features with the CRNN
model to obtain the deep semantic features of each
barrage text;
(4) Use the Soft-max function to classify the deep
semantic features of the text, and finally get the
emotional polarity of each barrage text.
The ALBERT-CRNN model structure is shown in
Figure 1, and it is mainly composed of the following
six parts: input layer, ALBERT layer, CRNN layer
(including CNN layer and Bi-GRU layer), fully
connected layer, Soft-max layer and output layer. As
shown in Figure 1.
Input pop-up text data
ALBERT
Conv3-
128
Conv3-
128
Conv3-
128
Max
Pooling
Forward GRU layer
Backward GRU layer
Backwa
rd GRU
layer
Softmax
Pop-up text emotional polarity
Figure 1: ALBERT-CRNN model structure.