ing more domain-specific information to the model.
In addition, other models can be added to the base-
line by using different Deep Learning architectures or
different word embedding models.
Finally, we plan to apply MuDEC for classifica-
tion of other provisions of legal opinions related to
consumer complaints, such as the value of the com-
pensation for material damage or the legal fees due
by the defeated party. This would require a more ro-
bust annotation process, which considers all relevant
features, and expands the number of instances in the
current annotated dataset. Going one step further, we
also plan to test the model in legal domains other than
consumer complaints.
ACKNOWLEDGEMENTS
This work was partly funded by FAPERJ under
grant E-26/200.832/2021, by CAPES under grants
88881.310592-2018/01, 88887.626833/2021-00, and
by CNPq under grant 302303/2017-0. The au-
thors wish to thank the Tecgraf Institute, PUC-Rio
and the Court of Justice of the State of Rio de
Janeiro (TJERJ) for supporting this research, in-
cluding the following: LABLEXRIO (N
´
ucleo de
Inovac¸
˜
ao do Poder Judici
´
ario), NUPEMASC (N
´
ucleo
de Pesquisa em M
´
etodos Alternativos de Soluc¸
˜
ao
de Conflitos), CI/TJRJ (Centro de Intelig
ˆ
encia do
TJERJ) and DGTEC (Diretoria-Geral de Tecnologia
da Informac¸
˜
ao e Comunicac¸
˜
ao de Dados do TJERJ).
REFERENCES
Abdi, H. and Williams, L. J. (2010). Principal component
analysis. Wiley interdisciplinary reviews: computa-
tional statistics, 2(4):433–459.
Arora, S., Liang, Y., and Ma, T. (2017). A simple but tough-
to-beat baseline for sentence embeddings. In Interna-
tional conference on learning representations.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer,
W. P. (2002). Smote: synthetic minority over-
sampling technique. Journal of artificial intelligence
research, 16:321–357.
de Araujo, P. H. L., de Campos, T. E., Braz, F. A., and
da Silva, N. C. (2020). Victor: a dataset for brazil-
ian legal documents classification. In Proceedings of
The 12th Language Resources and Evaluation Con-
ference, pages 1449–1458.
Fernandes, W. P. D., Frajhof, I. Z., Rodrigues, A. M. B.,
Barbosa, S. D. J., Konder, C. N., Nasser, R. B., de Car-
valho, G. R., Lopes, H. C. V., et al. (2022). Extracting
value from brazilian court decisions. Information Sys-
tems, 106:101965.
Fern
´
andez, A., Garcia, S., Herrera, F., and Chawla, N. V.
(2018). Smote for learning from imbalanced data:
progress and challenges, marking the 15-year an-
niversary. Journal of artificial intelligence research,
61:863–905.
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Ro-
drigues, J., and Aluisio, S. (2017). Portuguese word
embeddings: Evaluating on word analogies and natu-
ral language tasks. arXiv preprint arXiv:1708.06025.
Kim, J.-Y. and Cho, S.-B. (2019). Evolutionary optimiza-
tion of hyperparameters in deep learning models. In
2019 IEEE Congress on Evolutionary Computation
(CEC), pages 831–837. IEEE.
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu,
S., Barnes, L., and Brown, D. (2019). Text classifica-
tion algorithms: A survey. Information, 10(4):150.
Le, Q. and Mikolov, T. (2014). Distributed representations
of sentences and documents. In International confer-
ence on machine learning, pages 1188–1196. PMLR.
Luz de Araujo, P. H., de Campos, T. E., de Oliveira, R.
R. R., Stauffer, M., Couto, S., and Bermejo, P. (2018).
LeNER-Br: a dataset for named entity recognition in
Brazilian legal text. In International Conference on
the Computational Processing of Portuguese (PRO-
POR), Lecture Notes on Computer Science (LNCS),
pages 313–323, Canela, RS, Brazil. Springer.
Menardi, G. and Torelli, N. (2014). Training and assessing
classification rules with imbalanced data. Data mining
and knowledge discovery, 28(1):92–122.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and
Dean, J. (2013). Distributed representations of words
and phrases and their compositionality. In Advances in
neural information processing systems, pages 3111–
3119.
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N.,
Chenaghlu, M., and Gao, J. (2021). Deep learning–
based text classification: A comprehensive review.
ACM Computing Surveys (CSUR), 54(3):1–40.
Pennington, J., Socher, R., and Manning, C. D. (2014).
Glove: Global vectors for word representation. In
Proceedings of the 2014 conference on empirical
methods in natural language processing (EMNLP),
pages 1532–1543.
Salton, G. and Buckley, C. (1988). Term-weighting ap-
proaches in automatic text retrieval. Information pro-
cessing & management, 24(5):513–523.
Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practi-
cal bayesian optimization of machine learning algo-
rithms. Advances in neural information processing
systems, 25.
Vapnik, V. (2006). Estimation of dependences based on em-
pirical data. Springer Science & Business Media.
Wei, F., Qin, H., Ye, S., and Zhao, H. (2018). Empirical
study of deep learning for text classification in legal
document review. In 2018 IEEE International Con-
ference on Big Data (Big Data), pages 3317–3320.
IEEE.
Zhou, C., Sun, C., Liu, Z., and Lau, F. (2015). A c-lstm
neural network for text classification. arXiv preprint
arXiv:1511.08630.
Text Classification in the Brazilian Legal Domain
363