Comparing Deep Learning Models for Multi-label Classification of Arabic Abusive Texts in Social Media

Salma Azzi, Chiraz Zribi

2022

Abstract

Facing up to abusive texts in social networks is gradually becoming a mainstream NLP research topic. However, the detection of its specific related forms is still scarce. The majority of automatic solutions cast the problem into a two-class or three-class classification issue not taking into account its variety of aspects. Specifically in the Arabic language, as one of the most widely spoken languages, social media abusive texts are written in a mix of different dialects which further complicates the detection process. The goal of this research is to detect eight specific subtasks of abusive language in Arabic social platforms, namely Racism, Sexism, Xenophobia, Violence, Hate, Pornography, Religious hatred, and LGBTQ a Hate. To conduct our experiments, we evaluated the performance of CNN, BiLSTM, and BiGRU deep neural networks with pre-trained Arabic word embeddings (AraVec). We also investigated the recent Bidirectional Encoder Representations from Transformers (BERT) model with its special tokenizer. Results show that DNN classifiers achieved nearly the same performance with an overall average precision of 85%. Moreover, although all the deep learning models obtained very close results, BERT slightly outperformed the others with a precision of 90% and a micro-averaged F1 score of 79%.

Download


Paper Citation


in Harvard Style

Azzi S. and Zribi C. (2022). Comparing Deep Learning Models for Multi-label Classification of Arabic Abusive Texts in Social Media. In Proceedings of the 17th International Conference on Software Technologies - Volume 1: ICSOFT, ISBN 978-989-758-588-3, pages 374-381. DOI: 10.5220/0011141700003266


in Bibtex Style

@conference{icsoft22,
author={Salma Azzi and Chiraz Zribi},
title={Comparing Deep Learning Models for Multi-label Classification of Arabic Abusive Texts in Social Media},
booktitle={Proceedings of the 17th International Conference on Software Technologies - Volume 1: ICSOFT,},
year={2022},
pages={374-381},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011141700003266},
isbn={978-989-758-588-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Software Technologies - Volume 1: ICSOFT,
TI - Comparing Deep Learning Models for Multi-label Classification of Arabic Abusive Texts in Social Media
SN - 978-989-758-588-3
AU - Azzi S.
AU - Zribi C.
PY - 2022
SP - 374
EP - 381
DO - 10.5220/0011141700003266