because it employs a wide range of vocabulary from 
several languages, to make LD more understandable. 
To  begin,  LD  text  is  translated  into  the  MSA 
language to  be  understandable by  Arabic  speakers. 
Second,  translation  software  solutions  such  as 
Google  translate  could  be  used  to  easily  translate 
MSA  text  that  expresses  LD  into  another  national 
language.  Actually,  many  resources,  such  as  a 
bilingual dictionary, can be used to translate dialects 
into  their  original  languages.  Building  such 
dictionaries plays a crucial role in Natural Language 
Processing (NLP) applications not  only in machine 
translation but also in named entity recognition and 
cross-lingual  information  retrieval.  The  final 
purpose  of  this  work,  regardless  of  language,  is  to 
explain  the  general  method  for  creating  a  bilingual 
dictionary for translating dialects into  their original 
languages.  The  method  also  takes  into  account  the 
availability  of  preexisting  monolingual  dictionaries 
for original languages. In this research, a method for 
creating  a  bilingual  dictionary  for  LD-MSA 
translation is given as a case study. 
2  RELATED WORK 
Many researchers have recently become interested in 
translating  AD  into  MSA.  To  carry  out  their 
researches,  they  employed  various  approaches  to 
build  their  parallel,  bilingual,  and  monolingual 
dictionaries  that  are  crucial  for  building  machine 
translation  systems.  This  section  will  discuss  some 
of  the  most  significant  researches  on  creating 
dictionaries  or  corpora  for  translation  between  AD 
and  MSA.  (Kchaou  et  al.,  2020)  published  a  TD-
MSA  parallel  corpus  in  2020,  which  was  collected 
using  a  variety  of  resources.  The  Parallel  Arabic 
DIalectal  Corpus  (PADIC)  is  the  first  resource, 
which  is  a  parallel  corpus  that  combines  Maghreb 
dialects (Algerian, Tunisian, and Moroccan), Levant 
dialects  (Palestinian  and  Syrian),  and  the  MSA 
(Meftouh  et  al.,  2015).  Multi  Arabic  Dialect 
Applications and Resources (MADAR), a TD-MSA 
parallel corpus (Bouamor et al., 2018), is the second 
resource. They then gathered text from the Tunisian 
corpus CONSTitution (TD-CONST), which contains 
the  Tunisian  constitution  written  in  MSA  and 
translated into the dialect of Tunisian. The Tunisian 
social media corpus COMments (TD-COM) is 
another  resource  that  includes  900  Facebook 
comments that were then  translated  into MSA by  a 
native  speaker.  Finally,  they  created  a  TD-MSA 
bilingual dictionary by aligning the collected parallel 
corpora.  Starting  with  the  two  monolingual 
morphological  dictionaries for TD and MSA, 
(Sghaier et al., 2020) made a great effort to generate 
the  necessary  resources  from  scratch.  To  map  TD 
words to their MSA equivalents, a bilingual lexicon 
dictionary was built. 
In  2012,  (Salloum  et  al.,  2012)  presented  their 
Elissa  Rule-Based  Machine  Translation  (RBMT) 
system, which allowed for the translation of a set of 
Arabic  dialects  into  MSA  utilizing  AD-MSA 
dictionaries such as the Tharwa dictionary and other 
dictionaries they built. (Diab et al., 2013) Presented 
Tharwa  in  2013,  a  three-way,  large-scale  lexicon 
that  encompasses  Egyptian  Arabic,  Modern 
Standard  Arabic,  and  English.  The  Tharwa  is  the 
first  three-way  electronic  resource  for  DA  that 
includes  rich  and  deep  linguistic  information  for 
each  entry.  Egyptian  Arabic  is  the  resource's  first 
pilot  dialect,  with  intentions  to  expand  to  other 
Arabic  dialects.  The  Tharwa  were  gathered  from  a 
variety of sources, both manually and automatically. 
(Tachicart  et  al.,  2014),  introduced  their  machine 
translation,  which  combines  a  rule-based  approach 
and  a  statistical  approach,  using  tools  designed for 
Arabic standard and adapting them to the Moroccan 
dialect. To collect their bilingual dictionary corpus, 
they used the writings of some television production 
scenarios and some MSA dictionaries. The extension 
of  the  bilingual  dictionary  was  done  by  collecting 
additional  online  resources  to  ensure  maximum 
coverage of the vocabulary of the Moroccan dialect. 
In  2018,  (Mubarak  et  al.,  2018)  presented  a 
parallel  corpus  called  Dial2MSA,  which  contains 
dialectal Arabic tweets in four main Arabic dialects 
(Egyptian,  Maghrebi,  Levantine,  and  Gulf)  and 
their  corresponding  MSA  translations.  The  tweets 
were  collected  from  Twitter,  and  then  a  set  of 
distinctive words for each dialect were filtered. The 
crowdsourcing  platform  (CrowdFlower)  was  then 
utilized  to  hire  native  speakers  of  each  dialect  to 
translate each tweet into its MSA. The final corpus 
contains  16,000  Egyptian-MSA  pairs,  8,000 
Maghrebi-MSA  pairs,  and  18,000  of  Gulf-MSA 
and Levantine-MSA pairs. In 2022, Torjmen, Roua, 
and  Kais  Haddar  created  a  bilingual  dictionary 
from  various  TD-MSA  corpora.  The  TD-MSA 
bilingual dictionary has 4417 entries and generates 
approximately  174,  000  forms  using  derivational 
and inflectional grammars (Mubarak et al., 2019).