using different types of style markers for classical
Arabic. Our aim was to compare the effectiveness of
using style markers that do not rely primarily on the
lexical or structural dimensions of language. We
used three types of style markers based mostly on
the syntactical information contained in the structure
of the text: part of speech based features, function
word features and character-based features. To
evaluate the effectiveness of these markers, we
conducted an experiment on a diachronic classical
Arabic corpus comprising more than 700 books. Our
results show that these markers can indeed be very
effective stylistic features, achieving high
performance in authorship attribution results.
REFERENCES
Abbasi, A. and Chen, H. (2005) ‘Applying authorship
analysis to extremist-group web forum messages’,
IEEE Intelligent Systems. IEEE, 20(5), pp. 67–75.
Al-Ayyoub, M., Alwajeeh, A. and Hmeidi, I. (2017) ‘An
extensive study of authorship authentication of arabic
articles’, International Journal of Web Information
Systems. Emerald Publishing Limited.
Ali, A. S. M. and Ali, A. S. (1987) A linguistic study of the
development of scientific vocabulary in Standard
Arabic. Routledge.
Argamon, S. and Levitan, S. (2005) ‘Measuring the
usefulness of function words for authorship
attribution’, in Proceedings of the Joint Conference of
the Association for Computers and the Humanities and
the Association for Literary and Linguistic Computing.
El Bakly, A. H., Darwish, N. R. and Hefny, H. A. (2020)
‘A Survey on Authorship Attribution Issues of Arabic
Text’, International Journal of Artificial Intelligent
Systems and Machine Learning, 2, pp. 86–92.
Belinkov, Y. et al. (2019) ‘Studying the history of the
Arabic language: language technology and a large-
scale historical corpus’, Language Resources and
Evaluation. Springer, 53(4), pp. 771–805.
Boukhaled, M. A. and Ganascia, J.-G. (2015) ‘Using
function words for authorship attribution: Bag-of-
words vs. sequential rules’, in Natural Language
Processing and Cognitive Science. De Gruyter, pp.
115–122.
Boukhaled, M. A. and Ganascia, J.-G. (2017) ‘Stylistic
Features Based on Sequential Rule Mining for
Authorship Attribution’, in Cognitive Approach to
Natural Language Processing. Elsevier, pp. 159–175.
Chaski, C. E. (2005) ‘Who’s at the keyboard? Authorship
attribution in digital evidence investigations’,
International journal of digital evidence. Citeseer,
4(1), pp. 1–13.
Chung, C. and Pennebaker, J. W. (2007) ‘The
psychological functions of function words’, Social
communication, pp. 343–359.
Gamon, M. (2004) ‘Linguistic correlates of style:
authorship classification with deep linguistic analysis
features’, in Proceedings of the 20th international
conference on Computational Linguistics, p. 611.
Hoover, D. L. (2003) ‘Frequent collocations and authorial
style’, Literary and Linguistic Computing. ALLC,
18(3), pp. 261–286.
Jamak, A., Savatić, A. and Can, M. (2012) ‘Principal
component analysis for authorship attribution’,
Business Systems Research: International journal of
the Society for Advancing Innovation and Research in
Economy. Udruga za promicanje poslovne
informatike, 3(2), pp. 49–56.
Kestemont, M. (2014) ‘Function words in authorship
attribution. From black magic to theory?’, in
Proceedings of the 3rd Workshop on Computational
Linguistics for Literature (CLFL), pp. 59–66.
Kestemont, M. et al. (2019) ‘Overview of the Cross-
domain Authorship Attribution Task at PAN 2019.’, in
CLEF (Working Notes).
Martin-del-Campo-Rodriguez, C. et al. (2019)
‘Authorship Attribution through Punctuation n-grams
and Averaged Combination of SVM’.
Omar, A. and Hamouda, W. I. (2020) ‘The effectiveness
of stemming in the stylometric authorship attribution
in arabic’,
International Journal of Advanced
Computer Science and Applications (IJACSA), 11(1),
pp. 116–121.
Pokou, Y. J. M., Fournier-Viger, P. and Moghrabi, C.
(2016) ‘Authorship Attribution using Variable Length
Part-of-Speech Patterns.’, in ICAART (2), pp. 354–
361.
De Roeck, A., Sarkar, A. and Garthwaite, P. H. (2004)
‘Defeating the homogeneity assumption’, in
Proceedings of 7th International Conference on the
Statistical Analysis of Textual Data (JADT), pp. 282–
294.
Sapkota, U. et al. (2015) ‘Not all character n-grams are
created equal: A study in authorship attribution’, in
Proceedings of the 2015 conference of the North
American chapter of the association for computational
linguistics: Human language technologies, pp. 93–
102.
Sebastiani, F. (2002) ‘Machine learning in automated text
categorization’, ACM computing surveys (CSUR).
ACM, 34(1), pp. 1–47.
Stamatatos, E. (2008) ‘Author identification: Using text
sampling to handle the class imbalance problem’,
Information Processing & Management. Elsevier,
44(2), pp. 790–799.
Stamatatos, E. (2009) ‘A survey of modern authorship
attribution methods’, Journal of the American Society
for information Science and Technology. Wiley Online
Library, 60(3), pp. 538–556.
Yule, G. U. (1944) The statistical study of literary
vocabulary. CUP Archive.
Zamani, H. et al. (2014) ‘Authorship identification using
dynamic selection of features from probabilistic
feature set’, in International Conference of the Cross-
NLPinAI 2022 - Special Session on Natural Language Processing in Artificial Intelligence