Furthermore, some boiler-plate code required for
the training loop could also be reduced with a train-
ing framework, such as PyTorch Lightning. PyTorch
Lightning is a lightweight PyTorch wrapper which re-
duces the engineering effort required to train models.
It reduces the boiler-plate code required to train mod-
els on multiple GPUs, different hardware, different
floating-point precision etc.
ACKNOWLEDGEMENTS
This project was funded by the Language Technol-
ogy Programme for Icelandic 2019–2023. The pro-
gramme, which is managed and coordinated by Al-
mannarómur
4
, is funded by the Icelandic Ministry of
Education, Science and Culture.
We would like to thank Jón Friðrik Daðason
at Reykjavik University for supplying us with the
ELECTRA models used in this research.
REFERENCES
Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter,
S., and Vollgraf, R. (2019). FLAIR: An easy-to-use
framework for state-of-the-art NLP. In Proceedings
of the 2019 Conference of the North American Chap-
ter of the Association for Computational Linguistics
(Demonstrations), pages 54–59, Minneapolis, Min-
nesota. Association for Computational Linguistics.
Ba, J., Kiros, J., and Hinton, G. E. (2016). Layer normal-
ization. ArXiv, abs/1607.06450.
Bjarnadóttir, K., Hlynsdóttir, K. I., and Steingrímsson, S.
(2019). DIM: The database of Icelandic morphol-
ogy. In Proceedings of the 22nd Nordic Conference
on Computational Linguistics, pages 146–154, Turku,
Finland. Linköping University Electronic Press.
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T.
(2017). Enriching word vectors with subword infor-
mation. Transactions of the Association for Computa-
tional Linguistics, 5:135–146.
Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D.,
Bougares, F., Schwenk, H., and Bengio, Y. (2014).
Learning phrase representations using RNN encoder–
decoder for statistical machine translation. In Pro-
ceedings of the 2014 Conference on Empirical Meth-
ods in Natural Language Processing (EMNLP), pages
1724–1734, Doha, Qatar. Association for Computa-
tional Linguistics.
Clark, K., Luong, M.-T., Le, Q. V., and Manning, C. D.
(2020). ELECTRA: Pre-training Text Encoders as
Discriminators Rather Than Generators. In Interna-
tional Conference on Learning Representations.
4
https://almannaromur.is/
Gardner, M., Grus, J., Neumann, M., Tafjord, O., Dasigi,
P., Liu, N. F., Peters, M., Schmitz, M., and Zettle-
moyer, L. (2018). AllenNLP: A deep semantic nat-
ural language processing platform. In Proceedings
of Workshop for NLP Open Source Software (NLP-
OSS), pages 1–6, Melbourne, Australia. Association
for Computational Linguistics.
Hinrichs, E. and Krauwer, S. (2014). The CLARIN re-
search infrastructure: Resources and tools for eHu-
manities scholars. In Proceedings of the Ninth In-
ternational Conference on Language Resources and
Evaluation (LREC’14), pages 1525–1531, Reykjavik,
Iceland. European Language Resources Association
(ELRA).
Hochreiter, S. and Schmidhuber, J. (1997). Long Short-
Term Memory. Neural Computation, 9(8):1735–1780.
Howard, J. and Gugger, S. (2020). Fastai: A Layered API
for Deep Learning. Information, 11(2).
Loftsson, H., Yngvason, J. H., Helgadóttir, S., and Rögn-
valdsson, E. (2010). Developing a PoS-tagged corpus
using existing tools. In Proceedings of “Creation and
use of basic lexical resources for less-resourced lan-
guages”, workshop at the 7th International Confer-
ence on Language Resources and Evaluation (LREC
2010), Valetta, Malta.
Luong, T., Pham, H., and Manning, C. D. (2015). Effec-
tive Approaches to Attention-based Neural Machine
Translation. ArXiv, abs/1508.04025.
Nikulásdóttir, A., Guðnason, J., Ingason, A. K., Loftsson,
H., Rögnvaldsson, E., Sigurðsson, E. F., and Ste-
ingrímsson, S. (2020). Language technology pro-
gramme for Icelandic 2019-2023. In Proceedings of
the 12th Language Resources and Evaluation Confer-
ence, pages 3414–3422, Marseille, France. European
Language Resources Association.
Ott, M., Edunov, S., Baevski, A., Fan, A., Gross, S., Ng,
N., Grangier, D., and Auli, M. (2019). fairseq: A fast,
extensible toolkit for sequence modeling. In Proceed-
ings of the 2019 Conference of the North American
Chapter of the Association for Computational Lin-
guistics (Demonstrations), pages 48–53, Minneapolis,
Minnesota. Association for Computational Linguis-
tics.
Steingrímsson, S., Helgadóttir, S., Rögnvaldsson, E.,
Barkarson, S., and Guðnason, J. (2018). Risamál-
heild: A very large Icelandic text corpus. In Pro-
ceedings of the Eleventh International Conference on
Language Resources and Evaluation (LREC 2018),
Miyazaki, Japan. European Language Resources As-
sociation (ELRA).
Steingrímsson, S., Kárason, Ö., and Loftsson, H. (2019).
Augmenting a BiLSTM tagger with a morphologi-
cal lexicon and a lexical category identification step.
In Proceedings of the International Conference on
Recent Advances in Natural Language Processing
(RANLP 2019), pages 1161–1168, Varna, Bulgaria.
INCOMA Ltd.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C.,
Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz,
M., Davison, J., Shleifer, S., von Platen, P., Ma, C.,
Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S.,
DMS: A System for Delivering Dynamic Multitask NLP Tools
509