All data is openly provided by the International Con-
ference on Document Analysis and Recognition (IC-
DAR) 2017 Chiron et al. (2017) and 2019 Rigaud
et al. (2019) competitions on Post-OCR text correc-
Our code base is publicly available and described
at https://doi.org/10.5281/zenodo.5799211 (Todorov
and Colavizza, 2021).
An Assessment of the Impact of OCR Noise on Language Models