On Learning-free Detection and Representation of Textline Texture in Digitized Documents

Dominik Hauser, Christoffer Kassens, H. Siegfried Stiehl

2022

Abstract

Textline detection and extraction is an integral part of any document analysis and recognition (DAR) system bridging the signal2symbol gap in order to relate a raw digital document of whatever sort to the computational analysis up to understanding of its semantic content. Key is the computational recovery of a rich representation of the salient visual structure which we conceive texture composed of periodic and differently scaled textlines in blocks with varying local spatial frequency and orientation. Our novel learning-free approach capitalizes on i) a texture model based upon linear system theory and ii) the complex Gabor transform utilizing both real even and imaginary odd kernels for the purpose of imposing a quadrilinear representation of textline characteristics as in typography. The resulting representation of textlines, be they either linear, curvilinear or even circular, then serves as input to subsequent computational processes. Via an experimental methodology allowing for controlled experiments with a broad range of digital data of increasing complexity (e.g. from synthetic 1D data to historical newspapers up to medieval manuscripts), we demonstrate the validity of our approach, discuss success and failure, and propose ensuing research.

Download


Paper Citation


in Harvard Style

Hauser D., Kassens C. and Stiehl H. (2022). On Learning-free Detection and Representation of Textline Texture in Digitized Documents. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-549-4, pages 203-212. DOI: 10.5220/0010801300003122


in Bibtex Style

@conference{icpram22,
author={Dominik Hauser and Christoffer Kassens and H. Siegfried Stiehl},
title={On Learning-free Detection and Representation of Textline Texture in Digitized Documents},
booktitle={Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2022},
pages={203-212},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010801300003122},
isbn={978-989-758-549-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - On Learning-free Detection and Representation of Textline Texture in Digitized Documents
SN - 978-989-758-549-4
AU - Hauser D.
AU - Kassens C.
AU - Stiehl H.
PY - 2022
SP - 203
EP - 212
DO - 10.5220/0010801300003122