A Well-founded Ontology to Support the Preparation of Training and Test Datasets

Lucimar de A. Lial Moura, Marcus Albert A. da Silva, Kelli de Faria Cordeiro, Maria Cláudia Cavalcanti



In the knowledge discovery process, a set of activities guide the data preprocessing phase, one of them is the data transformation from raw data to training and test data. This complex and multidisciplinary phase involves concepts and structured knowledge in distinct and particular ways in the literatures and specialized tools, demanding data scientists with suitable expertise. In this work, we present PPO-O, a reference ontology of the data preprocessing operators, to identify and represent the semantics of the concepts related to the data preprocessing phase. Moreover, the ontology highlights data preprocessing operators to the preparation of the training and test datasets. Based on PPO-O, Assistant-PP tool was developed, which made it capable to capture the retrospective data provenance during the execution of data preprocessing operators, facilitating the reproducibility and explainability of the dataset created. This approach might be helpful to non-experts users in data preprocessing.


Paper Citation

in Harvard Style

Moura L., da Silva M., Cordeiro K. and Cavalcanti M. (2021). A Well-founded Ontology to Support the Preparation of Training and Test Datasets. In Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-758-509-8, pages 99-110. DOI: 10.5220/0010460000990110

in Bibtex Style

author={Lucimar Moura and Marcus da Silva and Kelli Cordeiro and Maria Cavalcanti},
title={A Well-founded Ontology to Support the Preparation of Training and Test Datasets},
booktitle={Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 2: ICEIS,},

in EndNote Style


JO - Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - A Well-founded Ontology to Support the Preparation of Training and Test Datasets
SN - 978-989-758-509-8
AU - Moura L.
AU - da Silva M.
AU - Cordeiro K.
AU - Cavalcanti M.
PY - 2021
SP - 99
EP - 110
DO - 10.5220/0010460000990110