A LOGIC-BASED APPROACH TO SEMANTIC INFORMATION EXTRACTION

Massimo Ruffolo, Marco Manna

2006

Abstract

Recognizing and extracting meaningful information from unstructured documents, taking into account their semantics, is an important problem in the field of information and knowledge management. In this paper we describe a novel logic-based approach to semantic information extraction, from both HTML pages and flat text documents, implemented in the HıLεX system. The approach is founded on a new two-dimensional representation of documents, and heavily exploits DLP + - an extension of disjunctive logic programming for ontology representation and reasoning, which has been recently implemented on top of the DLV system. Ontologies, representing the semantics of information to be extracted, are encoded in DLP + , while the extraction patterns are expressed using regular expressions and an ad hoc two-dimensional grammar. The execution of DLP + reasoning modules, encoding the HıLεX grammar expressions, yields the actual extraction of information from the input document. Unlike previous systems, which are merely syntactic, HıLεX combines both semantic and syntactic knowledge for a powerful information extraction.

Download


Paper Citation


in Harvard Style

Ruffolo M. and Manna M. (2006). A LOGIC-BASED APPROACH TO SEMANTIC INFORMATION EXTRACTION . In Proceedings of the Eighth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-972-8865-42-9, pages 115-123. DOI: 10.5220/0002458601150123

in Bibtex Style

@conference{iceis06,
author={Massimo Ruffolo and Marco Manna},
title={A LOGIC-BASED APPROACH TO SEMANTIC INFORMATION EXTRACTION},
booktitle={Proceedings of the Eighth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2006},
pages={115-123},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002458601150123},
isbn={978-972-8865-42-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Eighth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - A LOGIC-BASED APPROACH TO SEMANTIC INFORMATION EXTRACTION
SN - 978-972-8865-42-9
AU - Ruffolo M.
AU - Manna M.
PY - 2006
SP - 115
EP - 123
DO - 10.5220/0002458601150123