ONTOLOGY BASED EXTRACTION AND INTEGRATION OF INFORMATION FROM UNSTRUCTURED DOCUMENTS

Naychi Lai Lai Thein, Khin Haymar Saw Hla, Ni Lar Thein

2005

Abstract

The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. One of the basic problems in the development of Semantic Web is information integration. Indeed, the web is composed of a variety of information sources, and in order to integrate information from such sources, their semantic integration and reconciliation is required. Also, web pages are formatted with HTML which is only a human readable format and the agents cannot understand their meaning. In this paper, we present an approach to extract information from unstructured documents (e.g. HTML) and are converted to standard format (XML) by using source ontology. Then, we translate XML output to local ontology. This paper also describes a key technology for mapping between ontologies to compute similarity measures to express complex relationships among concepts. In order to address this problem, we apply machine learning approach for semantic interoperability in the real, commercial and governmental world.

Download


Paper Citation


in Harvard Style

Lai Lai Thein N., Haymar Saw Hla K. and Lar Thein N. (2005). ONTOLOGY BASED EXTRACTION AND INTEGRATION OF INFORMATION FROM UNSTRUCTURED DOCUMENTS . In Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 972-8865-19-8, pages 457-460. DOI: 10.5220/0002555504570460

in Bibtex Style

@conference{iceis05,
author={Naychi Lai Lai Thein and Khin Haymar Saw Hla and Ni Lar Thein},
title={ONTOLOGY BASED EXTRACTION AND INTEGRATION OF INFORMATION FROM UNSTRUCTURED DOCUMENTS},
booktitle={Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2005},
pages={457-460},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002555504570460},
isbn={972-8865-19-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - ONTOLOGY BASED EXTRACTION AND INTEGRATION OF INFORMATION FROM UNSTRUCTURED DOCUMENTS
SN - 972-8865-19-8
AU - Lai Lai Thein N.
AU - Haymar Saw Hla K.
AU - Lar Thein N.
PY - 2005
SP - 457
EP - 460
DO - 10.5220/0002555504570460