tified three areas for further enhancement of ESTEST:
developing its schema matching component by using
IE on the textual schema metadata; exploring identi-
fier disambiguation techniques to assist with the co-
referencing task in IE; and improving the support
for enhancing the global schema with new structure
found in the text.
To investigate how generally applicable our ap-
proach is, we will then evaluate the enhanced ES-
TEST system in the crime investigation and seman-
tic web application domains. In this evaluation, we
will consider the results obtained by ESTEST users
who are experts in the given application domain.
These results will be compared to both any existing
approaches (such as manual inspection or keyword
search) employed by these expert users and to results
obtained by them using a stand-alone IE system.
Finally, in order to support the end user, we will
analyse the requirements of a workbench for access-
ing the ESTEST components and functionality via a
graphical user interface.
REFERENCES
A.H.Tan (1999). Text mining: The state of the art and
the challanges. Proc. of the PAKDD 1999 Workshop
on Knowledge Discovery from Advanced Databases,
pages 65–70.
A.Poulovassilis (2004). A tutorial on the IQL query lan-
guage. Technical report, AutoMed Project.
Appelt, D. (1999). An introduction to Information Ex-
traction. Artificial Intelligence Communications,
12(3):161–172.
AutoMed Project (2006).
http://www.doc.ic.ac.uk/automed/.
Bairoch, A., Boeckmann, B., Ferro, S., and Gasteiger, E.
(2000). Swiss-Prot: Juggling between evolution and
stability. Brief. Bioinform., 5:39–55.
Bontcheva, K., Tablan, V., Maynard, D., and Cunningham,
H. (2004). Evolving GATE to Meet New Challenges
in Language Engineering. Natural Language Engi-
neering, 10:349—373.
Brickley, D. and Guha, R. (2004). RDF vocabulary descrip-
tion language 1.0: RDF schema. W3C Recommenda-
tion. http://www.w3.org/TR/rdf-schema/.
Cunningham, H., Bontcheva, K., and Li, Y. (2005). Knowl-
edge Management and Human Language: Crossing
the Chasm. Journal of Knowledge Management,
9(5):108–131.
Cunningham, H., Maynard, D., Bontcheva, K., and Tablan,
V. (2002). GATE: A framework and graphical devel-
opment environment for robust NLP tools and appli-
cations. In Proc. of the 40th Anniversary Meeting of
the Association for Computational Linguistics.
Cunningham, H., Maynard, D., and Tablan, V. (2000).
JAPE: a Java Annotation Patterns Engine (Second
Edition). Research memorandum, University of
Sheffield.
Fellbaum, C. (1998). WordNet an electronic lexical
database.
Halevy, A. (2003). Data Integration: A Status Report. In
Weikum, G., Sch
¨
oning, H., and Rahm, E., editors,
BTW, volume 26 of LNI, pages 24–29. GI.
Kiryakov, A., Popov, B., Ognyanoff, D., Manov, D., Kir-
ilov, A., and Goranov, M. (2003). Semantic Annota-
tion, Indexing, and Retrieval. In 2nd International Se-
mantic Web Conference (ISWC2003), pages 484–499.
Lassila, O. and Swick, R. (1999). Resource description
framework (RDF) model and syntax specification.
W3C Recommendation. http://www.w3.org/TR/REC-
rdf-syntax/.
Lenzerini, M. (2002). Data Integration: A Theorectical Per-
spective. In Proc. PODS02, pages 247–258.
McBride, B. (2002). Jena: A semantic web toolkit. IEEE
Internet Computing, 6(6):55–59.
McBrien, P. and A.Poulovassilis (2003). Defining peer-
to-peer data integration using both as view rules. In
Proc. Workshop on Databases, Information Systems
and Peer-to-Peer Computing (at VLDB’03), Berlin.
McBrien, P. and Poulovassilis, A. (2003). Data integra-
tion by bi-directional schema transformation rules. In
Proc. ICDE’03, pages 227–238.
Popov, B., Kiryakov, A., Ognyanoff, D., Manov, D., and
Kirilov, A. (2004). KIM - a semantic platform for
information extraction and retrieval. Nat. Lang. Eng.,
10(3-4):375–392.
UK Department for Transport (1999). Stats20: Instruc-
tions for the completion of road accident report form.
http://www.dft.gov.uk.
U.Y. Nahm, R. M. (2000). Using Information Extraction to
aid the discovery of prediction rules from text. Proc.
of the KDD-2000 Workshop on text Mining, pages 51–
58.
Williams, D. (2005). Combining data integration and in-
formation extraction techniques. In Proc. Workshop
on Data Mining and Knowledge Discovery, at BN-
COD’05, pages 96–101.
Williams, D. and Poulovassilis, A. (2004). An example
of the ESTEST approach to combining unstructured
text and structured data. In Proc. of the Database and
Expert Systems Applications (DEXA’04), pages 191–
195. IEEE Computer Society.
Wu, J. and Heydecker, B. (1998). Natural language under-
standing in road accident data analysis. Advances in
Engineering Software, 29:599–610.
COMBINING INFORMATION EXTRACTION AND DATA INTEGRATION IN THE ESTEST SYSTEM
21