ENHANCING HIGH PRECISION BY COMBINING OKAPI BM25 WITH STRUCTURAL SIMILARITY IN AN INFORMATION RETRIEVAL SYSTEM

Yaël Champclaux, Taoufiq Dkaki, Josiane Mothe

2009

Abstract

In this paper, we present a new similarity measure in the context of Information Retrieval (IR). The main objective of IR systems is to select relevant documents, related to a user’s information need, from a collection of documents. Traditional approaches for document/query comparison use surface similarity, i.e. the comparison engine uses surface attributes (indexing terms). We propose a new method which combines the use of both surface and structural similarities with the aim of enhancing precision of top retrieved documents. In a previous work, we showed that the use of structural similarity in combination with cosine improves bare cosine ranking. In this paper, we compare our method to Okapi based on BM25 on the Cranfield collection. We show that structural similarities improve average precision and precision at top 10 retrieved documents about 50%. Experiments also address the term weighting influences on system performances.In this paper, we present a graph-based model which belongs to the vector space family. A vector space model considers each document as a vector in the term space. Each coordinate of a vector is a value representing the importance in a document or in a query of an indexing term. The vector space is defined by the set of terms that the system collects during the indexing phase. Many similarity measures such as Cosine, Jaccard, Dice… are used to determine how well a document corresponds to a query. Such measures determine local similarities between a document and a query on the basis of the terms they have in common. Our goal is to exploit another type of similarities called structural similarities. These similarities identify resemblances between elements on the basis of relationships they have. The structural relationship that we use originates from the fact that documents contain words and that words are contained in documents. The idea is to compare these documents through the similarities between the words they contain while similarities between words are themselves dependent on similarities between the documents they are contained in. In a previous paper, we have shown that the use of structural similarities alone was not sufficient to improve the performance of an IRS. In this paper, we present a different method that combines the use of both structural and surface similarities with the aim of enhancing high precision. Surface similarity is computed as an Okapi measure. Selected documents are then stored in a graph then sorted using a SimRank-based score. We call this 2-stages method OkaSim. We have performed different experiments with different term-weightings on the Cranfield Corpus and show that the structural similarities can improve an Okapi ranking. We show that those similarities can improve average precision more than 50% and precision at top 10 retrieved documents about 50% of an Okapi ranking. Tests and experiments also address the term weighting influences on system performances.

Download


Paper Citation


in Harvard Style

Champclaux Y., Dkaki T. and Mothe J. (2009). ENHANCING HIGH PRECISION BY COMBINING OKAPI BM25 WITH STRUCTURAL SIMILARITY IN AN INFORMATION RETRIEVAL SYSTEM . In Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 3: ICEIS, ISBN 978-989-8111-86-9, pages 279-285. DOI: 10.5220/0002017202790285

in Bibtex Style

@conference{iceis09,
author={Yaël Champclaux and Taoufiq Dkaki and Josiane Mothe},
title={ENHANCING HIGH PRECISION BY COMBINING OKAPI BM25 WITH STRUCTURAL SIMILARITY IN AN INFORMATION RETRIEVAL SYSTEM},
booktitle={Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 3: ICEIS,},
year={2009},
pages={279-285},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002017202790285},
isbn={978-989-8111-86-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 3: ICEIS,
TI - ENHANCING HIGH PRECISION BY COMBINING OKAPI BM25 WITH STRUCTURAL SIMILARITY IN AN INFORMATION RETRIEVAL SYSTEM
SN - 978-989-8111-86-9
AU - Champclaux Y.
AU - Dkaki T.
AU - Mothe J.
PY - 2009
SP - 279
EP - 285
DO - 10.5220/0002017202790285