After that:
d(docF(l
1
)
m
,docF(l
2
)
n
)
≤
d(docF(l
3
)
m
,docF(l
4
)
n
),
m, n = 1, 2, 3, m
≠
n, l
1
, l
2
, l
3
, l
4
= e, g
5 CONCLUSIONS AND FUTURE
WORK
FIN techniques seem promising for IR and related
applications; the prospect of CLIR without
dictionaries is very intriguing. Nevertheless there are
quite enough topics to be considered carefully.
These techniques, for monolingual and cross
language IR, work with long documents and queries
that can give “good” FINs. Unfortunately, this is not
the case with the queries submitted to Internet search
engines; these queries very often have just a couple
of terms (Kobayashi, 2000). The FIN techniques can
be more successful in problems of document
classification where documents with many terms –
and “better” FINs– must be handled.
The quality of a FIN does not depend on number
of terms only; it must be considered with the mass
function for the distance calculation. In the previous
paragraph a “soft” degradation of the contribution to
the distance of terms with ctf = 1, has been
attempted through the mass function. A better idea
would be probably to ignore completely these terms
in the FIN computation. In FIN construction, term
document frequency (df) must be taken into account
as well.
The bell-shaped mass function seems to be a
reasonable one but other ideas should be considered
in conjunction with FIN computation.
Last but not least the renormalization scheme: in
any multilingual collection the numbers of terms in
different languages are random and a solid and
flexible re-balancing scheme is needed, which is not
independent of the FIN construction method and the
distance calculation (mass function).
The optimal determination of the parameters A, α,
β and k
r
is part of the system training process using
parallel corpora.
At present, experiments are been conducted along
these lines mainly with two of the standard
monolingual collections, namely CACM and WSJ.
These collections are of interest because they have
relatively longer queries.
On the other hand, a trilingual – greek / english /
french – test collection is been built for CLIR
experimentation. The parallel corpora are created by
translation by hand (although there is some
mechanical help). Experiments are conducted and
records of performance are kept during various
stages of the parallel corpora creation.
ACKNOWLEDGMENTS
This work was co-funded by 75% from the E.U. and
25% from the Greek Government under the
framework of the Education and Initial Vocational
Training Program – Archimedes.
REFERENCES
Ballesteros, L. and W. B. Croft, “Phrasal Translation and
Query Expansion Techniques for Cross-Language
Information Retrieval” in the Proceedings of the 20th
International ACM SIGIR Conference on Research
and Development in Information Retrieval (SIGIR-
97), pp. 84-91, 1997.
Ballesteros, L. and W. B. Croft, “Resolving Ambiguity for
Crosslanguage Retrieval” in the Proceedings of the
21st International ACM SIGIR Conference on
Research and Development in Information Retrieval
(SIGIR-98), pp. 64-71, 1998.
Berry, M. and P. Young, “Using latent semantic indexing
for multi-language information retrieval” Computers
and the Humanities, vol. 29, no 6, pp. 413-429, 1995.
Davis, M., “New experiments in cross-language text
retrieval at NMSU’s Computing Research Lab” in D.
K. Harman, ed., The Fifth Text Retrieval Conference
(TREC-5), NIST, 1996.
Dumais, S. T., T.K. Landauer, M.L. Littman, “Automatic
cross-linguistic information retrieval using latent
semantic indexing” in G. Grefenstette, ed., Working
Notes of the Workshop on Cross-Linguistic
Information Retrieval. ACM SIGIR.
Kaburlasos, V.G., “Fuzzy Interval Numbers (FINs):
Lattice Theoretic Tools for Improving Prediction of
Sugar Production from Populations of Measurements,”
IEEE Trans. on Man, Machine and Cybernetics – Part
B, vol. 34, no 2, pp. 1017-1030, 2004.
Kobayashi, M. and K. Takeda, “Information Retrieval on
the Web,” ACM Computing Surveys, 32, 2 (2000), pp
144-173.
Kraft, D.H. and D.A. Buell, “Fuzzy Set and Generalized
Boolean Retrieval Systems” in Readings in Fuzzy Sets
for Intelligent Systems, D. Dubius, H.Prade, R.R.
Yager (eds) 1993.
Oard, D.W., “Alternative Approaches for Cross-Language
Text Retrieval” in Cross-Language Text and Speech
Retrieval, AAAI Technical Report SS-97-05.
Available at
http://www.clis.umd.edu/dlrg/filter/sss/papers/
FUZZY INTERVAL NUMBER (FIN) TECHNIQUES FOR FUZZY INTERVAL NUMBER (FIN) TECHNIQUES FOR
255