Contact Deduplication in Mobile Devices using Textual Similarity and Machine Learning

Eduardo N. Borges, Rafael F. Pinheiro, Graçaliz P. Dimuro



This paper presents a method that identifies duplicate contacts, i.e., records representing the same person or organization, automatically collected from multiple data sources. Contacts are compared using similarity functions, which scores are combined by a classification model based on decision trees, avoiding the need for an expert to manually configure similarity thresholds. The experiments show that the proposed method identified correctly up to 92% of duplicate contacts.


Paper Citation

in Harvard Style

Borges E., Pinheiro R. and Dimuro G. (2017). Contact Deduplication in Mobile Devices using Textual Similarity and Machine Learning . In Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-247-9, pages 64-72. DOI: 10.5220/0006275100640072

in Bibtex Style

author={Eduardo N. Borges and Rafael F. Pinheiro and Graçaliz P. Dimuro},
title={Contact Deduplication in Mobile Devices using Textual Similarity and Machine Learning},
booktitle={Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},

in EndNote Style

JO - Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Contact Deduplication in Mobile Devices using Textual Similarity and Machine Learning
SN - 978-989-758-247-9
AU - Borges E.
AU - Pinheiro R.
AU - Dimuro G.
PY - 2017
SP - 64
EP - 72
DO - 10.5220/0006275100640072