fgssjoin: A GPU-based Algorithm for Set Similarity Joins
Rafael D. Quirino, Sidney R. Junior, Leonardo A. Ribeiro, Wellington S. Martins
2017
Abstract
Set similarity join is a core operation for text data integration, cleaning and mining. Most state-of-the-art solutions rely on inherently sequential, CPU-based algorithms. In this paper we propose a parallel algorithm for the set similarity join problem, harnessing the power of GPU systems through filtering techniques and divide-and-conquer strategies that scales well with data size. Experiments show substantial speedups over the fastest algorithms in literature.
DownloadPaper Citation
in Harvard Style
Quirino R., Junior S., Ribeiro L. and Martins W. (2017). fgssjoin: A GPU-based Algorithm for Set Similarity Joins . In Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-247-9, pages 152-161. DOI: 10.5220/0006339001520161
in Bibtex Style
@conference{iceis17,
author={Rafael D. Quirino and Sidney R. Junior and Leonardo A. Ribeiro and Wellington S. Martins},
title={fgssjoin: A GPU-based Algorithm for Set Similarity Joins},
booktitle={Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2017},
pages={152-161},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006339001520161},
isbn={978-989-758-247-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - fgssjoin: A GPU-based Algorithm for Set Similarity Joins
SN - 978-989-758-247-9
AU - Quirino R.
AU - Junior S.
AU - Ribeiro L.
AU - Martins W.
PY - 2017
SP - 152
EP - 161
DO - 10.5220/0006339001520161