fgssjoin: A GPU-based Algorithm for Set Similarity Joins

Rafael D. Quirino, Sidney R. Junior, Leonardo A. Ribeiro, Wellington S. Martins



Set similarity join is a core operation for text data integration, cleaning and mining. Most state-of-the-art solutions rely on inherently sequential, CPU-based algorithms. In this paper we propose a parallel algorithm for the set similarity join problem, harnessing the power of GPU systems through filtering techniques and divide-and-conquer strategies that scales well with data size. Experiments show substantial speedups over the fastest algorithms in literature.


Paper Citation

in Harvard Style

Quirino R., Junior S., Ribeiro L. and Martins W. (2017). fgssjoin: A GPU-based Algorithm for Set Similarity Joins . In Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-247-9, pages 152-161. DOI: 10.5220/0006339001520161

in Bibtex Style

author={Rafael D. Quirino and Sidney R. Junior and Leonardo A. Ribeiro and Wellington S. Martins},
title={fgssjoin: A GPU-based Algorithm for Set Similarity Joins},
booktitle={Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},

in EndNote Style

JO - Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - fgssjoin: A GPU-based Algorithm for Set Similarity Joins
SN - 978-989-758-247-9
AU - Quirino R.
AU - Junior S.
AU - Ribeiro L.
AU - Martins W.
PY - 2017
SP - 152
EP - 161
DO - 10.5220/0006339001520161