CBK-Modes: A Correlation-based Algorithm for Categorical Data Clustering

Joel Luis Carbonera, Mara Abel

2015

Abstract

Categorical data sets are often high-dimensional. For handling the high-dimensionality in the clustering process, some works take advantage of the fact that clusters usually occur in a subspace. In soft subspace clustering approaches, different weights are assigned to each attribute in each cluster, for measuring their respective contributions to the formation of each cluster. In this paper, we adopt an approach that uses the correlation among categorical attributes for measuring their relevancies in clustering tasks. We use this approach for developing the CBK-Modes (Correlation-based K-modes); a soft subspace clustering algorithm that extends the basic k-modes by using the correlation-based approach for measuring the relevance of the attributes. We conducted experiments on five real-world datasets, comparing the performance of our algorithm with five state-of-the-art algorithms, using three well-known evaluation metrics: accuracy, f-measure and adjusted Rand index. The results show that the performance of CBK-Modes outperforms the algorithms that were considered in the evaluation, regarding the considered metrics.

Download


Paper Citation


in Harvard Style

Carbonera J. and Abel M. (2015). CBK-Modes: A Correlation-based Algorithm for Categorical Data Clustering . In Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-096-3, pages 603-608. DOI: 10.5220/0005367106030608

in Bibtex Style

@conference{iceis15,
author={Joel Luis Carbonera and Mara Abel},
title={CBK-Modes: A Correlation-based Algorithm for Categorical Data Clustering},
booktitle={Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2015},
pages={603-608},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005367106030608},
isbn={978-989-758-096-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - CBK-Modes: A Correlation-based Algorithm for Categorical Data Clustering
SN - 978-989-758-096-3
AU - Carbonera J.
AU - Abel M.
PY - 2015
SP - 603
EP - 608
DO - 10.5220/0005367106030608