A k-Means Algorithm for Clustering with Soft Must-link and Cannot-link Constraints

Philipp Baumann, Dorit Hochbaum

2022

Abstract

The k-means algorithm is one of the most widely-used algorithms in clustering. It is known to be effective when the clusters are homogeneous and well separated in the feature space. When this is not the case, incorporating pairwise must-link and cannot-link constraints can improve the quality of the resulting clusters. Various extensions of the k-means algorithm have been proposed that incorporate the must-link and cannot-link constraints using heuristics. We introduce a different approach that uses a new mixed-integer programming formulation. In our approach, the pairwise constraints are incorporated as soft-constraints that can be violated subject to a penalty. In a computational study based on 25 data sets, we compare the proposed algorithm to a state-of-the-art algorithm that was previously shown to dominate the other algorithms in this area. The results demonstrate that the proposed algorithm provides better clusterings and requires considerably less running time than the state-of-the-art algorithm. Moreover, we found that the ability to vary the penalty is beneficial in situations where the pairwise constraints are noisy due to corrupt ground truth.

Download


Paper Citation


in Harvard Style

Baumann P. and Hochbaum D. (2022). A k-Means Algorithm for Clustering with Soft Must-link and Cannot-link Constraints. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-549-4, pages 195-202. DOI: 10.5220/0010800000003122


in Bibtex Style

@conference{icpram22,
author={Philipp Baumann and Dorit Hochbaum},
title={A k-Means Algorithm for Clustering with Soft Must-link and Cannot-link Constraints},
booktitle={Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2022},
pages={195-202},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010800000003122},
isbn={978-989-758-549-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - A k-Means Algorithm for Clustering with Soft Must-link and Cannot-link Constraints
SN - 978-989-758-549-4
AU - Baumann P.
AU - Hochbaum D.
PY - 2022
SP - 195
EP - 202
DO - 10.5220/0010800000003122