to the other clusters on the same level in the hierarchy.
This score showed that the clusters, although not per-
fect, are close to the groups provided with the ICD-9
standard.
Table 5: Overall Silhouette score by algorithm.
Algorithm Silhouette score
K-Means 0.7151
Birch 0.7059
Agglomerative 0.7257
The amount of clusters generated is shown in Ta-
ble 6, and for each algorithm, the average Silhouette
score is depicted by Table 5.
Table 6: Total number of clusters.
Algorithm Total number of clusters
K-Means 2200
Birch 1895
Agglomerative 2160
As shown by our tests, Agglomerative clustering
outperforms the other algorithms, K-Means by 1%,
while Birch by 2%. As previously stated, even though
DBSCAN can be used in the algorithm, the hierarchi-
cal aspect could not be used, due to it’s implementa-
tion. Ergo, this clustering algorithm is not part of the
research.
5 CONCLUSION
This paper presents a novel tool named Blanket Clus-
terer, which unifies the most widely used cluster-
ing techniques in Machine Learning and facilitates
their application to various numeric representations of
texts, sounds, and videos. We successfully validated
Blanket Clusterer by a dataset comprised of ICD-9 de-
scriptions. The tool proved its efficiency in applying
different clustering methods to the dataset and provid-
ing a detailed report. Furthermore, Blanket Clusterer
provides a valuable interpretation of the best cluster-
ing results through a three-dimensional visualization
plot. In our specific use-case, the Agglomerative clus-
tering offers the best results, compared to the other
algorithms, with a higher Silhouette score than the
rest, with a value of 0.7257. This research proves that
Blanket Clusterer is a valuable tool for measuring the
efficiency of clustering algorithms on a specific task.
Lastly, the code and its interfaces are publicly avail-
able and open-sourced, thus incentivizing researchers
to further enhance and expand its functionalities.
REFERENCES
Abdulhafedh, A. (2021). Incorporating k-means, hierarchi-
cal clustering and pca in customer segmentation. Jour-
nal of City and Development, 3(1):12–30.
Bhardwaj, K. K., Banyal, S., and Sharma, D. K. (2019).
Chapter 7 - artificial intelligence based diagnostics,
therapeutics and applications in biomedical engineer-
ing and bioinformatics. In Balas, V. E., Son, L. H.,
Jha, S., Khari, M., and Kumar, R., editors, Internet
of Things in Biomedical Engineering, pages 161–187.
Academic Press.
Chami, I., Gu, A., Chatziafratis, V., and R
´
e, C. (2020).
From trees to continuous embeddings and back: Hy-
perbolic hierarchical clustering. 33:15065–15076.
Dasgupta, S. (2016). A cost function for similarity-based
hierarchical clustering. pages 118–127.
Farjo, J., Abou Assi, R., Masri, W., and Zaraket, F. (2013).
Does principal component analysis improve cluster-
based analysis? pages 400–403. IEEE.
Gao, Z., Lin, H., Tan, C., Wu, L., Li, S., et al. (2021). Git:
Clustering based on graph of intensity topology. arXiv
preprint arXiv:2110.01274.
Glielmo, A., Husic, B. E., Rodriguez, A., Clementi, C.,
No
´
e, F., and Laio, A. (2021). Unsupervised learning
methods for molecular simulation data. Chemical Re-
views, 121(16):9722–9758.
Govender, P. and Sivakumar, V. (2020). Application of k-
means and hierarchical clustering techniques for anal-
ysis of air pollution: A review (1980–2019). Atmo-
spheric Pollution Research, 11(1):40–56.
He, X., Zhao, K., and Chu, X. (2021). Automl: A sur-
vey of the state-of-the-art. Knowledge-Based Systems,
212:106622.
Jafarzadegan, M., Safi-Esfahani, F., and Beheshti, Z.
(2019). Combining hierarchical clustering approaches
using the pca method. Expert Systems with Applica-
tions, 137:1–10.
Jian, A. K. (2009). Data clustering: 50 years beyond k-
means, pattern recognition letters. Corrected Proof.
Shahapure, K. R. and Nicholas, C. (2020). Cluster quality
analysis using silhouette score. pages 747–748. IEEE.
Surono, S. and Putri, R. D. A. (2021). Optimization of
fuzzy c-means clustering algorithm with combination
of minkowski and chebyshev distance using principal
component analysis. International Journal of Fuzzy
Systems, 23(1):139–144.
Tan, M., Pang, R., and Le, Q. V. (2020). Efficientdet:
Scalable and efficient object detection. pages 10781–
10790.
Zeng, K., Ning, M., Wang, Y., and Guo, Y. (2020). Hier-
archical clustering with hard-batch triplet loss for per-
son re-identification. pages 13657–13665.
Blanket Clusterer: A Tool for Automating the Clustering in Unsupervised Learning
131