to the other clusters on the same level in the hierarchy.
This score showed that the clusters, although not per-
fect, are close to the groups provided with the ICD-9
Table 5: Overall Silhouette score by algorithm.
Algorithm Silhouette score
K-Means 0.7151
Birch 0.7059
Agglomerative 0.7257
The amount of clusters generated is shown in Ta-
ble 6, and for each algorithm, the average Silhouette
score is depicted by Table 5.
Table 6: Total number of clusters.
Algorithm Total number of clusters
K-Means 2200
Birch 1895
Agglomerative 2160
As shown by our tests, Agglomerative clustering
outperforms the other algorithms, K-Means by 1%,
while Birch by 2%. As previously stated, even though
DBSCAN can be used in the algorithm, the hierarchi-
cal aspect could not be used, due to it’s implementa-
tion. Ergo, this clustering algorithm is not part of the
This paper presents a novel tool named Blanket Clus-
terer, which unifies the most widely used cluster-
ing techniques in Machine Learning and facilitates
their application to various numeric representations of
texts, sounds, and videos. We successfully validated
Blanket Clusterer by a dataset comprised of ICD-9 de-
scriptions. The tool proved its efficiency in applying
different clustering methods to the dataset and provid-
ing a detailed report. Furthermore, Blanket Clusterer
provides a valuable interpretation of the best cluster-
ing results through a three-dimensional visualization
plot. In our specific use-case, the Agglomerative clus-
tering offers the best results, compared to the other
algorithms, with a higher Silhouette score than the
rest, with a value of 0.7257. This research proves that
Blanket Clusterer is a valuable tool for measuring the
efficiency of clustering algorithms on a specific task.
Lastly, the code and its interfaces are publicly avail-
able and open-sourced, thus incentivizing researchers
to further enhance and expand its functionalities.
