Blanket Clusterer: A Tool for Automating the Clustering in

Unsupervised Learning

Konstantin Bogdanoski

, Kostadin Mishev

and Dimitar Trajanov

Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University,

Rugjer Boshkovikj 16, Skopje, North Macedonia

Keywords:

Unsupervised Learning, Clustering, Hierarchical Clustering, Data Visualization, Machine Learning,

Algorithm Optimisation, Machine Learning Tools, Blanket Clusterer, Silhouette Score.

Abstract:

We propose a generic hierarchical clustering algorithm - named Blanket Clusterer, which allows researchers

to examine their data and verify the results gained from other machine learning techniques. We also integrate

a three-dimensional visualization plugin that provides better understanding of the clustering results. We verify

the tool on a speciﬁc use-case, i.e., measuring the clustering techniques performances on a textual dataset

based solely on ICD-9 descriptions encoded using the Word2Vec distributed representations. The veriﬁcation

shows that Blanket Clusterer provides an efﬁcient pipeline for evaluating and interpreting the most frequently

used clustering methods in unsupervised learning.

1 INTRODUCTION

Clustering is one of the most popular machine learn-

ing techniques regarding unsupervised learning meth-

ods. The idea behind them is to use mathematical

equations to successfully divide the dataset into sub-

sets. These algorithms allow the user to automatically

divide the set into subsets, perchance gaining previ-

ously unknown facts about it(Bhardwaj et al., 2019).

The effort to execute the clustering algorithms and de-

termine the best one is often a time-consuming and

exhaustive task that depends on the dataset format and

characteristics. This paper presents a tool that encom-

passes multiple clustering techniques, named Blan-

ket Clusterer. Blanket Clusterer, in terms of prob-

lems revolving around clustering data, automates the

complex process of dataset preparation, execution,

and evaluation of different families of clustering al-

gorithms based on dataset metadata and algorithm re-

quirements. Furthermore, it implements advanced 3D

visualization plots to better interpret the algorithms’

results and aid the algorithms assessment and veriﬁ-

cation process. Blanket Clusterer currently incorpo-

rates multiple clustering algorithms which we discuss

in this paper. They vary from bottom-up to top-down

https://orcid.org/0000-0003-4879-9870

https://orcid.org/0000-0003-3982-3330

https://orcid.org/0000-0002-3105-6010

approaches, from hierarchical to non-hierarchical al-

gorithms. Furthermore, regarding the hierarchy seg-

ment of our algorithm, due to the fact that not all clus-

tering algorithms are hierarchical algorithms and our

use-case has a dataset which needs hierarchy, we im-

plement a custom hierarchical algorithm, which di-

vides the set into subsets, thus implementing this fea-

ture in clustering algorithms which did not have it.

In this study, we perform veriﬁcation of the clus-

tering tool on a speciﬁc use-case scenario. All claims

submitted by physicians to the Medical Services Plan

(MSP) must include a diagnostic code. This informa-

tion allows MSP to verify claims and generate statis-

tics about causes of illness and death. The Interna-

tional Classiﬁcation of Disease (ICD) coding system

divides diseases and health conditions into similar

categories based on body systems and health condi-

tions. More speciﬁcally, ICD-9 diagnostic codes are

based on the ninth revision of the coding standard.

The ICD-9 codes are organized into 19 chapters, and

a speciﬁc code range identiﬁes each chapter. Each di-

agnosis is presented by ICD-9 code and description.

First, we leverage these textual descriptions to build a

dataset that we are using to verify the tool we propose.

Next, we use Word2Vec to encode these descriptions

into a distributed vector representation. The results,

the sequence of real-valued vectors, are given as in-

put to our Blanket Clusterer tool to perform clustering

using the implemented clustering algorithms. Finally,

Bogdanoski, K., Mishev, K. and Trajanov, D.

Blanket Clusterer: A Tool for Automating the Clustering in Unsupervised Learning.

DOI: 10.5220/0011276000003277

In Proceedings of the 3rd International Conference on Deep Learning Theory and Applications (DeLTA 2022), pages 125-131

ISBN: 978-989-758-584-5; ISSN: 2184-9277

125

we assess the category label for each diagnosis ob-

tained from the clustering algorithms’ output by com-

paring it to the ICD-9 chapter, thus measuring their

effectiveness to categorize the texts using only the di-

agnosis descriptions.

Lastly, we develop a three-dimensional visualiza-

tion tool, which works with the results in visually rep-

resenting them for ease of use. The main tool which

was used for the hierarchical representation is Carrot-

Search’s FoamTree

, as described in Section 3 Blan-

ket Clusterer.

As a short summary of our work, we:

• propose a generic hierarchical clustering module

that integrates the most common used clustering

algorithms;

• provide three-dimensional representation of the

obtained clusters, with various color-coded val-

ues, for easier interpretation of the results;

• provide statistical information about performance

(time, number of clusters) related to the differ-

ent types of covered clustering algorithms, while

also offering a foundation for future implementa-

tions of new clustering algorithms - either by im-

plementing an already existing one, or building a

brand new one from scratch, as long as it handles

the speciﬁc object type which we already cover in

the algorithm.

• validate clustering results with silhouette scores

for each algorithm used

The Blunket Clusterer tool and code is available on:

• Docker, though their DockerHub library, on

docker.hub/kbogdanoski/blanket-clusterer

• GitHub, available at git.com/Blanket-Clusterer

2 RELATED WORK

We previously stated that Blanket Clusterer is an al-

gorithm which covers different clustering algorithms,

hence the word ”blanket” in the name. K-Means is

one of the algorithms which is being broadly used

in various Machine learning studies(Govender and

Sivakumar, 2020). K-Means is an algorithm which

exists for at least 60 years and although there have

been a couple of algorithms which have had a proof

of concept after it, k-means has remained as one

of the most widely used algorithms due to it’s sim-

plicity, ease of implementation and efﬁciency(Jian,

2009)(Govender and Sivakumar, 2020). Aside from

FoamTree is available at https://www.carrotsearch.

com/foamtree/

work with datasets extracted from textual informa-

tion, through a natural language processing technique,

k-means has shown that it does not make any differ-

ence when it comes to using datasets when they come

from different industries, such as using data in studies

related to spatial-temporal characteristics of air pol-

lution, pollutant behavior in terms of space and so

on(Govender and Sivakumar, 2020). De facto, the au-

thors of the paper(Govender and Sivakumar, 2020),

focus on the problem with applying k-means and hier-

archical clustering techniques for analysis of air pol-

lution, since clustering has been widely applied to at-

mospheric science data, climate and meteorological

data.

Clustering algorithms work with mathematical

equations, thus they need numerical values for the cal-

culations. Although words, sentences, can be trans-

formed into numeric vectors, while at the same time

containing much valuable information, the same can

be said with processing images, as they, on the bottom

side, are an array of numerical values representing

the pixels and position of them. Re-identiﬁcation is

a sub-task in the ﬁeld of image processing problems,

as the main goal is to match pictures of the same per-

son that appears in different cameras, commonly used

as a tool in the identiﬁcation of pedestrian crossings

around the world in the feat to identify pedestrian in-

formation(Zeng et al., 2020), allowing the ofﬁcials to

maintain control without being present at the said in-

tersection.

The authors in (Glielmo et al., 2021) propose an

overview of all algorithms currently used to extract

simpliﬁed models from molecular simulations to un-

derstand the simulated systems on a physical level.

More precisely, they discuss feature representation of

molecular systems and present state-of-the-art algo-

rithms, divided in ﬁve sections: dimensional reduc-

tion, density estimation and clustering, and kinetic

models. Figure 1 shows possible steps that can be per-

formed to analyze data from a molecular simulation

with an indication of the particular section in which

they are used - Density estimation and Clustering.

According to the research done with the paper

”Combining hierarchical clustering algorithms using

the PCA method”(Jafarzadegan et al., 2019), it was

conﬁrmed that the performance of the various unsu-

pervised learning techniques, speciﬁcally clustering

and classiﬁcation designs, depends on the problem at

hand, as well as the pertinent method used to solve

said problem. Additionally, using the same, or simi-

lar, method for contrasting problems, might and most

likely will, produce unreliable results. For this rea-

son, Blanket Clusterer allows the researchers to try

out different types of clustering algorithms, in an ef-

DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications

126

Figure 1: Unsupervised learning methods for molecular

simulation.

fort to ﬁnd the best one, while testing out different

possibilities at the same time.

Even though a clustering algorithm is selected

for the task at hand, researchers still might not be

pleased with the results. Further optimization is pos-

sible through the use of Principal Component Anal-

ysis - PCA, as shown in the research paper: ”Opti-

mization of fuzzy c-means clustering algorithm with

combination of minkowski and chebyshev distance

using principal component analysis”(Surono and Pu-

tri, 2021), where it was shown that reducing the num-

ber of dimensions for each entry in the dataset, while

using PCA as a technique, the clustering accuracy ob-

tained had an accuracy score of 1.6468(Surono and

Putri, 2021). PCA, or principal component analysis,

is an analysis method which aims to transform a high-

dimensional dataset into a low-dimensional space,

while not losing any of the features which are impor-

tant to the knowledge of the model. PCA uses alge-

braic equations to reduce dimensions that are inter-

connected with other dimensions within the dataset,

into new data with unrelated dimensions named prin-

cipal components(Farjo et al., 2013).

Aside from using PCA as a tool to optimize cus-

tom clustering algorithms, it can also be used in

already-known algorithms. As shown by the re-

search of Abdulhafedh(Abdulhafedh, 2021), using

PCA alongside the K-Means clustering algorithm in

the aim to divide a dataset containing customers into

segments, so plausible marketing strategies may be

used for the customers with whom the possibility of

them accepting the offer, is greater than before.

We mention PCA as it is a valuable tool in unsu-

pervised learning techniques, as it is a feature which

will be developed to work alongside Blanket Clus-

terer, to further provide assistance and recommenda-

tions to the researchers who do use this tool.

In recent years, data scientists have become very

hard to ﬁnd, due to the high demand. AutoML, sim-

ilar to Blanket Clusterer, is a tool which aims to re-

move the need of having a data scientist to anal-

yse, preprocess, process, extract knowledge from a

dataset, with providing a tool to allow other users,

such as domain experts, to automatically create ma-

chine learning applications, without the prerequisite

of having statistical and machine learning knowl-

edge.(He et al., 2021). As shown in Figure 2, Au-

toML extracts features from the dataset, it generates

a model for said dataset, and estimates the model,

prior to providing it to the user. In contrast to Au-

toML which provides tools for supervised learning,

using the beneﬁts of techniques such as Convolu-

tional Neuron Networks (CNNs), and Recurrent Neu-

ral Networks (RNNs), Blanket Clusterer is a tool

which handles tasks related to unsupervised learning,

mainly clustering algorithms for unlabeled and/or la-

beled data, using the beneﬁts of the simple to un-

derstand techniques, their efﬁciency and efﬁcacy (as

shown in Figure 3). There are various implementa-

tions of AutoML, including Google’s Cloud AutoML.

Figure 2: AutoML execution pipeline.

Figure 3: Blanket Clusterer execution pipeline.

EfﬁcientDet is a tool in the AutoML segment(Tan

et al., 2020), which aims to increase the efﬁciency

and scalability, while also providing optimizations

for further improvement. As an important note Ef-

ﬁcientDet’s algorithm operates in supervised learn-

ing manor, especially object detection in provided

datasets. Blanket Clusterer on the other hand, pro-

vides results for whatever task, as long as the dataset

conforms to the requirements of the tool in terms of

the format. GIT(Gao et al., 2021) is another cluster-

ing algorithm, based on graph clustering which aims

to increase the performance related to graph cluster-

ing. This contrasts our tool which can handle datasets

from different sources, as previously stated.

HypHC(Chami et al., 2020) is a hierarchical clus-

tering algorithm, with a goal to compare different al-

gorithms, measure their quality and explain their suc-

cess or failure, similar to our tool. The main differ-

ence between these two algorithms, is that HypHC

utilizes the discrete cost function over space proposed

by Dasgrupta(Dasgupta, 2016), while Blanket Clus-

terer does not, in its current state.

Blanket Clusterer: A Tool for Automating the Clustering in Unsupervised Learning

127

3 BLANKET CLUSTERER

3.1 Goal

Blanket Clusterer is a Python module which aims to

provide a straightforward generic hierarchical cluster-

ing algorithm, alongside a clean 3D

visualization, to

better understand the received results.

The tool integrates the most commonly used clus-

tering algorithms in one place:

• K-Means

• Agglomerative

• Birch

• DBSCAN

As an important side note, DBSCAN due to its im-

plementation, does not provide the preferred results,

compared to the other algorithms, thus this algorithm

was not part of our testing and validation segment.

Additionaly, Blanket Clusterer adds another level

of functionality in allowing to hierarchically clus-

ter data with said algorithms, as well as adding

a visualization functionality, using CarrotSearch’s

FoamTree

library, and measuring their perfor-

mance through Silhouette scores.

3.2 Implementation

As previously stated, Blanket Clusterer works with

other types of clustering algorithms. Aside from us-

ing these algorithms, it also eases the process in using

them, with automating the whole process. Addition-

ally, Blanket Clusterer provides a three-dimensional

visualization of the results and provides a Silhouette

metric of the clustering results for each algorithm.

Figure 4: Blanket Clusterer classes.

The pipeline (shown in Figure 5) implemented

in the Blanket Clusterer, consists of multiple steps.

All of the classes are built upon a predeﬁned class

(Generic Clusterer, as shown in Figure 4), in which

we have deﬁned those methods that are needed for

each algorithm.

3D - Three dimensional representation

3.3 Pipeline

Figure 5: Execution pipeline.

As previously stated, the execution pipeline of the al-

gorithm is shown in Figure 5. The pipeline starts

with the constructor, or the init () method. This

method sets up the object which will be used to clus-

ter the data. It ﬁlls the necessary attributes with the

needed data.

As we initialize the Object, when creating a new

variable, the constructor is called. After creating the

object, we need only to call the method clusterize()

which will inherently call other methods in the class.

The ﬁrst method in the pipeline after clusterize()

is the extract vectors() which will extract the vectors

into the required format. The vectors are stored in a

dictionary in a key-value format.

After extracting the vectors, we extract the keys

so we can have them in a dictionary, with the ex-

tract names() method. This method is needed for

evaluation purposes to compare the cluster predic-

tions to the ground-truth clusters.

Next, we need to extract the group names, if pro-

vided, using the extract

group names(). It is quite

similar to the name extraction process, but we will

talk about it more in the next section.

The embeddings, names and group-names are not

the only attributes necessary for our algorithm. We

also need to extract the number of times a key is seen

in the cluster and in which group it belongs to. Using

the extract nums count() does that for us, and this is

DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications

128

needed for a better presentation of the results.

Alongside the number of times a code occurs in a

cluster, are the coverage values, which are extracted

with extract coverages(). Those coverage values

provide the color for our entities in the cluster, and the

whole cluster overall. This is also not necessary, just

like the group-names and the number of occurrences.

The color is produced with a simple mathematical

equation, because the color format is HSL

, we can

divide the number 360 by the number of groups there

are. With this simple math equation, we give a cer-

tain group, a certain color, which in return provides

us with a better visualization in the end.

Until now, we have extracted the vectors in

a dictionary, and the code they represent in the

key position of the dictionary, but the algorithms

need numerical arrays to operate. The method ex-

tract vectors as array() does that for us, and stores

the numerical values as arrays, making them ready for

clustering.

After all these methods, comes the run() method,

which checks if everything was okay up until this

point, and starts the process of hierarchically cluster-

ing the data. If everything passes okay to this point

and all the data entered was valid, the process should

provide results without any setbacks.

3.3.1 Hierarchical Algorithm

The crucial part of our tool is having hierarchical re-

sult of the datasets. Our hierarchical algorithm is cus-

tom made, displayed with the pseudo code shown in

Figure 6.

The pseudo code is straight-forward. Initially we

give the tool a huge cluster of data. The tool then

clusters the bigger cluster into smaller sub-clusters

(the number of sub-clusters - N, is provided by the

user). Afterwards, for each sub-cluster, there is a sim-

ple check: If the total number of elements in the clus-

ter (X), does not exceed the maximum allowed num-

ber of elements in a cluster (M), clustering for the cur-

rent level is completed. Otherwise if X is greater than

M, another check is needed to see how big the differ-

ence is. If X is greater than M squared (M

), then the

cluster is divided into sub-clusters with N elements

in each one, otherwise the cluster is divided into sub-

clusters with

elements in the cluster.

HSL - H-hue, S-saturation, L-luminance, color format

Figure 6: Hierarchical algorithm pseudo-code.

4 USE-CASE EVALUATION

SCENARIO

4.1 Dataset

To validate Blanket Clusterer, a specially designed

dataset comprised with ICD-9 descriptions of the di-

agnoses is used. The Word2Vec word vectors are used

for encoding these descriptions. We took the word

embeddings from Word2Vec and applied clustering

techniques with our tool.

The ﬁnal input for the Blanket Clusterer is com-

posed of a dictionary, whose key is the ICD-9 code,

and value is the diagnosis description represented as

a list of word vectors, as shown in Figure 7. We use

Blanket Clusterer to cluster the vectors (shown in Ta-

ble 1), hierarchically and validate the results after-

ward.

Figure 7: Example of ICD-9 to vector transformation.

Table 1: Key - Value of word embeddings in Word2Vec

model.

Key Value

001 [0.11, 0.22, ..., 0.99]

002 [0.22, 0.33, ..., 0.88]

... ...

E999 [0.99, 0.99, ..., 0.99]

To better understand the clustering results, aside

from using the model’s embeddings, we also pro-

vide a dataset consisting of human readable names of

the ICD-9 codes, so Blanket Clusterer can hierarchi-

cally name the clusters accordingly. This allows us to

further validate our tool, with comparing the perfor-

mance. As shown in Table 2, each ICD-9 code corre-

Blanket Clusterer: A Tool for Automating the Clustering in Unsupervised Learning

129

sponds to a speciﬁc human readable name.

Table 2: Key - Value of names.

Key Value

001 Cholera

004 Shigellosis

... ...

E999 Late effect of injury due to

war operations and terrorism

Similar to the previous dataset, the second dataset,

aside from our Word2Vec model and key-names

dataset, consists of speciﬁc names depicting groups

of ICD-9 codes, as shown in Table 3. This dataset

contains information about sets of ICD-9 codes, cor-

responding to a certain group in the ICD-9 naming

scheme.

Table 3: Key - Value of group-names.

Key Value

001-009 Intestinal Infectious Diseases

010-018 Tuberculosis

... ...

E990-E999 Injury Resulting From

Operations Of War

After providing the algorithm with the datasets,

we review the results through the 3D representation

tool which we implement in Blanket Clusterer, and

through the Silhouette score.

4.2 Results

All tests were done with the following parameters

(shown in Table 4):

Table 4: Parameters and values used in test run.

Parameter Value

Number of diagnoses 16,728

Number of clusters in a hierarchy level 19

Number of entities in a cluster 19

Maximum depth of hierarchy 6

4.2.1 Visualization

As shown in Figure 8 and Figure 9, there is a clear

distinction between the results, regarding the color-

coding of clusters. As previously stated, in the re-

sults shown in Fig. 9, with the test run we provided

a dataset containing the names of groups with codes,

hence each color represents a certain group of ICD-9

codes. Each color represents one speciﬁc chapter of

the ICD-9 taxonomy, thus proving that the algorithm

has successfully clustered the diagnoses to a particu-

lar chapter as declared in the ICD-9 taxonomy.

Aside from the color, the names of the main clus-

ters also vary between the ﬁgures. Because we have a

custom naming scheme for them. It is easier to under-

stand and validate the results when all features, which

Blanket Clusterer provides, are used.

Figure 8: Visualization without group-names.

Figure 9: Visualization with group-names.

4.2.2 Cross-algorithm Performance

We measured the cross-algorithm performance

through the Silhouette score(Shahapure and Nicholas,

2020), which represents the overall quality of the

clusters in terms of the similarity between the vectors

in each cluster, including the hierarchical groups of

clusters. The score ranges from 0 to 1, where:

• 0 means the elements between the clusters are

similar to one another

• 1 means that the elements between the clusters are

very well separated

Having a higher score, means the clustering algo-

rithm successfully separates the elements into clus-

ters, ﬁlled with similar values, while the opposite is a

fact with lower scores.

The Silhouette algorithm, since we have a hi-

erarchical clustering tool, was calculated hierarchi-

cally and comparatively between clusters, meaning

elements belonging to a certain cluster on certain level

in the hierarchy are compared to elements belonging

DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications

130

to the other clusters on the same level in the hierarchy.

This score showed that the clusters, although not per-

fect, are close to the groups provided with the ICD-9

standard.

Table 5: Overall Silhouette score by algorithm.

Algorithm Silhouette score

K-Means 0.7151

Birch 0.7059

Agglomerative 0.7257

The amount of clusters generated is shown in Ta-

ble 6, and for each algorithm, the average Silhouette

score is depicted by Table 5.

Table 6: Total number of clusters.

Algorithm Total number of clusters

K-Means 2200

Birch 1895

Agglomerative 2160

As shown by our tests, Agglomerative clustering

outperforms the other algorithms, K-Means by 1%,

while Birch by 2%. As previously stated, even though

DBSCAN can be used in the algorithm, the hierarchi-

cal aspect could not be used, due to it’s implementa-

tion. Ergo, this clustering algorithm is not part of the

research.

5 CONCLUSION

This paper presents a novel tool named Blanket Clus-

terer, which uniﬁes the most widely used cluster-

ing techniques in Machine Learning and facilitates

their application to various numeric representations of

texts, sounds, and videos. We successfully validated

Blanket Clusterer by a dataset comprised of ICD-9 de-

scriptions. The tool proved its efﬁciency in applying

different clustering methods to the dataset and provid-

ing a detailed report. Furthermore, Blanket Clusterer

provides a valuable interpretation of the best cluster-

ing results through a three-dimensional visualization

plot. In our speciﬁc use-case, the Agglomerative clus-

tering offers the best results, compared to the other

algorithms, with a higher Silhouette score than the

rest, with a value of 0.7257. This research proves that

Blanket Clusterer is a valuable tool for measuring the

efﬁciency of clustering algorithms on a speciﬁc task.

Lastly, the code and its interfaces are publicly avail-

able and open-sourced, thus incentivizing researchers

to further enhance and expand its functionalities.

REFERENCES

Abdulhafedh, A. (2021). Incorporating k-means, hierarchi-

cal clustering and pca in customer segmentation. Jour-

nal of City and Development, 3(1):12–30.

Bhardwaj, K. K., Banyal, S., and Sharma, D. K. (2019).

Chapter 7 - artiﬁcial intelligence based diagnostics,

therapeutics and applications in biomedical engineer-

ing and bioinformatics. In Balas, V. E., Son, L. H.,

Jha, S., Khari, M., and Kumar, R., editors, Internet

of Things in Biomedical Engineering, pages 161–187.

Academic Press.

Chami, I., Gu, A., Chatziafratis, V., and R

e, C. (2020).

From trees to continuous embeddings and back: Hy-

perbolic hierarchical clustering. 33:15065–15076.

Dasgupta, S. (2016). A cost function for similarity-based

hierarchical clustering. pages 118–127.

Farjo, J., Abou Assi, R., Masri, W., and Zaraket, F. (2013).

Does principal component analysis improve cluster-

based analysis? pages 400–403. IEEE.

Gao, Z., Lin, H., Tan, C., Wu, L., Li, S., et al. (2021). Git:

Clustering based on graph of intensity topology. arXiv

preprint arXiv:2110.01274.

Glielmo, A., Husic, B. E., Rodriguez, A., Clementi, C.,

e, F., and Laio, A. (2021). Unsupervised learning

methods for molecular simulation data. Chemical Re-

views, 121(16):9722–9758.

Govender, P. and Sivakumar, V. (2020). Application of k-

means and hierarchical clustering techniques for anal-

ysis of air pollution: A review (1980–2019). Atmo-

spheric Pollution Research, 11(1):40–56.

He, X., Zhao, K., and Chu, X. (2021). Automl: A sur-

vey of the state-of-the-art. Knowledge-Based Systems,

212:106622.

Jafarzadegan, M., Saﬁ-Esfahani, F., and Beheshti, Z.

(2019). Combining hierarchical clustering approaches

using the pca method. Expert Systems with Applica-

tions, 137:1–10.

Jian, A. K. (2009). Data clustering: 50 years beyond k-

means, pattern recognition letters. Corrected Proof.

Shahapure, K. R. and Nicholas, C. (2020). Cluster quality

analysis using silhouette score. pages 747–748. IEEE.

Surono, S. and Putri, R. D. A. (2021). Optimization of

fuzzy c-means clustering algorithm with combination

of minkowski and chebyshev distance using principal

component analysis. International Journal of Fuzzy

Systems, 23(1):139–144.

Tan, M., Pang, R., and Le, Q. V. (2020). Efﬁcientdet:

Scalable and efﬁcient object detection. pages 10781–

10790.

Zeng, K., Ning, M., Wang, Y., and Guo, Y. (2020). Hier-

archical clustering with hard-batch triplet loss for per-

son re-identiﬁcation. pages 13657–13665.

Blanket Clusterer: A Tool for Automating the Clustering in Unsupervised Learning

131