Efﬁcient IoT Device Fingerprinting Approach using Machine Learning

Richmond Osei

, Habib Louaﬁ

, Malek Mouhoub

1 a

and Zhongwen Zhu

Department of Computer Science, University of Regina, Regina, SK, Canada

Department of Computer Science, New York Institute of Technology (Vancouver Campus), Canada

GAIA Montreal, Ericsson Canada, Montreal, Canada

ﬁ

Keywords:

Internet of Things IoT, Device Fingerprinting, Feature Extraction, Dimensionality Reduction, Machine

Learning.

Abstract:

Internet of Things (IoT) usage is steadily becoming a way of life. IoT devices can be found in smart homes,

factories, farming, etc. However, skyrocketing of IoT devices comes along with many security concerns

due to their small and constrained build-up. For instance, a comprised IoT device in a network presents a

vulnerability that can be exploited to attack the entire network. Since IoT devices are usually scattered over

vast areas, Mobile Network Operators resort to analyzing the trafﬁc generated by these devices to detect the

identity (ﬁngerprint) and nature of these devices (legitimate, faulty, or malicious). We propose an efﬁcient

solution to ﬁngerprint IoT devices using known classiﬁers, alongside dimensionality reduction techniques,

such as PCA and Autoencoder. The latter techniques extract the most relevant features required for accurate

ﬁngerprinting while reducing the amount of IoT data to process. To assess the performance of our proposed

approach, we conducted several experiments on a real-world dataset from an IoT network. The results show

that the Autoencoder for dimensionality reduction with a Decision Tree Algorithm reduces the number of

features from 14 to 5 while keeping the prediction of the IoT devices ﬁngerprints very high (97%).

1 INTRODUCTION

Internet of Things (IoT), known as the Internet of

Object usage, is gradually becoming a way of life.

IoT combines a network of physical components (sen-

sors, cars, and other items) that interact with each

other and other computing components to achieve

speciﬁc goals. Various IoT devices are continuously

introduced to cyberspace. IoT devices can be found

in many places, including transportation, healthcare,

smart homes, and even industrial environments. It has

been predicted, in (AB, 2015), that by 2022, about

29 billion devices (including 18 billion IoT devices)

will be connected to cyberspace, which will raise se-

rious security concerns. Because of IoT limited re-

sources and complexity in built-up infrastructures, at-

tackers are primarily interested in hacking IoT de-

vice networks. Attacks targeting IoT devices tend to

make IoT devices not function as expected, creating

an anomaly in the entire network. Devices showing

abnormal behavior can be detected by analyzing the

trafﬁc they generate.

After the MIRAI botnet attack (Antonakakis et al.,

https://orcid.org/0000-0001-7381-1064

2017), protecting IoT devices became very promi-

nent, shutting down several big companies’ servers.

This attack exploited the weakness of millions of IoT

devices and used them to perform Distributed Denial

of Service (DDoS) attacks on DNS servers of several

big companies, such as Twitter. Because of the mas-

sive and diverse number of IoT device models, rely-

ing on the most typical approaches of device detec-

tion is becoming increasingly difﬁcult. In this regard,

device ﬁngerprinting offers a more efﬁcient and effec-

tive way to collect data on devices that will aid in their

detection and help to identify security vulnerabilities

reliably (Antonakakis et al., 2017).

Device ﬁngerprinting is the process of identifying

the identity of an IoT device from its trafﬁc. The ﬁn-

gerprints and their behavior baseline help drastically

detect potential attacks on IoT devices (Bratus et al.,

2008).

In this context, we present a new approach, based

on supervised machine learning (ML), capable of ac-

curately ﬁngerprinting IoT devices connected to a net-

work. The proposed approach will ﬁrst use ML di-

mensionality reduction techniques (such as PCA and

Auto-Encoder) to extract the most relevant features

from the original dataset. A classiﬁer will then be per-

Osei, R., Louaﬁ, H., Mouhoub, M. and Zhu, Z.

Efﬁcient IoT Device Fingerprinting Approach using Machine Learning.

DOI: 10.5220/0011260500003283

In Proceedings of the 19th International Conference on Security and Cryptography (SECRYPT 2022), pages 525-533

ISBN: 978-989-758-590-6; ISSN: 2184-7711

525

formed on the extracted features to perform the pre-

diction phase. To assess the performance of our pro-

posed approach, we conducted several experiments

on a real-world dataset from an IoT network. Sev-

eral known classiﬁers and dimensionality reduction

techniques are considered in this regard. The re-

sults obtained and reported in this paper show that the

Autoencoder for dimensionality reduction combined

with a Decision Tree Algorithm reduces the number

of features from 14 to 5 only while keeping the predic-

tion of the IoT devices ﬁngerprints very high (97%)

2 RELATED WORK

This section reviews the most important approaches,

solutions, and frameworks proposed to ﬁngerprint IoT

devices using ML algorithms. The literature review

will be organized into two main subsections: solu-

tions related to ﬁngerprinting IoT devices and dimen-

sionality reduction techniques applied in ML.

2.1 IoT Device Fingerprinting

(Thangavelu et al., 2018) proposed a methodology

called a Distributed IoT Fingerprinting Technique

(DEFT), an approach for ﬁngerprinting IoT devices

by considering the trafﬁc session. The authors ignore

the packet level, especially the application protocol

layer (packet size, sender’s IP, and port number), even

though it is noticeably the best feature due to its cost.

The authors considered features from protocols like

DNS, mDNS, TLS., HTTP, etc. Later, Principal Com-

ponent Analysis (PCA) analyzes the features with two

components, the related 2-D planes, without reducing

the number of dimensions of the features. ML algo-

rithms were used, such as Random Forest, K-NN, and

Naive Bayes. The higher accuracy was obtained with

Random Forest. The precision reference, recall ref-

erence, and F-1 score reference metrics were used to

evaluate the accuracy of the obtained ﬁngerprints of

the IoT devices.

In (Acar et al., 2020), the authors ﬁrst proposed

a new multi-stage privacy attack on IoT devices to

leak sensitive user information. The authors evaluated

the proposed attack using the known commercial IoT

home devices dataset. Finally, they presented a solu-

tion based on trafﬁc spooﬁng to effectively address

the proposed attack. Using features such as mean

packet length, mean inter-arrival time, and standard

deviation in packet lengths, the authors adopted ma-

chine learning algorithms including KNN, XGBoost,

Decision Tree, Ada Boost, Random Forest and Na

ıve

Bayes. The proposed solution achieves an accuracy

of 94% with Random Forest.

(Zhang et al., 2021) propose an approach to ﬁn-

gerprint IoT devices using an unsupervised learn-

ing framework. Unsupervised learning was chosen

over regular supervised learning due to labeling costs.

These costs correspond to the time needed to label

the dataset and human error. The authors selected

14 temporal and spatial main features to form a 72-

dimensional vector reﬂecting the physical attributes

of various IoT devices at the network level. Using

Variational Autoencoder to build a clustering frame-

work and K-means algorithm, the proposed method

achieved an accuracy of 86% for a given dataset.

In (Msadek et al., 2019), Nizar et al. used features

from the application protocol, transport, network, and

data link layers to develop a model to ﬁngerprint IoT

devices. The authors offered a method to ﬁngerprint

IoT devices by performing various evaluation and

exploration measurements on the dataset using ma-

chine learning algorithms. The proposed methodol-

ogy yielded an accuracy of 18.5% and clocked a speed

of 18.39 faster than the usual baseline approaches.

2.2 Dimensionality Reduction

In (Sakurada and Yairi, 2014), Sakurada et al. pro-

posed a method of detecting anomalies in telemetry

data of a spacecraft, using dimensionality reduction

techniques. The authors used Autoencoder to per-

form dimensionality reduction, as most of the vari-

ables in the telemetry data are not correlated. They

applied Autoencoders on synthetic and real data, and

compared the performance of Autoencoders to PCA.

It was reported that the Autoencoder detects anoma-

lies where linear PCA failed. Overall, the Autoen-

coders increase accuracy through denoising (i.e., by

randomly changing some corrupted values of the in-

put values to zero).

In (Abdulhammed et al., 2019), the authors pro-

posed a network intrusion detection system based

on ML algorithms. They used the dataset from CI-

CIDS2017 (Yulianto et al., 2019). In building the

proposed system, Autoencoder and PCA were used

for dimensionality reduction to compress and trans-

form features into a lower dimension of all the fused

data. Using these dimensionality reduction tech-

niques, the authors reduced the dataset features from

81 to 10. They maintained an overall accuracy of

99.6 % in the training process using Random For-

est, Bayesian Network, Linear Discriminant Analysis,

and Quadratic Discriminant Analysis.

Table 1 summarizes the reviewed solutions re-

ported in this section, along with their advantages and

drawbacks.

SECRYPT 2022 - 19th International Conference on Security and Cryptography

526

Table 1: Related Work Summary.

Solutions Features Strengths Weaknesses

DEFT (Thangavelu

et al., 2018)

• DNS

• mDNS

• Session statistics

• TLS

• HTTP

• SDDP

• QUIC

• MQTT

• STUN

• NTP

• BOTTP

• Use of protocols fea-

tures, which are con-

sidered less expensive

than application layer

features.

• DEFT approach can

be used to distinguish

anomalous behaviours

since it can characterize

the normal process of an

IoT device by its vendor.

• Use of protocol features instead of

the best features (application layers

features)

• It uses many features that corre-

spond to more time and resources in

the machine learning process.

• Another issue of the DEFT solu-

tion is that its scope is restricted

to local IoT networks. Thus, it

does not present an Internet-wide

view and does not apply to one-

way scan ﬂows incoming at network

telescopes.

Peek-a-

Boo (Acar

et al., 2020)

• Mean packet length

• Standard deviation in

packet lengths

• Mean inter-arrival time

• Ability to ﬁngerprint IoT

device is real-world ap-

plication

• The system yet effective

mitigation mechanism to

hide the actual activities

of the users from the out-

side world.

• This model raises critical privacy

concerns for any IoT devices, espe-

cially personal homes, residences,

ofﬁces of corporations or govern-

ment agencies.

Unsupervised

IoT Finger-

printing

Method

via Varia-

tional Auto-

encoder

and K-

means (Zhang

et al., 2021)

• Temporal Features (Peri-

odicity and Burstiness),

• Spatial Features (Volume

statistics features, Two-

tuple Bag features, Pro-

tocol features, Payload

features)

• Use unsupervised learn-

ing approach rather than

supervise to save the cost

of labelling and human-

error in labelling

• The accuracy of this model is low

compare with other IoT ﬁngerprint-

ing models.

• Because this algorithm uses an un-

supervised learning process, It will

take a lot of time to calculate and an-

alyze data.

IoT Device

Fingerprint-

ing: Machine

Learning

based En-

crypted

Trafﬁc Anal-

ysis (Ab-

dulhammed

et al., 2019)

• Application layer

(HTTP, HTTPS, DHCP,

SSDP, DNS, MDNS,

NTP, SMB, AFP, SSH)

• Transport layer (TCP,

UDP)

• Network layer (IP, ICMP,

ICMPV6, IGMP)

• Data link layer(ARP,

LLC)

• The proposed methodol-

ogy yielded an accuracy

of 18.5% and clocked

a speed of 18.39 faster

than the usual baseline

approaches.

• The model depends on handcrafted

parameter tuning and does not seg-

ment trafﬁc autonomously as we do

in this work.

• The proposed solution works only

for massive training datasets con-

taining no noise.

• The solution is not a stand-alone

system but instead, rely on other ap-

proaches to ﬁngerprint and provide

proper security.

Efﬁcient IoT Device Fingerprinting Approach using Machine Learning

527

3 PROBLEM STATEMENT

It is assumed that we have an IoT network to which

a set of IoT devices are attached. Our objective is to

analyze the trafﬁc generated by these IoT devices and

identify (ﬁngerprint) their identities (e.g., the device’s

name). It is known that the network trafﬁc of IoT de-

vices is huge. Thus we are more interested in identify-

ing the minimal set of features needed to ﬁngerprint

the IoT devices, with the highest accuracy possible.

To that goal, we experiment with various dimension-

ality reduction techniques, such as PCA and Autoen-

coders, to ﬁnd the optimal set of features capable of

ﬁngerprinting IoT devices. The problem at hand can

be formulated as follows.

Let R be a set of known dimensionality reduc-

tion methods, and F the set of all the features that are

initially extracted (in the sense of ML) from a given

dataset D. Let us denote each combination of fea-

tures by f

. f

⊆ P(F) where P(F) represents the

power of the set F (1 ≤ i ≤ 2

|F|

. As stated earlier,

our methodology combines a given dimensionality re-

duction method, r

, with a classiﬁer m

. Our ulti-

mate goal is to ﬁnd the optimal combination (r

∗

, m

∗

)

minimizing the prediction score, measured using one

known prediction metric (accuracy, precision, recall,

or F1-Score). This problem can be formulated using

Equation 1.

∗

, m

∗

) = argmax

∈R ,m

∈M

, m

) (1)

where, R and M are respectively, the set of dimen-

sionality reduction techniques and the set of classi-

ﬁers. r

∗

returns the optimal set of extracted features

∗

4 METHODOLOGY

To solve equation (1) and ﬁnd the optimal subset of

features that can be used to ﬁngerprint the IoT de-

vices, while keeping the prediction score higher, we

propose a system comprised of several modules, as

shown in Figure 1. In the following, these modules

are described.

4.1 Data Collection

The data preparation phase includes acquiring the IoT

device data captured and converting it into a readable

ML format for data preprocessing. The IoT device

captures are recorded in pcap ﬁles, which we pro-

cessed using Zeek (formerly Bro) and obtained differ-

ent log ﬁles. Some of the Zeek obtained logs include

Figure 1: Proposed methodology ﬂowchart.

conn. log, dhcp.log, dns.log, ntp.log, ssh.log, ssl.log,

Z509.log. Here we are more interested in the con-

nection logs, as they contain the most important in-

formation about the trafﬁc (Gustavsson, 2019). Then,

the connection logs are converted into CSV ﬁles, to

which we add the labels, which are the names of the

devices.

4.2 Data Preprocessing

Data preprocessing refers to the process of data clean-

ing, label encoding, data shufﬂing, and data normal-

ization. This process is crucial for the ML algorithm

to be able to efﬁciently process datasets, which in

turn increases the accuracy of the ML performance.

It eliminates defective elements (e.g., missing data,

duplicate instances) and outliers (Garc

ıa et al., 2015).

In natural world settings, it is impractical to ob-

tain a ﬂawless dataset due to errors in data ac-

quisition/collection, device limitation, or human er-

rors (Van den Broeck et al., 2005). Therefore, in

the data cleaning process, we remove some data from

the dataset that does not conform to the pattern or is

considered unnecessary. The following steps are con-

ducted to cleanse the dataset under consideration:

• Deletion of rows with missing or duplicate values.

SECRYPT 2022 - 19th International Conference on Security and Cryptography

528

• removal of inconsistent values according to the

feature data type. i.e. that instances that do not

ﬁt features.

• and removal of columns that have low variance.

The MinMaxscaler is used to normalize the

dataset by shifting and scaling the values in a range

of 0 and 1.

scaled

x − min(x)

max(x) − min(x)

(2)

where, min(x) amd max(x) are respectively the mini-

mum and maximum values of the feature x.

4.3 Dimensionality Reduction

First, feature selection techniques are applied to dis-

card the non-relevant features and keep only those

that are really needed. Then, feature extraction tech-

niques are perfomed to extract the most relevant com-

ponents. Feature extraction improves prediction ac-

curacy and computational efﬁciency by reducing data

redundancy (instances having a linear correlation with

each other (Wang et al., 2016). We consider the fol-

lowing linear or non-linear dimensionality reduction

techniques.

• Linear Dimensionality Reduction Techniques:

– Principal Component Analysis (PCA): PCA

is selected in this research based on its lin-

ear transformation technique to check for lin-

earity between various features in the dataset

(Anowar et al., 2021).

– Independent Component Analysis (ICA): ICA

too was selected based on its linear and super-

vised transformation technique but unlike PCA.

ICA was used to search for a linear direction of

non-Gaussian data, and the components are in-

dependent statistically (Comon, 1994).

– Linear Discriminant Analysis (LDA): LDA se-

lection too was based on its linearity and it’s

being a supervised dimensionality technique.

(Anowar et al., 2021). Similar to PCA, but

apart from maximizing the data variance, it also

maximizes the separation of multiple classes.

(Reddy et al., 2020). Minimum components of

5 resulted in the highest accuracy

– Exploratory Factor Analysis (FA): Factor Anal-

ysis was chosen also because of the linear

transformation technique. Unlike PCA, EFA

only considers common variance, while PCA

considers both speciﬁc and common variance.

(Schreiber, 2021). Minimum components of 4

resulted in the highest accuracy.

– Non-Negative Matrix Factorization (NMF):

Non-negative Matrix Factorization was cho-

sen also because of the linear transformation

technique. In contrast with PCA and ICA,

NMF made non-negativity constraints making

the representation only additive (allowing no

subtractions), in contrast to many other linear

illustrations such as PCA and ICA. (Cai et al.,

2008).

– Karhunen Loeve Transform (KLT): Karhunen-

Loeve Transform (KLT) was selected based on

the linear dimensionality reduction technique.

Closely related to PCA, but unlike PCA it looks

for the reversible transformation by eliminat-

ing redundancy by removing a dataset’s corre-

lation (Rao and Yip, 2018). Minimum compo-

nents of 5 resulted in the highest accuracy

• Non- Linear Dimensionality Reduction

Tecnhinques

– SparsePCA: Like PCA, Sparse PCA was cho-

sen because of its non-linear dimensionality

technique to check for non-linearity between

some features in the dataset. Compared to

PCA, in Sparse PCA, the principal components

are selected as components with fewer non-zero

values in their coefﬁcient vectors. Also, Sparse

PCA does not solve data interpretation issues

concerned with PCA but solves the inconsis-

tency of the calculating component weights

in the high-dimensional setting (Guerra-Urzola

et al., 2021).

– Autoencoder (Autoencoder): Autoencoder was

chosen basically because of its local non-linear

reduction technique. That produces a better

performance on manifolds where the “local ge-

ometry is close to Euclidean” (Silva and Tenen-

baum, 2002)., but the “global geometry is prob-

ably not”. In this research, the Autoencoder

ANN was used using the TensorFlow autoen-

coder library in python.

4.4 Data Training and Prediction

Analysis

Different supervised ML algorithms are applied to the

dimensionality-reduced dataset in training the model.

Hyper-parameter tuning was adopted in the classiﬁca-

tion algorithms used, for example, choosing the best

K value for KNN, the maximum depth for the deci-

sion tree algorithm, the number of trees for the ran-

dom forest classiﬁer, and the number of layers used

in generating a bottleneck for Autoencoder. Using

the SKlearn library in Python, extracted components

Efﬁcient IoT Device Fingerprinting Approach using Machine Learning

529

from the dimensionality algorithms were partitioned

80/20 percent for training and testing.

In order to prevent the effect of unbalancing data

in regard to each class label, a stratiﬁed train/test split

(a default in SKLearn library) is adopted. The ac-

curacy (precision, recall and F1-score) average of 10

was taken for each machine learning algorithm. These

algorithms are as follows: Random Forest (RF), Deci-

sion Tree (DT), K-Nearest Neighbors (KNN), Naive

Bayes (NBC), Super Vector Machine (SVM).

4.5 Prediction and Analysis

The trained models obtained for each ML algorithm

and feature extraction method listed earlier are com-

pared to identify the optimal ones that solve (1). That

is, the feature extraction method and ML algorithm,

which yield the least number of features and highest

accuracy simultaneously, are selected as the optimal

ones. The performance used metrics are as follows:

Precision, Recall, and F1-Score.

5 EXPERIMENTATION

5.1 Experimentation Environment and

Dataset

We have used a Dell computer with an 11th Gen In-

tel(R) Core(TM) i9-11900K @ 3.50GHz, a Random

Access Memory(RAM) of 64GB, and a Graphics

Processing Unit(GPU) of 6GB (Nvidia GTX 1660

Ti). We are using Zeek, a passive, open-source

Unix-based network trafﬁc analyzer, to monitor the

trafﬁc from the dataset (Miettinen et al., 2017). An

MS- Excel to clean and label the dataset. The process

is run using Juptyer Notebook 6.3.0 on Anaconda

2.1.0 (Perez and Granger, 2015), with libraries,

including Pandas, NumPy, SKLearn, Matplotlib, and

Tensorﬂow (Pedregosa et al., 2011).

The dataset used in the ML process is real-world

data that is captured from a real IoT network (Miet-

tinen et al., 2017). The latter comprises 31 IoT de-

vices, from which 27 devices are of different types

(meaning that four types are represented by two de-

vices each). The different device types are presented

in Table 2, and the set of features is listed in Table 3.

From this list of 17 features, only 14 are kept after

performing feature selection (Time, UID, and History

are removed). The schema of the dataset, including

the number of instances, is shown in Table 4. We

consider all the four classiﬁcation methods listed in

Table 2: IoT device types (Miettinen et al., 2017).

No. Device No. Device

1 Aria 15 HomeMaticPlug

2 D-LinkCam 16 HueBridge

3 D-LinkDayCam 17 HueSwitch

4 D-LinkHomeHub 18 iKettle

5 D-LinkSenor 19 Lightify

6 D-LinkSiren 20 MAXGateway

7 D-LinkSwitch 21 SmarterCoffee

8 D-LinkWaterSensor 22 TP-LinkPlugHS100

9 EdimaxCam 23 TP-LinkPlugHS110

10 EdimaxPlug1101W 24 WeMoInsightSwitch

11 EdimaxPlug2101W 25 WeMoLink

12 EdnetCam 26 WeMoSwitch

13 EdnetGateway 27 Withings

14 D-LinkDoorSensor

Table 3: Features of the dataset used.

No. Feature Description

1 Time Timestamp

2 UID Unique Identiﬁer

3 Sender’s IP Sender’s IP Address

4 Sender’s Port Sender’s Port number

5 Response IP Receiver’s Address

6 Response Port Receiver’s Port Number

7 Protocol Transport Protocol Type (UDP

or TCP)

8 Service Network Protocol Type (HTTP,

DNS, DHCP, SSL, NTP)

9 Duration How long the connection lasted

(3 or 4-way)

10 Sender’s Bytes No. payload bytes Sender’s sent

11 Response Bytes No. payload bytes Reciever’s

sent

12 Connection

State

Possible Connection State Val-

ues

13 History State history of connection

14 Sender’s Packet No. packets Sender’s sent

15 Sender’s IP

Bytes

No. payload IP bytes Sender’s

sent

16 Response Pack-

ets

No. packets Reciever’s sent

17 Response IP

Bytes

No. payload IP bytes Reciever’s

sent

Table 4: Dataset Schema (Miettinen et al., 2017).

Attribute Value

Data format pcap

Number of Devices 27

Number of Features 17

Number of Instances 16,561

Data Size 13.9 MB

Start Date June 15, 2021

End Date September 07, 2021

SECRYPT 2022 - 19th International Conference on Security and Cryptography

530

Table 5: The best number of components obtained with the different dimensionality reduction algorithms.

Algorithm PCA ICA LDA FA KLT Sparse PCA NMF Autoencoder

Number of Components 6 6 5 4 5 5 6 4

Average Time (sec.) 0.27 0.29 0.32 0.64 0.35 0.32 0.37 0.22

Section 4.4. In addition to the eight reduction tech-

niques we listed in Section 4.3, we also use a baseline

method, called “No Reduction (No Red.)” for com-

parison purposed. This method simply takes all the

14 features as is (without any reduction).

5.2 Feature Extraction

The ﬁrst row of Table 5 lists the best number of com-

ponents tuned to their best, using each dimentionality

reduction technique, and according to the used metric.

We observe that the Autoencoder yields the minimum

number of components (four components extracted

from ﬁve features). The next best algorithm is LDA,

with ﬁve components (extracted from seven features).

Sparse PCA comes next with ﬁve components (ex-

tracted from eight features). For all the remaining

reduction techniques the number of components are

obtained from the full set of features (14). The sec-

ond row of Table 5 lists the average dimensionality

reduction time after three runs. Overall, the process-

ing times are comparable; however, the best one is

obtained using the Autoencoder algorithm (boldface

shaded cell), and the worst one was obtained by the

Factor Analysis algorithm (light-shaded cell).

5.3 Performance Results

The results obtained using the Precision metric (per-

centage of results that are relevant) are summarized

in Table 6. We observe that the optimal results (97%)

are obtained with the Autoencoder using the DT and

XGBoost ML algorithms. Note that the Autoencoder

is the reduction technique with the lowest number

of components (as shown in Table Table 5). The

second-best results (92%) are obtained with LDA,

SparcePCA and NMF when Random Forest is used.

The third best result (91%) is obtained with LDA us-

ing the KNN ML algorithm.

The results obtained using the Recall metric (per-

centage of total relevant results), summarized in Ta-

ble 7, are similar to those for the precision met-

ric. This conﬁrms once again the superiority of the

Autoencoder as an optimal dimensionality reduction

technique when tested with the DT and XGBoost

classiﬁers. Note that in some cases, precision is es-

sentially identical to recall. This means that the classi-

ﬁer identiﬁed the same amount of devices as false pos-

itives (FP) and false negatives (FN). This shows that

the number of devices wrongly classiﬁed as other de-

vices (False Negatives, or FN) and the number of de-

vices incorrectly classiﬁed as a speciﬁc device(False

Positives, or FP) are identical. The results obtained

using the F1-Score metric are summarized in Table 8.

To no surprise, we report the same conclusion favor-

ing the Autoencoder when combined with DT or XG-

Boost. Saying this, when combined with the Naive

Bayes, the Autoencoder shows one of the poorest re-

sults with the three metrics are listed in Tables 5, 7,

and 8. This is due to the Naive Bayes algorithm’s sys-

tematic assumption that features are not related but

dependent (Rennie et al., 2003; Said et al., 2021).

From Tables 6,7, 8, and 5, we can deduce that

some dimensionality algorithms perform better de-

pending on the classiﬁer used. Indeed, the latter can

perform well depending on the statistical and mathe-

matical notions used in the related dimensionality re-

duction technique. Also, Random Forest DT and XG

Boost, have competitive results without any feature

reduction. We have to see however if the prediction

time is affected when using the original set of fea-

tures (instead of the extracted ones). This will be dis-

cussed in the next section. Tables 6,7, and 8 show

that the baseline method (classiﬁers without any re-

duction, denoted by “No Red.”) often has competitive

results to when reduction techniques are used. These

situations do however come with expensive prediction

time costs as listed in Table 9. Indeed, the prediction

time is reduced to up to the half, while keeping similar

metrics score, thanks to the reduction methods.

6 CONCLUSION

We proposed an efﬁcient solution to ﬁngerprint IoT

devices using a set of classiﬁers alongside dimen-

sional reduction methods. The latter reduce the num-

ber of relevant features, improving accuracy while re-

ducing prediction time. The results of the experiments

we conducted show that the Autoencoder, along with

the XG Boost or Decision Tree, is the best combi-

nation in performance metrics and prediction time.

We plan to investigate other dimensionality reduc-

tion techniques (Hosseinabadi et al., 2022) and classi-

ﬁers such as Transformers and Generative Adversarial

Networks.

Efﬁcient IoT Device Fingerprinting Approach using Machine Learning

531

Table 6: Summary of prediction results, as measured using precision metric. The shaded cells shows the best measures of

each algorithm, while the boldface shaded cells show the optimal measures overall.

Algorithm No Red. PCA ICA LDA FA KLT Sparse PCA NMF Autoencoder

Random Forest 0.92 0.90 0.89 0.92 0.91 0.90 0.92 0.92 0.88

KNN 0.76 0.84 0.86 0.91 0.80 0.84 0.79 0.86 0.86

Naive Bayes 0.35 0.34 0.36 0.42 0.24 0.32 0.30 0.35 0.37

Decision Tree 0.93 0.85 0.87 0.89 0.89 0.88 0.91 0.93 0.97

XG Boost 0.90 0.89 0.90 0.90 0.91 0.90 0.91 0.93 0.97

Table 7: Summary of prediction results, as measured using the Recall metric.

Algorithm No Red. PCA ICA LDA FA KLT Sparse PCA NMF Autoencoder

Random Forest 0.91 0.90 0.89 0.92 0.91 0.90 0.92 0.92 0.88

KNN 0.76 0.83 0.86 0.91 0.80 0.84 ‘ 0.79 0.86 0.86

Naive Bayes 0.37 0.40 0.42 0.46 0.24 0.40 0.39 0.43 0.38

Decision Tree 0.93 0.88 0.87 0.89 0.89 0.87 0.91 0.92 0.97

XG Boost 0.91 0.88 0.90 0.90 0.91 0.89 0.91 0.92 0.97

Table 8: Summary of prediction results, as measured using F1-Score metric.

Algorithm No Red. PCA ICA LDA FA KLT Sparse PCA NMF Autoencoder

Random Forest 0.91 0.90 0.89 0.92 0.91 0.90 0.92 0.92 0.88

KNN 0.76 0.83 0.86 0.91 0.80 0.84 0.79 0.86 0.86

Naive Bayes 0.35 0.34 0.41 0.20 0.34 0.33 0.39 0.37 0.32

Decision Tree 0.93 0.88 0.87 0.89 0.89 0.87 0.91 0.92 0.97

XG Boost 0.91 0.88 0.90 0.90 0.91 0.89 0.91 0.92 0.97

Table 9: Summary of the running time in seconds corresponding to each predicted result listed in the tables.

Algorithm No PCA ICA LDA FA KLT Sparse NMF Auto

Red. PCA encoder

Random Forest 84.643 43.310 38.193 63.128 50.030 32.320 33.321 58.067 35.937

KNN 0.812 0.597 0.409 0.518 0.694 0.412 0.402 0.581 0.470

Naive Bayes 0.455 0.378 0.308 0.347 0.413 0.372 0.351 0.367 0.312

Decision Tree 5.925 3.847 2.989 3.520 4.122 3.005 2.546 3.653 2.641

XG Boost 46.913 29.593 24.934 25.982 38.184 28.973 22.795 30.739 25.908

ACKNOWLEDGEMENT

This research is supported by Mitacs and Ericsson

GAIA Canada Hub.

REFERENCES

AB, E. (2015). Ericsson mobility report, june 2015. Tech-

nical report, Ericsson AB, Tech. Rep. June, 2016.[On-

line]. Available: http://www. ericsson.

Abdulhammed, R., Musafer, H., Alessa, A., Faezipour, M.,

and Abuzneid, A. (2019). Features dimensionality re-

duction approaches for machine learning based net-

work intrusion detection. Electronics, 8(3):322.

Acar, A., Fereidooni, H., Abera, T., Sikder, A. K., Miet-

tinen, M., Aksu, H., Conti, M., Sadeghi, A.-R., and

Uluagac, S. (2020). Peek-a-boo: I see your smart

home activities, even encrypted! In Proceedings of

the 13th ACM Conference on Security and Privacy in

Wireless and Mobile Networks, pages 207–218.

Anowar, F., Sadaoui, S., and Selim, B. (2021). Concep-

tual and empirical comparison of dimensionality re-

duction algorithms (PCA, KPCA, LDA, MDS, SVD,

LLE, ISOMAP, LE, ICA, t-SNE). Computer Science

Review, 40:100378.

Antonakakis, M., April, T., Bailey, M., Bernhard, M.,

Bursztein, E., Cochran, J., Durumeric, Z., Halderman,

J. A., Invernizzi, L., Kallitsis, M., et al. (2017). Un-

derstanding the mirai botnet. In 26th USENIX security

symposium (USENIX Security 17), pages 1093–1110.

SECRYPT 2022 - 19th International Conference on Security and Cryptography

532

Bratus, S., Cornelius, C., Kotz, D., and Peebles, D. (2008).

Active behavioral ﬁngerprinting of wireless devices.

In Proceedings of the ﬁrst ACM conference on Wire-

less network security, pages 56–61.

Cai, D., He, X., Wu, X., and Han, J. (2008). Non-negative

matrix factorization on manifold. In 2008 eighth IEEE

international conference on data mining, pages 63–

72. IEEE.

Comon, P. (1994). Independent component analysis, a new

concept? Signal processing, 36(3):287–314.

Garc

ıa, S., Luengo, J., and Herrera, F. (2015). Data prepro-

cessing in data mining, volume 72. Springer.

Guerra-Urzola, R., Van Deun, K., Vera, J. C., and Sijtsma,

K. (2021). A guide for sparse pca: Model comparison

and applications. psychometrika, 86(4):893–919.

Gustavsson, V. (2019). Machine learning for a network-

based intrusion detection system: An application us-

ing zeek and the cicids2017 dataset.

Hosseinabadi, A. A. R., Sadeghilalimi, M., Shareh, M. B.,

Mouhoub, M., and Sadaoui, S. (2022). Whale

optimization-based prediction for medical diagnostic.

In Proceedings of the 14th International Conference

on Agents and Artiﬁcial Intelligence, ICAART 2022,

Volume 3, Online Streaming, February 3-5, 2022,

pages 211–217. SCITEPRESS.

Miettinen, M., Marchal, S., Hafeez, I., Asokan, N., Sadeghi,

A.-R., and Tarkoma, S. (2017). Iot sentinel: Auto-

mated device-type identiﬁcation for security enforce-

ment in iot. In 2017 IEEE 37th International Con-

ference on Distributed Computing Systems (ICDCS),

pages 2177–2184. IEEE.

Msadek, N., Soua, R., and Engel, T. (2019). Iot device ﬁn-

gerprinting: Machine learning based encrypted trafﬁc

analysis. In 2019 IEEE wireless communications and

networking conference (WCNC), pages 1–8. IEEE.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,

Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,

Weiss, R., Dubourg, V., et al. (2011). Scikit-learn:

Machine learning in python. the Journal of machine

Learning research, 12:2825–2830.

Perez, F. and Granger, B. E. (2015). Project jupyter: Com-

putational narratives as the engine of collaborative

data science. Retrieved September, 11(207):108.

Rao, K. R. and Yip, P. C. (2018). The transform and data

compression handbook. CRC press.

Reddy, G. T., Reddy, M. P. K., Lakshmanna, K., Kaluri, R.,

Rajput, D. S., Srivastava, G., and Baker, T. (2020).

Analysis of dimensionality reduction techniques on

big data. IEEE Access, 8:54776–54788.

Rennie, J. D., Shih, L., Teevan, J., and Karger, D. R. (2003).

Tackling the poor assumptions of naive bayes text

classiﬁers. In Proceedings of the 20th international

conference on machine learning (ICML-03), pages

616–623.

Said, A. B., Mohammed, E. A., and Mouhoub, M. (2021).

An implicit learning approach for solving the nurse

scheduling problem. In Mantoro, T., Lee, M., Ayu,

M. A., Wong, K. W., and Hidayanto, A. N., editors,

Neural Information Processing - 28th International

Conference, ICONIP 2021, Sanur, Bali, Indonesia,

December 8-12, 2021, Proceedings, Part II, volume

13109 of Lecture Notes in Computer Science, pages

145–157. Springer.

Sakurada, M. and Yairi, T. (2014). Anomaly detection

using autoencoders with nonlinear dimensionality re-

duction. In Proceedings of the 2nd workshop on ma-

chine learning for sensory data analysis, pages 4–11.

Schreiber, J. B. (2021). Issues and recommendations for

exploratory factor analysis and principal component

analysis. Research in Social and Administrative Phar-

macy, 17(5):1004–1011.

Silva, V. and Tenenbaum, J. (2002). Global versus local

methods in nonlinear dimensionality reduction. Ad-

vances in neural information processing systems, 15.

Thangavelu, V., Divakaran, D. M., Sairam, R., Bhunia,

S. S., and Gurusamy, M. (2018). Deft: A distributed

iot ﬁngerprinting technique. IEEE Internet of Things

Journal, 6(1):940–952.

Van den Broeck, J., Argeseanu Cunningham, S., Eeckels,

R., and Herbst, K. (2005). Data cleaning: detect-

ing, diagnosing, and editing data abnormalities. PLoS

medicine, 2(10):e267.

Wang, Y., Yao, H., and Zhao, S. (2016). Auto-encoder

based dimensionality reduction. Neurocomputing,

184:232–242.

Yulianto, A., Sukarno, P., and Suwastika, N. A. (2019).

Improving adaboost-based intrusion detection system

(ids) performance on cic ids 2017 dataset. In Jour-

nal of Physics: Conference Series, volume 1192, page

012018. IOP Publishing.

Zhang, S., Wang, Z., Yang, J., Bai, D., Li, F., Li, Z., Wu,

J., and Liu, X. (2021). Unsupervised iot ﬁngerprint-

ing method via variational auto-encoder and k-means.

In ICC 2021-IEEE International Conference on Com-

munications, pages 1–6. IEEE.

Efﬁcient IoT Device Fingerprinting Approach using Machine Learning

533