A Feature Engineering Focused System for Acoustic UAV Payload

Detection

Yaqin Wang

, Facundo Esquivel Fagiani

, Kar Ee Ho

and Eric T. Matson

Computer and Information Technology, Purdue University, West Lafayette, IN, U.S.A.

Renard Analytics, San Miguel de Tucum

an, Argentina

Keywords:

Audio Classiﬁcation, UAV Classiﬁcation, Machine Learning, Drone Security, Payload Detection, Acoustic

Classiﬁcation, Neural Network, Feature Extraction.

Abstract:

The technology evolution of Unmanned Aerial Vehicles (UAVs) or drones, has made these devices suitable

for a wide new range of applications, but it has also raised safety concerns as drones can be used for carrying

explosives or weapons with malicious intentions. In this paper, Machine Learning (ML) algorithms are used

to identify drones carrying payloads based on the sound signals they emit. We evaluate and propose a feature-

based classiﬁcation. Five individual features, and one combinations of features are used to train four different

standard machine learning models: SupportVector Machine (SVM), Gaussian Naive Bayes (GNB), K-Nearest

Neighbor (KNN) and a Neural Network (NN) model. The training and testing dataset is composed of sound

samples of loaded drones and unloaded drones collected by the team. The results show that the combination

of features outperforms the individual ones, with much higher accuracy scores.

1 INTRODUCTION

Unmanned Aerial Vehicles (UAVs), also called

drones, have seen an exponential increase in popu-

larity in recent years (Intelligence, 2021). As the

technology evolves, drones have become cheaper and

smaller, which allows a wide range of new applica-

tions, from ﬁlming sporting events to homeland secu-

rity, the possibilities are endless. On the negative side,

the increase accessibility to these devices also poses a

threat. Smaller and more potent UAVs allow them to

invade restricted zones without being detected, and to

carry potentially harmful payloads, such as weapons

and explosives. Alarming examples of UAV threats

can be cited, such as the case of a drone landing in-

side the perimeter of the U.S. White House in 2015

(Schmidt and Shear, 2015), the attacks to German

chancellor Angela Merkel in 2013 (Lee, 2013), and

Venezuelan president Nicol

as Maduro in 2018 (Koettl

and Marcolin, 2018). Under this context, it is of spe-

cial interest to be able to identify the presence of

loaded drones, since they usually represent a higher

risk than an unloaded drone.

The topic to be addressed in this study is the clas-

siﬁcation of loaded and unloaded drones, and acoustic

detection is the chosen approach. It is a cost-effective

solution, and despite the limitation in noise in the real

scenarios, such as bird singing or wind, (Case et al.,

2008). The acoustic method has provided promis-

ing results on drone sound classiﬁcation (Bernardini

et al., 2017; Seo et al., 2018). Regarding loaded and

unloaded drone classiﬁcation, when a drone carries a

payload, its rotors have to unfailingly increase the ro-

tational speed in order to keep its height, this produces

a different sound proﬁle, which can potentially be

identiﬁed by an acoustic recognition model(Li et al.,

2018).

This study was developed from (Fagiani, 2021;

Wang et al., 2021), by using acoustic signals to detect

UAVs and to map their exact location. However, this

study focuses on using features (and combinations of

features) as inputs for the ML models, and compare

their performances. The selected feature extraction

methods include mfcc, chroma, mel, contrast,and ton-

netz (librosa Development Team, 2021). We also used

a combination of the ﬁve features to compare with

their individual performance. Features from these dif-

ferent methods are used to feed into four different

standard machine learning models: Support Vector

Machine (SVM), Gaussian Naive Bayes (GNB), K-

Nearest Neighbor (KNN) and a Neural Network (NN)

model. The dataset we use for training and testing was

collected by the team. The two drones that are used

to collect audio recordings are DJI Phantom 4 and an

470

Wang, Y., Fagiani, F., Ho, K. and Matson, E.

A Feature Engineering Focused System for Acoustic UAV Payload Detection.

DOI: 10.5220/0010843800003116

In Proceedings of the 14th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2022) - Volume 3, pages 470-475

ISBN: 978-989-758-547-0; ISSN: 2184-433X

EVO 2 Pro.

The contribution of this paper is to provide an al-

ternative approach in building an audio-based payload

classiﬁcation for drones, using feature extraction. We

also built a small size drone audio database that will

be available to the public. We used the same ML

model structures for different feature extraction set-

tings, and the results showed that the combination

of features have a better performance than individual

ones. The rest of this paper is organized in the sec-

tions as follow. Section 2 reviews the current sound

detection methods for drone classiﬁcation, and the

payload classiﬁcation. Section 3 shows the methodol-

ogy proposed for the feature extraction methods and

three different ML models. Section 4 describes our

experiments and results. Lastly, section 5 presents the

conclusion and future works.

2 LITERATURE REVIEW

2.1 Sound Recognition Solution

A variety of methods have been provided to detect

drones using sound detection (Mezei et al., 2015; Jeon

et al., 2017; Fagiani, 2021; Kim et al., 2017). The

rotation of the drone’s rotor blades create an audible

signature that can be sensed and recorded, even within

the range of human hearing, but the question is if and

how these signatures can be distinguished from other

sounds. In one particular study, two different methods

to achieve drone sound detection included mathemat-

ical correlation and audio ﬁngerprinting (Mezei et al.,

2015).

In the ﬁrst case, the researchers employed a

method similar to global positioning systems (GPS)

work. To apply this methodology to the sound of

drones, the researchers created a library of sounds

by taking audio recordings of a lawnmower, hair

dryer, music, a model airplane, and two drones.

The sounds were dismantled to isolate their indi-

vidual components. The samples were then com-

pared through the process of correlation using two

techniques: Pearson’s correlation coefﬁcient and nor-

malized maximum correlation. In summary, the

researchers demonstrated that the correlation tech-

niques worked with the system correctly identifying

the drones sounds versus other sounds at a level of

65.6% and 77.9% accuracy. For reference, the drone

sounds were recorded at a distance of approximately

3 meters or less and in a relatively sound proof room.

In the second case, the researchers employed a

technique called audio ﬁngerprinting, which is ba-

sically the algorithm that operates the popular mo-

bile device application called Shazam. Shazam op-

erates by allowing the user to record a short audio

sample from the ambient sound of a song playing

nearby using the mobile device’s built-in microphone.

To simulate this capability, the researchers used an

open source tool called MusicG from GitHub. Then

they recorded samples of drone sounds, but this time

within 1 meter distance and again in a sound con-

trolled room. Overall, the researchers found both

methods to achieve acceptable results. Future work

intends to overcome limitations with regards to equip-

ment quality and distance from subject to micro-

phone.

Another promising study to detect drone sounds

was conducted by (Jeon et al., 2017) using MFCC

with GMM and two types of deep neural networks

(DNN), convolutional neural network (CNN) and re-

current neural network (RNN). The unique aspect of

this research was the emphasis on using polyphonic

sound data from real-life environments. In other

words, the focus was on identifying and classifying

drone sounds from a diverse background of compet-

ing noises in the environment. One signiﬁcant chal-

lenge that the research team faced was the paucity

of publicly available drone sound data. To remedy

this problem, the team implemented a novel technique

by synthesizing tracks of drone sounds with tracks

of background noise to create a coherent audio clip.

In this case, the sample drone sounds were gener-

ated from DJI Phantom 3 and Phantom 4, with the

background noise of people talking, car trafﬁc, and

airplane noise. The drone sounds were recorded at

distances of 30m, 70m, and 150m while both hov-

ering and approaching. Overall, RNN achieved the

best performance with F-score being (RNN > CNN

> GMM: 0.8809 > 0.6451 > 0.5232) coming from

240 ms of audio input. Precision and recall were also

highest with RNN at (0.7953, 0.8066).

In (Kim et al., 2017), the researchers sought to de-

velop a real-time drone detection and analysis sys-

tem using sound data from DJI Phantom 1 and 2

drones and environmental noise from a European

football stadium. Two different machine learning al-

gorithms were employed. The plotted image machine

learning (PIL) technique resulted in 83% accuracy

and K-nearest neighbor (KNN) achieved 61% accu-

racy. These self-learning techniques also resulted in

improvements of detection efﬁciency as well. The

downsides of using PIL is that it requires large data

sets and has a tendency to reveal bias in the result.

For KNN, the limitation includes a difﬁculty to distin-

guish between similar but different drone targets, de-

spite it being a fast and simple approach. The study’s

intent to produce a general UAV detection system

A Feature Engineering Focused System for Acoustic UAV Payload Detection

471

were also limited by not being able to test both al-

gorithms with the same drone types.

2.2 Loaded and Unloaded Drones

Recognition

In recent years, drones are becoming more and more

popular in both recreational and commercial pur-

poses. A micro-drone is relatively cheap and not too

difﬁcult to use. It can be used for recreational ac-

tivities, such as ﬁlming, as well as in farming, pack-

age deliveries, and more (Ritchie et al., 2017; Pallotta

et al., 2020). However, the popularity of using drones

has led to potential criminal and dangerous activities,

including privacy invasion, illegal ﬂying in restricted

areas such as airports, interference in public events,

and terrorist attacks with armed drones. Hence, to

detect and classify drones with different payloads is

crucial in terms of security and safety.

There are only limited number of research in the

topic of loaded drones recognition, and most of them

are focusing on using micro-Doppler radar to de-

tect and classify loaded and unloaded drones (Ritchie

et al., 2017; Pallotta et al., 2020). Palotta et al. pro-

posed a new micro-Doppler feature extraction proce-

dure based on spectral kurtosis to classify UAVs with

different payloads. Both of the narrowband and wide-

band spectrograms from the radar are used in calculat-

ing spectral kurtosis, which is used as input to a classi-

ﬁer after a dimensionality reduction stage using prin-

cipal component analysis (PCA). They have reached

an average accuracy of 92.61% for different payloads

on the proposed feature extraction procedure.

3 METHODOLOGY

In order to evaluate using feature-based methods for

UAV’s payload classiﬁcation, we compared features

and ML methods. Using UAV audio recording ﬁles

as input, we extracted ﬁve features (mfcc, chroma,

mel, contrast, and tonnetz). In addition one combi-

nations of these features is used. These 5 individual

and 1 combinations of features are used to train four

ML models. We compared four different ML models,

which are SVM, GNB, KNN and NN.

We used a DJI Phantom 4 and an EVO 2 Pro

to collect audio recordings, with and without pay-

load. UAV samples were collected at McAllister Park,

Lafayette, IN, 47904. We collected a total of 1232

number of samples for loaded and unloaded data, for a

total of 204.5 minutes long, as shown in Table 1. The

payload we used for both UAVs is a bottle of water

with 500ml capacity, which is about 16.9 oz. All the

Table 1: UAV Audio Recording Data.

UAV Type Quantity Total Time

Loaded DJI Phantom 4 343 57.16 min

Unloaded DJI Phantom 4 302 50.33 min

Loaded EVO 2 Pro 297 49.50 min

Unloaded EVO 2 Pro 290 48.33 min

Total 1232 204.5 min

Table 2: Feature Extraction Methods.

Feature Shape

chroma stft 12

chroma cqt 12

mel 128

mfcc 40

rms 1

centroid 1

bandwidth 1

contrast 7

ﬂatness 1

bandwidth 1

rolloff 1

poly shape 2

tonnetz 6

zero crossing 1

data processing and ML models training are done on

a Macbook Air, with 1.1 GHz Quad-Core Intel Core

i5 and 8 GB memory.

3.1 Features Extraction

When a human is asked to identify the sound of an ob-

ject, they might try to recognize melodic or rhythmic

patterns and use them to differentiate them, e.g. trafﬁc

sounds, bird singing, and music. Using features for

classiﬁcation may provide “explanations” to under-

stand on how the ML classiﬁcation was produced. In

analyzing and preparing audio ﬁles for machine learn-

ing training, the process of learning the patterns is fea-

ture extraction. In this project, we used the python

library, Librosa, for audio feature processing (librosa

Development Team, 2021).

Table 2 shows 12 different feature extraction tools

in Librosa (librosa Development Team, 2021). The

right column is the number of features on each

method calculated. All of the feature extraction meth-

ods in this table are spectral features except the last

two, which are rhythmic features. Spectral features

represent sound based on the amount of vibration at

each individual frequency.

Among the 12 different extraction methods, we re-

moved the ones with the shape of 1 or 2 because indi-

vidually they would not provide enough information

for classiﬁcation purposes. The selected methods in-

clude: mfcc, mel, contrast, chroma, and tonnetz. We

explored the individual features and the combination

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

472

(a) Loaded drone mfcc.

(b) Unloaded drone mfcc.

Figure 1: MFCC Feature Plot.

(a) Loaded drone tonnetz.

(b) Unloade drone tonnetz.

Figure 2: Tonnetz Feature Plot.

of them to see how they perform with different ma-

chine learning models. By using the Librosa feature

extraction methods, the features from the audio ﬁles

are saved into numpy array format.

It is difﬁcult for a human to see and understand

how these features (a vector of numbers) represent the

audio features. Spectral display functions enable vi-

sualization of the features. Figure 1 and Figure 2 com-

pare the visualized feature between two audio record-

ings from loaded and unloaded drones. It is visually

obvious that the two audio ﬁles are different in sound.

3.2 ML Models

The four models used for training are: SVM, GNB,

KNN, and neural network (NN). In these three linear

models, we used all default settings for all parameters.

The neural network has 3 dense layers, 2 activation

layers, and 2 dropout layers, as shown in Figure 3.

Figure 3: Neural Network Model Structure.

In the SVM model, the parameter C is set to 10,

and the kernel we chose is linear. For GNB model, we

used all default parameters for model training. And in

KNN, the parameter n neighbors is set to 6, with all

other ones as default.

4 EXPERIMENTS AND RESULTS

4.1 Experiments

The loaded and unloaded drones audio recordings

dataset contains 1232 samples, and each one of them

has a sample length of 10 seconds. Experiments on

the dataset we collected will provide valid informa-

tion about which feature extraction method or combi-

nation works the best, as well as which machine learn-

ing model has the best performance.

The features of each recording under different

category are calculated and extracted ﬁrst, and then

saved in the form of numpy arrays. Five individual

features are extracted, as well as one combination.

The extracted features are the input for the machine

learning models. The models will learn the features

from the audio ﬁles, not the raw audio data. Once the

training is done, the program evaluates the model, and

provides a test accuracy score. We split the dataset in

70% training and 30% testing then report and com-

pare on the test accuracy, recall, precision and F-1

score.

4.2 Results

Table 3 shows the results for model SVM. We cal-

culated and extracted individual features and combi-

nations described above. Accuracy on combination

of features has an obvious increase compared to in-

dividual ones. MFCC outperforms the other individ-

ual features with the highest accuracy and F-1 score.

A Feature Engineering Focused System for Acoustic UAV Payload Detection

473

Table 3: Test Results for SVM.

Feature Accuracy Recall Precision F-1

1. chroma 0.965 1.000 0.989 0.995

2. mel 0.914 0.984 0.864 0.920

3. mfcc 0.986 1.00 0.988 0.986

4. contrast 0.786 0.818 0.773 0.795

5. tonnetz 0.676 0.722 0.665 0.692

Combo 0.992 1.00 1.00 1.00

Table 4: Test Results for GNB.

Feature Accuracy Recall Precision F-1

1. chroma 0.924 0.850 1.00 0.919

2. mel 0.924 1.000 0.870 0.930

3. mfcc 0.997 1.000 0.995 0.997

4. contrast 0.546 0.107 0.952 0.192

5. tonnetz 0.689 0.765 0.668 0.713

Combo 0.997 1.000 0.995 0.997

Table 5: Test Results for KNN.

Feature Accuracy Recall Precision F-1

1. chroma 0.995 0.995 0.995 0.995

2. mel 0.954 0.947 0.962 0.954

3. mfcc 0.989 0.989 0.989 0.989

4. contrast 0.800 0.790 0.780 0.770

5. tonnetz 0.708 0.781 0.684 0.730

Combo 0.989 0.984 0.995 0.989

Table 6: Test Results for NN.

Feature Accuracy Recall Precision F-1

1. chroma stft 0.924 0.850 1.00 0.919

2. mel 0.924 1.000 0.870 0.930

3. mfcc 0.997 1.000 0.995 0.997

4. contrast 0.546 0.107 0.952 0.192

5. tonnetz 0.689 0.765 0.668 0.713

Combo 0.997 1.000 0.995 0.997

Chroma feature also performs well with a recall of 1.

Table 4 shows the test results for the model GNB.

Accuracy on combination of features has the best per-

formance in accuracy and all other scores. MFCC and

Mel are the two best individual features with the high-

est accuracy scores. Chroma also has very high accu-

racy, precision and F-1 score, but a relatively lower

recall.

Table 5 shows the results for model KNN. Sur-

prisingly, chroma feature outperforms the others, in-

cluding the combination features. The combination

feature has a slightly lower accuracy score, but still

promising.

Table 6 shows the results for the neural network

(NN) model. The combination features and MFCC

have the same accuracy, recall, precision and F-1

score. Chroma and Mel have similar performance.

In all four machine learning models, the combina-

tion of features has the better performance compared

to individual ones. MFCC is the best individual fea-

ture method for SVM, GNB and NN, while chroma

performs the best in KNN.

5 CONCLUSION AND FUTURE

WORKS

This paper explored ﬁve different feature extraction

methods and a combination to classify whether a

drone carries payload. The ﬁve selected feature ex-

traction methods are chroma, mel, mfcc, contrast, and

tonnetz, and the combinations of the ﬁve individual

ones is also applied and evaluated. Features of each

audio recording under each category (loaded and un-

loaded) are calculated and saved. The four machine

learning models that we we used for trainings are

SVM, GNB, KNN, and a Neural Network. Those

saved features are used as input to feed into the train-

ing models. The dataset was collected and labeled by

using two different brands and models of the drones.

The dataset includes 1232 audio samples of loaded

and unloaded drones. The results show that the com-

bination of features have a better performance than in-

dividual ones. The combination feature reaches about

99% average accuracy in all four ML models. The

best individual features are MFCC and chroma for all

four ML models. Our method of feature combina-

tion outperforms the research of Palotta et al, with the

average accuracy of 92.61%. And our approach re-

quires fewer computational resources, and has higher

explainability.

The limitations of our method include that we only

used the same payload for all data collecting. Also the

amount of data we have is sufﬁcient for current re-

search purpose, but we will need more data for more

general UAV payload detection with different manu-

factures and models. We may need to increase the

complexity of the ML models after collecting more

data. With more effort in the future, accuracy is ex-

pected to improve with more data and optimized mod-

els.

ACKNOWLEDGEMENT

The authors appreciate the support and guidance from

Dr. Eric Matson.

REFERENCES

Bernardini, A., Mangiatordi, F., Pallotti, E., and Capodi-

ferro, L. (2017). Drone detection by acoustic signature

identiﬁcation. Electronic Imaging, 2017(10):60–64.

Case, E. E., Zelnio, A. M., and Rigling, B. D. (2008). Low-

cost acoustic array for small uav detection and track-

ing. In 2008 IEEE National Aerospace and Electron-

ics Conference, pages 110–113. IEEE.

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

474

Fagiani, F. E. (2021). UAV detection and lo-

calization system using an interconnected array

of acoustic sensors and machine learning al-

gorithms. Master thesis, Purdue University.

https://doi.org/10.25394/PGS.14502759.v1.

Intelligence, I. (2021). Drone market outlook in 2021:

industry growth trends, market stats and forecast,

“https://www.businessinsider.com/drone-industry-

analysis-market-trends-growth-forecasts”, (accessed

August 2021).

Jeon, S., Shin, J.-W., Lee, Y.-J., Kim, W.-H., Kwon, Y., and

Yang, H.-Y. (2017). Empirical study of drone sound

detection in real-life environment with deep neural

networks. In 2017 25th European Signal Processing

Conference (EUSIPCO), pages 1858–1862. IEEE.

Kim, J., Park, C., Ahn, J., Ko, Y., Park, J., and Gallagher,

J. C. (2017). Real-time uav sound detection and anal-

ysis system. In 2017 IEEE Sensors Applications Sym-

posium (SAS), pages 1–5. IEEE.

Koettl, C. and Marcolin, B. (August 10th, 2018). “A closer

look at the drone attack on maduro in venezuela,

“https://www.nytimes.com/2018/08/10/world

/americas/venezuela-video-analysis.html”, (accessed

August 2021).

Lee, T. B. (September 18th, 2013). Watch the pirate

party ﬂy a drone in front of germany’s chance,

“https://www.washingtonpost.com/news/the-switch/

wp/2013/09/18/watch-the-pirate-party-ﬂy-a-drone-in-

front-of-germanys-chancellor/”, (accessed August

2021).

Li, S., Kim, H., Lee, S., Gallagher, J. C., Kim, D., Park,

S., and Matson, E. T. (2018). Convolutional neu-

ral networks for analyzing unmanned aerial vehicles

sound. In 2018 18th International Conference on Con-

trol, Automation and Systems (ICCAS), pages 862–

866. IEEE.

librosa Development Team (2021). Feature extraction,

“https://librosa.org/doc/main/feature.html”, (accessed

August 2021).

Mezei, J., Fiaska, V., and Moln

ar, A. (2015). Drone sound

detection. In 2015 16th IEEE International Sympo-

sium on Computational Intelligence and Informatics

(CINTI), pages 333–338. IEEE.

Pallotta, L., Clemente, C., Raddi, A., and Giunta, G. (2020).

A feature-based approach for loaded/unloaded drones

classiﬁcation exploiting micro-doppler signatures. In

2020 IEEE Radar Conference (RadarConf20), pages

1–6. IEEE.

Ritchie, M., Fioranelli, F., Borrion, H., and Grifﬁths, H.

(2017). Multistatic micro-doppler radar feature ex-

traction for classiﬁcation of unloaded/loaded micro-

drones. IET Radar, Sonar & Navigation, 11(1):116–

124.

Schmidt, M. S. and Shear, M. D. (January

26th, 2015). A drone, too small for

radar to detect, rattles the white house,

“https://www.nytimes.com/2015/01/27/us/white-

house-drone.html”, (accessed August 2021).

Seo, Y., Jang, B., and Im, S. (2018). Drone detection us-

ing convolutional neural networks with acoustic stft

features. In 2018 15th IEEE International Confer-

ence on Advanced Video and Signal Based Surveil-

lance (AVSS), pages 1–6. IEEE.

Wang, Y., Fagiani, F. E., Ho, K. E., and Matson, E. T.

(2021). A feature engineering focused system for

acoustic uav detection. In ccecpted for Publica-

tion International Conference on Robotic Computing

(IRC2021). IEEE.

A Feature Engineering Focused System for Acoustic UAV Payload Detection

475