EVO 2 Pro.
The contribution of this paper is to provide an al-
ternative approach in building an audio-based payload
classification for drones, using feature extraction. We
also built a small size drone audio database that will
be available to the public. We used the same ML
model structures for different feature extraction set-
tings, and the results showed that the combination
of features have a better performance than individual
ones. The rest of this paper is organized in the sec-
tions as follow. Section 2 reviews the current sound
detection methods for drone classification, and the
payload classification. Section 3 shows the methodol-
ogy proposed for the feature extraction methods and
three different ML models. Section 4 describes our
experiments and results. Lastly, section 5 presents the
conclusion and future works.
2 LITERATURE REVIEW
2.1 Sound Recognition Solution
A variety of methods have been provided to detect
drones using sound detection (Mezei et al., 2015; Jeon
et al., 2017; Fagiani, 2021; Kim et al., 2017). The
rotation of the drone’s rotor blades create an audible
signature that can be sensed and recorded, even within
the range of human hearing, but the question is if and
how these signatures can be distinguished from other
sounds. In one particular study, two different methods
to achieve drone sound detection included mathemat-
ical correlation and audio fingerprinting (Mezei et al.,
2015).
In the first case, the researchers employed a
method similar to global positioning systems (GPS)
work. To apply this methodology to the sound of
drones, the researchers created a library of sounds
by taking audio recordings of a lawnmower, hair
dryer, music, a model airplane, and two drones.
The sounds were dismantled to isolate their indi-
vidual components. The samples were then com-
pared through the process of correlation using two
techniques: Pearson’s correlation coefficient and nor-
malized maximum correlation. In summary, the
researchers demonstrated that the correlation tech-
niques worked with the system correctly identifying
the drones sounds versus other sounds at a level of
65.6% and 77.9% accuracy. For reference, the drone
sounds were recorded at a distance of approximately
3 meters or less and in a relatively sound proof room.
In the second case, the researchers employed a
technique called audio fingerprinting, which is ba-
sically the algorithm that operates the popular mo-
bile device application called Shazam. Shazam op-
erates by allowing the user to record a short audio
sample from the ambient sound of a song playing
nearby using the mobile device’s built-in microphone.
To simulate this capability, the researchers used an
open source tool called MusicG from GitHub. Then
they recorded samples of drone sounds, but this time
within 1 meter distance and again in a sound con-
trolled room. Overall, the researchers found both
methods to achieve acceptable results. Future work
intends to overcome limitations with regards to equip-
ment quality and distance from subject to micro-
phone.
Another promising study to detect drone sounds
was conducted by (Jeon et al., 2017) using MFCC
with GMM and two types of deep neural networks
(DNN), convolutional neural network (CNN) and re-
current neural network (RNN). The unique aspect of
this research was the emphasis on using polyphonic
sound data from real-life environments. In other
words, the focus was on identifying and classifying
drone sounds from a diverse background of compet-
ing noises in the environment. One significant chal-
lenge that the research team faced was the paucity
of publicly available drone sound data. To remedy
this problem, the team implemented a novel technique
by synthesizing tracks of drone sounds with tracks
of background noise to create a coherent audio clip.
In this case, the sample drone sounds were gener-
ated from DJI Phantom 3 and Phantom 4, with the
background noise of people talking, car traffic, and
airplane noise. The drone sounds were recorded at
distances of 30m, 70m, and 150m while both hov-
ering and approaching. Overall, RNN achieved the
best performance with F-score being (RNN > CNN
> GMM: 0.8809 > 0.6451 > 0.5232) coming from
240 ms of audio input. Precision and recall were also
highest with RNN at (0.7953, 0.8066).
In (Kim et al., 2017), the researchers sought to de-
velop a real-time drone detection and analysis sys-
tem using sound data from DJI Phantom 1 and 2
drones and environmental noise from a European
football stadium. Two different machine learning al-
gorithms were employed. The plotted image machine
learning (PIL) technique resulted in 83% accuracy
and K-nearest neighbor (KNN) achieved 61% accu-
racy. These self-learning techniques also resulted in
improvements of detection efficiency as well. The
downsides of using PIL is that it requires large data
sets and has a tendency to reveal bias in the result.
For KNN, the limitation includes a difficulty to distin-
guish between similar but different drone targets, de-
spite it being a fast and simple approach. The study’s
intent to produce a general UAV detection system
A Feature Engineering Focused System for Acoustic UAV Payload Detection
471