Aerial Fire Image Synthesis and Detection

Sandro Campos and Daniel Castro Silva

Faculty of Engineering of the University of Porto, Artiﬁcial Intelligence and Computer Science Laboratory,

Rua Dr. Roberto Frias s/n, 4200-465 Porto, Portugal

Keywords:

Fire Detection, Unmanned Aerial Vehicle, Convolutional Neural Network, Data Imbalance, Data

Augmentation, Generative Adversarial Network, Multi-agent System.

Abstract:

Unmanned Aerial Vehicles appear as efﬁcient platforms for ﬁre detection and monitoring due to their low

cost and ﬂexibility features. Detecting ﬂames and smoke from above is performed visually or by employ-

ing onboard temperature and gas concentration sensors. However, approaches based on computer vision and

machine learning techniques have identiﬁed a pertinent problem of class imbalance in the ﬁre image domain,

which hinders detection performance. To represent ﬁres visually and in an automated fashion, a residual neu-

ral network generator based on CycleGAN is implemented to perform unpaired image-to-image translation

of non-ﬁre images obtained from Bing Maps to the ﬁre domain. Additionally, the adaptation of ERNet, a

lightweight disaster classiﬁcation network trained on the real ﬁre domain, enables simulated aircraft to carry

out ﬁre detection along their trajectories. We do so under an environment comprised of a multi-agent dis-

tributed platform for aircraft and environmental disturbances, which helps tackle the previous inconvenience

by accelerating artiﬁcial aerial ﬁre imagery acquisition. The generator was tested using the metric of Fr

echet

Inception Distance, and qualitatively, resorting to the opinion of 122 subjects. The images were considered di-

verse and of good quality, particularly for the forest and urban scenarios, and their anomalies were highlighted

to identify further improvements. The detector performance was evaluated in interaction with the simulation

platform. It was proven to be compatible with real-time requirements, processing detection requests at around

100 ms, reaching an accuracy of 90.2% and a false positive rate of 4.5%.

1 INTRODUCTION

The extreme environmental conditions increasingly

promoted by climate change make it particularly

likely for natural disaster phenomena to occur each

year. The especially vulnerable sub-tropical climate

of the Mediterranean basin, as an example, starts out-

lining a trend of abnormally extended and power-

ful ﬁre seasons (Turco et al., 2019). Consequently,

Southern European countries such as Portugal, Spain,

Italy and Greece have been frequently ravaged by un-

controlled and disproportional ﬁres leaving trails of

destruction behind (PORDATA, 2020). If not for ﬁres,

storms, droughts, and ﬂoods are amongst the many

disasters that unfortunately take place. In fact, most

death and damage is related to the latter three (WMO,

2021). According to the United Nations, weather-

related disasters have surged ﬁve-fold in just a short

time frame of 50 years, impacting poorer countries

the worst. All of these are reminders of worldwide

concerns that require closer coordination of means

and agile mechanisms for both controlling and pre-

venting them. This work aims to stimulate the use of

ﬁre imaging techniques to improve aerial ﬁre detec-

tion, considering simultaneously a possible expansion

to other scenarios.

The pertinence of studying these natural disasters

has driven scientists to develop simulation tools ca-

pable of managing vehicles under coordinated mis-

sions. The Platform is an example of such a tool, and

recent developments have allowed aerial vehicles to

assess ﬁre propagation by means of sensor readings

(Almeida, 2017) (Damasceno, 2020); however, the

potential of performing disaster control using aerial

imagery is still unexplored.

Our work aims at ﬁlling the previous gap, by fo-

cusing on the development of an external module to

enable the creation of a pipeline for synthetic ﬁre gen-

eration and detection using aerial imagery. It primar-

ily tackles the following tasks:

1. Generation of synthetic ﬂames and smoke on

aerial images captured by the simulated aircraft;

2. Adaptation of a lightweight model to detect ﬁre in

the generated images, in a real-time scenario.

Campos, S. and Silva, D.

Aerial Fire Image Synthesis and Detection.

DOI: 10.5220/0010829400003116

In Proceedings of the 14th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2022) - Volume 2, pages 273-284

ISBN: 978-989-758-547-0; ISSN: 2184-433X

273

This work follows a recent and growing trend of

producing synthetic data to train highly complex ma-

chine learning models (Tripathi et al., 2019), with

applications to domains such as autonomous driving

(Hollosi and Ballagi, 2019), product identiﬁcation in

warehouses (Wong et al., 2019), and even ﬁre detec-

tion (Park et al., 2020). This line of thought preaches

the generation of diverse and large datasets, which

typically mix real samples with synthetic ones to help

reduce the inconvenience of data imbalance faced by

most prediction problems. The models constructed

by this technique are then of use in the real domain

with improved results. In this work, we address the

problem of ﬁre detection using imagery. The ﬁre

images generated by our model are assessed accord-

ing to their degree of realism, both quantitatively and

qualitatively, and proven to be of value by demonstra-

bly good real ﬁre detectors standards.

The remaining of this document is structured as

follows. Section 2 provides a literature review of im-

age generation and classiﬁcation techniques, partic-

ularly adapted to the ﬁre domain. Sections 3 and 4

present more detailed information about the proposed

solution and its implementation, respectively. Section

5 describes the mechanisms used to validate the qual-

ity of the generated images and the performance of

the ﬁre detector. Finally, Section 6 gathers relevant

conclusions and future work topics.

2 STATE OF THE ART

Three main strategies are primarily considered when

one intends to automate the synthetic image genera-

tion necessary for the simulation of an onboard cam-

era. The ﬁrst one resorts to image rendering based

on CAD (Computer-Aided Design) models, the sec-

ond to compositing techniques, and the third to state-

of-the-art Generative Adversarial Networks (GANs).

More recently, deep-learning-based image inpainting

has also proved to produce realistic features in im-

agery, especially when the context of their surround-

ing environment is considered.

Real-time optical ﬁre detection approaches also

leverage the power of deep learning models. Pre-

trained with extensive and diverse sets of aerial im-

ages, these models have become competitive by de-

ploying such capabilities to devices with low compu-

tational resources.

2.1 Computer Aided Design

Computer Graphics Software (CGS) has more re-

cently found its way into popularity due to the in-

creasing computational power sprawl, as more capa-

ble Graphics Processing Units (GPUs) surge. Soft-

ware tools such as Autodesk 3ds Max

, Blender

and

Unity 3D

enable the manipulation of CAD models,

three-dimensional polygonal meshes representative of

objects, and provide a suitable environment for creat-

ing virtual scenes. These applications often include

rendering engines responsible for encoding the world

information into a synthetic image and scripts are de-

veloped to perform batch generation. Figure 1 il-

lustrates a ﬁre simulation attempt using Corona Ren-

derer

for Autodesk 3ds Max.

Figure 1: Three-dimensional ﬁre rendering using Corona

for Autodesk 3ds Max (MographPlus, 2018).

Approaches based on rendering of CAD models,

despite being proven to produce realistic high-quality

images, require heavy computational resources and

are very dependent on human intervention for image

customization and rendering (Arcidiacono, 2018).

2.2 Image Compositing

Another technique, known as image compositing,

consists of extracting foreground objects from images

and pasting them on new backgrounds (Rother et al.,

2004). In comparison to image rendering, it is less

demanding when ensuring global image high quality

and local consistency.

Driven by the lack of annotated images, Dwibedi

et al. presented Cut, Paste and Learn, in 2017,

a method to generate synthetic images by applying

this concept while focusing on patch-level realism

(Dwibedi et al., 2017). This advancement was signiﬁ-

cant, considering placing features over image back-

grounds may create pixel artifacts at a local level,

which, when propagated into the neural classiﬁer,

may induce it to ignore the introduced features, fail-

ing their detection.

Available at: https://www.autodesk.com/products/3ds-

max/overview

Available at: https://www.blender.org/

Available at: https://unity.com/

Available at: https://corona-renderer.com/

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

274

The literature does not provide much insight into

ﬁre image synthesis using image compositing tech-

niques. This can be explained by the difﬁculty of

segmenting ﬂames and smoke and obtaining viable

masks for accurate overlapping.

2.3 Generative Neural Networks

First proposed by Ian Goodfellow, generative adver-

sarial networks are deep generative models consist-

ing of two deep neural networks, a generator G and

a discriminator D, opposing each other in a min-max

zero-sum game (Goodfellow et al., 2014). The gen-

erator is responsible for creating synthetic data out of

a latent vector p

(z), while the discriminator evalu-

ates whether data is real or fake, when in comparison

with real samples from the same domain. The two

networks are connected, considering the output of the

generator is, along with the real dataset, provided as

input to the discriminator. GANs are trained to min-

imize the generator’s error rate, until convergence is

reached, with the improvement of the generator’s data

creation skills and the increasing inability of the dis-

criminator for detecting the forged imagery. Figure 2

portrays a schematic representation of a GAN.

Figure 2: GANs are comprised of generator and discrim-

inator networks. The generator produces fake data for a

target domain. The discriminator provides feedback on its

outcome by comparing it to real training data (Silva, 2017).

Unpaired Image-to-Image Translation using

Cycle-Consistent Adversarial Networks (Cycle-

GANs), presented in 2017, is a type of generative

neural network which enables the construction of

two bijective mappings, reverse of one another,

between two image domains (Zhu et al., 2017). Data

augmentation techniques using CycleGANs have

appeared in domains where imagery is difﬁcult or

expensive to acquire, as is the case of ﬁre detection

(Park et al., 2020).

Park et al. identiﬁed the problem of class imbal-

ance in the wildﬁre detection domain and presented

a solution based on synthetic ﬁre image generation

(Park et al., 2020), employing CycleGANs (Zhu et al.,

2017) and DenseNets (Densely Connected Convolu-

tional Network) (Huang et al., 2017). CycleGANs

enable the creation of ﬁre images from previously col-

lected non-ﬁre images, by allowing the conversion

of domain and the respective introduction of ﬁre vi-

sual features. Cycle consistency and identity mapping

losses are considered to prevent the model from per-

forming unintended changes of shape and color to the

original image backgrounds, while maintaining them.

This procedure allowed for a better balance between

image classes to be fed into the neural network. Wild-

ﬁre images support increased from 43% to over 49%

and allowed to almost double the total number of im-

ages on the dataset. Figure 3 presents a sample of

wildﬁre images generated by this approach.

Figure 3: The images of mountains in the top row are suc-

cessfully translated to the ﬁre domain using a CycleGAN,

with the respective results portrayed in the bottom row (Park

et al., 2020).

2.4 Image Inpainting

Neural networks have also intervened in image in-

painting, the process which focuses on restoring de-

teriorated images. It can include ﬁlling missing parts,

repairing casual damage and removing unintended ar-

tifacts such as noise, scratches and other distortions

(El Harrouss et al., 2020). These techniques aim to

leave no trace of reconstruction to increase image re-

alism and make tampering as undetectable as possi-

ble. As a consequence, they are also considered for

introducing new features into imagery.

Liu et al. proposed a novel approach with a gen-

eration phase subdivided into rough and reﬁnement

sub-networks combined with a feature patch discrim-

inator, as seen in Fig. 4.

Figure 4: Architecture of the Coherent Semantic Attention

network. The ﬁrst sub-network creates rough pixel predic-

tions while the second reﬁnes them to obtain better correla-

tion of pixels between patches (Liu et al., 2019).

Aerial Fire Image Synthesis and Detection

275

The rough network predicts initial rough features

for unknown patches based on known neighbour-

ing regions, advocating global semantic consistency.

These are afterwards reﬁned, in the sub-network

where an auxiliary coherent semantic attention (CSA)

layer is included. It allows generated patches to have

a better correlation with neighbouring patches of the

same unknown region, largely increasing coherency

between pixels at a local level. This layer is located at

resolution 32x32, as it appears to optimize model per-

formance and needed computing requirements. Mov-

ing the layer to shallower positions may cause loss of

information and increase the computational overhead

due to the operations being performed at higher reso-

lutions, while shifting it to deeper positions enhances

execution times at the expense of image quality. A

pretrained VGG-16 (Simonyan and Zisserman, 2015)

network is also of use to extract features from the

original images, introducing them as input on down-

sampling layers of the reﬁnement network to speed up

and optimize feature generation.

2.5 Optical Fire Detection

Many ﬁre detection approaches using UAVs still rely

on the communication with ground stations for data

processing. These stations are usually equipped with

high-end computing hardware capable of executing

the heaviest of prediction models. However, in re-

ality, UAVs performing missions on disaster con-

trol are subject to very limited visibility and con-

nectivity. As a consequence, scientists are encour-

aged to pursue the development of self-contained,

fully autonomous embedded systems for ﬁre detec-

tion based on lightweight implementations of state-

of-the-art deep learning methods.

Kyrkou and Theocharides developed a custom

CNN (Convolutional Neural Network) architecture

named ERNet, for emergency response and disaster

management, highlighted in this work (Kyrkou and

Theocharides, 2019). Their approach opposes that of

many techniques, which adapt pre-trained networks,

such as that of ResNet-50 (Residual Neural Network)

(He et al., 2016) and VGG-16, in a process of transfer

learning for image classiﬁcation, resorting to the use

of high-performance GPUs. They limited the number

of ﬁlters applied in order to speed up computations

and reduced parameter size, according to the scarce

memory available onboard of such vehicles. Residual

connections on the computational blocks were also

useful to improve model accuracy, while not hinder-

ing performance signiﬁcantly.

This network was trained to classify disasters ac-

cording to 4 different incident types, in which ﬁre is

included. AIDER (Aerial Image Dataset for Emer-

gency Response) is the augmented dataset created for

this purpose. The detector achieved a mean accuracy

of 90% at 53 FPS (Frames Per Second) and it con-

sumed no more than 300 KB of memory, allowing for

onboard real-time detection and on-chip storage.

3 PROPOSED SOLUTION

The proposed solution comprises the development of

an external service, designated Fire Module, capable

of interacting with vehicles to perform ﬁre generation

and detection in images captured from an aerial per-

spective. In a real environment, considering it is em-

bedded in the ﬁrmware of actual aircraft, and given

that aircraft shall be provided with onboard cameras

for image acquisition on assessment missions, this

module is most helpful in performing detection. In a

simulated environment, however, the solution consid-

ers the existence of a service that simulates said cam-

eras by generating aerial imagery of a speciﬁc type of

disturbance. Consequently, one may require an addi-

tional module to simulate the actual behaviour of the

disturbance over the terrain.

We integrate this architecture within The Plat-

form, a multi-agent distributed system that provides

a simulation environment based on Microsoft Flight

Simulator X (FSX) for ﬂeets of autonomous, hetero-

geneous vehicles. These vehicles intervene in the pro-

cess of assessing disturbances in missions that range

from pollution source identiﬁcation to ﬁre detection in

outdoor environments (Silva, 2011). Figure 5 depicts

the relevant components of The Platform’s architec-

ture for this work and their respective relationships.

Figure 5: Relevant entities of The Platform and their respec-

tive interactions. The Disturbances Manager generates dis-

turbances that affect the simulation environment where ve-

hicles perform missions. The External Fire Simulator helps

to reproduce the realistic behaviour of ﬁre. (adapted from

(Damasceno, 2020)).

The Vehicle Agent is responsible for the simula-

tion of an aircraft in FSX, enabling, for instance, nav-

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

276

igation control (Silva, 2011). The Disturbances Man-

ager (DM) creates and manages all disturbances in the

simulation environment. ForeFire intervenes as an ex-

ternal disturbance simulator that more accurately and

realistically simulates the ﬁre spread behaviour over

the terrain (Filippi et al., 2014).

The solution congregates therefore ﬁve interact-

ing entities: the previously existing Vehicle Agents,

DM and ForeFire, and the new Fire Module, com-

prised of a Camera Simulator and a Fire Detector, and

the Maps API (Application Programming Interface),

an external aerial tiles provider. The container dia-

gram of Fig. 6 depicts the integration of the micro-

service within this simulation platform, the most rel-

evant components and their relationships.

Figure 6: Container diagram depicting the Fire Module

within the system and its respective dependencies.

The Fire Module has been designed as a micro-

service based on a RESTful architecture, as it shall

provide independent, loosely coupled features to sev-

eral vehicle instances cooperating simultaneously.

This approach aims to increase modularity, increase

resilience to faults, and ease the deployment process

for devices on the edge. This decision preaches bet-

ter isolation of concerns and the integration of solu-

tions using diversifying technologies, valuing ﬂexi-

bility and, most importantly, the system’s scalability.

The module is synchronised with the DM and there-

fore portrays the disturbances consistently for all ve-

hicles participating in the same mission, advocating

better management of resources.

When a Vehicle Agent is performing a ﬁre assess-

ment mission, it connects to the Camera Simulator

and requests the aerial image of its own point of view.

This image, originally collected from the aerial tiles

provider (ATP), at the vehicle’s position, may or may

not contain ﬂames and smoke, depending on both the

distance of the vehicle to the ﬁre area and the orien-

tation of the camera in relation to it. For that, the

ﬁres’ location is always provided to the Camera Sim-

ulator by the DM. In case the simulated camera does

not capture the ﬁre, the image returned to the vehi-

cle corresponds to the tile just as it was obtained from

its provider, meaning it does not undergo any change.

Otherwise, a residual neural network trained using

CycleGAN proves its ability of performing non-ﬁre to

ﬁre image domain translation, on demand, and returns

the synthetic image with the ﬁre features in place. The

Vehicle Agent then provides the received image to the

lightweight neural network of the module’s Fire De-

tector which performs binary classiﬁcation on the re-

ceived aerial tile and evaluates the presence of ﬁre.

The aircraft requests images of the simulated camera

for every position of its trajectory and repeats this pro-

cess without ever knowing the ground truth of the ﬁre

detection problem.

The generation of ﬁre by the model is highly de-

pendent on the scenario its training has targeted. The

focus in this work lies in the particular synthesis of

ﬁre for images of forests even though it can also be of

use for urban environments.

4 IMPLEMENTATION DETAILS

The Fire Module provides vehicles with camera simu-

lation and detection services. It was developed using

FastAPI, a high performance tool for building APIs

in Python, and its communication with The Platform

is performed using HTTP. The generation service is

reached using a GET request to ”/camera”, and its re-

sponse includes the synthetic aerial image, in JPEG

format, for the aircraft’s position. On the other end,

the detection service is reached using a POST request

to ”/detector”, whose body shall carry an image of

JPEG format as well. The API takes care of feeding

the image into the ERNet classiﬁcation network and

returns a boolean regarding the presence of ﬁre in it.

The camera simulator must therefore be capa-

ble of acquiring the aerial tiles corresponding to the

positions of the aircraft from a preestablished maps

API. Bing Maps REST Services

is developed by Mi-

crosoft and provides free licensing plans offering 125

thousand API requests a year and up to 50 thousand

within any 24-hour period, for educational purposes

(Microsoft, 2021). Apart from the traditional top-

down satellite imaging, 45

◦

angle aerial views resem-

bling captures taken by UAVs are also available in this

API and exerted much inﬂuence in its selection. Fig-

ure 7 portrays sample tiles of the two perspectives.

The CycleGAN network was trained for 110

epochs, following a learning rate of 0.0002 up to

epoch 100, from which it linearly decreased, and us-

More information at: https://docs.microsoft.com/en-

us/bingmaps/rest-services/

Aerial Fire Image Synthesis and Detection

277

Figure 7: Samples of satellite and bird’s eye perspectives,

as taken from Microsoft Bing Maps.

ing the Adam solver (β

= 0.5) as optimizer, as speci-

ﬁed by default (Zhu et al., 2017). Also, taking advan-

tage of the fact that the Bing Maps service uses tiles

of size 256x256 pixels for rendering, the network’s

input size matches this value. The batch size is set

to 1 to enable very frequent parameter updates and it

is used with Instance Normalization layers, which are

recommended for styling transfer tasks (Huang and

Belongie, 2017).

The insertion of ﬁres in the images is spatially

restricted, according to the evolution of the burning

area, and the procedure takes this factor into consider-

ation. Initially, the camera simulator performs the re-

quest of the aerial image for the desired location and

calculates the planar coordinates of the ﬁre polygon

(as provided by the DM) on the image. For that, an

internal service of the Bing Maps API is used to draw

the polygon in the appropriate location, in an easily

identiﬁable color such as fuchsia and, as observed in

Fig. 8, it is then extracted using HSV segmentation.

Figure 8: Polygon extraction using an edge detection tech-

nique. The generation of a binary mask allows to extract the

polygonal coordinates of the designated ﬁre region.

After translating the image collected from the ATP

to the ﬁre domain, and having the binary mask of the

ﬁre polygon, the Cut-and-Paste technique is applied

and the desired result is obtained. Observe the exam-

ple in Fig. 9. Note that the resulting image may or

may not contain ﬁre features, even if one is occurring,

depending on the distance of the aircraft to the ﬁre’s

location and the orientation of the camera in relation

to it, in the case of bird’s eye perspective.

In order to disguise discontinuities created by this

technique, the Poisson blending method (P

erez et al.,

2003) was implemented. The results generated were

more realistic yet much more discrete. Figure 10

Figure 9: Example of applying the Cut-and-Paste technique

for a ﬁre polygon. The binary mask allows to select the pix-

els from the ﬁrst image belonging to the desired ﬁre region.

They are then superimposed on the original image.

portrays an image sample with a simple superimpo-

sition of the ﬁre polygon and the respective image

when mixed seamless cloning is applied. This solu-

tion combines the gradients of the original image with

those of the ﬁre polygon to form the blended region

of interest.

Figure 10: Sample image with a simple ﬁre polygon overlay

and a sample image subject to mixed seamless cloning.

At the same time, the fact that no smoke emerges

from the ﬁre polygon is odd. Looking for a solution

to recreate this behaviour, we noticed that ﬁres in con-

ﬁned indoor spaces are better documented as the num-

ber of variables to assess is smaller when in compari-

son to ﬁres in the open. In that type of closed environ-

ment, the energy released by ﬁres is characterised by

four steps: Incipient, Growth, Fully Developed and

Decay stages, as seen in Fig. 11 (Hartin, 2008).

Figure 11: Stages of ﬁre development in a compartment

(Hartin, 2008).

For lack of better judgement, the method we

implemented for calculating the size of the smoke

columns follows a na

ıve approach where the energy

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

278

release, in each iteration of the ﬁre front, is directly

proportional to the size of its smoke columns. We de-

cided that they would initially grow at a constant rate

for two iterations, stay in the fully-developed phase

for one iteration, and decay for over the last four it-

erations of the simulation, from which the ﬁre is ex-

tinguished. Using an exclusively manual procedure,

smoke vectors collected by web scraping were re-

sized and blended into the image with the ﬁre fea-

tures. These were displayed in varying shades of grey

and assuming the direction of the wind obtained di-

rectly from the simulator. Figure 12 presents a se-

quence of tiles representing a terrain with ﬁre and the

respective smoke progression.

Figure 12: Example of smoke progression during a simu-

lated ﬁre. The direction of the smoke columns is that of

the wind provided by the simulator, while their sizes vary

according to the respective ﬁre stage.

Each tile represents an iteration of the generation

pipeline, meaning that the speed of smoke simulation

is directly proportional to the the cadence of requests

made to the API. For a certain tile the smoke columns

are also considered to be subject to the same wind

intensity and direction. This is an approximation to

what happens in reality because, as is well known,

ﬁres can tamper with local environmental conditions,

and it is hardly viable to take into account the actual

wind behaviour for this speciﬁc scenario.

4.1 Experiments using Coherent

Semantic Attention

The image painting strategy indicates, although there

is yet no scientiﬁc evidence, that there should be a

possibility of ﬁlling the unknown regions of a non-

ﬁre image with ﬂame and smoke features, as a human

painter would. This assumption led to experiments

that produced detailed textures and whose insertion

generated little to no discontinuity. It was also ob-

served that the application of red ﬁlters on the images

directly inﬂuences the amount of ﬂames produced,

which allows to increase their variability. Figure 13

reveals some of the results. The top row depicts the

original aerial images layered with a red ﬁlter and the

respective ﬁre polygons, in gray, while the bottom

row depicts the same polygons ﬁlled with the ﬂames

and smoke features.

Figure 13: Samples of ﬁre generation using CSA for images

with overlay red ﬁlters and random polygonal masks. The

respective results are portrayed in the bottom row.

Although appealing, this method was disregarded

because the generation for large masks is unfeasible.

More speciﬁcally, in this scenario, it is hardly possi-

ble to recreate the key features of the original images,

which end up being stripped from their own context.

5 VALIDATION AND RESULTS

The generated ﬁre images were evaluated according

to their degree of realism, quantitatively using qual-

ity and image similarity metrics, and qualitatively by

subjective and manual analysis. The adaptation of a

model for ﬁre detection, proven to be good in the real

domain, also allows assessing the good performance

of the synthetic generation model.

5.1 Synthetic Image Quality

The Fr

echet Inception Distance (FID) allows assess-

ing the degree of quality and similarity between the

images created by the generation model and the real

ﬁre images (Heusel et al., 2017). It is based on the

activations of the penultimate layer of the pre-trained

InceptionV3 network and it evaluates the distance be-

tween the Gaussian distributions of the two sets of

images. The FID scores, depicted in Fig. 14 for the

last ﬁve training iterations, depict a minimum value

of 42.0 which is part of a decreasing trend. Since the

lower the FID score, the better the image quality, we

may conclude that the generator is producing increas-

Aerial Fire Image Synthesis and Detection

279

ingly more realistic images and with fewer artefacts

(noise, blur and distortions).

Figure 14: The Fr

echet Inception Distance (FID) scores for

the last ﬁve training iterations of the ﬁre generator depict a

declining trend.

Nevertheless, the most used metric to evaluate

the results of the generative networks still relies on

the subjective opinion of individuals, comparing real

samples with ﬁctitious ones, in Preference Judgement

Surveys (Borji, 2019). For that purpose we developed

a survey with a medium set of 40 generated ﬁre im-

ages, carefully collected to hold 10 samples of forest

and urban scenarios, both of top-down and bird’s eye

perspectives. We afterwards asked the respondents

to indicate their preference in relation to the scenario

and image perspective, and requested the identiﬁca-

tion of generation anomalies. Considering the rela-

tively small population size, and for results to be ro-

bust and more representative of reality, all image sam-

ples were chosen at random for each of the previous

questions. The exception to this lies on the ﬁnal ques-

tion, in which we decided to test the users’ perception

of reality. It consisted on the identiﬁcation of gener-

ated samples when these were presented next to a real

one, in an environment of similar conﬁguration. The

images we speciﬁcally selected for this case study are

depicted in Fig. 15.

The survey was disseminated by the community

and 122 responses were obtained. On a scale of 1 to

5, the subjects considered the images to have a me-

dian value (Mdn) of 4 when it comes to their degree

of realism, with ratings presenting a mean value (M)

of 3.44 and a standard deviation (SD) of 1.16. The

generated image of forest ﬁre approximated its real

counterpart well but it was identiﬁed without much

effort, managing to mislead over 12.3% of the pop-

ulation. The corresponding question targeting urban

images deceived people similarly, with a 11.5% fail-

ure rate, but falls a bit short in its approximation to

the real counterpart, exhibiting a lower M and higher

SD. The population diverged more while providing

their opinion on images of urban scenarios, which is

Figure 15: Two pairs of generated and real ﬁre image sam-

ples, respectively. The top row depicts a forest scenario,

while the bottom row depicts an urban scenario.

explained by a greater instability of the network for

that same type of environment. Table 1 presents a

summary of the statistics previously enumerated.

Table 1: Degree of realism of the generated ﬁre images,

evaluated on a scale of 1 to 5. Mdn, M and SD represent the

median, the mean and the standard deviation values.

Mdn M SD

Degree of realism (Overall) 4 3.44 1.159

Approximation (Forest) 4 3.52 0.893

Approximation (Urban) 3 2.98 1.064

We conclude that the images are of good quality

and there is a particular preference for the ones gen-

erated for forest scenarios, which would be expected.

On the other hand, the preference for bird’s eye per-

spective over top-down perspective is not notorious.

Interestingly, the respondents were only 3.6% more

prone to select forestry image, of both top-down and

bird’s eye perspectives, than to select urban imagery.

At the same time, they were also just 3% more con-

ﬁdent that the images of top-down perspective were

more realistic than the ones of bird’s eye perspec-

tive. The difﬁculty respondents have had in making

up their mind leads one to believe that, contrary to

what one might have thought, the images from dif-

ferent scenarios and perspectives present a similar de-

gree of quality and realism.

The generation anomalies identiﬁed concern

mainly the lack of texture detail and distortions

caused by exacerbated saturation levels or by the pres-

ence of artefacts such as noise. Also, some subjects

expected a higher diversity of ﬂames and smoke fea-

tures and a higher image resolution, which they con-

sidered to have negatively impacted their assessment.

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

280

5.2 Fire Detection Performance

It was observed that the classiﬁcation model main-

tains its original performance when facing the syn-

thetic ﬁre images. To test it, 876 aerial ﬁre images

were generated, both of urban and forest environ-

ments in the greater metropolitan areas of 4 cities,

from Europe and California, in the United States.

The model reported an accuracy of 90.2% and a

false positive rate of only 4.5%. Precision and recall

tend to be inversely proportional to one another, that

is, the increase of one usually implies the decrease

of the other. This phenomenon occurred in this case,

where for ﬁre images the precision (94.9%) is higher

than the recall (84,9%), trend that is reversed for non-

ﬁre images, where the precision (86.4%) is smaller

than the recall (95.4%). The F1-score presents the

harmonic mean between precision and recall and is

useful to evaluate models when there is some imbal-

ance in the class distribution. For the present case the

F1-score is highly valued too, at 91.5%.

Another metric is the Area Under Curve (AUC),

which measures the ability of the model to distinguish

between the positive and the negative class based on

the Receiver Operator Characteristic (ROC) curve,

plotted using the true positive and false positive rates

at various thresholds. The higher the value of AUC,

comprised between 0 and 1, the better the classiﬁer is

able to distinguish between class samples and the bet-

ter its predictive power. The current classiﬁer is close

to perfect at identifying the synthetic ﬁre class, with

an AUC of 94.8%.

Table 2 summarizes the results for the previously

mentioned metrics.

Table 2: Classiﬁcation metrics of the ﬁre detector.

Class Accuracy Precision Recall F1-score AUC

NoFire

0.902

0.864 0.954 0.907

0.948

Fire 0.949 0.849 0.896

Mean 0.902 0.916 0.915 0.915 0.948

The confusion matrix of Fig. 16 shows, however,

some discrepancy between the number of false nega-

tives and false positives detected by the model. The

ﬁrst represent more than 15% of predictions, while

the last only account for 4.5%.

After carefully analysing the constitution of each

set, one comes to the conclusion that false positives

are found to only contain images of forests, most

of lower quality, as acquired from the external tiles

provider, and being of bird’s eye perspective. That

may denote some overﬁtting of the model. Observe

two false positive examples in Fig. 17.

On the other hand, false negative samples are

mostly comprised of images of forestry with under-

Figure 16: Confusion matrix of the ﬁre detector on images

of the validation dataset. False negatives and false positives

account for 15% and 4.5% of predictions, respectively.

Figure 17: False positive ﬁre samples.

growth and scattered vegetation, in which the gener-

ator tends to create undesired noise and blurring arti-

facts, vestigial columns of smoke but fails to generate

ﬂames. Some ﬁre samples of urban scenarios suffer-

ing from color distortions are also wrongly classiﬁed

by the model. Figure 18 displays one image sample

representative of each case.

Figure 18: False negative ﬁre samples.

The fact that the perspective of training images

was variable brings some entropy to the ability of both

the image generation and classiﬁcation models. Fire

detection in urban scenarios tends to portray worse re-

sults since the training of the generator was primarily

focused on forest environments.

5.3 Performance Assessment

The processing time of the generation and detection

pipeline is, at this stage, inherently dependent on the

Aerial Fire Image Synthesis and Detection

281

generation process which, in turn, has a strong con-

nection to the Bing Maps API. The servers’ response

time strongly affects the rate at which tiles can be pro-

vided to vehicles on a mission and, as a consequence,

it has not been possible to simulate a camera as in

a real-time scenario of 25 FPS. This would require

a constant low latency connection to the Bing Maps

servers and less restrictive measures to enable a larger

number of requests for a given time interval.

We registered pipeline iterations of over 10 vehi-

cle simulations using a camera of bird’s eye perspec-

tive. With this conﬁguration, the Camera Simulator

requests two tiles to the Bing Maps API, for each ve-

hicle’s position. One is the original aerial tile, the

other is similar but includes the ﬁre polygon drawn

on its appropriate location, should it exist on that im-

age. The respective requests are summarized in the

plots of Fig. 19. Note that the API was running with-

out GPU in order to better approximate the behaviour

of a machine with low computational resources.

Figure 19: The processing time for the generation and de-

tection requests reveal that the real-time bottleneck lies on

the generation procedure.

In this plot we observe that the API is able to re-

turn at least an aerial tile every second for 50% of the

collected request samples. The majority of these tiles

do not contain features of ﬁre, either because the vehi-

cle’s camera is not close enough to the burning area or

because its orientation does not allow it to capture that

region. This case, where the camera does not generate

synthetic ﬁre, constitutes the fastest response scenario

for the vehicle and it still comprises about a second of

tile fetching, rendering inadequate the realistic cam-

era simulation and limiting the camera to a maximum

of 1 FPS right from the start. In order to prevent over-

ﬂowing the external Fire Module API with pending

requests, especially when it comes to generating a se-

quence of ﬁre tiles, the time interval between requests

has been carefully set to 3 seconds.

The detection performance complies, on the other

hand, with a scenario closer to real-time, averaging

100 ms per Vehicle Agent request at the same ex-

periments, with as little as 30 ms of SD. Reducing

the generation bottleneck would, according to these

metrics, make it possible for the pipeline to run at

around 10 FPS in the simulation environment, which

is more acceptable. In reality, because a real aircraft

would not need to simulate its own camera, the de-

tector would be achieving over 50 FPS and consum-

ing no more than 300 KB of memory. Therefore, it

gathers all conditions necessary to run autonomously

aboard an embedded system of low memory and stor-

age resources (Kyrkou and Theocharides, 2019).

6 CONCLUSIONS AND FUTURE

WORK

The models that are currently the reference in what

concerns ﬁre detection highlight a widespread prob-

lem which has been affecting their performance, the

imbalance of classes in the training data, since there

is a very small number of ﬁre images, especially of an

aerial perspective.

The generation of features of ﬂames and smoke

in images of aerial perspective is performed for the

complete image, using a ResNet generator trained on

image-to-image translation using the CycleGAN ar-

chitecture. These are afterwards blended into the orig-

inal image using the Cut-and-Paste technique in order

to match them to the location of the burning region.

The similarity between generated and real images

was assessed using the Fr

echet Inception Distance.

The declining trend of this metric, during training, de-

noted a gradual improvement of the generator, which

produces images with increasingly less noise, blur

and distortion. In addition, a group of 122 respon-

dents to the conducted survey willingly provided their

subjective opinion to evaluate the generated images

qualitatively. These were considered of good quality,

with a median realism of 4 out of 5, and proved to ap-

proximate images of forests better than those of urban

environments, as initially intended and expected.

On the other hand, ERNet is a lightweight model

designed to perform disaster detections in real-time,

with good accuracy and low false positive rates on

UAVs and similar CPU-based machines. It was

adapted to perform binary classiﬁcation on the exis-

tence of ﬁre on the aerial images provided by the ve-

hicles of The Platform. Not only did the detector man-

age to process requests in under 100 ms, but reached

a high accuracy of 90.2% and conﬁrmed the very low

rates of false positives pledged by the original imple-

mentation, this time using generated images of ﬁre.

The false negatives accounted for 15.1% of cases and

corresponded mainly to images of sparsely vegetated

forest and urban scenarios with color distortions. The

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

282

false positives accounted for just 4.5% of predictions

and contained only images of forest, most of them of

low quality, which may evidence some overﬁtting of

the detection model. Yet, given its AUC of 94.8%,

we conclude that the model is able to identify very

well the generated images of synthetic ﬁre, further re-

inforcing the quality of the generator.

Integrated into the simulation platform, this mod-

ule raises a number of questions, in particular con-

cerning the generation procedure, because it is com-

putationally more expensive than detection. An equi-

librium was found to ensure its usability, but more can

be done to improve it.

The implemented Fire Module interoperates with

the external Bing Maps REST Services by means

of HTTP requests. This communication may suffer

from overheads, mostly because of the variable la-

tency with the respective servers, which may also be

overloaded and thus subject to longer response times.

To tackle this problem it is essential to reduce the

number of requests issued by creating caches to hold

tiles of frequently used routes or by acquiring tiles

of larger resolutions. The latter cover a larger surface

area which can be segmented in order to match the on-

board camera’s ﬁeld of view at the aircraft’s position.

Prefetching, a mechanism where tiles are retrieved in

advance according to the predeﬁned trajectory, could

also prove beneﬁcial.

The insertion of ﬁre into bird’s eye type of frames

is currently subject to an internal functionality of the

Bing Maps API which allows to perform the drawing

of polygons on demand to be thereafter manually ex-

tracted. This implies that every drawing on the bird’s

eye perspective corresponds to an additional request

to the external tiles provider, which is unfeasible. This

issue should be resolved and considered for all other

solutions that are subsequently integrated.

Since image generation proved to perform differ-

ently according to the environment, further develop-

ments could also separate the classiﬁcation task into

two speciﬁc models, training one of them on forestry

while the other is trained on urban scenarios.

The incorporated lightweight detector based on

ERNet portrays very promising results and opens up

the opportunity to generalise the pipeline concept to

other types of disturbances. It should help to identify,

for example, building collapses, ﬂoods or trafﬁc in-

cidents already targeted by the detector. This would

enable the comparison of different multi-vehicular ap-

proaches and help acquiring a deeper understanding

on which works best for each case. One could there-

fore invest in studying the catastrophic scenarios from

the air in order to deﬁne a sequence of priority actions

to be carried out by the formation of aircraft.

REFERENCES

Almeida, J. (2017). Simulation and Management of Envi-

ronmental Disturbances in Flight Simulator X. Mas-

ter’s thesis, University of Porto, Faculty of Engineer-

ing, Porto, Portugal.

Arcidiacono, C. (2018). An empirical study on syn-

thetic image generation techniques for object detec-

tors. Master’s thesis, KTH, School of Electrical En-

gineering and Computer Science (EECS), Stockholm,

Sweden.

Borji, A. (2019). Pros and cons of GAN evaluation mea-

sures. Computer Vision and Image Understanding,

179:41–65. DOI:10.1016/j.cviu.2018.10.009.

Damasceno, R. (2020). Co-Simulation Architecture for En-

vironmental Disturbances. Master’s thesis, University

of Porto, Faculty of Engineering, Porto, Portugal.

Dwibedi, D., Misra, I., and Hebert, M. (2017). Cut, Paste

and Learn: Surprisingly Easy Synthesis for Instance

Detection. In Proceedings of 2017 IEEE Interna-

tional Conference on Computer Vision (ICCV), pages

1310–1319, Venice, Italy. IEEE Computer Society.

DOI:10.1109/ICCV.2017.146.

El Harrouss, O., Almaadeed, N., Al-ma’adeed, S., and

Akbari, Y. (2020). Image Inpainting: A Re-

view. Neural Processing Letters, 51:2007–2028.

DOI:10.1007/s11063-019-10163-0.

Filippi, J. B., Bosseur, F., and Grandi, D. (2014). Fore-

Fire: open-source code for wildland ﬁre spread mod-

els, pages 275–282. Advances in Forest Fire Research.

Imprensa da Universidade de Coimbra, Coimbra, Por-

tugal. DOI:10.14195/978-989-26-0884-6 29.

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B.,

Warde-Farley, D., Ozair, S., Courville, A., and Ben-

gio, Y. (2014). Generative Adversarial Nets. In

Proceedings of The 27th International Conference on

Neural Information Processing Systems - Volume 2,

NIPS’14, page 2672–2680, Montreal, Canada. MIT

Press. DOI:10.1145/3422622.

Hartin, E. (2008). Fire Development and Fire Behavior In-

dicators. Technical report, Compartment Fire Behav-

ior Training.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep

Residual Learning for Image Recognition. In Pro-

ceedings of 2016 IEEE Conference on Computer Vi-

sion and Pattern Recognition (CVPR), pages 770–

778, Las Vegas, Nevada, USA. IEEE Computer So-

ciety. DOI:10.1109/CVPR.2016.90.

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and

Hochreiter, S. (2017). GANs Trained by a Two Time-

Scale Update Rule Converge to a Local Nash Equilib-

rium. In Proceedings of the 31st International Con-

ference on Neural Information Processing Systems,

NIPS’17, page 6629–6640, Long Beach, California,

USA. Curran Associates Inc.

Hollosi, J. and Ballagi, A. (2019). Training Neu-

ral Networks with Computer Generated Images.

In Proceedings of 2019 IEEE 15th International

Scientiﬁc Conference on Informatics, pages 155–

160, Poprad, Slovakia. IEEE Computer Society.

DOI:10.1109/Informatics47936.2019.9119273.

Aerial Fire Image Synthesis and Detection

283

Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger,

K. Q. (2017). Densely Connected Convolutional Net-

works. In Proceedings of 2017 IEEE Conference on

Computer Vision and Pattern Recognition (CVPR),

pages 2261–2269, Honolulu, Hawaii, USA. IEEE

Computer Society. DOI:10.1109/CVPR.2017.243.

Huang, X. and Belongie, S. (2017). Arbitrary Style

Transfer in Real-time with Adaptive Instance Nor-

malization. In Proceedings of 2017 IEEE Interna-

tional Conference on Computer Vision (ICCV), pages

1510–1519, Venice, Italy. IEEE Computer Society.

DOI:10.1109/ICCV.2017.167.

Kyrkou, C. and Theocharides, T. (2019). Deep-Learning-

Based Aerial Image Classiﬁcation for Emergency

Response Applications Using Unmanned Aerial Ve-

hicles. In Proceedings of 2019 IEEE/CVF Con-

ference on Computer Vision and Pattern Recogni-

tion Workshops (CVPRW), pages 517–525, Long

Beach, California, USA. IEEE Computer Society.

DOI:10.1109/CVPRW.2019.00077.

Liu, H., Jiang, B., Xiao, Y., and Yang, C. (2019).

Coherent Semantic Attention for Image Inpaint-

ing. In Proceedings of 2019 IEEE/CVF Interna-

tional Conference on Computer Vision (ICCV), pages

4169–4178, Seoul, Korea. IEEE Computer Society.

DOI:10.1109/ICCV.2019.00427.

Microsoft (2021). Bing Maps Licensing. Online: https://

www.microsoft.com/en-us/maps/licensing. (accessed

2021-02-11).

MographPlus (2018). Corona for 3ds Max — Ren-

dering Smoke, Fire and Explosions — Tutorial

#113. Online: https://www.youtube.com/watch?v=

DYTDNGqvPUw. (accessed 2021-11-23).

Park, M., Tran, D. Q., Jung, D., and Park, S.

(2020). Wildﬁre-Detection Method Using DenseNet

and CycleGAN Data Augmentation-Based Remote

Camera Imagery. Remote Sensing, 12(22):3715.

DOI:10.3390/rs12223715.

erez, P., Gangnet, M., and Blake, A. (2003). Poisson Im-

age Editing. ACM Transactions on Graphics - TOG,

22(3):313–318. DOI:10.1145/882262.882269.

PORDATA (2020). Forest ﬁres and burn area.

Online: https://www.pordata.pt/Europa/Incłndios+

ﬂorestais+e+rea+ardida-1374. (accessed 2021-01-

27).

Rother, C., Kolmogorov, V., and Blake, A. (2004).

”GrabCut”: Interactive Foreground Extrac-

tion Using Iterated Graph Cuts. ACM Trans-

actions on Graphics - TOG, 23(3):309–314.

DOI:10.1145/1015706.1015720.

Silva, D. C. (2011). Cooperative multi-robot missions : de-

velopment of a platform and a speciﬁcation language.

PhD thesis, University of Porto, Faculty of Engineer-

ing, Porto, Portugal.

Silva, T. (2017). A Short Introduction to Generative Ad-

versarial Networks. Online: https://sthalles.github.io/

intro-to-gans/. (accessed 2020-12-16).

Simonyan, K. and Zisserman, A. (2015). Very Deep Con-

volutional Networks for Large-Scale Image Recogni-

tion. In Proceedings of The 3rd International Confer-

ence on Learning Representations (ICLR), San Diego,

California, USA. arXiv: 1409.1556 [cs.CV].

Tripathi, S., Chandra, S., Agrawal, A., Tyagi, A., Rehg,

J. M., and Chari, V. (2019). Learning to Gener-

ate Synthetic Data via Compositing. In Proceed-

ings of 2019 IEEE/CVF Conference on Computer Vi-

sion and Pattern Recognition (CVPR), pages 461–

470, Long Beach, California, USA. IEEE Computer

Society. DOI:10.1109/CVPR.2019.00055.

Turco, M., Jerez, S., Augusto, S., Tar

ın-Carrasco, P., Ra-

tola, N., Jim

enez-Guerrero, P., and Trigo, R. M.

(2019). Climate drivers of the 2017 devastating

ﬁres in Portugal. Scientiﬁc Reports, 9:13886. DOI:

10.1038/s41598-019-50281-2.

WMO (2021). WMO Atlas of Mortality and Economic

Losses from Weather, Climate and Water Extremes

(1970–2019). Technical Report WMO- No. 1267,

World Meteorological Organization. ISBN: 978-92-

63-11267-5.

Wong, M. Z., Kunii, K., Baylis, M., Ong, W. H., Kroupa,

P., and Koller, S. (2019). Synthetic dataset gen-

eration for object-to-model deep learning in indus-

trial applications. PeerJ Computer Science, 5:e222.

DOI:10.7717/peerj-cs.222.

Zhu, J., Park, T., Isola, P., and Efros, A. A. (2017). Unpaired

Image-to-Image Translation Using Cycle-Consistent

Adversarial Networks. In Proceedings of 2017 IEEE

International Conference on Computer Vision (ICCV),

pages 2242–2251, Venice, Italy. IEEE Computer So-

ciety. DOI:10.1109/ICCV.2017.244.

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

284