waterways navigation zones with high risk of colli-
sion and with the presence of several maritime marks
required for mooring assistance and priorities indica-
tion. To address this issue, we conducted measure-
ment campaigns to collect our custom SSAVE dataset
tailored to the training of the relevant classes that we
need to detect and keep track of, aboard a vessel. Our
main contribution is a new diverse training dataset
addressed for waterways maritime environment cap-
tured by cameras mounted in a drone and a moving
barge in realistic weather and traffic conditions. We
investigate also the training of the YOLOv5 model for
the detection of seven different classes (ship, barge,
cutter, yelow mark, red mark, line mark and other ob-
stacle) for situational awareness purposes. This paper
is organized as follows: Sect.2 describes the collected
SSAVE dataset and the classes that we considered for
the semantic annotation; in Sect.3 distortion correc-
tion of a part of the dataset is presented. In Sect.4,
the results of YOLOv5 detector training are presented
and evaluated. And finally Sect.5 presents inference
results of combining the detection with a Deep Sort
pre-trained tracker; also it gives conclusions and dis-
cusses possible further research perspectives.
2 MARITIME WATERWAY
DATASET: SSAVE
Maritime image datasets are essential for the training
of neural networks performing object detection and
tracking. They should present an important variety
of weather and lighting conditions; also, the images
should be taken with different angles of view.
2.1 SSAVE Dataset
The SSAVE dataset was realized in collaboration
with our industrial partners (Deme and Tresco) in the
project and with the Belgian naval base at Zeebrugge.
We collected thousands of images with high defini-
tion (1080x1920 pixels) in realistic conditions of traf-
fic and in different weather conditions such as sunny,
cloudy and rainy days. We used AXIS Q3515-LV
Network IP cameras mounted on a navigating barge in
addition to GoPro Hero8 cameras placed on a drone
and also attached to a navigating barge. Recording
images from a drone was essential to detect the nav-
igating barge itself and also to have a better view of
the line marks. We hand-picked 827 representative
images out of the gathered footage to construct our
dataset. The variety in terms of luminosity and dis-
tances separating the barges to the obstacles in addi-
tion to the presence of objects of interest with close
proportions have been considered in the selection of
the images constructing the dataset. The dataset is
composed of 78 images taken from a GoPro Hero 8
camera mounted on a drone, 175 images taken from
a similar camera mounted on a navigating barge and
574 images taken from an AXIS Q3515-LV Network
IP camera attached to a navigating barge. Indeed, the
dataset is tailored to the scenarios addressed by the
study corresponding to the navigation of a barge in
a waterway and mooring into a fixed platform called
a cutter. We annotated manually each image in the
dataset using CVAT image annotator. Seven classes
were considered in the annotation (ship, barge, cutter,
yellow mark, red mark, line mark and other obstacle).
Figure 1 presents an example of annotated images.
For instance, we need to detect and track all ships sur-
rounding the navigating barge in addition to all types
of obstacles that can present collision threats. Naviga-
tion marks need to be detected and classified in order
to respect priorities and interdiction areas while navi-
gating and mooring, also the navigation marks can be
subject to collisions. Finally the cutter should be de-
tected and kept in track in order to assist the barge in
mooring properly. We did not proceed to augmenta-
tion techniques to increase the dataset size.
2.2 Preprocessing of Distorted Images
Some of the collected images (about 100) presented
radial distortion. To remedy to this error, we first cal-
ibrated the camera to identify its parameters using the
Matlab camera calibrating tool and a checkerboard
pattern. Then, we used the UndistortImage function
of Matlab to rectify the lens distortion and therefore
convert distorted images into undistorted ones. Fig-
ure 2 presents an example of an image before and af-
ter distortion correction.
3 OBJECT DETECTION AND
TRACKING METHODS
This section presents a brief description of the deep
learning models used for the detection and tracking
tasks. Also, the evaluation metrics used for perfor-
mance measurement are defined.
3.1 YOLOv5 Detection
Object detection is one of the fundamental computer
vision problems that enables semantic video analy-
sis and image understanding. Its concept consists in
identifying precisely in an image the presence and the
VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications
644