2 LITERATURE REVIEW
Soccer analytic companies have only recently started
to analyse so-called big data (e.g., high-resolution
video, tracking player movement and possession in-
formation). At the same time, only recently have
major advances been made in machine learning, pro-
ducing techniques that can handle these new high-
dimensional data sets. The amount of data available in
soccer has increased with different techniques to col-
lect a large amount of data such as sensors, GPS, and
computer vision algorithms. This helps the use of ma-
chine learning in soccer in its various areas such as in
recruiting and analysing the performance of players,
in selling tickets bringing fans closer to their club, and
also in helping decision making that affects an entire
area of a club.
2.1 Machine Learning in Soccer
Machine learning is the field of study that focus on
how computers learn to perform a task without be-
ing explicitly programmed to do it. It can be defined
as a set of methods that can automatically detect pat-
terns in data to predict future data or to perform other
types of decision making (Murphy, 2012). Machine
learning is beginning to play an essential role within
the following branches of computing: data migra-
tion, hard-to-program applications, and custom soft-
ware applications (Mitchell, 1997). Machine learn-
ing algorithms generally fall into two paradigms: su-
pervised learning and unsupervised learning (Stimp-
son and Cummings, 2014). In supervised learning a
“teacher” is assumed to be present, where the correct
answers are provided for each situation. Supervised
learning techniques build predictive models that learn
from a large number of training examples, where each
training example has a label that indicates its truth
output (Zhou, 2017) – a pair consisting of the input
object and an output label value that belongs to a class
or is a continuous value.
Machine learning, and AI in general, have been
more and more used in the world of soccer not only in
performance or tactical analysis, but also in the med-
ical and marketing departments.
One such example outside the tactical field is in-
jury prevention. For example, the study conducted by
Rommers et al. (Rommers et al., 2020) who during
one season tried to predict the injuries of 734 players
aged between 10 and 15 years old from seven Bel-
gian academies. At the beginning of the season a bat-
tery of tests were performed to evaluate motor coor-
dination and physical fitness and characteristics (e.g.,
height, weight, strength, and flexibility). Based on
these characteristics, the machine learning algorithm
was able to predict injuries and distinguish between
serious and light injuries with high accuracy. The ap-
plication of this type of algorithms also helps coaches
in decision making during the game, such as knowing
the physical condition of a player and whether or not
he should be substituted.
Another example of the application of machine
learning in soccer is in analysing player performance.
Jamil et al. (Jamil et al., 2021) applied several ma-
chine learning algorithms (Logistic Regression, Gra-
dient Boosting, and Random Forest) to classify the
performance of professional goalkeepers aiming to
distinguish an elite goalkeeper from a sub-elite goal-
keeper. The conclusions drawn in this study where
that all elite goalkeepers shared the same common
characteristics: short distribution, successfully pass-
ing and receiving the ball, and not conceding goals.
This study suggested that it is the goalkeeper’s skill
with his feet that distinguishes elite goalkeepers from
the sub-elite.
Another example in the area of performance anal-
ysis is the work of Pappalardo et al. (Pappalardo et al.,
2019) through a simulator recommendation. The
work implemented PlayeRank, a data-driven frame-
work that offers a principled multi-dimensional and
role-aware evaluation of the performance of soccer
players.
2.2 Match Data and Annotation
Annotations in soccer are an important tool to obtain
data from a match. The analysis of soccer matches re-
lies on the annotation of both individual player’s ac-
tions (e.g., passes and shoots), athletic performance
and team events (e.g., substitutions). Consequently,
annotating soccer events at a fine-grained level is an
expensive and error-prone task (Barra et al., 2021).
On the other hand, positional data is usually ob-
tained using automated or semi-automated tools that
rely on devices such as GPS receivers, cameras and
computer vision. One of the more interesting oppor-
tunities provided by the availability of position track-
ing data in soccer is the analysis of tactical behaviour.
Tactical behaviour is an important determinant of per-
formance in team sports like soccer, and refers to how
a team manages its spatial positioning over time to
achieve a shared goal.
3 MATERIALS
The material used in this paper is a database cor-
responding to annotation and positional data of 13
The Elusive Features of Success in Soccer Passes: A Machine Learning Perspective
111