ability of a neural network to extract the important
features and process them to estimate true values of
a predefined set of body measurements on the out-
put; (2) to deal with the insufficient amount of pub-
licly available data annotated with ground-truth body
measurements, we generated a large-scale synthetic
dataset of various body shapes in standard body pose,
using parametric human body model, along with cor-
responding point clouds, gray-scale and silhouette
images, skeleton data, and 16 annotated body mea-
surements; (3) to obtain the ground-truth for the 16
measurements on the body models, we established a
skeleton-guided annotation pipeline, which can eas-
ily be extended to compute more complex and task-
specific body dimensions, and finally, (4) we present
a method for an accurate automatic end-to-end hu-
man body measurements estimation from a single in-
put frame.
2 RELATED WORK
The anthropometric body measurements estimation is
an emerging problem in the context of various appli-
cations, such as garment manufacturing, ergonomics,
or surveillance. An automatic estimation of accu-
rate body measures would prevent us from having to
manually tape measure the human bodies. Also, the
automated pipeline would bring consistency in body
measuring, which is often hard to maintain when tape
measuring different human subjects. Aside from nat-
ural human error, or inaccuracies caused by tape mea-
suring, there is an ambiguity across various different
body measuring standards.
There have been numerous algorithmic strategies
presented to tackle the task of human body measure-
ments estimation over the years (Guill
´
o et al., 2020;
Anisuzzaman et al., 2019; Ashmawi et al., 2019;
Song et al., 2017; Dao et al., 2014; Tsoli et al., 2014;
Li et al., 2013). However, they often proved not to
satisfy the accuracy of the estimations, nor meet the
desired efficiency, or computational and complexity
requirements. One of the main problems when pro-
cessing data representing human body is the irregu-
larity and complex structure of the human body sur-
face. In theory, there are no predefined vertices on
the surface of human body to guide the processing;
as it is when analyzing standard 3D objects with cor-
ners and edges. Considering the theoretical and prac-
tical issues with the algorithmic approaches, in many
environments, they have been replaced with machine
learning techniques, such as random forests (Xiaohui
et al., 2018) or neural networks (Yan and K
¨
am
¨
ar
¨
ainen,
2021; Wang et al., 2019).
In order to sufficiently train a machine learning
model, a large amount of human body data annotated
with ground truth body measurements is essential.
In general, there are no such large-scale benchmark
datasets publicly available for research purposes at the
moment. The main reason for this is the exhausting
process of manual tape measuring of real human bod-
ies. Therefore, most researchers have made use of the
synthetic data instead of the real human data. Tejeda
et al. (Gonzalez-Tejeda and Mayer, 2019) focused on
the annotation process of three basic body measure-
ments: chest, waist, and pelvis circumference on 3D
human body model. Our annotation method presented
in this paper is inspired by their approach, while we
optimized and adjusted the conditions in computa-
tion of the particular measurements, and extended the
set of measurements by thirteen additional body mea-
sures.
2.1 1D Statistical Input Data
Regarding the human body measurements estimation,
several existing approaches formulate the task as esti-
mating an extended list of advanced body measures
from a set of predefined basic body measurements
on the input, thus having the 1D statistical input
data (Wang et al., 2019; Liu et al., 2017). Usually,
the estimation is based on an end-to-end learning neu-
ral network, mapping from the input easy-to-measure
body dimensions to the detailed body dimensions on
the output. However, these methods still require man-
ual tape measuring of the few basic attributes, which
may be inconvenient in certain application scenarios.
2.2 Image Input Data
Methods inferring from 2D input images were pro-
posed to estimate the body measurements from visual
data to avoid the need for manual measuring in de-
ployment. Most frequently, the input data are in a
form of RGB images (Yan and K
¨
am
¨
ar
¨
ainen, 2021;
Anisuzzaman et al., 2019; Shigeki et al., 2018), al-
though the three color channels may not be very
beneficial in context of this particular task, at the
cost of processing the three-channeled data. Thus,
several other approaches settled for gray-scale im-
ages (Tejeda and Mayer, 2021) as input data, while
achieving competitive results.
Binary silhouette images of a human body were
also used in some of the strategies (Gonzalez-Tejeda
and Mayer, 2019; Song et al., 2017), suggesting the
contours of the body shape are the most important fea-
ture for the stated task.
VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications
538