Automated Human Movement Segmentation by Means of Human Pose

Estimation in RGB-D Videos for Climbing Motion Analysis

Raul Beltr

an B.

, Julia Richter

and Ulrich Heinkel

Professorship Circuit and System Design, Chemnitz University of Technology,

Reichenhainer Straße 70, 09126 Chemnitz, Germany

Keywords:

Climbing Motion Analysis, Movement Segmentation, Human Pose Estimation, Video Analysis.

Abstract:

The individual movement characterization of the human body parts is a fundamental task for the study of dif-

ferent activities executed by a person. Changes in position, speed and frequency of the different limbs reveal

the kind of activity and allow us to estimate whether an action is well performed or not. Part of this character-

ization consists of establishing when the action begins and ends, but it is a difﬁcult process when attempted by

purely optical means since the subject’s pose in the image must ﬁrst be extracted before proceeding with the

movement variables identiﬁcation. Human motion analysis has been approached in multiple studies through

methods ranging from stochastic to artiﬁcial intelligence prediction, and more recently the latest research has

been extended to the sport climbing employing the centre-of-mass analysis. In this paper, we present a method

to identify the beginning and end of the movements of human body parts, through the analysis of kinematic

variables obtained from RGB-D videos, with the aim of motion analysis in climbing. Application tests with

OpenPose, PoseNet and Vision are presented to determine the optimal framework for human pose estimation

in this sports scenario, and ﬁnally, the proposed method is validated to segment the movements of a climber

on the climbing wall.

1 INTRODUCTION

With the increasing accessibility to devices for

recording and analysing people and objects in the 3-D

space, through image processing and artiﬁcial intelli-

gence (AI), every day more products appear that pro-

vide us with real-time information about our activi-

ties. An example of this is the video processing tech-

nology in real-time applied to sport, which makes it

possible to give online feedback to the athletes by

simply recording exercise sequences on their smart-

phones and then analysing them on the spot using an

application. In this type of application, a fundamental

process consists of extracting the human ﬁgure, deter-

mining the pose, and ﬁnally characterizing the move-

ment. Each of these phases is a matter of research,

which has either been approached individually (Xi-

aohui et al., 2018; Khuangga and Widyantoro, 2018;

Zheng et al., 2020) or jointly using AI (Fan et al.,

2017; Cao et al., 2017; Papandreou et al., 2018).

https://orcid.org/0000-0001-6612-3212

https://orcid.org/0000-0001-7313-3013

https://orcid.org/0000-0002-0729-6030

Human Pose Estimation (HPE) is a trending solu-

tion that AI offers to determine the position and ori-

entation of a person’s body in a given image. While

there is already an acceptable level of precision in 2-D

pose estimation, in many scenarios, the 3-D case still

requires more work to produce accurate models with

data fusion techniques, which is a challenging task. In

this area, sport climbing draws attention not only be-

cause of the widespread use it has had in recent years,

but also due to the challenges it implies for the recog-

nition of human postures.

The characterization or classiﬁcation of human

movement according to kinematic variables such as

displacement and distance, velocity and speed, accel-

eration, and time, requires segmenting the motion ob-

servation sequences into smaller components, called

motion primitives. It is a principal task to describe

or analyse the execution of human activities, to fa-

cilitate the identiﬁcation, modelling and learning of

movement (Lin et al., 2016). The climbing action is

divided into phases, usually composed of movements,

in which these can be segmented. e.g. Firstly, one

hand reaches for a hold, then the feet are placed, and

ﬁnally, the climber stands up to grab the next grip

366

Beltrán B., R., Richter, J. and Heinkel, U.

Automated Human Movement Segmentation by Means of Human Pose Estimation in RGB-D Videos for Climbing Motion Analysis.

DOI: 10.5220/0010817300003124

In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 5: VISAPP, pages

366-373

ISBN: 978-989-758-555-5; ISSN: 2184-4321

with the other hand. Then, it is especially important to

determine when each of these phases starts and ends

to analyse individual movements, adding that speciﬁc

movement errors can occur in each of these phases.

After the climber has reached the new hold and re-

placed the feet, for example, this very arm has to be

kept straight to save energy within the same phase

of the climbing action. To evaluate and determine if

there is error, the segments of the whole climbing se-

quence have to be obtained.

Our research has as its main objective the move-

ment segmentation for the hands, feet, and waist us-

ing the 3-D skeleton joints of a climber recorded in

an RGB-D video through an optical device. Thus, a

mechanism is provided to model individual climbing

movements using the estimated pose in 2-D images

and its projection in 3-D using the point cloud deliv-

ered by the device. Our contributions are included in

the ﬁrst phases of the climbing analysis, namely in

the selection of a suitable HPE framework and the ap-

plication of techniques for segmentation of the move-

ment primitives.

The paper is structured as follows: Section 2

presents extant works in the research areas related to

the present document. Thereupon, Section 3 explains

the methods to carry out the data collection, model

construction, and information analysis. Next, in sec-

tion 4, the results are exposed and discussed. Finally,

conclusions of the overall work are presented and fu-

ture work in the climbing analysis is outlined.

2 RELATED WORKS

This paper focuses on the ﬁeld of climbing motion

analysis and is linked to the segmentation of human

motion primitives. Recent work in these two research

areas is presented below.

2.1 Motion Primitives Segmentation

In extant studies, human motion and human posture

has been commonly modelled by means of dynamic

systems and approximations with stochastic methods

to carry out temporal and dimensional segmentation

of the body parts associated with the displacement

(Kuli

c et al., 2011). Other works in the same di-

rection as Meier et al. (2011) have reformulated the

problem from the perspective of trajectory recogni-

tion in parameterized libraries of motion primitives,

an approach that is valid for the study of speciﬁc lo-

comotion activities. With the rise of AI in the last

decade, projects of movement primitives segmenta-

tion have emerged (Lin et al., 2016; V

ogele et al.,

2014), but as proposed by Lin et al. (2016), from

the deﬁnition of what a segment is to how the data

is collected, make each solution to the problem have

an application to speciﬁc requirements. These stud-

ies, like others (Jansen et al., 2019; Aoki et al., 2016),

have used inertial measurement unit (IMU) sensors

attached to the joints of the body to collect the posi-

tion and velocity data of the limbs, obtaining consis-

tent signals that facilitate the analysis. Cutting-edge

investigations (Zago et al., 2020; Colyer et al., 2018)

using marker-less sensors, have been facing problems

such as body parts occlusion and the quality of the

gathered data, requiring multiple sensors at different

viewpoints to improve data quality, or data virtualiza-

tion to predict unknown poses with predeﬁned kine-

matic models. There are also investigations (McCay

et al., 2020) in this sense where the posture analy-

sis is carried out from RGB-D images, to train neural

networks and translate them into a classiﬁcation prob-

lem.

2.2 Climbing Motion Analysis

In the ﬁeld of sport climbing, there are recent studies

to analyse the trajectory of the climber’s centre-of-

mass (CoM) through optical devices, such as Richter

et al. (2020a), where information on the ﬂuency,

force, and distance to the wall is acquired to pro-

vide the climber with information to prevent possi-

ble injuries from a therapeutic point of view. Richter

et al. (2020b) moreover provide a profound survey

on climbing motion analysis. Cha et al. (2015) anal-

yse postures and movements also employing optical

devices, but with an orientation to the construction

of 3-D graphic animations. Others like Seifert et al.

(2014), using IMUs attached to the climber body, at-

tempt the recognition of climbing patterns using clus-

ter analysis to process the position of the limbs and

waist of the subject. Nevertheless, these studies that

involve wearables are dedicated to laboratory analy-

sis, since they have the difﬁculty to be transferred to

applications of daily use due to the cost of the im-

plements (Jansen et al., 2019) and how cumbersome

these accessories can be for climbers.

3 METHODS

In this section we ﬁrst introduce a segmentation con-

cept for our climbing scenario, after which we pro-

ceed with the explanation of the steps taken to collect,

process and analyse the information.

Automated Human Movement Segmentation by Means of Human Pose Estimation in RGB-D Videos for Climbing Motion Analysis

367

3.1 Segment Deﬁnition

As in other sports, climbing movements aim to reduce

effort and improve performance, seeking to solve a

task while saving effort. Climbing is an acyclic sport

with three phases: preparation, continue-reaching,

and stabilization, usually with a combination of

movements in a single phase (Winter, 2012). The pe-

riod in which each of these movements is executed is

what we deﬁne here as a segment. Figure 1 shows an

example of a sequence of climber movements, where

the speed changes indicate when the climbing actions

begin and end.

Figure 1: Schemes of the temporal segmentation of climb-

ing movements.

For movement analysis, we took ﬁve target joints

from the skeleton data: the climber’s wrists and an-

kles, plus the hip. For each joint an independent dis-

crete signal was constructed with the measured veloc-

ity at each frame. Thus, the movement of one hand to

the next grip, the rearrangement of the feet or the hip’s

displacement when standing up on the supporting leg,

for example, can be identiﬁed as peaks in the speed

signal. Considering that in the obtained signal, detec-

tions of interest consist of several consecutive peaks,

the analysis must use the signal envelope or its cumu-

lative value to ﬁnd the local maximums. To rule out

small peaks due to jittering in the skeleton joints lo-

calization, we decided to use the cumulative value of

the signal where sustained slopes can be seen when a

long movement is executed.

3.2 Movement Segmentation Procedure

Figure 2 depicts the overall process followed to

achieve the movement segmentation through a 3-D

optical device, which is described in detail in the fol-

lowing.

Figure 2: Block diagram of the movement segmentation al-

gorithm.

3.2.1 RGB-D Video Recording

Our study was conducted on RGB-D videos captured

with two different devices. Initially, an Intel Re-

alSense D435 camera was used, with which video

samples were taken at 4 m from the climbing wall

with a resolution of 848×480 pixels at 30 fps, obtain-

ing a density in the point cloud of one point per pixel.

Subsequently, an iPad Pro 12.9-inch 4th Generation

was used to record videos of 1440×1920 pixels at 60

fps, at 4 m and 6 m from the wall, with a density of

one point per each 8,62 and 6,12 pixels respectively.

The distances to the climbing wall depended on the

sensor used, considering that the entire wall should ﬁt

within its angle of view.

3.2.2 2-D Human Pose Estimation

The pose detection in the RealSense (RS) videos was

carried out by means of OpenPose (Cao et al., 2019),

a real-time multi-person detection library capable of

jointly detecting human body, face, and foot key-

points. For the iPad case, the PoseNet (Papandreou

et al., 2018) framework was tested ﬁrst with a Ma-

chine Learning (ML) model developed for iOS in

TensorFlow Lite ; however, better results were ob-

tained when using the Vision framework for HPE

built into the device SDK provided by Apple Inc.

In the skeleton model obtained, ﬂuctuations be-

tween good and bad joint detections translate into

a high rate of jittering in the position of the recog-

nized body joints. In our work, we reduced the rate

of this jittering by applying an implementation of

the Savitzky-Golay ﬁlter algorithm (Savitzky and Go-

lay, 1964), whose principle is the calculation of local

VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications

368

polynomial regression to determine the new value of

each non-conforming point.

3.2.3 3-D Model Construction

The estimation of the third coordinate for the 3-D

skeleton joints was performed in the post-processing

phase, using the collected data from the device and

the Point Cloud Library (PCL).

The RS-D435 device includes an active infrared

(IR) stereo vision sensor to capture the depth of the

scene, producing point clouds in a modiﬁed Rosbag

ﬁle accessible through the camera’s SDK . In the

iPad’s case, the device uses a LiDAR scanner that per-

forms depth-sensing with the help of its pro cameras,

motion sensors, and the GPU, such that the more it

scans an area, the more details are resolved. The iOS

SDK allows communication through Shaders with the

GPU memory and thus to control the delivery den-

sity of the point cloud in each frame, to the detri-

ment of the available RAM; for this reason, a thresh-

old is introduced between the duration of the video

and the desired density of the point cloud. We choose

this threshold based on the distance to the climbing

wall, carrying out tests from 4 m and 6 m as shown in

Table 1.

Table 1: Values for the suitable point cloud size. Low val-

ues for iPad come from the threshold between video length,

PCL persistence time, and available memory on the device;

having a maximum of 49.192 depth points with an individ-

ual conﬁdence level.

Determining the depth of the body limbs presents

a problem when the point cloud density is low, then

the distance to the elbows, hands, knees, and feet of-

ten coincides with the climbing wall. As a solution,

we use the Kalman ﬁlter (Kalman, 1960) to predict

the correct distance and thus reduce the jittering pro-

duced in the z-coordinate of the skeleton joints.

3.2.4 Joint Signal Construction

For the analysis, the joint’s position along the entire

climbing route is recorded independently of the other

joints. Hence, the velocity and acceleration at the i-th

frame of the video are given by:

= f

− x

i−1

)

+ (y

− y

i−1

)

+ (z

− z

i−1

)

(1)

− v

xi−1

− x

i−1

− v

yi−1

− y

i−1

− v

zi−1

− z

i−1

(2)

where f is the sampling frequency depending on the

recording device, 30 Hz or 60 Hz. The series of n val-

ues for all frames of the video sequence constitute the

discrete velocity and acceleration signals that model

the movement of the climber’s limbs. As shown in

equation 3, to reduce the noise inﬂuence and facilitate

the detection of the joints state, a low-pass moving

average ﬁlter was applied on v, with a period of f /3

samples.

V [i] =

M−1

∑

j=1

v[i + j], with M = f /3 (3)

As preconditions were deﬁned: i) The start and

end of a motion segment is the moment when the

climber is at rest, the hands gripping the holds and

the feet ﬁrm on the grips. ii) The minimum window

for a detection movement is 1/3 s ( f /3), considering

that below this limit, climbing movements occur be-

tween close grips and are too short for our analysis.

iii) The study is conducted on the climber’s extremi-

ties and their CoM, so the joints involved are the sub-

ject’s wrists, ankles and hips. iv) In the 2-D case, the

z-coordinate is zero.

3.2.5 The Z-Scoring Algorithm

The peaks in the signal were located employing an

implementation of the z-score algorithm, a procedure

based on the principle of dispersion, which identiﬁes

as local maxima those data points that are within a

certain number of standard deviations (σ) from the

mean (µ) of a moving window. This procedure uses

three parameters, the window lag, the threshold or

number of σ’s at which the peak is marked, and the

inﬂuence of the peak on the µ and σ. To illustrate the

technique employed, Figure 3 shows in dark blue the

velocity signal related to the left wrist of a climber, as

schematized in Figure 1. The algorithm identiﬁes the

movements as peaks in the velocity signal, so there

are four detections around frame numbers 160, 264,

656 and 852. This method detects signiﬁcant vari-

ations in the signal, leaving out those sudden peaks

considered as noise, such as occurs in the area of

frame 372.

3.2.6 Joint Movement Intervals

Through testing, we determined that for long or sus-

tained climbing movements, the cumulative accelera-

tion works better to identify the movement peaks than

the ﬂat velocity. Figure 4 shows the cumulative accel-

eration of the wrist joint analysed in ﬁgure 3, there the

four signiﬁcant movements detected along the signal

were transformed into slopes when we had long and

Automated Human Movement Segmentation by Means of Human Pose Estimation in RGB-D Videos for Climbing Motion Analysis

369

Figure 3: Application of the z-score algorithm on the wrist

velocity signal with lag window 35 and σ threshold 6.

sustained movements, while the relatively short and

fast ones were interpreted as steep slopes. This is the

case for the limbs when they are occluded and the un-

certainty in position is reﬂected as jittering in the sig-

nal, as both images show around frame 372. Frame

intervals were created around the peaks obtained in

the accumulated acceleration signal, and only those

peaks whose intervals intersect adjacent ones were

taken. This process provided us with a series of frame

intervals where the joint showed consistent changes

in velocity, i.e. a starting point with increasing accel-

eration and an end point with the decreasing magni-

tude. These local maxima indicate when the limb of

the body is in motion and are therefore directly related

to the segmentation of the movement sought.

Figure 4: Signal analysis in 2-D of a wrist joint obtained

from an 16-second video sequence at 60 fps, and application

of the z-score algorithm to cumulative acceleration.

4 RESULTS AND DISCUSSION

This section presents an evaluation of the quality of

data delivered by the skeleton extraction algorithms

used, followed by a contrast of results using pure 2-D

images against the complement with 3-D information.

4.1 HPE Framework Selection

In images recorded with the RS-D435 camera, Open-

Pose results are notably good in most positions taken

by the climber. The skeleton extraction for peo-

ple from behind is precise and in the climbing sce-

nario, it is possible to detect particular climber poses

when the occlusion of the extremities is not very pro-

nounced. In addition, OpenPose offers the location of

the hands and feet, which is important when it comes

to analysing how effective the climber’s pose is be-

fore making a move (Kosmalla et al., 2020). Never-

theless, the algorithm has problems by detecting false

hands and feet positions when they are confused with

shapes of the holds on the climbing wall, especially

when lighting conditions or the colour of the grips

make it difﬁcult to differentiate them from body parts,

see Figure 5a. It is relevant to mention here that Open-

Pose has a licence restriction that strictly prohibits its

use for sports for commercial purposes (CMU, 2019);

additionally, the annual cost of this licence makes it

awkward to implement outside the research ﬁeld.

(a)

(b)

(c)

Figure 5: Pose estimation by three different frameworks:

(a) Hold misidentiﬁed as a right hand by OpenPose in an

RS image. (b) Twisted skeleton by OpenPose trying to ﬁt

the subject in front. (c) Correct skeleton detection by Vision

at 6 m from a wall tilted at 25°.

For the sequences recorded with iPad, PoseNet

allowed the extraction of the skeleton with an accu-

racy of up to 53 % in each video. Index calculated

with the number of false orientations of the skeleton,

in 12 videos of 19,8 s on average at 60 fps (14.270

images approximately), plus the number of duplicate

joints as a result of partial twists of the skeleton de-

tected, see Figure 5b. This does not include false de-

tections of limbs hidden by the climber’s body, which

VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications

370

are difﬁcult to count algorithmically, hence the per-

centage of good detections could decrease as a func-

tion of the time the climber covers a limb. The prob-

lem with PoseNet lies in the Microsoft COCO key-

point dataset (Lin et al., 2014) used to train the al-

gorithm , where the content does not provide enough

knowledge for the algorithm to recognize climbers on

a climbing wall. After correcting the body limbs’

laterality through heuristic rules the result reached a

67 % of effectiveness in the detection of the poses.

Tests with the Vision framework were successful

with 93 % effectiveness, obtaining the skeleton with-

out the need for orientation corrections as Figure 5c

shows, and applying the same metric used to evaluate

PoseNet as shown in Table 2. Detection difﬁculties

occur when the subject occupies less than 1/3 of the

overall image height, as recommended by the docu-

mentation. The detections are then confused with the

adjacent holds just like OpenPose. A curious situation

happens when a climber has hair styled like a pony-

tail, then the algorithm detects this kind of hair-knot

as if it were the subject’s nose and tries to rotate the

skeleton.

Regardless of the algorithm used for the pose es-

timation, they all provide a likelihood in the detec-

tion of each skeleton joint. By averaging the individ-

ual probabilities of all the detected joints per frame,

to assign a global certainty to the climber’s skeleton,

it could be observed that, in the case of OpenPose,

the detections are generally made with a certainty of

67 %. While with PoseNet and Vision this value is

68 % and 82 % respectively. Although this calcu-

lation is not conclusive, it does make it possible to

identify the certainty as an inherent parameter of the

algorithm employed, which cannot be used as cross-

sectionally indicator.

Table 2: Effectiveness comparison of the three tested pose

detection frameworks. OpenPose ran only on the RealSense

device without rotation problem, while PoseNet and Vision

were able to run on the iPad.

4.2 Algorithm Evaluation

The climber’s movement segmentation algorithm was

evaluated in six scenarios, which are derived from the

use of three groups of videos analysed in 2-D and

3-D, see Table 3. The ﬁrst video group made up of

24 recordings created with a RealSense D435 cam-

era, 4 m away from the climbing wall, for which HPE

was done using the OpenPose framework. The sec-

ond and third video groups were recorded on an iPad

Pro 4th Generation, at 4 m and 6 m from the climb-

ing wall respectively, using the device’s framework

for HPE. On the other hand, the six parameters re-

quired for the algorithm execution were calculated in

advance for each scenario before executing the evalu-

ation.

Table 3: Speciﬁcations of the three datasets used to tuning

the segmentation algorithm.

The ground truth for the evaluation of the algo-

rithm was constructed manually by observing each

of the climber’s movements in the different sets of

videos, taking time measurements for the actual seg-

ments where a movement of each limb was observed.

To classify True Positives, an intersection of at least

70 % between the interval lengths of the detected and

the expected segment was considered; otherwise, it

was treated as False Negative.

In the video group of the RS-D435 camera, both

in the 2-D and in the 3-D cases, the skeleton jit-

tering could be signiﬁcantly reduced using the Sav-

itzky–Golay ﬁlter. Relying only on those skeleton

nodes whose likelihood was greater than 65 % and

76 %, respectively. Given that the point cloud density

was higher there, the calculation of the z-coordinate

for the joints presented less abrupt variations. Thus,

as shown in Figure 6, the detections of the movement

of the climber’s limbs, in both scenarios remained

similar. However, rapid movements between adjacent

holds were discarded by the algorithm. It may be at-

tributed to the fact that they occurred within the slid-

ing window of the z-scoring algorithm, within time

lower than 1/3 s, or due to no intersections of the 10-

frame interval around the detected peak were found.

In general, the detections were made with a proba-

bility of 73,74 % in 2-D and 69,51 % in 3-D, where

part of the fails can be attributed to the fact that many

skeleton joints below the certainty threshold were dis-

carded to avoid signals from other doubtful nodes.

For the iPad videos, the results differed notably

between the 2-D and 3-D cases. On the one hand, in

the 2-D detections, the jittering presented in the skele-

ton joints positions was not reduced as much as ex-

pected. It is due to the Vision algorithm producing

high certainty values for the detected joints so that

even invalid positions cannot be discarded. Despite

Automated Human Movement Segmentation by Means of Human Pose Estimation in RGB-D Videos for Climbing Motion Analysis

371

Figure 6: ROC curve comparison of the RS-D435 videos

segmented in 2-D and 3-D.

this, we obtained a 74,16 % of good detections with

a medium rate of false positives for videos recorded

at 4 m from the wall, see Figure 7. That was not the

case when it was recorded 6 m from the wall, where

good detections decreased to 67,59 % and false posi-

tives increased with it. On the other hand, 3-D detec-

tions showed considerable variations in the expected

results. The z-coordinate calculated for the joints ex-

hibited a high rate of jittering, which produced many

spikes in the speed signal. For now, this presents a

technical restriction in our research, since the density

of the point cloud depends on the physical memory of

the device and the duration of the recorded sequence.

However, the results for 4 m and 6 m were detections

with 65,5 % and 59,56 % effectiveness.

Figure 7: ROC curve for iPad Pro data sets recorded at 4 m

from the wall, showing a comparison between 2-D and 3-D.

5 CONCLUSIONS AND FUTURE

WORK

The objective of the climber’s movement segmenta-

tion by analysing changes in speed and acceleration

of their limbs was met according to the expectations.

The study showed that currently there are algorithms

skilled enough to detect various poses of climbers in

action, such as OpenPose and Vision. It was pos-

sible to prove that the cumulative acceleration met-

ric is valid for detecting the peaks of the climber’s

limbs movement. Although, there is still a signiﬁcant

problem to be solved, which is the sudden change of

position of the hidden limbs by the climber’s body,

where a viable solution is to retrain one of the evalu-

ated HPE frameworks by including a proper climbing

image dataset. The latter is feasible using PoseNet,

considering that Vision is a private framework from

Apple Inc.

Even through the results of the automatic move-

ment segmentation, in both 2-D and 3-D scenarios,

were more consistent with the observations in the

videos recorded with RS-435 using OpenPose, the

quality of the video recorded with iPad and its Vision

framework can not be discarded. Poor results at 6 m

could be expected as the iOS documentation recom-

mends 5,5 m maximum from the object, but it helped

us in our project to test the reach of the device’s tech-

nology. Considering the OpenPose licensing restric-

tions on the one hand, and the versatility of the iPad

hardware and software on the other, this device is a

suitable tool to continue our research. However, we

are aware that Vision can only be used within the Ap-

ple Inc. devices environment, so its utilization limits

the use of the applications.

The results presented above allow us to continue

our research to evaluate the execution of technique

in sport climbing, considering the relationships be-

tween posture, momentum and the effectiveness of

the climber’s movements.

ACKNOWLEDGEMENTS

This research was funded by the “Zentrales Inno-

vationsprogramm Mittelstand (ZIM)” by the Federal

Ministry for Economic Affairs and Energy (BMWi)

with the project ID ZF4095809DH9.

REFERENCES

Aoki, T., Lin, J. F.-S., Kuli

c, D., and Venture, G. (2016).

Segmentation of human upper body movement us-

ing multiple imu sensors. In 2016 38th Annual In-

ternational Conference of the IEEE Engineering in

Medicine and Biology Society (EMBC), pages 3163–

3166. IEEE.

Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., and Sheikh,

Y. (2019). OpenPose: realtime multi-person 2d pose

estimation using part afﬁnity ﬁelds. In IEEE trans-

VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications

372

actions on pattern analysis and machine intelligence,

volume 43, pages 172–186. IEEE.

Cao, Z., Simon, T., Wei, S.-E., and Sheikh, Y. (2017). Real-

time multi-person 2d pose estimation using part afﬁn-

ity ﬁelds. In Proceedings of the IEEE conference on

computer vision and pattern recognition, pages 7291–

7299. IEEE.

Cha, K., Lee, E.-Y., Heo, M.-H., Shin, K.-C., Son, J., and

Kim, D. (2015). Analysis of climbing postures and

movements in sport climbing for realistic 3d climbing

animations. In Procedia Engineering, volume 112,

pages 52–57. Elsevier.

CMU (2019). Carnegie Mellon University: Open-

Pose - non-exclusive commercial license.

https://cmu.ﬂintbox.com/technologies/b820c21d-

8443-4aa2-a49f-8919d93a8740, visisted on

24/09/2021.

Colyer, S. L., Evans, M., Cosker, D. P., and Salo, A. I.

(2018). A review of the evolution of vision-based mo-

tion analysis and the integration of advanced computer

vision methods towards developing a markerless sys-

tem. In Sports medicine-open, volume 4, pages 1–15.

SpringerOpen.

Fan, K., Wang, P., Hu, Y., and Dou, B. (2017). Fall de-

tection via human posture representation and support

vector machine. In International journal of distributed

sensor networks, volume 13. SAGE Publications Sage

UK: London, England.

Jansen, W., Laurijssen, D., Daems, W., and Steckel, J.

(2019). Automatic calibration of a six-degrees-of-

freedom pose estimation system. In IEEE Sensors

Journal, volume 19, pages 8824–8831. IEEE.

Kalman, R. E. (1960). A new approach to linear ﬁl-

tering and prediction problems. In Transactions of

the ASME–Journal of Basic Engineering, volume 82,

pages 35–45.

Khuangga, M. C. and Widyantoro, D. H. (2018). Human

identiﬁcation using human body features extraction.

In 2018 International Conference on Advanced Com-

puter Science and Information Systems (ICACSIS),

pages 397–402. IEEE.

Kosmalla, F., Zenner, A., Tasch, C., Daiber, F., and Kr

uger,

A. (2020). The importance of virtual hands and feet

for virtual reality climbing. In Extended Abstracts of

the 2020 CHI Conference on Human Factors in Com-

puting Systems, pages 1–8. Association for Comput-

ing Machinery.

Kuli

c, D., Kragic, D., and Kr

uger, V. (2011). Learning ac-

tion primitives. Visual analysis of humans, pages 333–

353. Springer.

Lin, J. F.-S., Karg, M., and Kuli

c, D. (2016). Movement

primitive segmentation for human motion modeling:

A framework for analysis. In IEEE Transactions on

Human-Machine Systems, volume 46, pages 325–339.

IEEE.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,

Ramanan, D., Doll

ar, P., and Zitnick, C. L. (2014).

Microsoft coco: Common objects in context. In Euro-

pean conference on computer vision, pages 740–755.

Springer.

McCay, K. D., Ho, E. S., Shum, H. P., Fehringer, G., Mar-

croft, C., and Embleton, N. D. (2020). Abnormal in-

fant movements classiﬁcation with deep learning on

pose-based features. In IEEE Access, volume 8, pages

51582–51592. IEEE.

Meier, F., Theodorou, E., Stulp, F., and Schaal, S. (2011).

Movement segmentation using a primitive library. In

2011 IEEE/RSJ International Conference on Intelli-

gent Robots and Systems, pages 3407–3412. IEEE.

Papandreou, G., Zhu, T., Chen, L.-c., Gidaris, S., Tompson,

J., and Murphy, K. (2018). PersonLab: Person pose

estimation and instance segmentation with a bottom-

up, part-based, geometric embedding model.

Richter, J., Beltr

an B., R., and Heinkel, U. (2020a).

Camera-based climbing analysis for a therapeutic

training system. In Current Directions in Biomedical

Engineering, volume 6. De Gruyter.

Richter, J., Beltr

an B., R., K

ostermeyer, G., and Heinkel,

U. (2020b). Human climbing and bouldering motion

analysis: A survey on sensors, motion capture, anal-

ysis algorithms, recent advances and applications. In

VISIGRAPP (5: VISAPP), pages 751–758.

Savitzky, A. and Golay, M. J. (1964). Smoothing and dif-

ferentiation of data by simpliﬁed least squares pro-

cedures. In Analytical chemistry, volume 36, pages

1627–1639. ACS Publications.

Seifert, L., Dovgalecs, V., Boulanger, J., Orth, D., H

erault,

R., and Davids, K. (2014). Full-body movement pat-

tern recognition in climbing. In Sports Technology,

volume 7, pages 166–173. Taylor & Francis.

ogele, A., Kr

uger, B., and Klein, R. (2014). Efﬁcient un-

supervised temporal segmentation of human motion.

acm siggraph. In Eurographics Symposium on Com-

puter Animation.

Winter, S. (2012). Klettern & Bouldern: Kletter-

und Sicherungstechnik f

ur Einsteiger, pages 90–91.

Rother Bergverlag.

Xiaohui, T., Xiaoyu, P., Liwen, L., and Qing, X. (2018).

Automatic human body feature extraction and per-

sonal size measurement. In Journal of Visual Lan-

guages & Computing, volume 47, pages 9–18. Else-

vier.

Zago, M., Luzzago, M., Marangoni, T., De Cecco, M., Tara-

bini, M., and Galli, M. (2020). 3d tracking of human

motion using visual skeletonization and stereoscopic

vision. In Frontiers in bioengineering and biotechnol-

ogy, volume 8, page 181. Frontiers.

Zheng, C., Wu, W., Yang, T., Zhu, S., Chen, C., Liu, R.,

Shen, J., Kehtarnavaz, N., and Shah, M. (2020). Deep

learning-based human pose estimation: A survey.

Automated Human Movement Segmentation by Means of Human Pose Estimation in RGB-D Videos for Climbing Motion Analysis

373