A., Bengio, Y., et al. (2017). A closer look at mem-
orization in deep networks. In International Confer-
ence on Machine Learning, pages 233–242. PMLR.
Bengio, Y., Louradour, J., Collobert, R., and Weston, J.
(2009). Curriculum learning. In Proceedings of
the 26th annual international conference on machine
learning, pages 41–48.
Chapelle, O., Scholkopf, B., and Zien, A. (2009).
Semi-supervised learning (chapelle, o. et al., eds.;
2006)[book reviews]. IEEE Transactions on Neural
Networks, 20(3):542–542.
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D.,
Bougares, F., Schwenk, H., and Bengio, Y. (2014).
Learning phrase representations using rnn encoder-
decoder for statistical machine translation.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In 2009 IEEE conference on com-
puter vision and pattern recognition, pages 248–255.
Ieee.
Gould, S., Fulton, R., and Koller, D. (2009). Decomposing
a scene into geometric and semantically consistent re-
gions. In 2009 IEEE 12th International Conference
on Computer Vision, pages 1–8.
Graves, A., rahman Mohamed, A., and Hinton, G. (2013).
Speech recognition with deep recurrent neural net-
works.
Hacohen, G. and Weinshall, D. (2019). On the power of
curriculum learning in training deep networks. In In-
ternational Conference on Machine Learning, pages
2535–2544. PMLR.
Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang,
I., and Sugiyama, M. (2018). Co-teaching: Robust
training of deep neural networks with extremely noisy
labels. arXiv preprint arXiv:1804.06872.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep resid-
ual learning for image recognition.
Honari, S., Molchanov, P., Tyree, S., Vincent, P., Pal, C.,
and Kautz, J. (2018). Improving landmark localiza-
tion with semi-supervised learning. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, pages 1546–1555.
Jiang, L., Zhou, Z., Leung, T., Li, L.-J., and Fei-Fei, L.
(2018). Mentornet: Learning data-driven curricu-
lum for very deep neural networks on corrupted la-
bels. In International Conference on Machine Learn-
ing, pages 2304–2313. PMLR.
Kingma, D. P. and Ba, J. (2017). Adam: A method for
stochastic optimization.
Kosmopoulos, D., Oikonomidis, I., Constantinopoulos, C.,
Arvanitis, N., Antzakas, K., Bifis, A., Lydakis, G.,
Roussos, A., and Argyros, A. (2020). Towards a vi-
sual sign language dataset for home care services. In
2020 15th IEEE International Conference on Auto-
matic Face and Gesture Recognition (FG 2020), pages
520–524. IEEE.
Li, J., Socher, R., and Hoi, S. C. (2020). Dividemix:
Learning with noisy labels as semi-supervised learn-
ing. arXiv preprint arXiv:2002.07394.
Mirzasoleiman, B., Cao, K., and Leskovec, J. (2020). Core-
sets for robust training of deep neural networks against
noisy labels. Advances in Neural Information Pro-
cessing Systems, 33.
Northcutt, C., Jiang, L., and Chuang, I. (2021). Confi-
dent learning: Estimating uncertainty in dataset labels.
Journal of Artificial Intelligence Research, 70:1373–
1411.
Panteleris, P., Oikonomidis, I., and Argyros, A. A. (2018).
Using a single rgb frame for real time 3d hand pose
estimation in the wild. In IEEE Winter Conference on
Applications of Computer Vision (WACV 2018), also
available at CoRR, arXiv, pages 436–445, lake Tahoe,
NV, USA. IEEE.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., Desmaison, A., Kopf, A., Yang, E., De-
Vito, Z., Raison, M., Tejani, A., Chilamkurthy, S.,
Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).
Pytorch: An imperative style, high-performance deep
learning library. In Wallach, H., Larochelle, H.,
Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Gar-
nett, R., editors, Advances in Neural Information Pro-
cessing Systems 32, pages 8024–8035. Curran Asso-
ciates, Inc.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2019). Mobilenetv2: Inverted residuals
and linear bottlenecks.
Shorten, C. and Khoshgoftaar, T. M. (2019). A survey on
image data augmentation for deep learning. Journal
of Big Data, 6(1):1–48.
Simon, T., Joo, H., Matthews, I., and Sheikh, Y. (2017).
Hand keypoint detection in single images using mul-
tiview bootstrapping.
Song, H., Kim, M., Park, D., Shin, Y., and Lee, J.-G. (2020).
Learning from noisy labels with deep neural networks:
A survey. arXiv preprint arXiv:2007.08199.
Sun, Y. and Loparo, K. (2020). Context aware im-
age annotation in active learning. arXiv preprint
arXiv:2002.02775.
Voigtlaender, P., Luo, L., Yuan, C., Jiang, Y., and Leibe,
B. (2021). Reducing the annotation effort for video
object segmentation datasets. In Proceedings of
the IEEE/CVF Winter Conference on Applications of
Computer Vision, pages 3060–3069.
Wan, C., Probst, T., Van Gool, L., and Yao, A. (2017).
Crossing nets: Dual generative models with a shared
latent space for hand pose estimation. In Confer-
ence on Computer Vision and Pattern Recognition,
volume 7.
Zeiler, M. D. (2012). Adadelta: An adaptive learning rate
method.
Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A.,
Sung, G., Chang, C.-L., and Grundmann, M. (2020).
Mediapipe hands: On-device real-time hand tracking.
Zhu, X. and Ghahramani, Z. (2002). Learning from labeled
and unlabeled data with label propagation.
Exploitation of Noisy Automatic Data Annotation and Its Application to Hand Posture Classification
641