Tracking Algorithm is beneficial either with RGB-D
information or just with RGB information. When
using the GPU, the impact of the Tracking Algorithm
is less, but in both situations, it is beneficial to use the
Tracking Algorithm.
It is important to emphasize that, using the
Tracking Algorithm and the GPU, the depth
information has no impact on the leather
segmentation velocity. The FPS results were
calculated with the models performing the prediction
in all frames and, when it is necessary to increase the
tracking velocity, the prediction does not need to be
performed in all frames, increasing the FPS.
4 CONCLUSIONS
Using a U-NET architecture for the Deep Learning
model helped us to get better results compared to
creating an architecture from scratch. In addition to
not having to train for many epochs, the results are
also satisfactory.(Ronneberger et al., 2015)
In general, the depth information is important for
the Deep Learning Model when it is intended to
segment a deformable object, but it is necessary to
pay attention to the dataset size. With a larger dataset
the depth information has more impact, but if the
dataset is small or the images have low resolution, the
depth information is not so useful for the Deep
Learning Model.
The Tracking Algorithm proved to be useful to
increase the system's processing velocity, in this way,
it is possible to increase the number of FPS and
manage to track the Leather in the same way.
However, it is necessary to pay attention to situations
in which the Bounding Box Model loses the object.
To prevent this situation, it is possible to create an
architecture that, in some situations, uses the Full
Image Model to guarantee the correct location of the
Leather.
For future work it is important to increase the
dataset size and include different types of Leather.
This way, the depth information will have a greater
impact on the Models. In addition, it is possible to add
more data processing steps, but it is necessary to pay
attention to the impact of each technique on the
system as it will have to work in real time.
ACKNOWLEDGEMENTS
This work is supported by: European Structural
and Investment Funds in the FEDER component,
through the Operational Competitiveness and
Internationalization Programme (COMPETE 2020)
[Project nº 42778; Funding Reference: POCI-01-
0247-FEDER-042778].
REFERENCES
Hu, Z., Han, T., Sun, P., Pan, J., & Manocha, D. (2019). 3-
D Deformable Object Manipulation Using Deep Neural
Networks. IEEE Robotics and Automation Letters,
4(4), 4255–4261. https://doi.org/10.1109/LRA.2019.2
930476
Kingma, D. P., & Ba, J. (2017). Adam: A Method for
Stochastic Optimization. ArXiv:1412.6980 [Cs].
http://arxiv.org/abs/1412.6980
Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A large-scale
hierarchical multi-view RGB-D object dataset. 2011
IEEE International Conference on Robotics and
Automation, 1817–1824. https://doi.org/10.1109/
ICRA.2011.5980382
Liu, Z., Shi, S., Duan, Q., Zhang, W., & Zhao, P. (2019).
Salient object detection for RGB-D image by single
stream recurrent convolution neural network.
Neurocomputing, 363, 46–57. https://doi.org/10.1016/
j.neucom.2019.07.012
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L.,
Gross, M., & Sorkine-Hornung, A. (2016). A
Benchmark Dataset and Evaluation Methodology for
Video Object Segmentation. 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR),
724–732. https://doi.org/10.1109/CVPR.2016.85
Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P.,
Sorkine-Hornung, A., & Van Gool, L. (2018). The 2017
DAVIS Challenge on Video Object Segmentation.
ArXiv:1704.00675 [Cs]. http://arxiv.org/abs/1704.00
675
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net:
Convolutional Networks for Biomedical Image
Segmentation. ArXiv:1505.04597 [Cs]. http://arxiv.
org/abs/1505.04597
Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W.
T. (2008). LabelMe: A Database and Web-Based Tool
for Image Annotation. International Journal of
Computer Vision, 77(1–3), 157–173. https://doi.org/
10.1007/s11263-007-0090-8
Song, S., & Xiao, J. (2013). Tracking Revisited Using
RGBD Camera: Unified Benchmark and Baselines.
2013 IEEE International Conference on Computer
Vision, 233–240. https://doi.org/10.1109/ICCV.20
13.36
Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B.
B. G., Geiger, A., & Leibe, B. (2019). MOTS: Multi-
Object Tracking and Segmentation. 2019 IEEE/CVF
Conference on Computer Vision and Pattern
Recognition (CVPR), 7934–7943. https://doi.org/
10.1109/CVPR.2019.00813