Figure 14: Reconstructed points across all subsets. Average taken every 2.5 minutes.
In future work, the inclusion of B-frames should
be implemented to further take advantage of video
compression motion estimation for feature detection
and matching. Moreover, further research of our ap-
proach with additional SfM pipelines should be pur-
sued to include simultaneous localization and map-
ping (SLAM) techniques.
6 CONCLUSION
This paper introduces a near real-time feature detec-
tion and matching algorithm for SfM reconstruction
using the properties of motion estimation found in
H.264 video compression encoders. Utilizing the mo-
tion vectors of the compressed video frames, we can
accurately predict the matching of features between
frames over time. The design has been evaluated
against multiple extraction techniques using an identi-
cal incremental pipeline to generate a 3D sparse point
cloud. In comparison, we have found our approach to
have very low execution time while also balancing the
accuracy of the generated sparse point cloud.
REFERENCES
Alcantarilla, P. F., Nuevo, J., and Bartoli, A. (2013). Fast
explicit diffusion for accelerated features in nonlin-
ear scale spaces. In British Machine Vision Conf.
(BMVC).
Bakas, J., Naskar, R., and Bakshi, S. (2021). Detection and
localization of inter-frame forgeries in videos based
on macroblock variation and motion vector analysis.
Computers and Electrical Engineering, 89:106929.
Bianco, S., Ciocca, G., and Marelli, D. (2018). Evaluating
the performance of structure from motion pipelines.
Journal of Imaging, 4:98.
Brostow, G. J., Shotton, J., Fauqueur, J., and Cipolla, R.
(2008). Segmentation and recognition using struc-
ture from motion point clouds. In Forsyth, D.,
Torr, P., and Zisserman, A., editors, Computer Vi-
sion – ECCV 2008, pages 44–57, Berlin, Heidelberg.
Springer Berlin Heidelberg.
Cheng, J., Leng, C., Wu, J., Cui, H., and Lu, H. (2014).
Fast and accurate image matching with cascade hash-
ing for 3d reconstruction. In Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion, pages 1–8.
Dagum, L. and Menon, R. (1998). Openmp: an industry
standard api for shared-memory programming. IEEE
computational science and engineering, 5(1):46–55.
Edpalm, V., Martins, A., Maggio, M., and rzn, K.-E. (2018).
H.264 video frame size estimation. Technical report,
Department of Automatic Control, Lund Institute of
Technology, Lund University.
Hartley, R. and Zisserman, A. (2004). Multiple View Geom-
etry in Computer Vision. Cambridge University Press,
2 edition.
Hwang, Y., Seo, J.-K., and Hong, H.-K. (2008). Key-frame
selection and an lmeds-based approach to structure
and motion recovery. IEICE Transactions, 91-D:114–
123.
Kalms, L., Mohamed, K., and G
¨
ohringer, D. (2017). Accel-
erated embedded akaze feature detection algorithm on
fpga. Proceedings of the 8th International Symposium
on Highly Efficient Accelerators and Reconfigurable
Technologies.
Laumer, M., Amon, P., Hutter, A., and Kaup, A. (2016).
Moving object detection in the h.264/avc compressed
domain. APSIPA Transactions on Signal and Informa-
tion Processing, 5:e18.
Majdik, A., Till, C., and Scaramuzza, D. (2017). The zurich
urban micro aerial vehicle dataset. The International
Journal of Robotics Research, 36:027836491770223.
Mancas, M., Beul, D. D., Riche, N., and Siebert, X. (2012).
Human attention modelization and data reduction. In
Punchihewa, A., editor, Video Compression, chap-
ter 6. IntechOpen, Rijeka.
Using Video Motion Vectors for Structure from Motion 3D Reconstruction
21