Simultaneous Monocular Visual Odometry and Depth Reconstruction with
Scale Recovery
Abstract
In this paper, we propose a deep neural networkthat can estimate camera
poses and reconstruct thefull resolution depths of the environment
simultaneously usingonly monocular consecutive images. In contrast to
traditionalmonocular visual odometry methods, which cannot
estimatescaled depths, we here demonstrate the recovery of the
scaleinformation using a sparse depth image as a supervision signalin
the training step. In addition, based on the scaled depth,the relative
poses between consecutive images can be estimatedusing the proposed deep
neural network. Another novelty liesin the deployment of view synthesis,
which can synthesize anew image of the scene from a different view
(camera pose)given an input image. The view synthesis is the core
techniqueused for constructing a loss function for the proposed
neuralnetwork, which requires the knowledge of the predicted depthsand
relative poses, such that the proposed method couples thevisual odometry
and depth prediction together. In this way,both the estimated poses and
the predicted depths from theneural network are scaled using the sparse
depth image as thesupervision signal during training. The experimental
results onthe KITTI dataset show competitive performance of our methodto
handle challenging environments.