Monocular weakly supervised depth and pose estimation method based on multi-information fusion: DOI: 10.48129/kjs.12929

Zhimin  Zhang; Jianzhong Qiao; Shukuan  Lin

doi:10.48129/kjs.12929

Authors

Zhimin Zhang Dept. of Computer Science and Engineering Northeastern University, China
Jianzhong Qiao Northeastern University
Shukuan Lin Dept. of Computer Science and Engineering Northeastern University, China

DOI:

https://doi.org/10.48129/kjs.12929

Abstract

Current monocular visual odometry methods usually either require a large amount of expensive ground truth data or require effective training to obtain suboptimal results. This paper presents a weakly supervised monocular depth and camera pose estimation method based on the fusion of video sequences, inertial measurement unit (IMU), and "Ground truth" labels. First, we propose a labels generation model, which uses a transfer learning method to obtain high-precision depth and 6-degree-of-freedom(DOF) pose data as the "Ground truth" labels of our monocular model through a very small amount of ground truth disparity maps. Then, we construct a multi-information fusion network model based on the "Ground truth" labels, video sequence and IMU information to estimate depth and camera pose. Finally, we design the loss function of supervised cues based on "Ground Truth" labels and self-supervised cues. In the testing phase, the network model can separately output high-precision pose and depth data from a monocular video sequence. The model is tested on the Kitti dataset, and its results exceeded other mainstream monocular depth and pose estimation methods.

	P.O. Box: 17225, Khaldia-72453Kuwait
	kjs@ku.edu.kw
	kuwaitjournals@gmail.com
	(+965) 249 86180 / 249 84625

Monocular weakly supervised depth and pose estimation method based on multi-information fusion

DOI: 10.48129/kjs.12929

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

Information

Developed By