For self-supervised monocular depth estimation (SDE), recent works have introduced additional learning objectives, for example semantic segmentation, into the training pipeline and have demonstrated improved performance. However, such multi-task learning frameworks require extra ground truth labels, neutralising the biggest advantage of self-supervision. In this paper, we propose SUB-Depth to overcome these limitations. Our main contribution is that we design an auxiliary self-distillation scheme and incorporate it into the standard SDE framework, to take advantage of multi-task learning without labelling cost. Then, instead of using a simple weighted sum of the multiple objectives, we employ generative task-dependent uncertainty to weight each task in our proposed training framework. We present extensive evaluations on KITTI to demonstrate the improvements achieved by training a range of existing networks using the proposed framework, and we achieve state-of-the-art performance on this task.
|Title of host publication
|THe 33rd British Machine Vision Conference Proceedings
|Number of pages
|Published - Nov 2022