Dissertation/ Thesis

Learning dense prediction : from correspondence to segmentation

التفاصيل البيبلوغرافية
العنوان: Learning dense prediction : from correspondence to segmentation
المؤلفون: Zhang, Feihu
المساهمون: Torr, Philip, Prisacariu, Victor
بيانات النشر: University of Oxford, 2022.
سنة النشر: 2022
المجموعة: University of Oxford
مصطلحات موضوعية: Computer vision, Deep learning (Machine learning)
الوصف: Dense prediction is the task of predicting a label for each pixel in the image. Given 3D data (point clouds or RGB-D images) as input, dense prediction can also be extended to 3D space and assign each 3D point/location a label. According to the label type, dense prediction can be mainly categorized as depth estimation, motion prediction, segmentation, and other related tasks. There are four major challenges for learning dense predictions: i) how to significantly improve the accuracy and resolve the ambiguous regions, ii) high memory and computational costs, iii) the dependency on a large amount of labeled data for training, and iv) the poor cross-domain generalization to novel datasets. This integrated thesis focuses on dense prediction tasks, from correspondence estimation (stereo matching and optical flow) to 2D/3D semantic segmentation. Seven robust deep neural network models are proposed to achieve state-of-the-art accuracy, to realize effective training with just synthetic data or unlabeled real data, and to boost the cross-domain generalization to various unseen datasets. For the first task, traditional 3D geometry constraints are embedded into end-to-end trainable stereo matching networks to achieve state-of-the-art accuracy on two stereo matching benchmarks (by publication date). Based on this work, a domain-invariant stereo matching network is proposed. It is trained on the synthetic data but outperforms many models fine-tuned on real data. For the second task, a Separable Flow network is developed for optical flow estimation, which ranks the first on two standard optical flow benchmarks (by the time of publication). It's also one of the best methods for predicting optical flow on various unseen datasets. Moreover, research is also conducted on unsupervised pre-training and domain adaptation for semantic image segmentation. Finally, the 2D image segmentation knowledge is further leveraged for tackling 3D segmentation. The proposed 3D segmentation networks achieve the leading position on large-scale point-cloud segmentation benchmarks (at the time of publication).
نوع الوثيقة: Electronic Thesis or Dissertation
اللغة: English
URL الوصول: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.886668
رقم الانضمام: edsble.886668
قاعدة البيانات: British Library EThOS