Unsupervised computer vision

Dr. Bogdan Savchynskyy, SoSe 2022

This seminar belongs to the Master in Physics (specialization Computational Physics, code "MVJC") and Master of Applied Informatics (code "IS") , but is also open for students of Scientific Computing and anyone interested.

Credits: 2 / 4 CP depending on course of study


The topic of this semester is Monocular depth estimation and unsupervised scene segmentation from videos

Ground truth data for such computer vision tasks as depth estimation and pixelvise semantic segmentation is costly and time-sonsuming to obtain. Therefore, the techniques able to work without it or able to leverage existing annotated data for producing state-of-the-art results for video sequences from other, distinct sources, are of great interest. Such methods are the topic of this seminar. Technical basis of most methods are neural networks, so familiarity with this technique is required.

General Information

Please register for the seminar in Müsli. The seminar will take place in presence. All links will be send per email via Müsli.
The first seminar will take place on Wednesday, April 20 at 11:00. Please make sure to participate!

  • Seminar: Wed, 11:00 – 13:00 in Mathematikon B (Berliner Str. 43), SR B128
    Ring the door bell labelled "HCI am IWR" to be let in. The seminar room is on the 3rd floor.
  • Credits: 2 / 4 CP depending on course of study, see LSF

Seminar Repository:

The seminar repository is placed at HeiBox here. The password is sent over Müsli.

Papers to choose from:

Monocular depth estimation approach:
[1] Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J. Brostow. Digging into self-supervised monocular depth estimation.
[2] Clément Godard, Oisin Mac Aodha, and Gabriel J. Brostow. Unsupervised monocular depth estimation with left-right consistency.

Learning to estimate disparity on absolute scale by exploiting semantic and geometric cues indirectly:
[3] Tom van Dijk and Guido de Croon. How do neural networks see depth in single images?

State-of-the-art Optical Flow network:
[4] Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume.

Combination of [1-4] + original contribution = Monocular Scene Flow:
[5] Hur and Roth - 2020 - Self-Supervised Monocular Scene Flow Estimation

Contrastive formulation for learning representations
[6] M. Gutmann and A. Hyvärinen. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In AISTATS, volume 9, pages 297–304, 2010

Self-training loss:
[7] D.-H. Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In ICML Workshops, volume 3, 2013

Equipping view-invariant objectives with temporary-persistent constraints:
[8] C. Feichtenhofer, H. Fan, B. Xiong, R. B. Girshick, and K. He. A large-scale study on unsupervised spatiotemporal representation learning. In CVPR, pages 3299–3309, 2021.

Learning dense feature representations and exploits the equivariance constraint:
[9] P. O. Pinheiro, A. Almahairi, R. Y. Benmalek, F. Golemo, and A. C. Courville. Unsupervised learning of dense visual representations. In NeurIPS*2020, pages 4489–4500.

Temporal coherence for visual tracking:
[10] N. Wang, W. Zhou, and H. Li. Contrastive transformation for self-supervised correspondence learning. In AAAI, pages 10174–10182, 2021.

Contrastive learning of visual representations:
[11] T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton. A simple framework for contrastive learning of visual representations. In ICML, pages 1597–1607, 2020.

Baseline for unsupervised video segmentation + degenerate solutions description when learning feature representations in a fully convolutional manner
[12] A. Jabri, A. Owens, and A. A. Efros. Space-time correspondence as a contrastive random walk. In NeurIPS*2020, pages 19545–19560.

Baseline for unsupervised video segmentation
[13] Z. Lai, E. Lu, and W. Xie. MAST: A memory-augmented self-supervised tracker. In CVPR, pages 6478–6487, 2020.

State-of-the-art technique based on [6-13]:
[14] N. Araslanov, S. Schaub-Meyer, and S. Roth, “Dense unsupervised learning for video segmentation,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 34, 2021

Optional papers:

[15] N. Araslanov and S. Roth, “Self-supervised augmentation consistency for adapting semantic segmentation,” in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June 2021, pp. 15384–15394
[16] J. Hur and S. Roth, “Self-supervised multi-frame monocular scene flow,” in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, June 2021, pp. 2684–2694.
[17] F. Saeedan and S. Roth, “Boosting monocular depth with panoptic segmentation maps,” in Proc. of the IEEE Winter Conference on Applications in Computer Vision (WACV), Waikoloa Beach, HI, Jan. 2021.