Learning Where to Drive by Watching Others

Learning Where to Drive by Watching Others

Bautista, Miguel , Fuchs, Patrick , Ommer, Björn
GCPR 2017 : 39th German Conference on Pattern Recognition, 2017


The most prominent approach for autonomous cars to learn what areas of a scene are drivable is to utilize tedious human supervision in the form of pixel-wise image labeling for training deep semantic segmentation algorithms. However, the underlying CNNs require vast amounts of this training information, rendering the expensive pixel-wise labeling of images a bottleneck. Thus, we propose a self-supervised approach that is able to utilize the myriad of easily available dashcam videos from YouTube or from autonomous vehicles to perform fully automatic training by simply watching others drive. We play training videos backwards in time and track patches that cars have driven over together with their spatio-temporal interrelations, which are a rich source of context information. Collecting large numbers of these local regions enables fully automatic self-supervision for training a CNN. The proposed method has the potential to extend and complement the popular supervised CNN learning of drivable pixels by using a rich, presently untapped source of unlabeled training data.

Supplementary Material

Video Examples


Zero-shot Prediction

Difficult Examples

PDF icon Article3.74 MB
PDF icon Supplementary Material3.36 MB