| Title | Cross and Learn: Cross-Modal Self-Supervision |
| Publication Type | Conference Paper |
| Year of Publication | 2018 |
| Authors | Sayed, N, Brattoli, B, Ommer, B |
| Conference Name | German Conference on Pattern Recognition (GCPR) (Oral) |
| Conference Location | Stuttgart, Germany |
| Keywords | action recognition, cross-modal, image understanding, unsupervised learning |
| Abstract | In this paper we present a self-supervised method to learn feature representations for different modalities. Based on the observation that cross-modal information has a high semantic meaning we propose a method to effectively exploit this signal. For our method we utilize video data since it is available on a large scale and provides easily accessible modalities given by RGB and optical flow. We demonstrate state-of-the-art performance on highly contested action recognition datasets in the context of self-supervised learning. We also show the transferability of our feature representations and conduct extensive ablation studies to validate our core contributions. |
| URL | https://arxiv.org/abs/1811.03879v1 |
| Citation Key | sayed:GCPR:2018 |



