Robust Vision Challenge

in Association with the 2012 ECCV Workshop on Unsolved Problems in Optical Flow and Stereo Estimation

Video cameras provide information on a scene with low cost in acquisition, space and energy, and at the same time high spatio-temporal resolution. To extract depth and motion information from a video computer vision algorithms make strong assumption on a scene. The algorithms are thus easily distracted by phenomenons that violate these assumptions, such as reflecting or transparent surfaces, lens flares, and changing illuminations.
We recorded multiple real world scenes that contain instances of challenging phenomenons such as they occur in every-day traffic scenarios. To be applicable in industry, image based depth and motion estimation needs to deal robustly and reliably with these scenes.
We pose the estimation of depth and motion on our recorded sequences as a challenge to the scientific community. Can you develop an stereo or optical flow algorithm that can deal with with these sequences?
For the evaluation of participating algorithms we bring together a jury of renowned experts on the application of stereo and motion estimation. In the absence of ground-truth information, the jury will thoroughly inspect and evaluate the submitted correspondences and their applicability in industry. A prize for the best-performing algorithm is awarded by Bosch.

The Winner

The winner of the challenge where Simon Hermann and Reinhard Klette with a SGM variant.
Please find details in our ECCV Workshop Winner Announcement!

The Task

Estimate robust and reliable depth or motion fields on our challenging real world videos!
Download image sequences
The sequences contain many examples for regularly occurring situations causing problems with most methods known today. We are looking for algorithms that can produce reliable depth or motion estimates for all images, including the indicated keyframes. Participation is open to all ideas that improve the state-of-the-art in automatic motion and depth estimation on the given input videos!
This could involve (but is not limited to):
  • Previously unknown methods of correspondence estimation
  • Correspondence estimation making use of confidence measures
  • Correspondence estimation with a detection of unreliable input images or unreliable image regions
  • ...
In fact, we are looking forward for any creative, visionary way to estimate motion or depth from our video sequences!


All authors of algorithms that estimate stereo correspondences or motion correspondences or stereo and motion correspondences from the input images and are accepted for publishing in any journal or conference proceedings can participate.

Due to heavy server load we extended the deadline to September 24, 2012.
You can find detailed submission instructions here. (deadline ended!)

The evaluation of this challenge is part of the "2012 ECCV Workshop on Unsolved Problems in Optical Flow and Stereo Estimation" taking place October 12th, 2012 in Florence, Italy.
For each submission at least one team-member should assist in a discussion during the workshop.
Note that participation in the challenge is independent of the publication of any articles within the workshop. However we encourage the participants to also publish or revise their ideas and methods within this framework.
If you have any questions concerning your participation please contact us under hcibenchmark -at-


The input images for the challenge are real world videos recorded in an uncontrolled environment. The actual depths and object motions visible in the scenes are unknown. For the evaluation we rely on the capability of human observers to judge whether a correspondence is roughly correct. Although that does not allow for a quantitative evaluation, we believe the sequences to be so challenging that qualitative evaluation is sufficient.
The human jury that evaluates the motion and stereo results consists of eight experts in the usage of image correspondences in various applications. Each jury member is going to look at the provided visualizations and may additionally use his or her own tools to evaluate the submitted material. We documented some of the expectations on motions fields and also the tools that will be used to visualize correspondence fields. Depending on the decision of the jury among the sufficiently stable and reliable correspondence algorithms the most applicable algorithm is awarded a prize of 3000€ that is kindly provided by Bosch.


Simon Baker, Ph.D.
(Principal Researcher, Microsoft Research)

Simon is a researcher in the Interactive Visual Media Group at Microsoft Research Redmond. Before joining MSR in 2006, he was an Associate Research Professor in the Robotics Institute at Carnegie Mellon University. He obtained a Ph.D. in the Department of Computer Science at Columbia University in 1998, an M.A. in Mathematics from Trinity College, Cambridge in 1995, an M.Sc. in Computer Science from the University of Edinburgh in 1992 and a B.A. in Mathematics from Trinity College, Cambridge in 1991.

Goksel Dedeoglu, Ph.D.
(Vision R&D Manager, Texas Instruments)

Goksel leads the Embedded Vision team at the Systems and Applications R&D Center at Texas Instruments. He routinely designs and optimizes embedded software for real-world application performance and robustness, tackling challenges in automotive safety, video security, and gesture recognition. These solutions are being deployed worldwide in vehicle infotainment systems and consumer robots. Goksel holds a Ph.D. in Robotics from Carnegie Mellon University where he specialized in Computer Vision. He is also a graduate of the University of Southern California and Istanbul Technical University.

Jan Effertz, Dr.
(Head of Subdepartment Sensors&Fusion, Volkswagen Research Driver Assistance and Integrated Safety)

Dr. Jan Effertz has been working in the field of driver assistance and autonomous driving throughout the past 10 years. He has a background in Electrical Engineering and Control Engineering. During his PhD program at the Technical University Braunschweig he participated in the DARPA Urban Challenge in 2007. Dr. Effertz was responsible for system architecture, environment perception and technical lead. He graduated on Environment Perception and Sensor Data Fusion for autonomous driving in 2009. Dr. Effertz is now responsible for the sensor systems and data fusion activities at Volkswagen Research.

Oliver Erdler, Dipl.Ing. and MBA
(Senior Manager of Video Signal Processing Group, Sony)

Oliver Erdler studied Electrical Engineering at Universites of Ulm and Dortmund and received his Diplom-Ingenieur degree in 1998. In 2002 he became Assignment Project Leader in Algorithm development for motion compensated format conversion at Sony. In 2009 he became Manager of the Video Signal Processing Group. During his time at Sony he also graduated as Master of Business Administration at Open University, London in 2011. Since April 2011 he occupies his current position.

Wolfgang Niehsen, Dr.
(Chief Expert Computer Vision Systems, Corporate Research, Robert Bosch GmbH)

Wolfgang Niehsen is head of the Competence Center Computer Vision Systems (CCV) at Robert Bosch Corporate Research, Hildesheim, Germany. His main research fields are computer vision algorithms, statistical signal processing and tracking, embedded signal processing, and driver assistance systems. He is senior member of the Institute of Electrical and Electronics Engineers (IEEE) where he serves as associate editor of the IEEE Intelligent Vehicles Symposium. He is member of the American Institute of Aeronautics and Astronautics (AIAA) and of the German Information Technology Society (ITG/VDE).

Phil Parsonage, M.Sc.
(Lead Engineer, The Foundry)

Phil Parsonage is Lead Engineer at The Foundry, guiding development and QA across their engineering teams. After graduating from the University of Oxford, winning the Maurice Lubbock prize, Phil joined NaturalMotion to work on their animation product, Endorphin. Moving to The Foundry in 2004, he worked on the AMPAS Sci-Tech Award® winning technology in the Furnace toolset. This coincided with the launch of the open source OpenFX plug-in API, which he evangelised, developed, and supported. After a stint, in 2007, as a Marie-Curie Research Fellow at Trinity College Dublin as part of the AXIOM project, he returned to The Foundry in his present role.

Stephan Simon, Dr.
(Senior Expert, Computer Vision, Corporate Research, Robert Bosch GmbH)

Stephan Simon is a Senior Expert at the Competence Center Computer Vision Systems (CCV) at Robert Bosch Corporate Research, Hildesheim, Germany. His main research interests are optical flow, object tracking and stereo vision. At Bosch he is responsible for the development of optical flow algorithms for several applications like driver assistance, robotics, and surveillance. Main focuses of his work are the robustness of computer vision algorithms under all environmental conditions and their tailoring under the low-power constraints of embedded systems in future computer vision products.

Christian Unger, M.Sc.
(Camera-based Driver Assistance, BMW Group)

Christian Unger received the Dipl. Inf. (FH) degree in computer science from the Munich University of Applied Sciences in 2005 and the M.Sc. degree in computer science from the Technische Universität München, Germany in 2007. After that he was working towards the PhD degree in computer science at the BMW Group, Munich, Germany. In his current function at the BMW Group he focuses on advanced driver assistance systems using monocular image processing, stereo and motion-stereo. His research interests include stereo, multi-view reconstruction, image based rendering and driver assistance.

Further Reading

This challenge is only part of a project that tries to assess the quality of image based depth and motion estimation on the one hand and on the other hand improve their reliability and robustness to ensure industrial applications. Please read the following links to learn more:
  • The website containing the data of the contest: [URL]
  • The recording and processing of the images for this contest: [URL]
  • A detailed description how the image based estimation of depth and motion differs from the estimation of optical flow: [URL]
  • A toolbox to visualize image correspondences: [URL]
  • Single slide presentation template: [URL]
  • Paper: "On Performance Analysis of Optical Flow Algorithms": [PDF]
    (to appear in: "Outdoor Large-Scale Real-World Scene Analysis", Springer LNCS, bibtex entry to be created.)


Daniel Kondermann (HCI)
Anita Sellent (Bosch)
Bernd Jähne (HCI)
Jochen Wingbermühle (Bosch)