H. Abu Alhaija, A. Sellent, D. Kondermann, C. Rother. “GraphFlow – 6D Large Displacement Scene Flow via Graph Matching”, German Conference on Pattern Recognition (GCPR, a.k.a. DAGM), 2015. [pdf]
We present an approach for computing dense scene flow from two large displacement RGB-D images. When dealing with large displacements the crucial step is to estimate the overall motion correctly. While state-of-the-art approaches focus on RGB information to establish guiding correspondences, we explore the power of depth edges. To achieve this, we present a new graph matching technique that brings sparse depth edges into correspondence. An additional contribution is the formulation of a continuous-label energy which is used to densify the sparse graph matching output. We present results on challenging Kinect images, for which we outperform state-of-the-art techniques.
In our two stage scene flow approach, instead of using sparse texture matches only, we additionally utilize depth edges extracted from the RGB-D images that describe object boundaries well.However, in the presence of large motion, they are actually not trivially described and matched.While exact edge description suffers from occlusion and distortion effects, more robust edge descriptors often lead to ambiguous matches. To disambiguate edge matches with robust descriptors, we use a structured matching approach in the form of graph matching that profits from non-local information to assign edge matches.
Fig. 1: Details of graph matching. (a) Our edge description segments are represented by their center point, average appearance descriptor that describe the foreground region and normalized depth gradient vector. In order to compute a description segment we accumulate neighboring edge pixels whose descriptor variance is lower than a threshold t and whose count is between rmin = 20 and rmax = 30 pixels. (b) For graph matching, the description segments centers are connected to form a graph. In particular, each description segment center is connected to its N = 3 nearest neighbors with respect to the geodesic distance of the depth map to avoid connections across large depth changes. (c) Illustration of the geometry term used for graph matching.
Given ground truth matches, we can assign three class labels to each description segment matched by the graph matching algorithm:
a) correct match; b) almost correct match and c) wrong match. We compare our graph matching with conventional nearest neighbor (SIFT) matching and Hungarian matching of the description segments that admits at most one-to-one matching between description segments.
Figure 02: Visual comparison of graph matching results. For illustration pur- pose both RGB images are super-imposed. Green means a correct match (or occlusion), blue is an almost correct match (denition in text) and red is a wrong match. Our result is clearly superior to the other techniques.
Figure 03 : results of the Scene Flow: (a) Original image pairs (b) using SphereFlow [Hornacek et at] (c) using SphereFlow initialized with our Graph Matching (d) Our results using GraphFlow.
The RGBD sequences can be downloaded here