Voting by Grouping Dependent Parts

The complexity of multi-scale, category level object detection in cluttered scenes is handled efficiently by Hough voting methods. The primary weakness of this approach is however that mutually dependent local observations are independently voting for intrinsically global object properties such as object scale. Thus the assumption is that the objects are a sum of their parts. This is against the fundamental conviction of Gestalt theory that the whole object is more than the sum of its parts.

We address this problem by incorporating local feature dependencies into voting framework by an objective function that combines three intimately related problems i) grouping mutually dependent parts ii) jointly solving the correspondence problem (matching parts of query image to model parts of training images) for all dependent parts and iii) letting groups of dependent parts vote for concerted object hypotheses.

The figure shown below depicts the processing pipeline of our approach. Given a novel image, we compute its probabilistic edge map and uniformly sample the edges to obtain local features. We then jointly optimize the grouping of query features, the correspondences between query features and training features and the transformation matrix defining the mapping of a group of query features to training features.


The left panel in the figure below shows the votes obtained by standard hough voting procedure. The mutual independence assumption leads to significant scatter in the votes, whereas our approach leads to concerted hough votes and thereby a strong object hypothesis. The hough votes obtained by our approach are shown in the right panel. Details about the experimental results can be found in our ECCV publication.


See publication section.