Joint Spatio-Temporal Alignment of Sequences

Abstract

In this chapter, we propose a joint spatio-temporal video alignment to handle the most general problem and include the 'classic' case of fixed geometric transformation and/or linear temporal mapping as a particular case. In particular, the video alignment is formulated as a unique inference problem on a huge number of parameters, a non-parametric temporal correspondence and non-fixed geometric transformation, instead of tackling independently either temporal or spatial alignment. This simultaneously satisfies a frame-correspondence, or synchronization, and a frame-alignment, or image registration, along the whole sequence. Hence, this joint similarity accurately discriminates among successive frames that share a large similarity of content; thus reducing spatio-- and temporal--misalignments. This reduction is reinforced by exploiting the similarities between neighbor frames. The way we do it is by integrating the estimation of the spatio--temporal parameters into a standard pairwise Markov random filed (MRF); thus restricting the frame correspondence and spatial transformation according to the neighborhood.

Sequences

Experiments are conducted on different video sequence pairs covering all the alignment cases as shown the below table to validate the proposed algorithm.

Cameras	Temporal Correspondence
Cameras	Affine	unknown
Static	Water [1]	Jump [2] and Dancing [4]
Moving	Indoor-2 [3]	Highway, BackRoad, Campus and Indoor-1[5]

Highway

Observed Sequence	Reference Sequence
Joint Video Alignment	Liu's method [6]

Campus

Observed Sequence	Reference Sequence
Joint Video Alignment	Liu's method [6]

Back-road

Observed Sequence	Reference Sequence
Joint Video Alignment	Liu's method [6]

Indoor-1

Observed Sequence	Reference Sequence
Joint Video Alignment	Sand and Teller [5]

Dancing

Observed Sequence	Reference Sequence
Joint Video Alignment

Jump

Observed Sequence	Reference Sequence
Joint Video Alignment

Indoor-2

Observed Sequence	Reference Sequence
Joint Video Alignment

Water

Observed Sequence	Reference Sequence
Joint Video Alignment

References

[1] Y. Caspi and M. Irani, "Spatio–temporal alignment of sequences," IEEE Trans. PAMI., vol. 24, no. 11, pp. 1409–1424, 2002. Available in: http://www.wisdom.weizmann.ac.il/~vision/VideoAnalysis/Demos/Seq2Seq/Seq2Seq.html

[2] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, "Actions as space-time shapes," Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2247–2253, December 2007. Available in: www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html

[3] P. Kelly, N. O'Connor, and A. Smeaton, "A framework for evaluating stereo-based pedestrian detection techniques," IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, pp. 1163–1167, August 2008. Available in: http://www.cdvp.dcu.ie/datasets/pedestrian_detection/

[4] C. Rao, A. Gritai, M. Sha, and et al., "View–invariant alignment and matching of video sequences," in Proc. IEEE International Conference on Computer Vision, 2003, pp. 939– 945. Available in: http://server.cs.ucf.edu/~vision/projects/ ViewInvariance/ViewInvariance.html

[5] P. Sand and S. Teller, "Video matching," ACM Transactions on Graphics (Proc. SIGGRAPH), vol. 22, no. 3, pp. 592–599, 2004. Available in: http://rvsn.csail.mit.edu/vid-match/

[6] C. Liu, J. Yuen, and A. Torralba, "Sift flow: Dense correspondence across scenes and its applications," IEEE Trans. PAMI, vol. 99, 2010.

top