| We present a method for recovering
three-dimensional (3D) human body motion from monocular video sequences
based on a robust image matching metric, incorporation of joint limits and
non-self-intersection constraints, and a new sample-and-refine search strategy
guided by rescaled cost-function covariances. Monocular 3D body tracking
is challenging: besides the difficulty of matching an imperfect, highly
flexible, self-occluding model to cluttered image features, realistic body
models have at least 30 joint parameters subject to highly nonlinear physical
constraints, and at least a third of these degrees of freedom are nearly
unobservable in any given monocular image. For image matching we use a carefully
designed robust cost metric combining robust optical flow, edge energy,
and motion boundaries. The nonlinearities and matching ambiguities make
the parameter-space cost surface multimodal, ill-conditioned and highly
nonlinear, so searching it is difficult. We discuss the limitations of CONDENSATION-like
samplers, and describe a novel hybrid search algorithm that combines inflated-covariance-scaled
sampling and robust continuous optimization subject to physical constraints
and model priors. Our experiments on challenging monocular sequences show
that robust cost modeling, joint and self-intersection constraints, and
informed sampling are all essential for reliable monocular 3D motion estimation. |