Speakers: Victor Adrian Prisacariu
Friday, August 31st 2012, 11 am, in BC 329
Victor Adrian Prisacariu is a post doctoral research assistant with the Active Vision Lab, Robotics Group, Department of Engineering Science, University of Oxford, having finished his PhD at the same lab in 2012. His current research interests include developing methods for segmentation, tracking, shape knowledge and analysis and using them in applications ranging from driver assistance systems to body tracking for human computer interaction and to heart 3D ultrasound segmentation.
Shape Knowledge for Segmentation and Tracking
In this talk I will detail methods for using high level shape information in simultaneous segmentation and tracking.
I base my work on the assumption that the space of possible 2D object shapes can be either generated by projecting down known rigid 3D shapes or learned from 2D shape examples. I minimise the discrimination between statistical foreground and background appearance models with respect to the parameters governing the shape generative process (the 6 degree-of-freedom 3D pose of the 3D shape or the parameters of the learned space). The foreground region is delineated by the zero level set of a signed distance function, and I define an energy over this region and its immediate background surroundings based on pixel-wise posterior membership probabilities. I obtain the differentials of this energy with respect to the parameters governing shape and conduct searches for the correct shape using standard non-linear minimisation techniques.
This methodology first leads to a novel rigid 3D object tracker. For a known 3D shape, the optimisation here aims to find the 3D pose that leads to the 2D projection that best segments a given image. I also extend my approach to track multiple objects from multiple views and propose novel enhancements at the pixel level based on temporal consistency. Finally, owing to the per pixel nature of much of the algorithm, I support the theoretical approach with a real-time GPU based implementation.
Next, I explore deformable 2D/3D object tracking. Unlike previous works, I use a non-linear and probabilistic dimensionality reduction, called Gaussian Process Latent Variable Models, to learn spaces of shape. Segmentation becomes a minimisation of an image-driven energy function in the learned space. I can represent both 2D and 3D shapes which I compress with Fourier-based transforms, to keep inference tractable. I extend this method by learning joint shape-parameter spaces, which, novel to the literature, enable simultaneous segmentation and generic parameter recovery. These can describe anything from 3D articulated pose to eye gaze. I also propose two novel extensions to standard GP-LVM: a method to explore the multimodality in the joint space efficiently, by learning a mapping from the latent space to a space that encodes the similarity between shapes and a method for obtaining faster convergence and greater accuracy by use of a hierarchy of latent embeddings.