Over the last few years, with the immense popularity of the Kinect, there has been renewed interest in developing methods for human gesture and action recognition from 3D skeletal data. A number of approaches have been proposed to extract representative features from 3D skeletal data, most commonly hard wired geometric or bio-inspired shape context features. We propose a hierarchial dynamic framework that first extracts high level skeletal joints features and then uses the learned representation for estimating emission probability to infer action sequences. Currently gaussian mixture models are the dominant technique for modeling the emission distribution of hidden Markov models. We show that better action recognition using skeletal features can be achieved by replacing gaussian mixture models by deep neural networks that contain many layers of features to predict probability distributions over states of hidden Markov models. The framework can be easily extended to include a ergodic state to segment and recognize actions simultaneously.
|Publication status||Published - 25 Sep 2014|
|Event||2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) - Columbus, OH, USA|
Duration: 23 Jun 2014 → 28 Jun 2014
|Conference||2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)|
|Period||23/06/14 → 28/06/14|