Spatio-temporal steerable pyramid for human action recognition

Xiantong Zhen, Ling Shao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)


In this paper, we propose a novel holistic representation based on the spatio-temporal steerable pyramid (STSP) for human action recognition. The spatio-temporal Laplacian pyramid provides an effective technique for multi-scale analysis of video sequences. By decomposing spatio-temporal volumes into band-passed sub-volumes, spatio-temporal patterns residing in different scales will be nicely localized. Then three-dimensional separable steerable filters are conducted on each of the sub-volume to capture the spatio-temporal orientation information efficiently. The outputs of the quadrature pair of steerable filters are squared and summed to yield a more robust measure of motion energy. To make the representation invariant to shifting and applicable with coarsely-extracted bounding boxes for the performed actions, max pooling operations are employed between responses of the filtering at adjacent scales, and over spatio-temporal local neighborhoods. Taking advantage of multi-scale and multi-orientation analysis and feature pooling, STSP produces a compact but informative and invariant representation of human actions. We conduct extensive experiments on the KTH, IXMAS and HMDB51 datasets, and the proposed STSP achieves comparable results with the state-of-the-art methods.
Original languageEnglish
Title of host publicationAutomatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on
PublisherThe Institute of Electrical and Electronics Engineers (IEEE)
ISBN (Electronic)978-1-4673-5546-9, 978-1-4673-5544-5
ISBN (Print)978-1-4673-5545-2
Publication statusPublished - 15 Jul 2013

Cite this