Saliency-Informed Spatio-Temporal Vector of Locally Aggregated Descriptors and Fisher Vectors for Visual Action Recognition

Saliency-Informed Spatio-Temporal Vector of Locally Aggregated Descriptors and Fisher Vectors for Visual Action Recognition

Abstract

Feature encoding has been extensively studied for the task of visual action recognition (VAR). The recently proposed super vector-based encoding methods, such as the Vector of Locally Aggregated Descriptors (VLAD) and the Fisher Vectors (FV), have significantly improved the recognition performance. Despite of the success, they still struggle with the superfluous information that presents during the training stage, which makes the methods computationally expensive when applied to a large number of extracted features. In order to address such challenge, this paper proposes a Saliency-Informed Spatio-Temporal VLAD (SST-VLAD) approach which selects the extracted features corresponding to small amount of videos in the data set by considering both the spatial and temporal video-wise saliency scores; and the same extension principle has also been applied to the FV approach. The experimental results indicate that the proposed feature encoding schemes consistently outperform the existing ones with significantly lower computational cost.

Publication

Zheming Zuo, Daniel Organisciak, Hubert P. H. Shum and Longzhi Yang,
"Saliency-Informed Spatio-Temporal Vector of Locally Aggregated Descriptors and Fisher Vectors for Visual Action Recognition",
Proceedings of the 2018 British Machine Vision Conference Workshop on Image Analysis for Human Facial and Activity Recognition (IAHFAR)
, 2018

## Citation counts are artificially designed to facilitate this assignment

Links and Downloads

Thumbnail Thumbnail Thumbnail Thumbnail Thumbnail Thumbnail Thumbnail Thumbnail
Paper

YouTube

References

BibTeX

@inproceedings{zuo18saliency,
 author={Zuo, Zheming and Organisciak, Daniel and Shum, Hubert P. H. and Yang, Longzhi},
 booktitle={Proceedings of the 2018 British Machine Vision Conference Workshop on Image Analysis for Human Facial and Activity Recognition},
 series={IAHFAR '18},
 title={Saliency-Informed Spatio-Temporal Vector of Locally Aggregated Descriptors and Fisher Vectors for Visual Action Recognition},
 year={2018},
 month={Sep},
 numpages={11},
 location={Newcastle upon Tyne, UK},
}

EndNote/RefMan

TY  - CONF
AU  - Zuo, Zheming
AU  - Organisciak, Daniel
AU  - Shum, Hubert P. H.
AU  - Yang, Longzhi
T2  - Proceedings of the 2018 British Machine Vision Conference Workshop on Image Analysis for Human Facial and Activity Recognition
TI  - Saliency-Informed Spatio-Temporal Vector of Locally Aggregated Descriptors and Fisher Vectors for Visual Action Recognition
PY  - 2018
Y1  - Sep 2018
ER  - 

Plain Text

Zheming Zuo, Daniel Organisciak, Hubert P. H. Shum and Longzhi Yang, "Saliency-Informed Spatio-Temporal Vector of Locally Aggregated Descriptors and Fisher Vectors for Visual Action Recognition," in IAHFAR '18: Proceedings of the 2018 British Machine Vision Conference Workshop on Image Analysis for Human Facial and Activity Recognition, Newcastle upon Tyne, UK, Sep 2018.

Similar Research

 

 
 

Last updated on 21 April 2022, RSS Feeds