A novel feature fusion based Human Action Recognition in 2D Videos

  • Authors

    • K Rajendra Prasad
    • P Srinivasa Rao
    2018-04-18
    https://doi.org/10.14419/ijet.v7i2.20.13297
  • Human action recognition (HAR), Feature Fusion, Support vector machine, Scale invariant feature transform, Speed up robust features, Histogram of oriented gradient, Local binary patterns.
  • Abstract

    Human action recognition from 2D videos is a demanding area due to its broad applications. Many methods have been proposed by the researchers for recognizing human actions. The improved accuracy in identifying human actions is desirable. This paper presents an improved method of human action recognition using support vector machine (SVM) classifier. This paper proposes a novel feature descriptor constructed by fusing the various investigated features. The handcrafted features such as scale invariant feature transform (SIFT) features, speed up robust features (SURF), histogram of oriented gradient (HOG) features and local binary pattern (LBP) features are obtained on online 2D action videos. The proposed method is tested on different action datasets having both static and dynamically varying backgrounds. The proposed method achieves shows best recognition rates on both static and dynamically varying backgrounds. The datasets considered for the experimentation are KTH, Weizmann, UCF101, UCF sports actions, MSR action and HMDB51.The performance of the proposed feature fusion model with SVM classifier is compared with the individual features with SVM. The fusion method showed best results. The efficiency of the classifier is also tested by comparing with the other state of the art classifiers such as k-nearest neighbors (KNN), artificial neural network (ANN) and Adaboost classifier. The method achieved an average of 94.41% recognition rate.

     

     

  • References

    1. [1] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In ICCV, 2005.

      [2] C. Fanti, L. Zelnik-manor, and P. Perona. Hybrid models for human motion recognition. In ICCV, 2005.

      [3] Chen MY, Hauptmann AG. MoSIFT: recognizing human actions in surveillance videos. Technological report, CMU-CS-09-161, Carnegie Mellon University; 2009. pp. 9–161.

      [4] Sheikh Y, Sheikh M, Shah M. Exploring the space of a human action. Int Conf Comput Vision, ICCV IEEE 2005:144–9.

      [5] Ke Y, Sukthanka R, Hebert M. Efficient visual event detection using volumetric features. Int Conf Comput Vision, ICCV IEEE 2005;1:166–73.

      [6] Fathi A, Mori G. Action recognition by learning mid-level motion features. Computer Vision Pattern Recognition, CVPR IEEE 2008:1–8.

      [7] Gemert J, Geusebroe J, Veenman C, Smeulders A. Kernel codebooks for scene categorization. Proc Euro Conf Comput Vision, ECCV 2008:696–709.

      [8] Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local SVM approach. Int Conf Pattern Recogn, ICPR IEEE 2004;3:32–6.

      [9] D. G. Lowe, Object Recognition from Local Scale-Invariant Features, International Conference on Computer Vision, 1999.

      [10] D. G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 2004.

      [11] C Harris, M. Stephens. A combined corner and edge detector, In M.M. Mathews, editor, Proc of Alvey vision conference, pages 147-151, University of Manchester, England. September, 1988.

      [12] N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2005.

      [13] N. Buch, J. Orwell, S. A. Velastin. 3D Extended Histogram of Oriented Gradients (3DHOG) for Classification of Road Users in Urban Scenes, British Machine Vision Conference, 2009.

      [14] M. Calonder, V. Lepetit, C. Strecha, P. Fua. Brief: Binary robust independent elementary features. European Conference on Computer Vision, 2010.

      [15] H. Bay, T. Tuytelaars, L. Van Gool. Surf: Speeded up robust features. European Conference on Computer Vision, May 2006.

      [16] Kushwaha, A.K.S.; Srivastava, S.; Srivastava, R. Multi-view human activity recognition based on silhouette and uniform rotation invariant local binary patterns. Multimedia Syst. 2016.

      [17] Zhang, Jia-Tao, Ah-Chung Tsoi, and Sio-Long Lo,“Scale invariant feature transform flow trajectory approach with applications to human action recognition,†Neural Networks (IJCNN), 2014 International Joint Conference on. IEEE, 2014.

      [18] Qazi, Hassaan Ali, Umar Jahangir, Bilal M. Yousuf, and Aqib Noor. "Human action recognition using SIFT and HOG method." In Information and Communication Technologies (ICICT), 2017 International Conference on, pp. 6-10. IEEE, 2017.

      [19] Chen MY, Hauptmann AG. MoSIFT: recognizing human actions in surveillance videos. Technological report, CMU-CS-09-161, Carnegie Mellon University; 2009. p. 9–161.

      [20] Souvenir, R.; Babbs, J. Learning the viewpoint manifold for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2008 (CVPR 2008), Anchorage, AK, USA, 23–28 June 2008; pp. 1–7.

      [21] Chaaraoui, A.A.; Climent-Pérez, P.; Flórez-Revuelta, F. Silhouette-based human action recognition using sequences of key poses. Pattern Recognit. Lett. 2013, 34, 1799–1807.

      [22] Ahmad, M.; Lee, S.-W. HMM-based human action recognition using multiview image sequences. In Proceedings of the 18th International Conference on Pattern Recognition 2006 (ICPR 2006), Hong Kong, China, 20–24 August 2006; pp. 263–266.

      [23] Weinland, D.; Özuysal, M.; Fua, P. Making Action Recognition Robust to Occlusions and Viewpoint Changes. In Computer Vision–ECCV 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 635–648.

      [24] Sargano, Allah Bux, Plamen Angelov, and Zulfiqar Habib. "Human action recognition from multiple views based on view-invariant feature descriptor using support vector machines." Applied Sciences 6, no. 10 (2016): 309.

      [25] H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid. Evaluation of local spatio-temporal features for action recognition. In BMVC, 2010.

      [26] Schuldt, Christian, Ivan Laptev, and Barbara Caputo. "Recognizing human actions: a local SVM approach." In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 3, pp. 32-36. IEEE, 2004.

      [27] Gorelick, Lena, Moshe Blank, Eli Shechtman, Michal Irani, and Ronen Basri. "Actions as space-time shapes." IEEE transactions on pattern analysis and machine intelligence 29, no. 12 (2007): 2247-2253.

      [28] University of Central Florida, UCF sports action dataset, February 2012. <http://vision.eecs.ucf.edu/datasetsActions.html>.

      [29] K. Soomro, A. R. Zamir, and M. Shah. UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR, abs/1212.0402, 2012.

      [30] J. Yuan, Z. Liu, and Y. Wu. Discriminative video pattern search for efficient action detection. http://users.eecs.northwestern.edu/~jyu410/index_files/actiondetection.html, January 2012.

      [31] Serre lab. Hmdb: A large video database for human motion recognition. http://serre-lab.clps.brown.edu/resources/HMDB/index.htm, November 2011.

      [32] Tao, Michael, Jiamin Bai, Pushmeet Kohli, and Sylvain Paris. "SimpleFlow: A Nonâ€iterative, Sublinear Optical Flow Algorithm." In Computer Graphics Forum, vol. 31, no. 2pt1, pp. 345-353. Blackwell Publishing Ltd, 2012.

      [33] Ali Borji and Laurent Itti. State-of-the-art in visual attention modeling. IEEE transactions on pattern analysis and machine intelligence, 35(1):185–207, 2013.

      [34] Lowe, David G. "Distinctive image features from scale-invariant keypoints." International journal of computer vision 60, no. 2 (2004): 91-110.

      [35] Svab, Jan, Tomas Krajnik, Jan Faigl, and Libor Preucil. "FPGA based speeded up robust features." In Technologies for Practical Robot Applications, 2009. TePRA 2009. IEEE International Conference on, pp. 35-41. IEEE, 2009.

      [36] Bay, Herbert, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. "Speeded-up robust features (SURF)." Computer vision and image understanding 110, no. 3 (2008): 346-359.

      [37] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886–893. IEEE, 2005.

      [38] D.G. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110, 2004.

      [39] T. Ojala, M. Pietikäinen, and T. Mäenpää, ``Multiresolution gray-scale and rotation invariant texture classication with local binary patterns,'' IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971987, Jul. 2002.

      [40] V. Vapnik, “Statistical Learn Theory,†John Wiley, New York, 1998.

      [41] C. Cortes, V. Vapnik, “Support vector networks,†Machine Learning, vol. 20, pp. 273-297, 1995.

      [42] C. Burges, “A tutorial on support vector machines for pattern recognition,†Data Mining Knowledge Discovery, vol. 2(2), pp. 121-167, 1998.

      [43] Liu J, Shah M. Learning human actions via information maximization. Comput Vision Pattern Recogn, CVPR IEEE 2008:1–8.

      [44] Niebles J, Wang H, Fei-Fei L. Unsupervised learning of human action categories using spatial-temporal words. Int J Computer Vision 2008;79(3):299–318.

      [45] Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local SVM approach. Int Conf Pattern Recogn, ICPR IEEE 2004;3:32–6.

      [46] Dollar P, Rabaud V, Cottrell G, Belongie S. Behavior recognition via sparse spatio-temporal features. IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance 2005:65–72.

      [47] Zhang Z, Hu Y, Chan S, Chia LT. Motion context: a new representation for human action recognition. Proceedings of the European conference on computer vision, ECCV Springer 2008; 5305:817–29.

      [48] Lin Z, Jiang Z, Davis LS. Recognizing actions by shape motion prototype trees. Int Conf Comput Vision, ICCV IEEE. p. 1–8.

      Bregonzio M, Xiang T, Gong S. Fusing appearance and distribution information of interest points for action recognition. Pattern Recognition 2012;45(3):1220–34
  • Downloads

  • How to Cite

    Rajendra Prasad, K., & Srinivasa Rao, P. (2018). A novel feature fusion based Human Action Recognition in 2D Videos. International Journal of Engineering & Technology, 7(2.20), 207-213. https://doi.org/10.14419/ijet.v7i2.20.13297

    Received date: 2018-05-26

    Accepted date: 2018-05-26

    Published date: 2018-04-18