Visual recognition and classification of videos using deep convolutional neural networks

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    Classification of videos based on its content is one of the challenging and significant research problems. In this paper, a simple and efficient model is proposed for classification of sports videos using deep learned convolution neural networks. In the proposed research, the gray scale variants of image frames are employed for classification process through convolution technique at varied levels of abstraction by adapting it through a sequence of hidden layers. The image frames considered for classification are obtained after the duplicate frame elimination and each frame is further rescaled to dimension 120x240. The sports videos categories used for experimentation include badminton, football, cricket and tennis which are downloaded from various sources of google and YouTube. The classification in the proposed method is performed with Deep Convolution Neural Networks (DCNN) with around 20 filters each of size 5x5 with around stride length of2 and its outcomes are compared with Local Binary Patterns (LBP), Bag of Words Features (BWF) technique. The SURF features are extracted from the BWF technique and further 80% of strongest feature points are employed for clustering the image frames using K-Means clustering technique with an average accuracy achieved of about 87% in classification. The LBF technique had produced an average accuracy of 73% in differentiating one image frame to other whereas the DCNN had shown a promising outcome with accuracy of about 91% in case of 40% training and 60% test datasets, 99% accuracy in case of 60% training an 40% test datasets. The results depict that the proposed method outperforms the image processing-based techniques LBP and BWF.

     


  • Keywords


    Sports videos, convolutional neural networks, local binary patterns, bag of words features, SURF, K-Means clustering, video processing

  • References


      [1] Krizhevsky A, Sutskever I & Hinton GE, “Imagenet classification with deep convolutional neural networks”, Advances in neural information processing systems, (2012), pp.1097-1105

      [2] Ciregan D, Meier U & Schmidhuber J, “Multi-column deep neural networks for image classification”, IEEE conference on Computer vision and pattern recognition (CVPR), (2012), pp.3642-3649.

      [3] Simonyan K & Zisserman A, “Very deep convolutional networks for large-scale image recognition”, arXiv preprint arXiv:1409.1556, (2014).

      [4] Zeiler MD & Fergus R, “Visualizing and understanding convolutional networks”, European conference on computer vision, (2014), pp. 818-833.

      [5] Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R & LeCun Y, “Overfeat: Integrated recognition, localization and detection using convolutional networks”, arXiv preprint arXiv:1312.6229, (2013).

      [6] Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R. & Fei-Fei L, “Large-scale video classification with convolutional neural networks”, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, (2014), pp.1725-1732.

      [7] Ng JYH, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R & Toderici G, “Beyond short snippets: Deep networks for video classification”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), pp.4694-4702.

      [8] Brezeale D & Cook DJ, “Automatic video classification: A survey of the literature”, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol.38, No.3, (2008), pp.416-430.

      [9] Zhou W, Vellaikal A & Kuo CC, “Rule-based video classification system for basketball video indexing”, Proceedings of the ACM workshops on Multimedia, (2000), pp.213-216.

      [10] Huang J, Liu Z, Wang Y, Chen Y & Wong EK, “Integration of multimodal features for video scene classification based on HMM”, IEEE 3rd Workshop on Multimedia Signal Processing, (1999), pp. 53-58.

      [11] Lin WH & Hauptmann A, “News video classification using SVM-based multimodal classifiers and combination strategies”, Proceedings of the tenth ACM international conference on Multimedia, (2002), pp.323-326.

      [12] Xu LQ & Li Y, “Video classification using spatial-temporal features and PCA”, International Conference on Multimedia and Expo, (2003).

      [13] Dimitrova N, Agnihotri L & Wei G, “Video classification based on HMM using text and faces”, 10th European Signal Processing Conference, (2000), pp.1-4.

      [14] Yang J, Jiang YG, Hauptmann AG & Ngo CW, “Evaluating bag-of-visual-words representations in scene classification”, Proceedings of the international workshop on Workshop on multimedia information retrieval, (2007), pp.197-206.

      [15] Zhao G, Ahonen T, Matas J & Pietikainen M, “Rotation-invariant image and video description with local binary pattern features”, IEEE Transactions on Image Processing, Vol.21, No.4,(2012), pp.1465-1477.

      [16] Lippmann RP, “Pattern classification using neural networks”, IEEE communications magazine, Vol.27, No.11,(1989), pp.47-50.

      [17] Hinton GE, Osindero S & The YW, “A fast learning algorithm for deep belief nets”, Neural computation, Vol.18, No.7,(2006), pp.1527-1554.

      [18] Rani NS & Ashwini PS, “A Standardized Frame work for Handwritten and Printed Kannada Numeral Recognition and Translation using Probabilistic Neural Networks”, IJISET - International Journal of Innovative Science, Engineering & Technology, Vol.1, No.4, (2014).

      [19] Pushpa BR, Anand C & Mithun NP, “Ayurvedic Plant Species Recognition using StatisticalParameters on Leaf Images”, International Journal of Applied Engineering Research, Vol.11, No.7,(2016), pp.5142-5147.


 

View

Download

Article ID: 13403
 
DOI: 10.14419/ijet.v7i2.31.13403




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.