Comparative Analysis of Neural Networks for Speech Emotion Recognition

  • Authors

    • Hemanta Kumar Palo
    • Mihir N. Mohanty
    2018-12-13
    https://doi.org/10.14419/ijet.v7i4.39.23820
  • Neural Network, Speech Emotion Recognition, Feature Extraction, Feature Reduction, Classification
  • Abstract

    This paper aims to investigate the ability of Neural Network (NN) models in recognizing speech emotions.  Extensive simulation of NN models such as the Radial Basis Function Network (RBFN), the Multilayer Perceptron (MLP), and the Probabilistic Neural Network (PNN) has been carried out to determine the Speech Emotion Recognition (SER) Accuracy of emotional states such as anger, happiness, sadness, and boredom. The utterances for these states are chosen from the standard Berlin (EMO-DB) database.  The efficient Cepstral domain vocal tract system features such as the Linear Predictive Cepstral Coefficients (LPCC), Mel Frequency Cepstral Coefficients (MFCCs), the Perceptual Linear Prediction coefficients (PLP) are put to test for their emotional discriminating ability with the proposed setup. These features are extracted at a frame-level and are clustered into their corresponding Vector Quantized (VQ) coefficients to get rid of the redundant information before simulating the chosen classifiers. The NN based identification system models are experienced with the desired level of SER accuracy as these classifiers remain effective for low-dimensional feature sets.  An improved accuracy of 83% has been observed with the PNN using the LPCCVQ feature sets as compared to 82% with the RBFN and 78% with the MLP. Amongst the derived feature sets, the LPCCVQ remains more reliable in characterizing the intended speech emotions while the PNN has outperformed other NN classifiers in the classification category as revealed from our results.

     

     

  • References

    1. [1] D Torres-Boza, M C Oveneke, F Wang, D Jiang, W Verhelst, and H Sahli, “Hierarchical sparse coding framework for speech emotion recognition,†Speech Communication. Vol. 99, 80-89, 1st May 2018.

      [2] C Quan, B Zhang, X Sun, and F Ren, “A combined cepstral distance method for emotional speech recognition,†International Journal of Advanced Robotic Systems, Vol. 14, No. 4, Jul 2017.

      [3] A Majkowski, M Kolodziej, R J Rak, and R Korczyήski, “Classification of emotions from speech signal,†In Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), IEEE, 276-281, 21st Sep 2016.

      [4] H K Palo, M N Mohanty, and M Chandra, “Efficient feature combination techniques for emotional speech classification,†International journal of speech technology, Vol.19, No.1, 135-150, Mar 2016.

      [5] S Wu, T H Falk, and W Y Chan, “Automatic speech emotion recognition using modulation spectral features,†Speech communication, Vol. 53, No.5, 768-785,1st May 2011.

      [6] SG Koolagudi, Y S Murthy, and S P Bhaskar, “Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition,†International Journal of Speech Technology, Vol 21, No.1, 167-183, Mar 2018.

      [7] P Khanna, and M S Kumar, “Application of vector quantization in emotion recognition from human speech,†In International conference on information intelligence, systems, technology and management, Springer, Berlin, Heidelberg, 118-125, 10th Mar 2011.

      [8] H K Palo, and M N Mohanty, “Wavelet based feature combination for recognition of emotions,†Ain Shams Engineering Journal, 28th Jan 2017 (in press).

      [9] H Wenjing, L Haifeng, and G Chunyu, “A hybrid speech emotion perception method of VQ-based feature processing and ANN recognition,†In Intelligent Systems, GCIS'09, WRI Global Congress on 2009, IEEE, Vol. 2, 145-149, 19th May, 2009.

      [10] M. E. Ayadi, M SKamel, and FKarray, “Survey on speech emotion recognition: Features, classification schemes, and databases,†Pattern Recognition, Vol.44, No.3,572-587, 1st Mar 2011.

      [11] S SHaykin, “Neural networks and learning machines,†Upper Saddle River: Pearson, Vol.3, Nov 2009.

      [12] H K Palo, and M N Mohanty, “Modified-VQ Features for Speech Emotion Recognition,†Journal of Applied Sciences, Vol.16, No.9, 406-418, 15th Aug 2016.

      [13] D F Specht, “Probabilistic neural networks,†Neural networks, Vol.3, No.1, 109-118, 1st Jan 1990.

      [14] H K Palo, M N Mohanty, and M Chandra, “New features for emotional speech recognition,†In IEEE Power, Communication and Information Technology Conference (PCITC), 424-429, 15th Oct 2015.

      [15] F Burkhardt, A Paeschke, M Rolfes, W F Sendlmeier, and B Weiss, “A database of German emotional speech,†In Ninth European Conference on Speech Communication and Technology, Vol. 5, 1517–1520, 4th Sep 2005.

      [16] KSRao, and SGKoolagudi, “Robust emotion recognition using pitch synchronous and sub-syllabic spectral features. SpringerBriefs in Speech Technology, 17-46, Springer, New York, NY, 2013.

      [17] D Kamińska, T Sapiński, and A Pelikant, “Comparison of perceptual features efficiency for automatic identification of emotional states from speech,†In Human System Interaction (HSI), 6th IEEE International Conference, 210-213, 6th Jun 2013.

      [18] JYuan, LChen, TFan, and JJia, “Dimension reduction of speech emotion feature based on weighted linear discriminate analysis,â€International Journal on Image Processing and Pattern Recognition, Vol.8, No.11, 299-308, 8th Nov 2015.

  • Downloads

  • How to Cite

    Kumar Palo, H., & N. Mohanty, M. (2018). Comparative Analysis of Neural Networks for Speech Emotion Recognition. International Journal of Engineering & Technology, 7(4.39), 112-116. https://doi.org/10.14419/ijet.v7i4.39.23820

    Received date: 2018-12-12

    Accepted date: 2018-12-12

    Published date: 2018-12-13