Automatic Selection of Speech Data based on Confidence Measure

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    The amount of training data used in automatic speech recognition and pronunciation aiding systems is one of the most important factors that can significantly affect the quality of the resulting systems. However, as the amount of training data increases, a huge effort in transcribing the data by professional linguists is needed. This task is usually expensive in terms of time and money. In this paper, we present an algorithm to automatically select more accurate subsets of speech data with high accuracy. The suggested algorithm utilizes confidence measures and posterior probabilities to extract parts of the data based on a confidence score. Experimental results and comparisons with a manually verified selection process and a random selection process show that the proposed algorithm is Robust and effective



  • Keywords

    data selection; confidence measure; speech processing; posterior probabilities; machine learning

  • References

      [1] Wu Y., Zhang R., and Rudnicky A.. Data selection for speech recognition. In Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop, pp. 562-565, 2007.

      [2] Nagórski A., Boves L., and Steeneken H. J.. Optimal selection of speech data for automatic speech recognition systems. In INTERSPEECH, 2002.

      [3] Wei K., Liu Y., Kirchhoff K., and Bilmes J.. Unsupervised submodular subset selection for speech data. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference, pp. 4107-4111, 2014.

      [4] Wei K., Liu Y., Kirchhoff K., Bartels C., and Bilmes J.. Submodular subset selection for large-scale speech training data. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference,

      [5] 3311-3315, 2014.

      [6] Christensen H., Casanueva I., Cunningham S., Green P., and Hain T. Automatic selection of speakers for improved acoustic modelling: recognition of disordered speech with sparse data. In Spoken Language Technology Workshop (SLT), 2014 IEEE, pp. 254-259, 2014.

      [7] Hämäläinen A., et al.. Improving Speech Recognition through Automatic Selection of Age Group–Specific Acoustic Models. In International Conference on Computational Processing of the Portuguese Language,

      [8] 12-23, Springer International Publishing, 2014.

      [9] Al-Barhamtoshy H., Abdou S., and Jambi K. Pronunciation Evaluation Model for None Native English Speakers, http://www.lifesciencesite

      [10] .com/lsj/life1109/030_24719life110914_216_226.pdf, pp. 216-226, 2014.




Article ID: 28186
DOI: 10.14419/ijet.v8i1.11.28186

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.