Resumen
This paper introduces the acquisition, evaluation and baseline Automatic Speech Recognition (ASR) experiments of a novel corpus containing speech from a set of impaired and unimpaired young speakers. A group of 14 speakers with different speech disorders have uttered several sessions over a 57-word vocabulary in Spanish to gather more than 3 hours of speech. In addition to this work, a parallel corpus of speech from unimpaired young speakers has been recorded with more than 6 hours of speech with the same vocabulary. The impaired speech corpus has been evaluated through a manual labeling to detect the mispronunciations made by the speakers, and the outcome of this work show that 17.31% of the phonemes have been either mispronounced or deleted in an isolated work task. A baseline evaluation of the performance of an state-of-the-art ASR system shows a 35.02% of Word Error Rate (WER) when using Speaker Independent models based on adult speech. This WER is reduced to 27.60% using models based on children speech and further reduced to 15.35% using speaker dependent models. Finally, experiments on connected speech show how ASR performance degrades on 4 impaired speakers on the transition from isolated words to connected speech due to the language impairments of the speakers and the coarticulation in connected speech.
Idioma original | Inglés |
---|---|
Estado | Publicada - 2008 |
Publicado de forma externa | Sí |
Evento | 1st Workshop on Child, Computer and Interaction, WOCCI 2008 - Chania, Grecia Duración: 23 oct. 2008 → … |
Conferencia
Conferencia | 1st Workshop on Child, Computer and Interaction, WOCCI 2008 |
---|---|
País/Territorio | Grecia |
Ciudad | Chania |
Período | 23/10/08 → … |