Resumen
This paper introduces the acquisition, evaluation and baseline Automatic Speech Recognition (ASR) experiments of a
novel corpus containing speech from a set of impaired and
unimpaired young speakers. A group of 14 speakers with different speech disorders have uttered several sessions over a
57-word vocabulary in Spanish to gather more than 3 hours
of speech. In addition to this work, a parallel corpus of
speech from unimpaired young speakers has been recorded
with more than 6 hours of speech with the same vocabulary.
The impaired speech corpus has been evaluated through a
manual labeling to detect the mispronunciations made by
the speakers, and the outcome of this work show that 17.31%
of the phonemes have been either mispronounced or deleted
in an isolated work task. A baseline evaluation of the performance of an state-of-the-art ASR system shows a 35.02%
of Word Error Rate (WER) when using Speaker Independent models based on adult speech. This WER is reduced
to 27.60% using models based on children speech and further reduced to 15.35% using speaker dependent models.
Finally, experiments on connected speech show how ASR
performance degrades on 4 impaired speakers on the transition from isolated words to connected speech due to the
language impairments of the speakers and the coarticulation
in connected speech.
novel corpus containing speech from a set of impaired and
unimpaired young speakers. A group of 14 speakers with different speech disorders have uttered several sessions over a
57-word vocabulary in Spanish to gather more than 3 hours
of speech. In addition to this work, a parallel corpus of
speech from unimpaired young speakers has been recorded
with more than 6 hours of speech with the same vocabulary.
The impaired speech corpus has been evaluated through a
manual labeling to detect the mispronunciations made by
the speakers, and the outcome of this work show that 17.31%
of the phonemes have been either mispronounced or deleted
in an isolated work task. A baseline evaluation of the performance of an state-of-the-art ASR system shows a 35.02%
of Word Error Rate (WER) when using Speaker Independent models based on adult speech. This WER is reduced
to 27.60% using models based on children speech and further reduced to 15.35% using speaker dependent models.
Finally, experiments on connected speech show how ASR
performance degrades on 4 impaired speakers on the transition from isolated words to connected speech due to the
language impairments of the speakers and the coarticulation
in connected speech.
Idioma original | Inglés |
---|---|
Título de la publicación alojada | Proceedings of the 2008 Workshop on Children, Computer and Interaction, Chania, Greece |
Estado | Publicada - 2008 |
Publicado de forma externa | Sí |