A Novel Corpus of Children's Disordered Speech

Research output: Contribution to conferencePaperpeer-review

7 Scopus citations

Abstract

This paper introduces the acquisition, evaluation and baseline Automatic Speech Recognition (ASR) experiments of a novel corpus containing speech from a set of impaired and unimpaired young speakers. A group of 14 speakers with different speech disorders have uttered several sessions over a 57-word vocabulary in Spanish to gather more than 3 hours of speech. In addition to this work, a parallel corpus of speech from unimpaired young speakers has been recorded with more than 6 hours of speech with the same vocabulary. The impaired speech corpus has been evaluated through a manual labeling to detect the mispronunciations made by the speakers, and the outcome of this work show that 17.31% of the phonemes have been either mispronounced or deleted in an isolated work task. A baseline evaluation of the performance of an state-of-the-art ASR system shows a 35.02% of Word Error Rate (WER) when using Speaker Independent models based on adult speech. This WER is reduced to 27.60% using models based on children speech and further reduced to 15.35% using speaker dependent models. Finally, experiments on connected speech show how ASR performance degrades on 4 impaired speakers on the transition from isolated words to connected speech due to the language impairments of the speakers and the coarticulation in connected speech.

Original languageEnglish
StatePublished - 2008
Externally publishedYes
Event1st Workshop on Child, Computer and Interaction, WOCCI 2008 - Chania, Greece
Duration: 23 Oct 2008 → …

Conference

Conference1st Workshop on Child, Computer and Interaction, WOCCI 2008
Country/TerritoryGreece
CityChania
Period23/10/08 → …

Fingerprint

Dive into the research topics of 'A Novel Corpus of Children's Disordered Speech'. Together they form a unique fingerprint.

Cite this