TY - GEN
T1 - Avoiding speaker variability in pronunciation verification of children's disordered speech
AU - Saz, Oscar
AU - Lleida, Eduardo
AU - Rodríguez, W. Ricardo
PY - 2009
Y1 - 2009
N2 - This paper deals with the problematic of speaker variability in a task of pronunciation verification for the speech therapy of children and young adults in Computer-Aided Pronunciation Training (CAPT) tools. The baseline system evaluates two different score normalization techniques: Traditional Test normalization (T-norm), and a novel N-best based normalization that outperforms the first by normalizing to the log-likelihood score of the first alternative phoneme in an unconstrained N-best list. When performing speaker adaptation, the use of all the adaptation data from the speaker improves the performance measured in Equal Error Rate (EER) of these systems compared to the speaker independent systems; but this can be outperformed by more precise models that only adapt to the correctly pronounced phonetic units as labeled by a set of human experts. The best EER obtained in all experiments is 15.63% when using both elements: Score normalization and speaker adaptation. The possibility of automatizing a more precise adaptation without the human intervention is finally proposed and discussed.
AB - This paper deals with the problematic of speaker variability in a task of pronunciation verification for the speech therapy of children and young adults in Computer-Aided Pronunciation Training (CAPT) tools. The baseline system evaluates two different score normalization techniques: Traditional Test normalization (T-norm), and a novel N-best based normalization that outperforms the first by normalizing to the log-likelihood score of the first alternative phoneme in an unconstrained N-best list. When performing speaker adaptation, the use of all the adaptation data from the speaker improves the performance measured in Equal Error Rate (EER) of these systems compared to the speaker independent systems; but this can be outperformed by more precise models that only adapt to the correctly pronounced phonetic units as labeled by a set of human experts. The best EER obtained in all experiments is 15.63% when using both elements: Score normalization and speaker adaptation. The possibility of automatizing a more precise adaptation without the human intervention is finally proposed and discussed.
KW - Children speech
KW - Pronunciation evaluation
KW - Speech disorders
UR - http://www.scopus.com/inward/record.url?scp=74049088197&partnerID=8YFLogxK
U2 - 10.1145/1640377.1640388
DO - 10.1145/1640377.1640388
M3 - Conference contribution
AN - SCOPUS:74049088197
SN - 9781605586908
T3 - Proceedings of the 2nd Workshop on Child, Computer and Interaction, WOCCI '09
BT - Proceedings of the 2nd Workshop on Child, Computer and Interaction, WOCCI '09
T2 - 2nd Workshop on Child, Computer and Interaction, WOCCI '09
Y2 - 5 November 2009 through 5 November 2009
ER -