Resumen
This paper presents the results on an Automatic Speech Recognition (ASR) framework that takes advantage of robust vocal tract length estimation methods for improving the performance of speech recognition in the presence of speakers with different conditions in age and gender. Well known techniques for Vocal Tract Length Normalization (VTLN) usually require previous stages for the estimation of the best warping factor for a given speaker, either by Maximum Likelihood (ML) estimates or by the calculation of acoustic features from the speakers like formant frecuencies through several utterances. This paper will show how to use robust framewise estimations of the vocal tract length to obtain a speaker dependent warping factor for achieving major improvements over all conditions of the TIDigits database. In the end, an updating function will be used to calculate an on-line estimate of the vocal tract length and the warping factor to use real time VTLN in speech recognition with similar results to the off-line strategies.
Idioma original | Inglés |
---|---|
Título de la publicación alojada | FALA 2010 Proceedings |
Subtítulo de la publicación alojada | "VI Jornadas en Tecnología del Habla" and II Iberian SLTech Workshop |
Páginas | 119-122 |
Número de páginas | 4 |
Estado | Publicada - 2010 |
Publicado de forma externa | Sí |
Evento | Fala 2010, VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop - Centro Social Caixanova, Vigo, Espana Duración: 10 nov. 2010 → 12 nov. 2010 http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VI/index.html |
Conferencia
Conferencia | Fala 2010, VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop |
---|---|
País/Territorio | Espana |
Ciudad | Vigo |
Período | 10/11/10 → 12/11/10 |
Dirección de internet |