On Line Vocal Tract Length Estimation for Speaker Normalization in Speech Recognition

William Ricardo Rodríguez Dueñas, Oscar Saz, Eduardo Lleida, Antonio Miguel

Producción: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

This paper presents the results on an Automatic Speech Recognition (ASR) framework that takes advantage of robust vocal tract length estimation methods for improving the performance of speech recognition in the presence of speakers with different conditions in age and gender. Well known techniques for Vocal Tract Length Normalization (VTLN) usually require previous stages for the estimation of the best warping factor for a given speaker, either by Maximum Likelihood (ML) estimates or by the calculation of acoustic features from the speakers like formant frecuencies through several utterances. This paper will show how to use robust framewise estimations of the vocal tract length to obtain a speaker dependent warping factor for achieving major improvements over all conditions of the TIDigits database. In the end, an updating function will be used to calculate an on-line estimate of the vocal tract length and the warping factor to use real time VTLN in speech recognition with similar results to the off-line strategies.
Idioma originalInglés
Título de la publicación alojadaFALA 2010 Proceedings
Subtítulo de la publicación alojada"VI Jornadas en Tecnología del Habla" and II Iberian SLTech Workshop
Páginas119-122
Número de páginas4
EstadoPublicada - 2010
Publicado de forma externa
EventoFala 2010, VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop - Centro Social Caixanova, Vigo, Espana
Duración: 10 nov. 201012 nov. 2010
http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VI/index.html

Conferencia

ConferenciaFala 2010, VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop
País/TerritorioEspana
CiudadVigo
Período10/11/1012/11/10
Dirección de internet

Huella

Profundice en los temas de investigación de 'On Line Vocal Tract Length Estimation for Speaker Normalization in Speech Recognition'. En conjunto forman una huella única.

Citar esto