Abstract
This paper presents the results on an Automatic Speech Recognition (ASR) framework that takes advantage of robust vocal tract length estimation methods for improving the performance of speech recognition in the presence of speakers with different conditions in age and gender. Well known techniques for Vocal Tract Length Normalization (VTLN) usually require previous stages for the estimation of the best warping factor for a given speaker, either by Maximum Likelihood (ML) estimates or by the calculation of acoustic features from the speakers like formant frecuencies through several utterances. This paper will show how to use robust framewise estimations of the vocal tract length to obtain a speaker dependent warping factor for achieving major improvements over all conditions of the TIDigits database. In the end, an updating function will be used to calculate an on-line estimate of the vocal tract length and the warping factor to use real time VTLN in speech recognition with similar results to the off-line strategies.
| Original language | English |
|---|---|
| Title of host publication | FALA 2010 Proceedings |
| Subtitle of host publication | "VI Jornadas en Tecnología del Habla" and II Iberian SLTech Workshop |
| Pages | 119-122 |
| Number of pages | 4 |
| State | Published - 2010 |
| Externally published | Yes |
| Event | Fala 2010, VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop - Centro Social Caixanova, Vigo, Spain Duration: 10 Nov 2010 → 12 Nov 2010 http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VI/index.html |
Conference
| Conference | Fala 2010, VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop |
|---|---|
| Country/Territory | Spain |
| City | Vigo |
| Period | 10/11/10 → 12/11/10 |
| Internet address |
Fingerprint
Dive into the research topics of 'On Line Vocal Tract Length Estimation for Speaker Normalization in Speech Recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver