Skip to main navigation Skip to search Skip to main content

On Line Vocal Tract Length Estimation for Speaker Normalization in Speech Recognition

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper presents the results on an Automatic Speech Recognition (ASR) framework that takes advantage of robust vocal tract length estimation methods for improving the performance of speech recognition in the presence of speakers with different conditions in age and gender. Well known techniques for Vocal Tract Length Normalization (VTLN) usually require previous stages for the estimation of the best warping factor for a given speaker, either by Maximum Likelihood (ML) estimates or by the calculation of acoustic features from the speakers like formant frecuencies through several utterances. This paper will show how to use robust framewise estimations of the vocal tract length to obtain a speaker dependent warping factor for achieving major improvements over all conditions of the TIDigits database. In the end, an updating function will be used to calculate an on-line estimate of the vocal tract length and the warping factor to use real time VTLN in speech recognition with similar results to the off-line strategies.
Original languageEnglish
Title of host publicationFALA 2010 Proceedings
Subtitle of host publication"VI Jornadas en Tecnología del Habla" and II Iberian SLTech Workshop
Pages119-122
Number of pages4
StatePublished - 2010
Externally publishedYes
EventFala 2010, VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop - Centro Social Caixanova, Vigo, Spain
Duration: 10 Nov 201012 Nov 2010
http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/VI/index.html

Conference

ConferenceFala 2010, VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop
Country/TerritorySpain
CityVigo
Period10/11/1012/11/10
Internet address

Fingerprint

Dive into the research topics of 'On Line Vocal Tract Length Estimation for Speaker Normalization in Speech Recognition'. Together they form a unique fingerprint.

Cite this