Performance of a SCFG-based language model with training data sets of increasing size

Joan Andreu Sánchez, José Miguel Benedí, Diego Linares

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

In this paper, a hybrid language model which combines a word-based n-gram and a category-based Stochastic Context-Free Grammar (SCFG) is evaluated for training data sets of increasing size. Different estimation algorithms for learning SCFGs in General Format and in Chomsky Normal Form are considered. Experiments on the UPenn Treebank corpus are reported. These experiments have been carried out in terms of the test set perplexity and the word error rate in a speech recognition experiment.

Original languageEnglish
Pages (from-to)586-594
Number of pages9
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3523
Issue numberII
DOIs
StatePublished - 2005
EventSecond Iberian Conference on Pattern Recognition and Image Analysis, IbPRIA 2005 - Estoril, Portugal
Duration: 07 Jun 200509 Jun 2005

Fingerprint

Dive into the research topics of 'Performance of a SCFG-based language model with training data sets of increasing size'. Together they form a unique fingerprint.

Cite this