LINGUIST List 13.1535

Tue May 28 2002

Diss: Computational Ling: Eisner "Smoothing..."

Editor for this issue: Karolina Owczarzak <>


  1. jason, Computational Ling: Eisner "Smoothing a Probabilistic Lexicon..."

Message 1: Computational Ling: Eisner "Smoothing a Probabilistic Lexicon..."

Date: Sat, 25 May 2002 02:07:18 +0000
From: jason <>
Subject: Computational Ling: Eisner "Smoothing a Probabilistic Lexicon..."

New Dissertation Abstract

Institution: University of Pennsylvania
Program: Computer and Information Science
Dissertation Status: Completed
Degree Date: 2001

Author: Jason Michael Eisner 

Dissertation Title: 
Smoothing a Probabilistic Lexicon Via Syntactic Transformations

Dissertation URL:

Linguistic Field: Computational Linguistics

Dissertation Director 1: Mitchell P. Marcus

Dissertation Abstract: 

Probabilistic parsing requires a lexicon that specifies each word's
syntactic preferences in terms of probabilities. To estimate these
probabilities for words that were poorly observed during training,
this thesis assumes the existence of arbitrarily powerful
transformations (also known to linguists as lexical redundancy rules
or metarules) that can add, delete, retype or reorder the argument and
adjunct positions specified by a lexical entry. In a given language,
some transformations apply frequently and others rarely. We describe
how to estimate the rates of the transformations from a sample of
lexical entries. More deeply, we learn which properties of a
transformation increase or decrease its rate in the language. As a
result, we can smooth the probabilities of lexical entries. Given
enough direct evidence about a lexical entry's probability, our
Bayesian approach trusts the evidence; but when less evidence or no
evidence is available, it relies more on the transformations' rates
to guess how often the entry will be derived from related

Abstractly, the proposed ``transformation models'' are probability
distributions that arise from graph random walks with a log-linear
parameterization. A domain expert constructs the parameterized graph,
and a vertex is likely according to whether random walks tend to halt
at it. Transformation models are suited to any domai where ``related''
events (as defined by the graph) may have positively covarying
probabilities. Such models admit a natural prior that favors simple
regular relationships over stipulative exceptions. The model
parameters can be locally optimized by gradient-based methods or by
Expectation-Maximization. Exact algorithms (matrix inversion) and
approximate ones (relaxation) are provided, with
optimizations. Variations on the idea are also discussed. We
compare the new technique empirically to previous techniques from the
probabilistic parsing literature, using comparable features, and
obtain a 20% perplexity reduction (similar to doubling the amount of
training data). Some of this reduction is shown to stem from the
transformation model's ability to match observed probabilities, and
some from its ability to generalize. Model averaging yields a final
24% perplexity reduction.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue