Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!


Revitalizing Endangered Languages

Edited by Justyna Olko & Julia Sallabank

Revitalizing Endangered Languages "This guidebook provides ideas and strategies, as well as some background, to help with the effective revitalization of endangered languages. It covers a broad scope of themes including effective planning, benefits, wellbeing, economic aspects, attitudes and ideologies."

E-mail this page

We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at***

Dissertation Information

Title: Corpus-based Parse Pruning - Applying Empirical Data to Symbolic Knowledge Add Dissertation
Author: Sonja Müller Update Dissertation
Email: click here to access email
Institution: Saarland University, Department of Computational Linguistics and Phonetics
Completed in: 2000
Linguistic Subfield(s): Syntax;
Director(s): Hans Uszkoreit
Manfred Pinkal

Abstract: On parsing natural language, the number of syntactically ambiguous situations inevitably grows with the coverage of the grammar. Therefore, most broad-coverage applications use one or other supplementary mechanism to decide on the respective probability of several ambiguous (partial) analyses.

In this thesis, I propose corpus-based parse pruning: A database of probabilistically weighted, multi-level constituent structures is generated from a stratificational German corpus and utilized as a backbone for a broad-coverage dependency grammar (Slot Grammar).

This pruning approach yields high-quality parsing results. An extensive evaluation of the syntactic variety in the training corpus and a series of experiments on quantity and quality of the constituent structures used for pruning give further insight into the criteria that help a language model to get representative and dynamically adaptable: Corpus size, a multi-purpose annotation scheme, and a wide variety of authors.