|Title:||Corpus-based Parse Pruning - Applying Empirical Data to Symbolic Knowledge||Add Dissertation|
|Author:||Sonja Müller||Update Dissertation|
|Email:||click here to access email|
|Institution:||Saarland University, Department of Computational Linguistics and Phonetics|
|Abstract:||On parsing natural language, the number of syntactically ambiguous situations inevitably grows with the coverage of the grammar. Therefore, most broad-coverage applications use one or other supplementary mechanism to decide on the respective probability of several ambiguous (partial) analyses.
In this thesis, I propose corpus-based parse pruning: A database of probabilistically weighted, multi-level constituent structures is generated from a stratificational German corpus and utilized as a backbone for a broad-coverage dependency grammar (Slot Grammar).
This pruning approach yields high-quality parsing results. An extensive evaluation of the syntactic variety in the training corpus and a series of experiments on quantity and quality of the constituent structures used for pruning give further insight into the criteria that help a language model to get representative and dynamically adaptable: Corpus size, a multi-purpose annotation scheme, and a wide variety of authors.