LINGUIST List 17.3141
Thu Oct 26 2006
Diss: Computational Ling/Ling Theories: Buch-Kromann: 'Discontinuou...'
Editor for this issue: Hannah Morales
<hannahlinguistlist.org>
Directory
1. Matthias
Buch-Kromann,
Discontinuous Grammar: A dependency-based model of human parsing and language learning
Message 1: Discontinuous Grammar: A dependency-based model of human parsing and language learning
Date: 26-Oct-2006
From: Matthias Buch-Kromann <mtk.idcbs.dk>
Subject: Discontinuous Grammar: A dependency-based model of human parsing and language learning
Institution: Copenhagen Business School
Program: Department of Computational Linguistics
Dissertation Status: Completed
Degree Date: 2006
Author: Matthias Buch-Kromann
Dissertation Title: Discontinuous Grammar: A dependency-based model of human parsing and language learning
Dissertation URL: http://www.id.cbs.dk/~mtk/thesis
Linguistic Field(s):
Computational Linguistics
Linguistic Theories
Dissertation Director:
Sabine Kirchmeier-Andersen
Carl Vikner
Dissertation Abstract:
In the dissertation, Matthias Buch-Kromann presents his dependency-basedgrammar formalism, Discontinuous Grammar. The dissertation argues thatgrammars should not only distinguish between grammatical and ungrammaticallinguistic analyses, but that they should assign a number (a cost) to theindividual words in both grammatical and ungrammatical analyses, so thatthe cost measures the syntactic, semantic, and pragmatic well-formedness ofthe individual words; in that way, the grammar can be used to preciselylocalize linguistic errors in the analysis. In this setting, parsing,generation and machine translation can be viewed as optimization problemswhere the goal is to find the cheapest analysis that satisfies a given sidecondition -- eg, that the analysis corresponds to a given text (parsing),semantic representation (generation), or source text (machine translation).
The dissertation demonstrates how the proposed formalism deals with a widerange of linguistic phenomena, including the complement and adjunctdistinction; discontinuous word orders and island constraints; controlconstructions, relatives, and parasitic gaps; elliptic coordinations;anaphora and discourse structure; punctuation; and inflectional andderivational morphology. The dissertation also describes how these analyseshave formed the theoretical basis for the construction of the DanishDependency Treebank, a general purpose corpus for Danish with 100,000 wordsequipped with complete dependency analyses.
The dissertation also proposes two methods, HPM and XHPM, for thestatistical estimation of hierarchically classifiable data such as words independency relations, which can be classified according to word class andontological class. The dissertation moreover proposes a statisticallanguage model based on the proposed grammar formalism and estimationmethod. Finally, the dissertation proposes a parsing algorithm, localoptimality parsing, which can be used in combination with a manual orstatistically induced grammar to segment and parse an entire discourse. Thedissertation argues that the parsing algorithm has a number of theoreticaladvantages compared with other parsing algorithms, such as its speed (ithas an almost-linear time complexity) and its potential as a plausiblemodel of human parsing.
|