LINGUIST List 17.3141

Thu Oct 26 2006

Diss: Computational Ling/Ling Theories: Buch-Kromann: 'Discontinuou...'

Editor for this issue: Hannah Morales <hannahlinguistlist.org>


Directory         1.    Matthias Buch-Kromann, Discontinuous Grammar: A dependency-based model of human parsing and language learning


Message 1: Discontinuous Grammar: A dependency-based model of human parsing and language learning
Date: 26-Oct-2006
From: Matthias Buch-Kromann <mtk.idcbs.dk>
Subject: Discontinuous Grammar: A dependency-based model of human parsing and language learning


Institution: Copenhagen Business School Program: Department of Computational Linguistics Dissertation Status: Completed Degree Date: 2006

Author: Matthias Buch-Kromann

Dissertation Title: Discontinuous Grammar: A dependency-based model of human parsing and language learning

Dissertation URL: http://www.id.cbs.dk/~mtk/thesis

Linguistic Field(s): Computational Linguistics                             Linguistic Theories
Dissertation Director:
Sabine Kirchmeier-Andersen Carl Vikner
Dissertation Abstract:

In the dissertation, Matthias Buch-Kromann presents his dependency-basedgrammar formalism, Discontinuous Grammar. The dissertation argues thatgrammars should not only distinguish between grammatical and ungrammaticallinguistic analyses, but that they should assign a number (a cost) to theindividual words in both grammatical and ungrammatical analyses, so thatthe cost measures the syntactic, semantic, and pragmatic well-formedness ofthe individual words; in that way, the grammar can be used to preciselylocalize linguistic errors in the analysis. In this setting, parsing,generation and machine translation can be viewed as optimization problemswhere the goal is to find the cheapest analysis that satisfies a given sidecondition -- eg, that the analysis corresponds to a given text (parsing),semantic representation (generation), or source text (machine translation).

The dissertation demonstrates how the proposed formalism deals with a widerange of linguistic phenomena, including the complement and adjunctdistinction; discontinuous word orders and island constraints; controlconstructions, relatives, and parasitic gaps; elliptic coordinations;anaphora and discourse structure; punctuation; and inflectional andderivational morphology. The dissertation also describes how these analyseshave formed the theoretical basis for the construction of the DanishDependency Treebank, a general purpose corpus for Danish with 100,000 wordsequipped with complete dependency analyses.

The dissertation also proposes two methods, HPM and XHPM, for thestatistical estimation of hierarchically classifiable data such as words independency relations, which can be classified according to word class andontological class. The dissertation moreover proposes a statisticallanguage model based on the proposed grammar formalism and estimationmethod. Finally, the dissertation proposes a parsing algorithm, localoptimality parsing, which can be used in combination with a manual orstatistically induced grammar to segment and parse an entire discourse. Thedissertation argues that the parsing algorithm has a number of theoreticaladvantages compared with other parsing algorithms, such as its speed (ithas an almost-linear time complexity) and its potential as a plausiblemodel of human parsing.