* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 17.3141

Thu Oct 26 2006

Diss: Computational Ling/Ling Theories: Buch-Kromann: 'Discontinuou...'

Editor for this issue: Hannah Morales <hannahlinguistlist.org>


To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
Directory
        1.    Matthias Buch-Kromann, Discontinuous Grammar: A dependency-based model of human parsing and language learning


Message 1: Discontinuous Grammar: A dependency-based model of human parsing and language learning
Date: 26-Oct-2006
From: Matthias Buch-Kromann <mtk.idcbs.dk>
Subject: Discontinuous Grammar: A dependency-based model of human parsing and language learning


Institution: Copenhagen Business School
Program: Department of Computational Linguistics
Dissertation Status: Completed
Degree Date: 2006

Author: Matthias Buch-Kromann

Dissertation Title: Discontinuous Grammar: A dependency-based model of human parsing and language learning

Dissertation URL: http://www.id.cbs.dk/~mtk/thesis

Linguistic Field(s): Computational Linguistics
                            Linguistic Theories

Dissertation Director:
Sabine Kirchmeier-Andersen
Carl Vikner

Dissertation Abstract:

In the dissertation, Matthias Buch-Kromann presents his dependency-based
grammar formalism, Discontinuous Grammar. The dissertation argues that
grammars should not only distinguish between grammatical and ungrammatical
linguistic analyses, but that they should assign a number (a cost) to the
individual words in both grammatical and ungrammatical analyses, so that
the cost measures the syntactic, semantic, and pragmatic well-formedness of
the individual words; in that way, the grammar can be used to precisely
localize linguistic errors in the analysis. In this setting, parsing,
generation and machine translation can be viewed as optimization problems
where the goal is to find the cheapest analysis that satisfies a given side
condition -- eg, that the analysis corresponds to a given text (parsing),
semantic representation (generation), or source text (machine translation).

The dissertation demonstrates how the proposed formalism deals with a wide
range of linguistic phenomena, including the complement and adjunct
distinction; discontinuous word orders and island constraints; control
constructions, relatives, and parasitic gaps; elliptic coordinations;
anaphora and discourse structure; punctuation; and inflectional and
derivational morphology. The dissertation also describes how these analyses
have formed the theoretical basis for the construction of the Danish
Dependency Treebank, a general purpose corpus for Danish with 100,000 words
equipped with complete dependency analyses.

The dissertation also proposes two methods, HPM and XHPM, for the
statistical estimation of hierarchically classifiable data such as words in
dependency relations, which can be classified according to word class and
ontological class. The dissertation moreover proposes a statistical
language model based on the proposed grammar formalism and estimation
method. Finally, the dissertation proposes a parsing algorithm, local
optimality parsing, which can be used in combination with a manual or
statistically induced grammar to segment and parse an entire discourse. The
dissertation argues that the parsing algorithm has a number of theoretical
advantages compared with other parsing algorithms, such as its speed (it
has an almost-linear time complexity) and its potential as a plausible
model of human parsing.



Respond to list|Read more issues|LINGUIST home page|Top of issue




Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.