* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 21.221

Thu Jan 14 2010

Diss: Comp Ling: Junczys-Dowmunt: 'German Compound Nouns and their...'

Editor for this issue: Di Wdzenczny <dilinguistlist.org>


To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
Directory
        1.    Marcin Junczys-Dowmunt, German Compound Nouns and their Polish Equivalents: Automatic extraction, analysis and verification based on parallel corpora

Message 1: German Compound Nouns and their Polish Equivalents: Automatic extraction, analysis and verification based on parallel corpora
Date: 11-Jan-2010
From: Marcin Junczys-Dowmunt <junczysamu.edu.pl>
Subject: German Compound Nouns and their Polish Equivalents: Automatic extraction, analysis and verification based on parallel corpora
E-mail this message to a friend

Institution: Adam Mickiewicz University
Program: Linguistics Program
Dissertation Status: Completed
Degree Date: 2009

Author: Marcin Junczys-Dowmunt

Dissertation Title: German Compound Nouns and their Polish Equivalents: Automatic extraction, analysis and verification based on parallel corpora

Dissertation URL: http://www.staff.amu.edu.pl/~junczys/index.php?title=Publications

Linguistic Field(s): Computational Linguistics

Subject Language(s): German, Standard (deu)
                            Polish (pol)

Dissertation Director:
Krzysztof Jassem
Jerzy Pogonowski

Dissertation Abstract:

We apply methods first used for statistical machine translation to the
automatic extraction and analysis of German compound nouns and their Polish
equivalents.
A large German-Polish parallel corpus is used as the main source of data.
In the course of this work several new applications are developed and
described.With the help of these applications a set of more than 140,000
unique German compound nouns is created for which we are able to identify
more than 200,000 unique Polish counterparts in the corpus.
From this data we extract several subsets of equivalence pairs that have
been filtered either automatically or half-automatically. Additionally, one
manually verified subset of equivalence pairs is created. These data sets
serve as reference material for the verification of several claims from
other works in contrastive linguistics that based their results on much
smaller amounts of data. Apart from that we supply information about the
quantitative distribution of German compound nouns and their Polish
equivalents which has not been provided in any earlier work.



Read more issues|LINGUIST home page|Top of issue




Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.