LINGUIST List 21.221
|
Thu Jan 14 2010
Diss: Comp Ling: Junczys-Dowmunt: 'German Compound Nouns and their...'
Editor for this issue: Di Wdzenczny
<di linguistlist.org>
|
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
|
Directory
1. Marcin
Junczys-Dowmunt,
German Compound Nouns and their Polish Equivalents: Automatic extraction, analysis and verification based on parallel corpora
Message 1: German Compound Nouns and their Polish Equivalents: Automatic extraction, analysis and verification based on parallel corpora
|
Date: 11-Jan-2010
From: Marcin Junczys-Dowmunt <junczys amu.edu.pl>
Subject: German Compound Nouns and their Polish Equivalents: Automatic extraction, analysis and verification based on parallel corpora
E-mail this message to a friend
Institution: Adam Mickiewicz University
Program: Linguistics Program
Dissertation Status: Completed
Degree Date: 2009
Author: Marcin Junczys-Dowmunt
Dissertation Title: German Compound Nouns and their Polish Equivalents: Automatic extraction, analysis and verification based on parallel corpora
Dissertation URL: http://www.staff.amu.edu.pl/~junczys/index.php?title=Publications
Linguistic Field(s):
Computational Linguistics
Subject Language(s): German, Standard (deu)
Polish (pol)
Dissertation Director:
Krzysztof Jassem
Jerzy Pogonowski
Dissertation Abstract:
We apply methods first used for statistical machine translation to the automatic extraction and analysis of German compound nouns and their Polish equivalents. A large German-Polish parallel corpus is used as the main source of data. In the course of this work several new applications are developed and described.With the help of these applications a set of more than 140,000 unique German compound nouns is created for which we are able to identify more than 200,000 unique Polish counterparts in the corpus. From this data we extract several subsets of equivalence pairs that have been filtered either automatically or half-automatically. Additionally, one manually verified subset of equivalence pairs is created. These data sets serve as reference material for the verification of several claims from other works in contrastive linguistics that based their results on much smaller amounts of data. Apart from that we supply information about the quantitative distribution of German compound nouns and their Polish equivalents which has not been provided in any earlier work.
Read more issues|LINGUIST home page|Top of issue
|
|

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|