LINGUIST List 12.526

Sun Feb 25 2001

Sum: Corpora English and German

Editor for this issue: Lydia Grebenyova <lydialinguistlist.org>


Directory

  1. Frank Oswalt, Corpora English and German

Message 1: Corpora English and German

Date: Wed, 21 Feb 2001 05:12:33 -0600
From: Frank Oswalt <f_oswalthotmail.com>
Subject: Corpora English and German

For Query: Linguist 11.1877

Howdy y'all,

a long while back I asked for information on German and English corpora 
which are tagged for grammatical functions, as well as for accessible 
parallel English-German corpora. Here is a summary of the replies I got.


ENGLISH GRAMMATICALLY TAGGED CORPORA

Joybrato Mukherjee (j.mukherjeeuni-bonn.de) drew my attention to the 
International Corpus of English, which can be ordered at the following 
website (which also allows you to download a very nice demo version):

 http://www.ucl.ac.uk/english-usage/ice/


GERMAN GRAMMATICALLY TAGGED CORPORA

George Smith (georgebloomfield.phil1.uni-potsdam.de) drew my attention to 
the NEGRA and TIGER projects, which can be reached via the following 
websites:

 http://www.coli.uni-sb.de/sfb378/negra-corpus/
 http://www.coli.uni-sb.de/cl/projects/tiger/


PARALLEL CORPORA GERMAN-ENGLISH

Anatol Stefanowitsch (anatolrice.edu) drew my attention to a small 
web-accessible parallel corpus at the University of Chemnitz:

 http://www.tu-chemnitz.de/phil/InternetGrammar/

Some people have their own collections of parallel texts, which they may or 
may not be willing to share with others (there may be copyright issues 
here).
The two that agreed to be mentioned here are
- Raphael Salkie (R.M.Salkiebton.ac.uk), who has a collection of parallel 
texts from websites, literature, manuals, EU- documents, political writing 
and speeches
coming to about 800.000 words in each language.
- Anatol Stefanowitsch, who has a small collection of parallel texts from 
news magazines (about 15,000 words), and who is in the process of 
assembling a larger parallel corpus of narrative writing.


VARIOUS

Martin Frost (Martinsinequa.com) drew my attention to the following 
websites:

 http://www.mpi.nl/world/tg/corpora/corpora.html
 http://www.ifi.unizh.ch/CL
 http://www.ims.uni-stuttgart.de/projekte/corplex/
 http://www.icp.grenet.fr/ELRA/fr/cata/tabtext.html

Thanks also to Klaus Abels, Petra Steiner, and Monika Budde for other 
helpful hints.

Take care now,
Frank Oswalt


Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue