LINGUIST List 7.1205

Sat Aug 31 1996

FYI: Indo-European course register, Text corpus of Dutch

Editor for this issue: Ann Dizdar <dizdartam2000.tamu.edu>


Directory

  1. "Fco. Javier Martnez Garca", TITUS: Indo-European Course Register
  2. Rob van Strien, Text Corpus of Dutch

Message 1: TITUS: Indo-European Course Register

Date: Mon, 26 Aug 1996 15:04:53 +0200
From: "Fco. Javier Martnez Garca" <martinezem.uni-frankfurt.de>
Subject: TITUS: Indo-European Course Register
The Indo-European Course Register is offered by the Indogermanische
Gesellschaft and TITUS.

The Indo-European Course Register provides the names of the I-E
relating courses offered at the German speaking Universities.

See following URL:
http://titus.uni-frankfurt.de/curric/idg-ws96.html
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Text Corpus of Dutch

Date: Fri, 23 Aug 1996 16:00:11 -0000
From: Rob van Strien <ROBrulxho.LeidenUniv.nl>
Subject: Text Corpus of Dutch
 INSTITUUT VOOR NEDERLANDSE LEXICOLOGIE


On-line access to INL 38 Million Words Text Corpus of Dutch, for
non-commercial purposes.

The Institute for Dutch Lexicology INL offers you the possibility to
consult a Dutch text corpus of ca. 38 million words, by the
international computer network (Internet). In 1994 and 1995, a 5
Million Words Corpus with diversified composition and a 27 Million
Words Newspaper Corpus have been made accessible in a similar way.
Access is for free for non-commercial purposes.

The 38 Million Words Corpus 1996 consists of three main components: a
component with varied composition (1970-1989), a newspaper component
(Meppeler Courant, 1992-1995) and a legal component (1814-1989).

The user has the opportunity to define subcorpora, either on the basis
of the parameters (1) corpuscomponent, (2) topic, (3) publication
medium/text type, and (4) period, or on the basis of selections from
text surveys presented at the screen. The user can ask for the size of
each defined subcorpus.

The texts have automatically been annotated with lemma (head word) and
two types of part of speech (POS): a global one (13 POS categories)
and a fine-grained one (with subcategorization) conformant with the
MECOLB standard (EC-project MLAP93-21 MECOLB; coordinator R. Neumann,
Institut fuer Deutsche Sprache, Mannheim). The MECOLB-tagset for Dutch
was developed in cooperation with the TOSCA Research Group (University
of Nymegen), under responsibility of Prof. dr. J. Aarts.

Most of the data has not been corrected, neither on the level of the
text, nor on the level of POS and headword.

The retrieval system allows you to search for single words or for word
patterns, including some predefined syntactic patterns that can be
changed by the user. There are two query languages, which differ in
formalism. Searches may address the levels of word form, two types of
part of speech, and head word, both separately and in combination by
use of Boolean operators and proximity searches. During the search,
data concerning frequency and distribution over the texts are provided
at several levels. The output most often is a list of items, or a
series of concordances (words in context) with a variable,
user-defined textual context. Sorting facilities may support your
analysis of the output data. With some limitations due to copyright,
the output of your searches can be transfered to your own computer by
e-mail. It is not allowed to transfer complete texts or substantial
text parts.

The providers of the texts have given permission for use of the texts
for non-commercial, research purposes only.

Please note that for an optimal use of the retrieval system, the use
of a VT 220 (or higher) terminal, or an appropriate terminal-emulator
(e.g. Kermit) is recommended.

For access to the corpora, an individual user agreement is to be
signed. There is a separate user agreement for each corpus. An
electronic user agreement form can be obtained from our mailserver
MailservRulxho.Leidenuniv.NL. Type in the body of your e-mail
message:

SEND [38MLN96]AGREEMNT.USE for the 38 Million Words Corpus 1996
SEND [27MLN95]AGREEMNT.USE for the 27 Million Words Newspaper Corpus
 1995
SEND [5MLN94]AGREEMNT.USE for the 5 Million Words Corpus 1994

Please make a hard copy of the agreement form, sign it, keep a copy
yourself, and return a signed copy to: Institute for Dutch Lexicology
INL, P.O. Box 9515, 2300 RA Leiden, The Netherlands. Fax: 31 71 527
2115.

After receipt of the signed user agreement, you will be informed about
your username and password.

If you need additional information, please send an e-mail message to
HelpdeskRulxho.Leidenuniv.NL, or send a fax to Mrs. dr. J.G. Kruyt.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue