LINGUIST List 4.999

Sun 28 Nov 1993

FYI: Tamil software, CELEX

Editor for this issue: <>


  1. Vasu Renganathan, Experimental NLP Software for Tamil
  2. Richard Piepenbrock, CELEX lexical CD-ROM -- updated information

Message 1: Experimental NLP Software for Tamil

Date: Wed, 15 Sep 1993 13:04:24 Experimental NLP Software for Tamil
From: Vasu Renganathan <>
Subject: Experimental NLP Software for Tamil

Content-Length: 2948

 Experimental NLP Software for Tamil

I would like to announce that there is an experimental NLP software package
for Tamil available in the following ftp cite.

 Cite name:
 Sub directory: /doc/misc/tamil/pulavan

All the executable stand-alone programs are kept in the file and
the source codes, written in Turbo Prolog, are kept in the file
These source codes can be easily linked with functions written in C. I have
already tested and linked some of them with functions written in Turbo C. A
readme file and an install file are also available with these two files.

Features of this system:

1) A morphological processor for Tamil: An automatic morpheme recognizer is
 used to recognize most of the Tamil morphems and their other inflectional
 forms. This component converts every Tamil word into an intermediate list
 structure, provided with suitable codes for affixes and the root form of
 the words. (files:, and

2) A context-free top-down parser processes the list structure created by the
 morphological component and identifies the constituents viz. words,
 phrase, npmax, and sentence. (file:

3) Special list manipulation predicates have been written using the
 principles of set-theory and GB theory. Predicates like
 subset_sent(sentence,sentence), subset_npm(npmax,npmax),
 member_npm(npmax,sentence), member_wor(words,phrase) etc., are written to
 process Tamil sentences using the list manipulation power of PROLOG.

4) An experimental Tamil script generator in graphics mode (file:

5) A script conversion program from Roman form to Tamil script and vice versa.

The knowledge base of this system has been tested in the following
 1) A sample English to Tamil translation system (seattle.exe).

 2) Tamil verb conjugation system (inflex.exe).

 3) A sample Natural Language Interface system (talktaml.exe).

System requirments:
a) IBM compatible PC, preferably 386 based system
b) A High resolution graphics adapter monitor, preferably SVGA monitor

Please send your comments and suggestions to I will
appreciate, if other NLP researchers who work for Tamil and other Dravidian
languages could contact me by email. Thanks in advance.

Vasu Renganathan
Do-21, Gowen hall,
Department of Asian Languages and Literature,
University of Washington,
Seattle WA 98195.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: CELEX lexical CD-ROM -- updated information

Date: Thu, 25 Nov 1993 19:58 +01CELEX lexical CD-ROM -- updated information
From: Richard Piepenbrock <>
Subject: CELEX lexical CD-ROM -- updated information

Detailed information concerning the English, German and Dutch lexical
databases on the CELEX CD-ROM can now be obtained by anonymous ftp from
the Linguistic Data Consortium as follows:

 connect to:
 go to directory: pub/ldc
 set transfer mode: binary
 get file: celex.readme (information about the CELEX CD) (linguistic introduction)
 celex.userguide.tar.Z (the complete User Guide)

The readme file is uncompressed and in ASCII-format. The other two, which
correspond to sections of the hardcopy CELEX User Guide written by Gavin
Burnage and which are subject to CELEX copyright, can be decompressed and
output to a postscript-capable printer. The content of this document
should provide answers to most questions regarding the content and use of

Persons outside of Europe who are interested in CELEX, but are unable
to retrieve and print the introductory text themselves, may request a
hard copy of the document from the LDC.

Persons in Europe who want a hard copy of the document mailed to
them, and anyone who still has technical questions after reading the
document, should direct their inquiries to:

 Richard Piepenbrock
 CELEX Project Manager
 Max Planck Institute for Psycholinguistics
 Wundtlaan 1
 The Netherlands

 Tel: (+31) (0)80 - 615797
 Fax: (+31) (0)80 - 521213

 EARN/BITNET: celexhnympi51
 SURFNET: celex::celexmail

Apart from making the introductory text freely available, the LDC is
not equipped to provide detailed replies as to technical details of
the CELEX CD-ROM. Please contact the LDC only if you need assistance
in obtaining the document, or would like to purchase the disc.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue