LINGUIST List 6.755

Thu 01 Jun 1995

Qs: Terminology system, Dictionary, Parallel corpora

Editor for this issue: <>


  1. "Milde Jordaan-Weiss 012 314-6165", REQUEST FOR INFORMATION
  2. Phil Bralich, Machine Usable Dictionary
  3. Maria Gavrilidou ILSP, Parallel corpora


Date: Mon, 8 May 1995 10:28:41 +REQUEST FOR INFORMATION
From: "Milde Jordaan-Weiss 012 314-6165" <>

Dear colleague

The National Terminology Services (NTS) of South Africa is looking
for a Terminology Management System which will be able to accommodate
all 11 official languages. Some of the African Languages have special
diacritics not yet available in commercial software.

Attached you will find the RFI from the NTS. Please pass it on to anyone who
might be interested. The rather bulky USER REQUIREMENT SPECIFICATION
will be e-mailed to interested parties as soon as they request it.
Please take note of the closing date of 29 May.

We appreciate your help in this matter.

Yours sincerely

Ms Milde Jordaan-Weiss
Ms Milde Jordaan-Weiss
National Terminology Services
Department of Arts, Culture, Science and Technology
Private Bag X894

Tel +27 12 314-6165
Fax +27 12 325-4943
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Machine Usable Dictionary

Date: Sun, 28 May 1995 21:22:18 Machine Usable Dictionary
From: Phil Bralich <bralichuhunix.uhcc.Hawaii.Edu>
Subject: Machine Usable Dictionary

As you may know from postings I have made to this list over the last
couple of months, Derek Bickerton and I are developing a parser
based on a theory of syntax that he and I have been developing over
the last four years. We are about to purchase a machine usable
dictionary with approximately 70,000 entries for $2500. If anyone
could advise us whether or not that is our best bet, or where we might
find other dictionaries, we would appreciate hearing from you.

We are currently working with a dictionary of under 1000 words, so it
is imperative that we obtain a larger one, so we may begin working
with larger corpora. Toward that end we would also like to find out
which texts were used in past parsing competitions and where the
results of these competitions are published. We believe that with a
few weeks of work we should be able to modify a dictionary
sufficiently to allow us to begin experinmenting with texts that were
used in past parsing competitions.

Here are the specs the parser. It is based on a series of algorithms that
have been four years in the making, but the programming required to
create this parser has only taken 300 hours using C++ . There
areapproximately 3000 lines of code that take up 150k executable on
disk. About 100k of RAM is required to run the parser. 30k on disk is
required for a 300 word dictionary. An average sentence takes under
4 seconds to process on a 486 IBM compatible. Since this is only a
development version, we expect these numbers to change. To date, no
optimizations have occurred, and we expect to significantly shrink the
dictionary disk usage and the execution time.

Phil Bralich
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 3: Parallel corpora

Date: Tue, 30 May 1995 14:23:14 Parallel corpora
From: Maria Gavrilidou ILSP <>
Subject: Parallel corpora

Content-Length: 2194

Dear linguists,

A short time ago, I posted to the list a query on parallel corpora.
Since answers are still comming in, I will not give a summary of the
answers at this point. (however, a summary will be given as soon as
I have gathered all answers).

Due to e-mail problems, I believe some e-mail messages must have been
lost. So, I give here below the list of the people whose messages
I have received. If you have written me and your name is not included
here, please re-send your answer to my personal address! I also repeat
here the original query for those who have not already seen it.

List of addresses of people who have answered :

The original message is the following:

) Dear linguists,
) I am involved in a project concerning parallel text-corpora, and
) I would like to know if anybody has already had any experience on
) the matter. Specifically, I would like to know if there already
) are any efforts ongoing (or completed!) about specs for parallel
) corpora, for representation issues, text typology etc.
) If anybody has the time to answer my query I would greatly appreciate
) it! Please reply to my personal address.

Sorry to those who have seen this message again!

Thank you all,
Maria Gavrilidou
Institute for Language and Speech Processing
Athens, Greece
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue