LINGUIST List 2.417

Sat 17 Aug 1991

Qs: Seeking Information about Lexical Resources

Editor for this issue: <>


Directory

  1. , Seeking Information About Lexical Resources

Message 1: Seeking Information About Lexical Resources

Date: Fri, 16 Aug 91 13:54:57 EDT
From: <ingriaBBN.COM>
Subject: Seeking Information About Lexical Resources
[The following is the text of a message which has been sent to various
 natural language researchers throughout the world. If you know of
 any site not on the list and can provide a contact or if you are
 involved in a project that should be surveyed, we would like to hear
 from you. Thanks in advance.]
Dear Colleague,
	We are writing to you as members of the Computational Lexicon
Working Group of the Text Encoding Initiative. The overall goal of
the Text Encoding Initiative is to produce standards for interchanging
electronic documents of various types. The specific goal of the
Lexicon Working Group is to propose standards for interchanging data
stored in lexicons, i.e. lexical databases intended for use by natural
language processing systems of all sorts. (Please note that we are
not dealing with Machine-Readable Dictionaries, which are electronic
versions of printed documents intended for human consumption. A
distinct Dictionary Working Group is concerned with standards for
MRDs.)
In order to achieve this goal in a way that will satisfy the broadest
community of lexicon users, we need to have the widest possible survey
of currently existing lexicons. We will be presenting a preliminary
report on the results of our survey to the TEI Steering Committee at
Oxford on October 2, 1991. Therefore we are asking all those to whom
we are sending this survey to respond to us within one month (i.e. by
September 15). We will need the intervening time to compile
responses, so any information received after this date will not be
included in the October report. We appreciate that we may be imposing
upon your research time with this request. However, your response
will help create a more accurate standard for the community as a
whole. All those who respond will receive a copy of the final
standards proposal. We have already surveyed approximately 30 systems
and we expect the final survey to include between 50 and 70 systems at
a minimum. (We have included a list of centers to which we have sent
this survey. If you notice any site we might have overlooked, we
would be grateful if you informed us or passed on a copy of this
survey to the appropriate person.)
Thank you for your cooperation and quick response.
Robert Ingria, Chair, BBN Systems and Technologies (ingriabbn.com)
Nicoletta Calzolari, CNR, Pisa (glottoloicnucevm.cnuce.cnr.it)
James Pustejovsky, Brandeis University (jamespchaos.cs.brandeis.edu)
Susan Warwick-Armstrong, ISSCO (susandivsun.unige.ch)
===================================Cut Here===================================
We would like to have example entries, including both syntax and
semantics, for the following classes of lexical items from your
system. (Please include the name of your system and the size(s) of
the lexicon(s) you have developed). If you have developed bilingual
or multilingual entries for your system, please also provide
translation examples for the classes listed below. Note that all our
examples are in English; please use the analogous classes for the
language(s) handled by your system.
(1) Nouns: entity nouns - apple, book, etc.
 	 relational nouns - speed, age, height, father, brother, etc.
	 abstract nouns - courage, love, altruism, etc.
 mass nouns - wine, sand, etc.
 proper names - John, Europe, IBM, etc.
 Also please indicate whether complement-taking properties
	 are represented: e.g. ``factive'' nouns like ``story'',
	 ``transitive'' nouns, etc.
(2) Pronouns (I, he, she, it, etc.) and bound anaphors (myself,
	himself, herself, each other, etc.)
(3) Verbs: a wide variety of valency classes:
 	 intransitive
	 transitive
 ditransitive
 clausal complement taking
 infinitival complement taking (raising and control)
 ``small-clause'' taking verbs including naked infinitives
 etc.
If your verbal entries include an indication of variants of a basic
valency class (e.g. whether a transitive verb passivizes, whether an
indirect object-taking verb allows ``Dative movement'', etc.), please
indicate this by example
	 If your system deals with a language like German in which
	 the nominal complements of a verb may appear in different
	 Cases (e.g. helfen takes a dative object while sehen takes
	 an accusative object), please show how this is represented.
(4) Modals and auxiliaries
(5) Prepositions - any indication of subclasses of prepositions
 e.g. ``case-marking'' prepositions vs.
 semantical contentful
 If English particles are a subset of prepositions in
 your system, please indicate this
(6) Adjectives - please indicate whether complement-taking properties are
	represented; e.g. ``proud of'', ``likely to'', etc.
	please indicate what semantic classes of adjectives you
	distinguish: e.g. scalar vs. bi-polar, intersective vs.
	subsective, etc.
	Also, please include any other information you may represent,
	such as the position in which an adjective can appear
	(pre-nominal, post-nominal, predicate position, etc.)
(7) Determiners and other similar nominal modifiers (e.g. articles,
	quantifiers, demonstratives, etc.) - Please indicate whether
	you indicate polarity, monotonicity, etc.
If your lexicon includes multi-word lexical entries, please supply
examples.
If your lexicon uses an inheritance mechanism, please describe it
and/or provide examples.
For the inflected categories of noun, verb, and adjective, please
indicate how irregular forms, inflectional paradigm, and other
morphological information is stored.
For translation entries, please indicate how they are interpreted or
used in the system.
Finally, if there are any other special characteristics of your
lexicon or the system controlling it that are not adequately covered
by the above categories, please provide a description and examples.
We would also appreciate it if you could send us any documentation on
your system that could help us to understand the examples, such as
technical reports, coding guidelines, etc.
Please send your EMail responses to:
ingriabbn.com
If you have any trouble responding to this address, the following
addresses may be used:
jamespchaos.cs.brandeis.edu
susandivsun.unige.ch
Hardcopy responses may be sent to:
Robert Ingria
BBN Systems and Technologies
10 Moulton Street
Mailstop 6/4c
Cambridge, MA 02139
USA
or
Susan Warwick-Armstrong
ISSCO
University of Geneva
54, Rte. des Acacias
1227 Geneva
SWITZERLAND
Again, thank you for your cooperation.
======================================================================
List of sites being surveyed, by country:
U.K.:
Alvey
SRI-Cambridge
Edinborough
Sussex
Manchester ET
DATR
UK-Eurotra
France:
LADL (Paris VII)
IRIT
Gsi-ERLI
Eurotra-France
The Netherlands:
Utrecht: Lexic project
Utrecht: Mimo
University of Amsterdam
BSO DLT
Philips/Rosetta
Finland:
IBM Finland
Switzerland:
ISSCO: GB-Parsing
ISSCO: ELU
ISSCO: French unification grammar
ISSCO: MT
Avalanche report MT system
Italy:
Pisa
Trento
Venice
Eurotra Italy
Portugal:
Israel:
Belgium:
Brussels: KRS
Metal-Leuven
Germany:
Saarbruecken
Stuttgart
Bonn: IKP Center
Tuebingen
GMD
Eurotra Germany
Greece:
Eurotra Greece
Canada:
METEO
Toronto
Simon Fraser
USA:
BBN Delphi
BBN IRUS
Brandeis/CTI lexicon
MIT Spoken Language System
MIT Fast Parser
IBM Lexical Resources
IBM Stochastic Grammar
IBM MT
NYU Proteus
CMU Spoken Language System
CMU MT
Unisys
AT&T Spoken Language System
AT&T Fidditch
Bellcore
University of Maryland
NMSU CRL
NMSU MT
ISI MT
ISI Generation
HP Labs
SRI
Xerox PARC LFG
Xerox PARC MT
Boeing (Washington)
Japan:
Kyoto University
EDR
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue