Editor for this issue: <>
[The following is the text of a message which has been sent to various natural language researchers throughout the world. If you know of any site not on the list and can provide a contact or if you are involved in a project that should be surveyed, we would like to hear from you. Thanks in advance.] Dear Colleague, We are writing to you as members of the Computational Lexicon Working Group of the Text Encoding Initiative. The overall goal of the Text Encoding Initiative is to produce standards for interchanging electronic documents of various types. The specific goal of the Lexicon Working Group is to propose standards for interchanging data stored in lexicons, i.e. lexical databases intended for use by natural language processing systems of all sorts. (Please note that we are not dealing with Machine-Readable Dictionaries, which are electronic versions of printed documents intended for human consumption. A distinct Dictionary Working Group is concerned with standards for MRDs.) In order to achieve this goal in a way that will satisfy the broadest community of lexicon users, we need to have the widest possible survey of currently existing lexicons. We will be presenting a preliminary report on the results of our survey to the TEI Steering Committee at Oxford on October 2, 1991. Therefore we are asking all those to whom we are sending this survey to respond to us within one month (i.e. by September 15). We will need the intervening time to compile responses, so any information received after this date will not be included in the October report. We appreciate that we may be imposing upon your research time with this request. However, your response will help create a more accurate standard for the community as a whole. All those who respond will receive a copy of the final standards proposal. We have already surveyed approximately 30 systems and we expect the final survey to include between 50 and 70 systems at a minimum. (We have included a list of centers to which we have sent this survey. If you notice any site we might have overlooked, we would be grateful if you informed us or passed on a copy of this survey to the appropriate person.) Thank you for your cooperation and quick response. Robert Ingria, Chair, BBN Systems and Technologies (ingriaMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuebbn.com) Nicoletta Calzolari, CNR, Pisa (glottolo
icnucevm.cnuce.cnr.it) James Pustejovsky, Brandeis University (jamesp
chaos.cs.brandeis.edu) Susan Warwick-Armstrong, ISSCO (susan
divsun.unige.ch) ===================================Cut Here=================================== We would like to have example entries, including both syntax and semantics, for the following classes of lexical items from your system. (Please include the name of your system and the size(s) of the lexicon(s) you have developed). If you have developed bilingual or multilingual entries for your system, please also provide translation examples for the classes listed below. Note that all our examples are in English; please use the analogous classes for the language(s) handled by your system. (1) Nouns: entity nouns - apple, book, etc. relational nouns - speed, age, height, father, brother, etc. abstract nouns - courage, love, altruism, etc. mass nouns - wine, sand, etc. proper names - John, Europe, IBM, etc. Also please indicate whether complement-taking properties are represented: e.g. ``factive'' nouns like ``story'', ``transitive'' nouns, etc. (2) Pronouns (I, he, she, it, etc.) and bound anaphors (myself, himself, herself, each other, etc.) (3) Verbs: a wide variety of valency classes: intransitive transitive ditransitive clausal complement taking infinitival complement taking (raising and control) ``small-clause'' taking verbs including naked infinitives etc. If your verbal entries include an indication of variants of a basic valency class (e.g. whether a transitive verb passivizes, whether an indirect object-taking verb allows ``Dative movement'', etc.), please indicate this by example If your system deals with a language like German in which the nominal complements of a verb may appear in different Cases (e.g. helfen takes a dative object while sehen takes an accusative object), please show how this is represented. (4) Modals and auxiliaries (5) Prepositions - any indication of subclasses of prepositions e.g. ``case-marking'' prepositions vs. semantical contentful If English particles are a subset of prepositions in your system, please indicate this (6) Adjectives - please indicate whether complement-taking properties are represented; e.g. ``proud of'', ``likely to'', etc. please indicate what semantic classes of adjectives you distinguish: e.g. scalar vs. bi-polar, intersective vs. subsective, etc. Also, please include any other information you may represent, such as the position in which an adjective can appear (pre-nominal, post-nominal, predicate position, etc.) (7) Determiners and other similar nominal modifiers (e.g. articles, quantifiers, demonstratives, etc.) - Please indicate whether you indicate polarity, monotonicity, etc. If your lexicon includes multi-word lexical entries, please supply examples. If your lexicon uses an inheritance mechanism, please describe it and/or provide examples. For the inflected categories of noun, verb, and adjective, please indicate how irregular forms, inflectional paradigm, and other morphological information is stored. For translation entries, please indicate how they are interpreted or used in the system. Finally, if there are any other special characteristics of your lexicon or the system controlling it that are not adequately covered by the above categories, please provide a description and examples. We would also appreciate it if you could send us any documentation on your system that could help us to understand the examples, such as technical reports, coding guidelines, etc. Please send your EMail responses to: ingria
bbn.com If you have any trouble responding to this address, the following addresses may be used: jamesp
chaos.cs.brandeis.edu susan
divsun.unige.ch Hardcopy responses may be sent to: Robert Ingria BBN Systems and Technologies 10 Moulton Street Mailstop 6/4c Cambridge, MA 02139 USA or Susan Warwick-Armstrong ISSCO University of Geneva 54, Rte. des Acacias 1227 Geneva SWITZERLAND Again, thank you for your cooperation. ====================================================================== List of sites being surveyed, by country: U.K.: Alvey SRI-Cambridge Edinborough Sussex Manchester ET DATR UK-Eurotra France: LADL (Paris VII) IRIT Gsi-ERLI Eurotra-France The Netherlands: Utrecht: Lexic project Utrecht: Mimo University of Amsterdam BSO DLT Philips/Rosetta Finland: IBM Finland Switzerland: ISSCO: GB-Parsing ISSCO: ELU ISSCO: French unification grammar ISSCO: MT Avalanche report MT system Italy: Pisa Trento Venice Eurotra Italy Portugal: Israel: Belgium: Brussels: KRS Metal-Leuven Germany: Saarbruecken Stuttgart Bonn: IKP Center Tuebingen GMD Eurotra Germany Greece: Eurotra Greece Canada: METEO Toronto Simon Fraser USA: BBN Delphi BBN IRUS Brandeis/CTI lexicon MIT Spoken Language System MIT Fast Parser IBM Lexical Resources IBM Stochastic Grammar IBM MT NYU Proteus CMU Spoken Language System CMU MT Unisys AT&T Spoken Language System AT&T Fidditch Bellcore University of Maryland NMSU CRL NMSU MT ISI MT ISI Generation HP Labs SRI Xerox PARC LFG Xerox PARC MT Boeing (Washington) Japan: Kyoto University EDR