Editor for this issue: <>
GSIL Publications List Effective November 15, 1993 Graduates Students in Linguistics (G.S.I.L.) Department of Linguistics University of Southern California Los Angeles, CA 90089-1693 U.S.A. Titles available: Dissertations: 1 Authier, J-M. Syntax of unselective binding (1988) 2 Franco, J. On object agreement in Spanish (1993) 3 Heggie, L. Syntax of copular structures (1988) Titles available shortly: 4 Katada, F. The representation of anaphoric relations in logical form (1990) For e-mail information, please contact jcamachoMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuescf.usc.edu
The updated publication list of the Graduate Linguistic Student Association (GLSA) of the University of Massachusetts, Amherst is available online in three different ways: 1) A short version of the list (without full tables of contents) is available on the Linguist List Listserver; 2) A long version of the list (with tables of contents) is available by anonymous ftp to linguistics.archive.umich.edu in the directory /linguistics/papers/available 3) Both versions of the Publications List are available by emailing glsaMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuelinguist.umass.edu --glsa
linguist.umass.edu
This message is posted to announce the release of a CD-ROM with lexical data by the Dutch Centre for Lexical Information which can be obtained from the Linguistic Data Consortium. CONTENTS AND AVAILABILITY OF THE CD-ROM --------------------------------------- The CD-ROM, which contains the CELEX lexical databases of English (version 2.5), Dutch (version 3.1) and German (version 2.0), is now available for research purposes from the Linguistic Data Consortium for $150. For each language, the CD-ROM contains detailed information on the orthography (variations in spelling, hyphenation), the phonology (phonetic transcriptions, variations in pronunciation, syllable structure, primary stress), the morphology (derivational and compositional structure, inflectional paradigms), the syntax (word class, word-class specific subcategorisations, argument structures), and word frequency (summed word and lemma counts, based on recent and representative text corpora) of both wordforms and lemmas (English: 52446 lemmas, 160594 wordforms; German: 50708 lemmas, 359611 wordforms; Dutch: 124136 lemmas, 381292 wordforms). Postscript files describe the available lexical information in detail. The original Celex databases can be consulted interactively either by using the SQL*PLUS query language within an ORACLE RDBMS environment, or by means of the specially designed user interface FLEX. The databases on this CD-ROM have not been tailored to fit any particular database management program. Instead, the information is presented in a series of plain ASCII files in a UNIX directory tree that can be queried with tools such as AWK or ICON. Unique identity numbers allow the linking of information from different files. As in the original databases, some kinds of information have to be computed on-line. Wherever necessary, AWK functions have been provided to recover this information. README files specify the details of their use. The CD-ROM is mastered using the ISO 9660 data format, with the Rock Ridge extensions, allowing it to be used in VMS, MS-DOS, Macintosh (*) and UNIX environments. Anyone who would like to purchase the CD-ROM should send a check or purchase order made payable to the "Trustees of the University of Pennsylvania" to Judith Storniolo Administrative Assistant, LDC Linguistic Data Consortium 441 Williams Hall University of Pennsylvania Philadelphia, PA 19104-6305 storniolMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueunagi.cis.upenn.edu Tel: +1/215/898-0464 Fax: +1/215/573-2175 (*) If someone has a Mac with a cdrom drive that was obtained before 12/92, and has not installed any system upgrades since that date, then that system will not be able to read the CELEX CD-ROM. In such a case, all that is needed is to obtain the upgraded driver software (a very small amount of code), and copy it onto the system in place of the existing driver. The upgrade can be obtained as follows: Connect to ftp server: ftp.apple.com Go to directory: dts/mac/sys.soft/cdrom Get file: cd-rom-setup GETTING INRTODUCTORY INFORMATION ON THE CD-ROM **-------------------------------------------- Further details concerning the lexical databases can be obtained by anonymous ftp from the LDC as follows: connect to: ftp.cis.upenn.edu go to directory: pub/ldc set transfer mode: binary get file: celex.info.tar.Z This file, which corresponds to Chapter 1 of the CELEX User Guide written by Gavin Burnage and which is subject to CELEX copyright, can be decompressed and output to a postscript-capable printer. The content of this document should provide answers to most questions regarding the content and use of CELEX. Persons outside of Europe who are interested in CELEX, but are unable to retrieve and print the introductory text themselves, may request a hard copy of the document from the LDC. Persons in Europe who want a hard copy of the document mailed to them, and anyone who still has technical questions after reading the document, should direct their inquiries to: Richard Piepenbrock CELEX Project Manager Max-Planck-Institut fuer Psycholinguistik Wundtlaan 1 6525 XD NIJMEGEN The Netherlands Tel: (+31) (0)80 - 615797 Fax: (+31) (0)80 - 521213 EARN/BITNET: celex
hnympi51 Internet: celex
mpi.nl SURFNET: celex::celexmail JANET: celex%hnympi51
uk.ac.earn-relay Apart from making the introductory text freely available, the LDC is not equipped to provide detailed replies as to technical details of the CELEX CD-ROM. Please contact the LDC only if you need assistance in obtaining the document, or would like to purchase the disc. APPENDIX: A BRIEF OVERVIEW OF THE ENGLISH DATA ON THE CD-ROM ------------------------------------------------------------ When starting to use the English database, the user first has to choose between two so-called `lexicon types': - a lemma lexicon - a wordform lexicon Each lexicon type uses a specific kind of entry. The CELEX lemma lexicon is the one most similar to an ordinary dictionary since every entry in this lexicon represents a set of related inflected words. In a lexicon, a lemma can be represented by using a headword (cf. traditional dictionary entries) such as, for example, `call' or `cat'. The wordform lexicon yields all possible inflected words: every entry in the lexicon is an inflectional variant of the related headword or stem. So, a wordform lexicon contains words like `call', `calls', `calling', `called', `cat', `cats' and so on. For both types of lexicons, the user may subsequently select any number of columns -- from approximately 150 database columns -- combining information on the orthography, phonology, morphology, syntax and frequency of the entries. The information sheet `Lexical Data, English' summarizes the types of information available. An exhaustive overview of the columns available is given in the CELEX User Guide. LEXICAL DATA, ENGLISH The lexical data that can be selected for each entry in the different English lexicon types can be divided into five categories: orthography, phonology, morphology, syntax and frequency. In a separate section, example data are given for each of these categories. *------------------------------------------------------------------ Orthography - with or without diacritics (spelling) - with or without word division positions - alternative spellings - number of letters/syllables Phonology - phonetic transcriptions (using SAMPA notation or (pronunciation) Computer Phonetic Alphabet (CPA) notation) with: - syllable boundaries - primary and secondary stress markers - consonant-vowel patterns - number of phonemes/syllables - alternative pronunciations Morphology - Derivational/compositional: (word structure) - division into stems and affixes - flat or hierarchical representations - Inflectional: - stems and their inflections Syntax - word class (grammar) - subcategorisations per word class Frequency - COBUILD frequency(*) *----------------------------------------------------------------- (*)These frequency data are based on the COBUILD corpus (sized 18 million words) built up by the University of Birmingham, Great Britain. EXAMPLE DATA, ENGLISH An arbitrary query using a small English lemma lexicon (that is, one with very few columns) might yield the following result: *---------------------------------------------------------- Headword Pronunciation Morphology: M: Cl Freq Structure Cl *---------- ---------------- ------------------- -- -- ---- celebrant "sE-lI-br
nt ((celebrate),(ant)) Vx N 6 celebration %sE-lI-"breI-Sn, ((celebrate),(ion)) Vx N 201 cell "sEl (cell) N N 1210 cellar "sE-l
r* (cellar) N N 228 cellarage "sE-l
-rIdZ ((cellar),(age)) Nx N 0 cellist "tSE-lIst ((cello),(ist)) Nx N 5 cello "tSE-l
U (cello) N N 25 cellular "sEl-jU-l
r* ((cell),(ular)) Nx A 21 celluloid "sEl-jU-lOId ((cellulose),(oid)) Nx N 29 *---------------------------------------------------------- An example selection from a small English wordform lexicon, showing the inflectional variants of the headwords given in the previous example, is presented in the next table: *---------------------------------------------------------- Word Word division Pronunciation Cl Type Freq *----------- --------------- ----------------- -- ---- ---- celebrant cel-e-brant "sE-lI-br
nt N sing 2 celebrants cel-e-brants "sE-lI-br
nts N plu 4 celebration cel-e-bra-tion %sE-lI-"breI-Sn, N sing 144 celebrations cel-e-bra-tions %sE-lI-"breI-Sn,z N plu 57 cell cell "sEl N sing 655 cells cells "sElz N plu 555 cellar cel-lar "sE-l
r* N sing 187 cellars cel-lars "sE-l
z N plu 41 cellarage cel-lar-age "sE-l
-rIdZ N sing 0 cellarages cel-lar-ag-es "sE-l
-rI-dZIz N plu 0 cellist cel-list "tSE-lIst N sing 5 cellists cel-lists "tSE-lIsts N plu 0 cello cel-lo "tSE-l
U N sing 24 cellos cel-los "tSE-l
Uz N plu 1 cellular cel-lu-lar "sEl-jU-l
r* A pos 21 celluloid cel-lu-loid "sEl-jU-lOId N sing 29