LINGUIST List 9.96

Wed Jan 21 1998

FYI: Novel, Release of CoreLex

Editor for this issue: Elaine Halleck <elainelinguistlist.org>


Directory

  • Daniel L. Everett, Interesting novel
  • Paul Buitelaar, Release of CoreLex

    Message 1: Interesting novel

    Date: Mon, 19 Jan 1998 12:31:38 -0500 (EST)
    From: Daniel L. Everett <deververb.linguist.pitt.edu>
    Subject: Interesting novel


    Folks,

    There is an interesting novel out that linguists ought to enjoy. It is entitled The Sparrow. The author is Mary Doria Russell, a Ph.D. in paleoanthropology. It is published by RandomHouse/Ballantine. The story is largely about Emilio Sandoz a Jesuit with a Ph.D. in Linguistics who travels with a small group to the planet Rakhat. He has to conduct fieldwork on the languages of this planet (and sends back articles for publication to earth). Some interesting aspects of fieldwork are captured well by Russell. There is a lot more to the novel than linguistics, though. I highly recommend it. And I do not usually read or enjoy novels very much.

    - Dan Everett

    ****************************** ******************************

    Daniel L. Everett Department of Linguistics University of Pittsburgh 2816 CL Pittsburgh, PA 15260 Phone: 412-624-8101; Fax: 412-624-6130 http://verb.linguist.pitt.edu/~dever

    Message 2: Release of CoreLex

    Date: Tue, 20 Jan 1998 18:36:07 -0500
    From: Paul Buitelaar <paulbcs.brandeis.edu>
    Subject: Release of CoreLex


    Announcing the release of CoreLex

    An ONTOLOGY, LEXICAL SEMANTIC DATABASE and TAGSET for nouns, organized around SYSTEMATIC POLYSEMY and UNDERSPECIFICATION.

    CoreLex developed out of a thesis on systematic polysemy and underspecification of nouns, establishing an ontology and semantic database of 126 semantic types, covering around 40,000 nouns and defining a large number of systematic polysemous classes that are derived by a careful analysis of sense distributions in WordNet. The semantic types are underspecified representations based on Generative Lexicon theory and are used in an underspecified approach to semantic tagging, addressing two problems: sense enumeration (the difficulty of deciding the number of discrete senses), due to systematic polysemy; and multiple reference (NP's denoting more than one model-theoretic referent), due to underspecification. Semantic tags that are based on traditional, discrete senses tend to be too fine-grained for practical use. For instance, WordNet has, on the lowest level, around 60,000 different tags (synsets) for nouns alone. The CoreLex approach, on the other hand, offers a concise set of 126 tags that are inherently more coarse-grained, by taking into account systematic polysemy and underspecification.

    The CoreLex database is freely available for research purposes, including commercial ones. For more information on the database and on the thesis that describes its motivation, construction and use, see the CoreLex webpage:

    http://www.cs.brandeis.edu/~paulb/CoreLex/corelex.html