LINGUIST List 15.2475

Tue Sep 7 2004

Diss: Comp Ling/Semantics: Old: 'The Semantic...'

Editor for this issue: Takako Matsui <takolinguistlist.org>


Directory

  • leonard_old, The Semantic Structure of Roget's, A Whole-Language Thesaurus

    Message 1: The Semantic Structure of Roget's, A Whole-Language Thesaurus

    Date: Mon, 6 Sep 2004 13:42:09 -0400 (EDT)
    From: leonard_old <leonard_oldhotmail.com>
    Subject: The Semantic Structure of Roget's, A Whole-Language Thesaurus


    Institution: Indiana University Program: School of Lib and Information Science Dissertation Status: Completed Degree Date: 2003

    Author: Leonard J Old

    Dissertation Title: The Semantic Structure of Roget's, A Whole-Language Thesaurus

    Dissertation URL: http://www.dcs.napier.ac.uk/~cs171/LJOld/publications_l_john_old.htm

    Linguistic Field: Computational Linguistics, Semantics, Lexicography, Cognitive Science

    Dissertation Director 1: Charles H Davis Dissertation Director 2: Ralf Shaw

    Dissertation Abstract:

    This study analyzed a database version of Roget's Thesaurus (Roget's International Thesaurus, 3rd Edition, 1962) for frequency and connectivity patterns among the words, senses, and cross-references in order to identify the implicit conceptual structure. Using descriptive statistics, lattices, and information maps, semantic patterns implicit in the data, at both the local and global levels of the structure, were identified.

    The explicit organizational structure of the thesaurus is, at the local level, sets of synonyms; and at the global level, a hierarchy of concepts. In contrast, the implicit organization at the local level has the characteristics of dictionary sense definitions (genus and differentiae), and at the global level has the characteristics of a small-world social network. The concept of genus and differentiae provides a model that can be seen to account for the distribution of polysemy within senses and across the Thesaurus. The small-world network model can be seen to account for the incidence of semantic hubs and authorities among cross-references, and conceptual and semantic switching centers among senses and words in the Thesaurus.

    Previous work on Roget's Thesaurus calculated chains and equivalence relations algorithmically from senses and words. In that research it was found that there is an inner semantic core of most-densely-connected words and senses. This study expanded on that research identifying the semantic structure of the inner core and relating it to the top most polysemous words in Roget's.

    While the largest thesaurus Categories relate to concrete objects such as plants, animals, food, clothing and technology, the most-connected words (in terms of numbers of senses and synonyms) were found to relate to abstract concepts such as motion, agitation and what appear to be concepts related to survival. This observation was supported by frequency counts, and global cross-reference and word connectivity patterns.