LINGUIST List 15.2475
Tue Sep 7 2004
Diss: Comp Ling/Semantics: Old: 'The Semantic...'
Editor for this issue: Takako Matsui <takolinguistlist.org>
Directory
leonard_old, The Semantic Structure of Roget's, A Whole-Language Thesaurus
Message 1: The Semantic Structure of Roget's, A Whole-Language Thesaurus
Date: Mon, 6 Sep 2004 13:42:09 -0400 (EDT)
From: leonard_old <leonard_oldhotmail.com>
Subject: The Semantic Structure of Roget's, A Whole-Language Thesaurus
Institution: Indiana University
Program: School of Lib and Information Science
Dissertation Status: Completed
Degree Date: 2003
Author: Leonard J Old
Dissertation Title: The Semantic Structure of Roget's, A
Whole-Language Thesaurus
Dissertation URL:
http://www.dcs.napier.ac.uk/~cs171/LJOld/publications_l_john_old.htm
Linguistic Field: Computational Linguistics, Semantics, Lexicography,
Cognitive Science
Dissertation Director 1: Charles H Davis
Dissertation Director 2: Ralf Shaw
Dissertation Abstract:
This study analyzed a database version of Roget's Thesaurus (Roget's
International Thesaurus, 3rd Edition, 1962) for frequency and
connectivity patterns among the words, senses, and cross-references in
order to identify the implicit conceptual structure. Using descriptive
statistics, lattices, and information maps, semantic patterns implicit
in the data, at both the local and global levels of the structure,
were identified.
The explicit organizational structure of the thesaurus is, at the
local level, sets of synonyms; and at the global level, a hierarchy of
concepts. In contrast, the implicit organization at the local level
has the characteristics of dictionary sense definitions (genus and
differentiae), and at the global level has the characteristics of a
small-world social network. The concept of genus and differentiae
provides a model that can be seen to account for the distribution of
polysemy within senses and across the Thesaurus. The small-world
network model can be seen to account for the incidence of semantic
hubs and authorities among cross-references, and conceptual and
semantic switching centers among senses and words in the Thesaurus.
Previous work on Roget's Thesaurus calculated chains and equivalence
relations algorithmically from senses and words. In that research it
was found that there is an inner semantic core of
most-densely-connected words and senses. This study expanded on that
research identifying the semantic structure of the inner core and
relating it to the top most polysemous words in Roget's.
While the largest thesaurus Categories relate to concrete objects such
as plants, animals, food, clothing and technology, the most-connected
words (in terms of numbers of senses and synonyms) were found to
relate to abstract concepts such as motion, agitation and what appear
to be concepts related to survival. This observation was supported by
frequency counts, and global cross-reference and word connectivity
patterns.