LINGUIST List 16.1436|
Thu May 05 2005
FYI: 100 Million Corpus: registers, WordNet, synonyms
Editor for this issue: Ann Sawyer
To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
100 Million Corpus: registers, WordNet, synonyms
Message 1: 100 Million Corpus: registers, WordNet, synonyms
From: Mark Davies <mark_daviesbyu.edu>
Subject: 100 Million Corpus: registers, WordNet, synonyms
There is a free resource that may be of interest -
"Variation in English Words and Phrases" found at:
This is a new interface to the 100 million word British National Corpus,
probably the most well-known corpus of English. One can carry out the
following types of searches -- most of which are not possible with any
1. Quickly find the frequency of words and phrases in any combination of
more than 70 registers that you define (spoken, academic, poetry, medical,
tabloids, email, etc); e.g.:
-- the most common nouns in natural sciences texts, adjectives in
engineering texts, or verbs in medical texts
-- which collocates (co-occurring words) occur more in one register than
another; e.g. the collocates of [chair] in fiction vs. academic texts
-- variation in grammatical constructions across registers; e.g. the
relative frequency of the passive in academic vs spoken, the relative
frequency of [whom] in all 70 registers, etc.
2. Compare between synonyms and other semantically-related words. One
simple search, for example, shows the most frequent nouns that appear with
[sheer], [complete], or [utter] (sheer nonsense, complete account, utter
dismay), but not with the others. Another simple search, for example, would
look for adjectives that occur with [woman] but not [man] or [child].
3. You can also input information from WordNet (a semantically-organized
lexicon of English) directly into the search form. This allows you to find
the frequency and distribution of words with similar, more general, or more
specific meanings (e.g. the frequency of synonyms of [world], or the
frequency of more specific words for [jump]).
4. Search for words and phrases by exact word or phrase, wildcard or part
of speech, or combinations of these (e.g. *ly good/bad [n*]: really good
time, extremely bad idea).
5. Use anchors and targets for fuzzy matches (e.g. all nouns somewhere near
[paper], all adjectives near [woman], or all nouns near [spin]).
Please feel free to email me with any questions that you might have.
Dept. Linguistics, Brigham Young University
Linguistic Field(s): Computational Linguistics
Respond to list|Read more issues|LINGUIST home page|Top of issue
Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.