LINGUIST List 27.4332

Wed Oct 26 2016

Qs: Dictionary Wordlist from List of Generated Words

Editor for this issue: Kenneth Steimel <kenlinguistlist.org>


Date: 25-Oct-2016
From: Tim Stewart <timoteostewart1977gmail.com>
Subject: Dictionary Wordlist from List of Generated Words
E-mail this message to a friend

I've written a specialized dictionary about a religious sociolect, and now I'm attempting to write an article about how I used a computer program to help generate the word list for it. I'm curious how often in the past other lexicographers have employed a programmatic approach to generating a list of hypothetical forms and then tested those forms against corpora to determine which hypothetical forms represent actual lexical items in use (I believe it has been done at least once before---details below). So far my efforts to dig up information about this topic in JSTOR and other academic databases have been fruitless. Maybe the LINGUIST list community can help!

My dictionary contains 350 lexical items, each of which is a blend of two (or more) names of Christian denominations. Examples of these items are bapticostal (Baptist + Pentecostal), fundagelical (fundamentalist + evangelical), and lutholic (Lutheran + Catholic). All the items are formed by blending syllables from a small set of about two dozen names of denominations (Anglican, Baptist, Catholic, Episcopal, etc.). Given the very narrow morphological and phonological criteria involved, it occurred to me to generate a list of possible items by programmatically combining parts of the names of these denomination names. Then I conducted searches for these hypothetical forms against corpora and online text databases to determine which forms I could find evidence for. I don't have the exact results in front of me, but my computer program generated several thousand hypothetical forms, and my searches then turned up quotational evidence for around 100 terms. So the success rate was somewhere in the neighborhood of 2%.

My question: Have there been other dictionaries whose word list was (even partly) generated using a method of programmatically generating hypothetical forms and then winnowing the word list?

My understanding is that it has happened at least once before. In their ''Dictionary of Krio-English'' (OUP, 1980) Fyle and Jones describe a method they used in the early 1970s to rapidly build up their Krio word list:

“A search [was made] for all known monosyllables in the language, using native-speaker competence. The method was simply to note all the combinations of consonant(s) + vowel + consonant(s) ((C^n)V(C^n)) allowable by the phonology of the language, and to record all those that turned out to be actual Krio monosyllables. This search yielded well over 1,000 monosyllables” (xii).

Any leads and suggestions are appreciated.

Tim Stewart
timdictionaryofchristianese.com


Linguistic Field(s): Lexicography

Page Updated: 26-Oct-2016