I have uploaded today 25 May 1994, at 14:25 Melbourne local time, file
GLOTTO02.ZIP into the /pc/incoming directory at

By the time you read this message it will probably have already been moved
to directory /pc/linguistics, where it will replace GLOTTO01.ZIP

One line description: Language classification and simulation
Suggested Garbo directory: /pc/linguistics
Replaces: /pc/linguistics/
Uploader name & email: Jacques B.M. Guy --
Author or company: Jacques B.M. Guy
Email address:
Surface address: Telecom Research Laboratories PO Box 249 Clayton 3168 Australia
Special requirement: nil
Shareware payment from private users: no
Shareware payment required from corporate users: no
Distribution limitations: nil
Demo: no
Nagware: no
Self-documenting: no
External documentation included: yes (220K)
Source included: no
Size: 173k compressed, 394k expanded

10-line description:

This package, consisting of six programs, twenty sample data files, and
two documentation files, lets you:
1. Classify languages from sample wordlists, with or without identifying
2. Classify languages from existing tables of cognate percentages.
3. Generate whole language families to test the validity and accuracy of
 any classification method relying on sample vocabulary lists or
 on proportions of shared cognates.

Detailed description:

 Name Size Contents
GLOTTO.DOC 111028 Documentation file.
GLOTED.EXE 32832 Program for typing existing tables of cognate
 percentages into computer files.
GLOTTREE.EXE 24144 Program for reconstructing the genealogical trees
 of language families from tables of cognate
GLOTLPP.EXE 22544 Program for computing tables of lexicophonological
 percentages directly from wordlists. Those tables
 can be used instead of cognate percentages.
GLOTPC.EXE 22384 Program for computing cognate percentages from files
 of identified cognate groups.
GLOTMRG.EXE 12768 Program for merging wordlists, listing them not by
 language, but by list item. Useful for identifying
 cognates by hand.
GLOTSIM.EXE 35934 Program for simulating the evolution and
 diversification of the vocabularies of language
 families. Useful for testing the validity and
 accuracy of various reconstruction methods.
GLOTTO.TXT 104202 "On Glottochronology and Lexicostatistics" XVth
 Pacific Science Congress, Dunedin, New Zealand,
VANUATU.PC 239 Percentages of cognates shared by eight languages
 of Vanuatu, formerly New Hebrides.
VANUATU.SIM 442 Description of the evolution and diversification
 of a language family. Running GLOTSIM with
 VANUATU.SIM as input generates a language family
 with lexicostatistical properties closely
 mimicking those of the real languages in
UTOAZTEC.PC 2487 Percentages of cognates shared by 32 Uto-Aztecan
 languages (from W.R. Miller's "The Classification
 of the Uto-Aztecan Languages Based on Lexical
 Evidence" (IJAL vol.40, no.1, January 1984,
UTOAZTEC.SIM 2193 Description of the evolution and diversification
 of a language family mimicking the lexicostatis-
 tical properties of the languages of UTOAZTEC.PC.
DANISH.VOC 1291 Sixteen languages each represented by a 200-item
NORWEGIA.VOC 1277 wordlist, for testing and experimenting.
SWEDISH.VOC 1309 Selected from Peter Bergman's "The Concise
FRENCH.VOC 1412 Dictionary of 26 Languages in Simultaneous
ITALIAN.VOC 1507 Translation", Signet Books, 1968.

 What is New in this Version

1. I have finally located a PC with a monochrome monitor and found that
 program GLOTED did not work at all on such a PC. I have corrected the
 error and it now works.

2. GLOTED has commands such as Alt-X for "exit", but if you are using
 a foreign-language keyboard you might well find that pressing Alt-X
 does nothing of the kind. I have added a command (F9) which causes
 GLOTED to ask you a few questions so that it may adapt to your

3. You no longer have to identify cognates to classify languages from
 wordlists. Program GLOTLPP computes similarity measures directly from
 wordlists, which can be used instead of cognate percentages. The
 process is much faster than cognate recognition. On a 386DX-33
 without a math co-processor GLOTLPP took 20 seconds to process the
 seventeen 200-item wordlists provided as example. On a 486DX-50 it
 took just under 9 seconds. Computing time is, very roughly,
 proportional to the number of items in the sample wordlist and to the
 square of the number of languages. The values computed by GLOTLPP
 being measures of phonological as well as lexical similarity, I would
 argue that, theoretically, they ought to give better classifications
 than cognate percentages proper.

4. In the first version of GLOTTO, if you wanted to classify languages
 from wordlists in the traditional way, you had to type those
 wordlists into computer files, merge them by item (using program
 GLOTMRG), insert your identification of cognates by hand, and,
 finally, you ran GLOTPC on that file to produce cognate percentages.

 This had two great disadvantages:

 a. Since GLOTTO allows handling up to 180 wordlists of up to 2000
 items each, the resulting file could be too large for many editors
 to handle. GLOTMRG now produces not one single file, but as many
 as necessary so that none is larger than 64K. GLOTPC has
 been modified to accept the new output from GLOTMRG.

 b. Typing wordlists into computer files is very time-consuming, and
 you might have preferred to have only to type in cognate groups,
 working directly from printed or handwritten wordlists. You can
 now do so. GLOTPC has been modified to accept this type of input
 data as well.

5. I have added an option to GLOTSIM which lets you specify unequal
 retention rates for lexical items, so that you may investigate
 their effects.

6. GLOTSIM now records the vocabularies of the languages it creates
 in formats compatible with the other programs in the package (GLOTMRG,

7. I have added to this package, in file GLOTTO.TXT, the text of my
 paper "On Glottochronology and Lexicostatistics" which was presented
 in 1983 at the XVth Pacific Science Conference. It discusses
 critically the main contributions to the topic from Swadesh 1950 to
 Blust 1981. Amongst other things it shows how Lees (1953)
 misinterpreted his data as evidence for a universal constant of
 lexical retention rate -- when it was evidence to the contrary, and
 how Blust's findings on the retention rates of Austronesian languages
 (1981) were corroborated by Dyen's independent observations presented
 on the same occasion (Third International Conference on Austronesian
 Linguistics, Denpasar, Indonesia). What prompted me to include it in
 this package was its conclusion:

 In which light we can only conclude that the present
 study is unlikely to have much impact, and that misuses
 of lexicostatistical data will continue as in the past
 for many years to come, perhaps even increasing with the
 easier and easier availability of cheap, high-speed
 computational facilities.

 Ten years later now, not only have I indeed observed a resurgence
 of interest in glottochronology, but the model, with all its false
 assumptions, is even being reinvented in biology: viz the late Alan
 Wilson's "biological clock" which is nothing but the notion of a
 universal, constant rate of change translated into genetics, even
 though it has long been observed by geneticists to be contrary to

8. GLOTPC has been enhanced to let you compute pseudo-cognate
 percentages from biological data, should you want to try
 reconstructing genetic trees without resorting to the "biological

9. I have corrected an error in GLOTTREE, which sometimes caused a
 branch showing no replacements to be misformed.
