Review of  Ethnologue

Reviewer: Harald Hammarström
Book Title: Ethnologue
Book Author: Sorry, No Book Author Data Available!
Publisher: SIL International Publications
Linguistic Field(s): Language Documentation
Anthropological Linguistics
Book Announcement: 16.2637

Discuss this Review
Help on Posting

Date: Mon, 5 Sep 2005 18:40:01 +0200 (MEST)
From: Harald Hammarström <harald2@cs.chalmers.se>
Subject: Ethnologue: Languages of the World, 15th edition

EDITOR: Gordon, Raymond J.
TITLE: Ethnologue
SUBTITLE: Languages of the World
PUBLISHER: SIL International
YEAR: 2005

Harald Hammarström, Department of Computing Science, Chalmers University
of Technology


The Ethnologue (2005) is the 15th edition of the SIL International effort
to gather a catalogue of all the living languages of the world. The
hardbound 1272-page volume is organized as follows:
Introduction 7-14
Statistical Summaries 15-36
Languages of the World 37-648
References 649-672
Language Maps 673-888
Indexes 889-1272

I will concentrate on the bulk of the work, i.e. the language entries and
information about them in the introduction. Inasmuch as they are correct
there is not much for a linguist to say about the statistical summaries,
maps and indexes (except that the maps, in colour, look great and will be
very useful). The first edition of the Ethnologue came out in 1951 and had
information on 46 languages. This 15th edition sports 7299 language
entries and the system of (lowercase) three-letter identifiers for each
language entry is now a draft ISO/DIS 639-3 standard. All the information
in the book version is also available free of charge on the web
http://www.ethnologue.com which greatly facilitates access and
searchability. SIL deserve a huge thank you for posting the web edition,
which will doubtlessly also increase the outreach of Ethnologue.


Out of the 7299 entries, 6912 represent living languages. "Living"
means "definitely having native speakers" so e.g. Latin is not counted as
living and there are another 27 'second language only'/'no data' entries.
Most of the remaining 360 extinct languages represent languages which have
died relatively recently (say within the last 100 years). Ethnologue does
not aim to catalogue all dead languages, so even well-attested ones like
Timucua (Granberry 1993) or Akkadian (Ungnad 1964) are missing. However, a
selection of ancient extinct languages are still listed (such as Ge'ez and
Coptic), perhaps those which have a bible translation. Likewise, there is
no aim towards completeness as to relatively recently extinct languages
either, whether poorly attested or well-attested. Consequently one can
find literally hundreds of extinct New World languages and languages
families in the lists of (Campbell 1997; Landar 1996; Adelaar 2004;
Kaufman 1994; Fabre 2005; Garza Cuarón and Lastra 1991) that are not in
Ethnologue. As plenty of extinct languages are not listed, their
respective family trees silently appear without these branches (e.g.
Ethnologue's Semitic is listed without its East Semitic branch consisting
of the long dead Eblaite and Akkadian (Faber 1997)).

The 6912 living languages include 124 living sign languages, 1 living
artificial language and 5 living pidgins (namely hmo, chn, nef, lir, cpi
despite the standard definition of pidgins as having no native speakers
(Bakker 2002, p. 7). Since Ethnologue admits (p. 13) that the inventory of
pidgins-jargons-special languages, e.g. sorcerers' languages, is not
complete there may well exist more of this kind.

Ethnologue commendably includes known unknown languages, i.e. where there
are speakers known to exist who presumably speak something but, since they
are not in contact, we don't know what. Examples of these are Sentinelese
of the Andamans (Abbi 2004; Shashi 1994), Uru-Pa-In (Angenot-de-Lima 2002,
p. 38) of Brazil, Yarí (Adelaar 2004, p. 624) of Colombia. Carabayo seems
to be a case where the group's 3 houses are known from airplane
observations. Another five Brazilian languages I know of only from the
Ethnologue: Himarimã, Iapama, Karahawyana, Kohoroxitari and Papavô.

1.1 UNLISTED LANGUAGES. Four quite solidly extant Brazilian languages are
missing: Máku is still reported to have 1 speaker (Rodrigues 2005; Seki
1999; Migliazza 1985, pp. 37, 280, 52). Kwazá and Aikanã are excellently
discussed in the introduction of van der Voort (2000). The isolate Kanoê
is not the Tupi Kanoé [kxo] or the [kxo] entry is quite erroneous, see p.
23-24 of Bacelar (2004).

Adelaar (2004, 164) mentions another three living South American languages
that are missing: Pisamira, Nonuya and Yurí. He also sheds light on a
couple of languages which can be presumed extinct on good grounds but
which do not have entries (as living or dead) in the Ethnologue: Opón-
Carare (5 speakers in 1944) (p. 114-115), Mochica (p. 172), some Tucanoan
e.g. Coretú and Icaguate (p. 621) as well as Culli/Culle (p. 172-173).

In 2005, Roger Blench and associates have uncovered data from the Dogon
Plateau of West Africa that prove the existence of several "new"
languages, including one of unknown affiliation; manuscript sources (cited
with permission) are available on his Dogon and other webpages
http://homepage.ntlworld.com/roger_blench/Dogon/. Without doubt, there
will be more "discoveries" in the future on the languages in Northern
Nigeria and adjacent regions. Likewise, two Australian mixed languages
have been brought to light (McConvell and Meakins 2005; O'Shannessy 2005)
too recently to make it into Ethnologue.

1.2 SPURIOUS LANGUAGES. In a work of this size it's hard to completely
exclude languages whose existence is really unsupported, such as Mutús of
the 14th edition (which is now removed, see also Adelaar (2004, p. 125)).

Since Ethnologue does not systematically include attested extinct
languages, the extinct unclassified Colombian languages Cagua, Chipiaje,
Coxima and Natagaima look suspicious, especially since they not mentioned
by by Adelaar or sources therein. Likewise, extinct unclassified Monimbo
of Nicaragua is not to be found in Meso-American sourcebooks. These cases
need of course not be spurious but their inclusion is highly arbitrary in
the masses of extinct unclassified, better documented, South American
languages that could have been taken up.

Pankararú [paz] and Pankararé [pax] are treated by most, e.g. Fabre
(2005), as one extinct language isolate whereas Ethnologue has two
entries, one extinct language isolate and one extinct unclassified.

Yauma is given in Ethnologue as an unclassified language of Angola.
Nothing else suggests that this should be anything more exotic than a
regular Bantu language. In fact it is explicitly listed as a Lucazi
dialect in Fleisch (2000, p. 1).

For extinct languages which are really living or vice versa see the
section below on speaker population. For languages that are better treated
as dialects see the section below on languages and dialects.

2.1 IN THEORY. The thoughts behind the Ethnologue language vs. dialect
divisions are so important that I will quote the section (p. 8):

"Not all scholars share the same set of criteria for what constitutes
a 'language' and what features define a 'dialect.' The Ethnologue applies
the following three basic criteria:
* Two related varieties are normally considered varieties of the same
language if speakers of each variety have inherent understanding of the
other variety at a functional level (that is, can understand based on
knowledge of their own variety without needing to learn the other variety).
* Where spoken intelligibility between varieties is marginal, the
existence of a common literature or of a common ethnolinguistic identity
with a central variety that both understand can be a strong indicator that
they should nevertheless be considered varieties of the same language.
* Where there is enough intelligibility between varieties to enable
communication, the existence of well-established distinct ethnolinguistic
identities can be a strong indicator that they should nevertheless be
considered to be different languages."

The problem with the second and third is that we don't know what a "well-
established ethnolinguistic identity" is. What would have been in order
is: a few examples, a systematic indication of when which criteria have
been utilized or else a rough indication of frequency of application.
Unclear cases are noted sporadically in the comments to the individual
entries in question but e.g. the latter criterion has obviously been
applied without indication in cases like the division of Serbian-Croatian-
Bosnian and Gitxsan-Nisga'a. Anyway, the bottom line is that little is won
by trying to break intelligibility ties with criteria that introduce new

The usage of the second criterion has some peculiar implications. If
accepted, the language/dialect status of two languages A and B can no
longer be established solely by inspection of all properties of A and B,
but depends on the existence of a third variety C. For example, two Vulgar
Latin dialects are one language as long as there is a Latin literature,
but if the speakers of one dialect become illiterate or we eradicate all
Latin writing, then they are perhaps two languages. Moreover, the number
of languages of three such varieties depends on the distribution of the
three over people. If they are distributed over two people, speaking A,C
and B,C respectively then the three are one and the same language. If they
are distributed over three people such that they all speak only one each,
then the number of languages is 1-3 (arguably 2). Note that there is no
inconsistency in the latter example. Is it perfectly possible for A and B
to understand C, but not produce it, so that A and B could not communicate
with each other alone. As a native Swedish speaker I can understand a lot
of Danish, but I can't produce credible Danish.

Finally, in their present formulation, the application of the first and
second criteria may lead to inconsistencies. Consider four varieties
A,B,C,D such that they all have the same ethnolinguistic identity (e.g.
Kurds), B and C have independent literatures (e.g. Sorani and Kurmanji
Kurdish), B and C are mutually intelligible (according to some, Sorani and
Kurmanji Kurdish are). Finally let A be mutually intelligible to B and
marginally to C, and likewise D to be mutually intelligible to C and
marginally to B. (It should be possible to find such Kurdish dialects.)
Now, A and D are not mutually intelligible at all and therefore, by
application of the first criterion, at least those two are separate
languages. The second criteria does not apply because, between A and D,
intelligibility is not even marginal. However, if we start by using the
second criterion on A,B,C and on B,C,D, all four must be one and the same

A popular belief holds that one cannot count the number of languages by
the mutual intelligibility criterion even if one sets an arbitrary
definition for when mutual intelligibility holds (say 85% shared
vocabulary) because of inconsistencies when applied to dialect continua.
This view is premature, it is perfectly possible to do this in an
intuitive way without any inconsistencies (Hammarström 2005).

2.2 IN PRACTICE. Many authors have noted the tendency of Ethnologue to be
extreme 'splitters' i.e. to prefer to split speech varieties into distinct
languages whenever possible. Middle American expert (Kaufman 1994, p. 33)
writes condescendingly of the 11th edition of Ethnologue (Grimes 1988)
suggesting the ratio 1:2 between 'reality' and Ethnologue (by argumentum
ad his own auctoritatem). In a more well-argued manner, traversing
handbooks area by area, the Africanist Maho (2004) finds 1441 living
African languages versus 2058 in the 14th edition of Ethnologue (Grimes
2000); this 15th edition has 2092.

However, as Maho notes (p. 12), maybe the discrepancy is due rather to the
handbooks and overviews being lumpers. For instance, Ethnologue splits
into 31 English-based Creoles, 46 Quechua, 69 Mayan languages, 21 Gbe, 35
Arabic (+3 Arabic-based Creoles), 25 Naga, 26 Berber, 21 Manding, 9 Fulani
but only one Hausa language -- whereas we are used to reading about these
in one-liners rather than as full-fledged families.

Since this is quite an important question, I have made a dive into the
specialist literature to compare the Ethnologue judgments. I think it's
fair to say that, most of the time, Ethnologue is consistent with the
specialists even where their sources must be independent. A lot of times
Ethnologue counts more languages than the specialists -- sometimes
wrongly, sometimes out of due caution. Less often, but still often, the
specialists count more mutually unintelligible varieties than Ethnologue.
Examples follow:

2.2.1 Ethnologue Undercounts.
* Lauje [law] should be split in two (Himmelmann 2001, p. 21) ".. both
Lauje and Ampibabo-Lauje speakers do not consider their speech varieties
mutually intelligible".
* Lenca is (or was) two languages and Xinca was more like four languages
(Campbell 1997, p. 166-167).
* Kilii Boni may be split from Boni (Heine 1982, p. 12).
* Kamona may be split from Bijogo (Segerer 2002, p. 7) as Ethnologue
* Befang could be split into Bangui and Modele (Boum 1981, p. 19) on
decent grounds.
* Bade [bde] could be split following Schuh: "Bade is dialectally diverse,
with some dialects differing enough from each other that one is tempted to
call the distinct languages" (Schuh 2005, p. 1).
* Panoan Katukína and Shanenawa are better treated as separate languages
(Vieira Cândido 2004, p. 13).
* Lemiting was distinct from Kiput (Blust 2003, p. 1).
* Mambila could be split into more than 2 (Connell 2000, p. 202).
* Tetun [tet] consists of two ".. virtually mutually untelligible" (van
Engelehoven and van Klinken 2005, p. 735) dialects. See also (van Klinken
1999; Williams-van Klinken, Hajek, and Nordlinger 2002, p. 3, 6) and
section 1.1 of (Williams-van Klinken, Hajek, and Nordlinger 2001).

2.2.2 Ethnologue Overcounts
* Kanuri/Kanembu is 1 language rather than 4 (Cyffer 1998, p. 31).
* Turkana is 1 language rather than 4 (Dimmendaal 1983, p. 2).
* Batak should be 2 perhaps 3 languages rather than 7 (Woollams 2005, p.
* Adang and Hamap are the same: "Adang speakers and Hamap speakers always
understand each other, when speaking their languages, though there are a
few differences (mainly phonological) between the two" (Haan 2001, p. 5).
* Many Australian, e.g. [piu], [pjt] and [kdd] are mutually intelligible
varieties (Dixon 2002, p. 5).
* Ibani [iby], Okrika [okr] and Kalabari [ijn] Ijo should be one language
(Williamson 1969, p. 2) "These three dialects are ... mutually
intelligible", instead of the confused [okr] and [ijn] as two separate
languages of an [East, Ibani-Okrika-Kalabari] branch, and [iby] of an
[Eastern, Northeastern, Ibani-Okrika-Kalabari] branch.
* Cacua and Nukak as well as Huoda and Yuhup may be perhaps be merged
(Andrade Martins 2004, p. 7).
* Perhaps [nyn], [nyo] are the same [ttj] the same (Rubongoya 1999, p.
* The division of Mumuye Proper into 5 languages is not supported by
(Shimizu 1979, p. 11-19) but then Ethnologue lists several varieties that
are not mentioned by Shimizu.
* In MacKay (1999, p. 12) 4 rather than 8 Totonac languages are recognized.
* The Bankon/Barombi split is ok but giving them as one language would
also have been ok (Atindogbé 1996).
* Northern (in Burkina Faso) and Southern (in Ghana) Dagaare are the same
according to (Naden 1988, p. 42) "can understand each other without undue
difficulty" and is not contradicted by the more recent source (Bodomo
* Furthermore in Ghana, "there is mutual intelligibility between ..."
(Dolphyne and Kropp Dakubu 1988, p. 54) Ahanta and Nzema and Nzema and
Anyi. There is (Dolphyne and Kropp Dakubu 1988, p. 77) "considerable
amount of mutual intelligibility" between Nchumbulu and Dwang.
* As for the notoriously difficult !Kung dialect continuum, Maho (1998, p.
113) states that "They all speak ... mutually intelligible forms of
speech". Both Maho's book and the volume (Haacke and Elderkin 1997) in
which the dialect study by Snyman appealed to Maho appear in Ethnologue's
* According to mutual intelligibility, there are only 3 Miao, 5 Bunu and 2
She languages (Bradley and Harlow 1994, p. 166).
* The split of the Makhuwa languages looks to be out of due caution
(Kisseberth 2003).
* The split of Bima-Sumba languages looks to be out of due caution (Klamer
2005, p. 709).
* Lampungic is better analyzed as 3 rather than 9 languages (Anderbeck
* Peripheral and Khalkh Mongolian are intelligible (Svantesson 2003;
Janhunen 2003a). The split into 3 Buryat languages must be due to partly
extralinguistic criteria (Skribnik 2003).

2.2.3 Ethnologue is in Harmony
* Good resolution of vexed Pashai dialect situation [aae, glh, psh, psi]
(Bashir 2003, p. 826).
* Zulgwa-Minew-Gemzek is one and the same [gnd] as judges (Barreteau 1984,
p. 170).
* Mekeo really is three languages, especially when takes cultural
differences into account, although speakers can learn the understand the
other dialects in less than a week's time (Jones 1998, p. 19).
* The division of Eskimo accords well with (Fortescue 1984; Miyaoka 1996).
* 4 Kham languages is a decent interpretation of (Watters 2002, p. 12-13).
* Nyamwanga-Iwa and Lungu-Mambwe are recognized in accordance with (Walsh
and Swilla 2000).
* The Chinese Mongolian dialect situation is well-handled (Janhunen 2003b).
* The heavy division of Banda receives support from detailed study of
(Cloarec-Heiss 2000).
* Two Araucanian languages is entirely accurate (Smeets 1989, p. 9-10).
* 2 Slave languages is not contradicted by section 2.3 in (Rice 1989).
* 8 Songhai is not a bad idea (Tersis 1972; Zima 1994; Heath 1999, p.
* Ekoti is justly a separate language (Schadeberg and Mucanheia 2000, p.
* Kilivila is fine (Lawton 1993, p. 6).
* Treating Matses [mcf] and Matis [mpg] as two separate languages is good
(Fleck 2003).
* Ngoe languages are consistent with (Hedinger 1987, p. 27).
* 2 Balanta languages is what (Wilson 1961, p. 139) postulates.
* 21 Mano & Dan languages is not contradicted by (Becker-Donner 1965, p.
* 21 Gbe languages may be too much but not impossible (Lefebvre and
Brousseau 2002; Capo 1990, p. 1-3,62).
* 1 May Brat language is optimal (Philomena Hedwig 1999).
* The Moken/Moklen division agrees with (Larish 2005, p. 514).
* ...

Indexable by the three-letter identification code, language entries have
the following main fields: Primary Name, Alternate Names, Speaker
Population, Classification, and Location. I regard them as primary since
they seem to be systematically indicated. The meaning and accuracy of the
data in these fields is scrutinized below.

In addition, but not with complete systematicity, the following pieces of
language information are usually given: dialect names, intelligibility
degree/lexical similarity with some neighbouring language(s), language
function(s) (e.g. official), language domain (e.g. liturgical), script,
typological remarks (e.g. basic word order), publications and use in media
(usually means presence of bible translation), status (e.g. extinct,
second language only, jargon, language of herb doctors) and other remarks.
Moreover, further information about the speakers is also usually supplied,
such as degree of bilingualism, literacy, religion, attitude to language,
means of subsistence (e.g. hunter-gatherers) and geo-ecological
environment (e.g. rain forest).

This additional data is welcome to the reader but will not be reviewed
here because it is not clear what the intended aim of coverage is. For
instance, my computer calculations show that 2675 have religion annotated,
3730 language development, 4108 language use and 1097 basic word order, as
SOV 558
SVO 322
VSO 133
VOS 24
OSV 12
OVS 10

However, these annotations are frequently partial and/or unsystematic and
have little to do with availability of data. For instance, Tundra Yukaghir
[ykg] is marked 'nontonal', whereas non-tonal Slovak [slk] and the 8-tone
language Iau [tmu] (Bateman 1986) have no information about tone or other
typological data.

3.1 PRIMARY AND ALTERNATE NAMES. Each entry is given a primary name which
is usually an established name from the literature. This hardly ever
coincides with the speakers' own name for the language (for an idea of the
discrepancy, check e.g. Appleyard in Irvine (1994), but Ethnologue aims to
set the primary name accordingly if there is a strong known desire (p. 10)
from the speakers to rid an entrenched foreign or offensive name.
Therefore, the Ethnologue has e.g. Tohono O'odham as primary name instead
of Papago (Zepeda 1983), Shabo for Mikeyir (Teferra 1991, p. 371) and,
more observantly than others, Nivaclé for Chulupí, Ashlushlay etc. But one
has missed e.g. Nuuchanulth for Nootka (Nakayama 2001, p. 2) and Nivkh for
Gilyak (Panfilov 1965).

The alternate names are some alternate names (often familiar from the
literature) and, as is well-known to all ethnolinguists, a multitude of
franco-, anglo-, hispanico-, portugo-phone spelling variants with or
without diacritics. In fact, the 7299 entries yield 39418 names in total,
of which about 45% are spelling variants. (This figure is from a rather
crude computerized statistical analysis.)

In numerous cases, neither the primary nor alternate names coincides with
the name used in the most recent/most authoritative piece of literature,
e.g. Sediq vs. Seediq (Tsukida 2005, p. 291), Phun vs. Hpon (Bradley and
Harlow 1994, p. 179), Qwarenya vs. Qwara (Appleyard 1998), Jiwarli vs.
Djiwarli (Austin 2001)). When searching, the user should be prepared to
try spelling variants with great persistence and creativity, and I have
tried to exercise extra care that none of the issues raised in this review
are mistakes in this respect.

3.2 SPEAKER POPULATION. Speaker populations are generally given with a
source, which may be a publication, person, organization or governmental
institution, as well as year of source. Some 750 entries do not give a
source-year pair at all, of which 274 are 'Extinct' and 238 (no overlap
with 'Extinct') are marked 'No estimate available' (a statement for which
one arguably does not need a source). Sometimes the year of the source is
that of the publication (1998) rather than the survey (1991) (Maho 1998),
sometimes the year is that of the survey (1995) rather than the
publication (2001) (Berthelette and Berthelette 2001) and sometimes both
are given e.g. eki 5,000 (1988, in Crozier and Blench 1992:36).

For the entries which have source years, the distribution of entries over
years is as follows (average 1993.01):

1922 1
1925 1
1931 1
1934 1
1954 1
1956 1
1959 1
1961 7
1962 6
1963 3
1965 1
1966 1
1967 1
1969 8
1970 9
1971 33
1972 24
1973 53
1974 1
1975 29
1976 20
1977 87
1978 38
1979 23
1980 67
1981 413
1982 162
1983 176
1984 49
1985 48
1986 111
1987 237
1988 70
1989 168
1990 383
1991 384
1992 98
1993 288
1994 172
1995 317
1996 143
1997 204
1998 297
1999 250
2000 1181
2001 243
2002 310
2003 337
2004 84
TOTAL 6543

Of those 183 entries with sources from 1975 and older only a handful
represent extinct languages. There is a certain persistent antiquity,
which is more revealing when we look at who the sources are. The sources
which account for 100 or more entries are:

SIL 1816
None 1270
Census 733
World Christian Database 545
Wurm and Hattori (1981) 337
United Bible Societies 145
Wurm 120
... ...

None means that only the year is given and 'Census' represent many
different censuses.

The intersecting point of interest is that Wurm and Hattori (1981) is the
source for (exactly) 337 entries, and the figures in that volume stem
mostly from surveys in the 1970s (Wurm and Hattori 1981) (no page number
given since this publication does not have page numbers). This poor effort
to update from Wurm and Hattori 1981, although a landmark publication, has
a particular effect on the number of non-extinct Australian languages. In
Dixon (2002, p. 2) we are told that "more than half of these [240-250
indigenous languages] are no longer spoken or remembered"; see also
McConvell (2001). Ethnologue lists 263 Australian languages of which 224
are listed as not (yet) extinct. This is a gross overestimate and SIL
should have consulted an Australian specialist here. From e.g. Wurm (2003,
pp. 42-43) one can glean a list of now deceased languages that Ethnologue
cites as still having speakers as of Wurm and Hattori (1981): lrg, nrx,
umr, bpt, fln, bym, gdc, gyf, gyy, gwu, kgl, zmk, zmc, wdu, wrg, zmu, nyt,
wkw, wga, wrb, djl, ....

There are too many other cases where there is a newer better source, e.g.
those on Ket by Krivogonov who visited every village 1991-1995 (Georg
2003, p. 99-103), for speakers population than Ethnologue, so I will just
give a selection of some more important ones below. There are also lots of
cases where the Ethnologue figures are up-to-date (although not extremely
up-to-date) such as e.g. following Salminen on Saami languages in
http://www.helsinki.fi/~tasalmin/fu.html and Ongota; slightly newer
figures are given in Savà (2003, p. 173).

3.2.1 Endangered Languages. Ethnologue marks languages which have a
speaker population of less than 50 or a very small fraction of the actual
ethnic group as 'nearly extinct'. They do not try to take on a more
sophisticated approach so e.g. Masep with 30-40 speakers is classed as
endangered despite the fact that it is used vigorously by all ages
(Clouse, Donohue, and Ma 2002, p. 4), and has been in the same state at
least since 1955.

3.2.2 Wrongly Extinct. To label languages as extinct is a bit sensitive
since it may deter people from searching for remaining speakers. Languages
like Tinigua, Kusunda and Leco have been said to be dead earlier but then
speakers were found.

Itene (Angenot-de-Lima 2002; Crevels 2002, p. 39, 34), Cayuvava (Crevels
2002, p. 34), and Yahuna (Adelaar 2004, p. 621), Senhaja de Srair
(Behnstedt 2002) are not (yet) extinct. Kusunda [kgg] is listed both as
extinct and 3 speakers. It's best to list it as not (yet) extinct (Rana

There are a number of languages which are really presumed extinct rather
than definitely extinct e.g: Jorá (Crevels 2002, p. 55), Tekiraka (Adelaar
2004, p. 456) and perhaps Wappo. I don't know what to say of Yavitero
since Adelaar says it is extinct on p. 162 but has 1 speaker on p. 612.
Canichana is however correctly classified as extinct in spite of Adelaar's
mention of semi-speakers (p. 613) since Crevels (2002, p. 55) clarifies
their nature "Estos hablantes sólo se acuerdan de algunas palabras y una o
dos frases".

3.2.3 Overestimated Populations. Hayu is mentioned as nearly extinct
(Bradley and Harlow 1994, p. 172) so the figure of 1743 speakers is
suspicious and probably refers to the ethnic group. So is said to have
5000 speakers (source dated 1972) but there are at most 100 speakers
(Carlin 1993, p. 5). (5000 is a plausible size for the ethnic group.)
Bubburè is claimed to have 500 speakers whereas actually it's more like 10
(Haruna 1998). Luo (also known as Kasabe) died in 1995 (Connell 1998, p.
216). Ona is extinct according to Adelaar (2004, p. 615). Wotapuri-
Katarwalai is probably extinct (Bashir 2003, p. 869), so the Ethnologue
number of 2000 probably refers to the ethnic group. Tyua is extinct
(Batibo 1998, p. 277) so the figure 817, as do many other of Cook's 2004
figures, probably refers to the ethnic group.

3.3 CLASSIFICATION. Ethnologue's language family index lists 103 families,
40 isolates, 21 mixed languages, 18 pidgins, 86 creoles, and 78
unclassfied languages. From the introduction (p. 14) it is clear that the
intent is genetic classification rather than some convenience grouping.
The basis for the classification is said to be the International
Encyclopedia of Linguistics 2nd ed. (IEL) (Frawley 2003), but that is
really an empty self-reference since IEL follows the 14th edition of
Ethnologue in its classification (Frawley 2003, p. xiv): "These lists [of
language families and their members] were compiled by Barbara Grimes --
not by the authors of the articles -- using the Ethnologue ... There
remain great controversies in the field over which languages belong to
which families, and, indeed, some of the groupings in the lists are at
odds with the positions of the authors of the articles. The goal of
including the lists was not to resolve controversies -- or promote them! --
but to ensure that the user has maximum information."

The IEL adds no substance to the classification and the argument given is
obviously a smokescreen to avoid effort. Surely, one can provide the user
with more 'maximum information' than arbitrariness and contradiction. I am
not asking that SIL embark on a large-scale enterprise of historical
linguistics, only that they report the latest well-argued expert opinions
on the matter.

A good case in point is Khoisan which is listed as a family even here in
the 15th edition of Ethnologue. But Khoisan specialists have denied the
establishment of genetic unity of its six genetically independent units
for ages (Bleek 1927; Westphal 1963; Westphal 1971; Köhler 1975; Winter
1981; Güldemann and Vossen 2000; Güldemann 2003), and other Khoisanists'
belief have never amounted to anything more than belief. Note that the
list includes Güldemann in the IEL, which is the newest published family
overview in wait for the ever-forthcoming Khoisan handbook from Routledge.

Although the 15th edition has incorporated some recent findings, there is
still a notable hangover of highly controversial groupings, to name a few:
Altaic (Róna-Tas 1998), Australian (Dixon 2002) see also (Evans 2005) and
references therein, Andamanese (Abbi 2004), Kadugli-Krongo should be a
stand-alone family outside Nilo-Saharan (Reh 1985; Ehret 2001, p. 2, 68),
East Papuan (Dunn, Reesink, and Terrill 2002, p. 31), Arutani-Sape
(Migliazza 1985), Trans New Guinea and Geelvink Bay need update (Foley
2000, p. 362), the North American Na-Dene, Penutian, Hokan, Coahuiltecan
(Tonkawa wrongly included), Hokan, Gulf need further splitting following
the well-argued divisions of Mithun (1999) and Campbell (1997), as well as
in South America (information scattered in Fabre (2005) and Adelaar
(2004). The internal subclassification in many families does not follow
the latest well-argued accounts either, e.g. Nilo-Saharan (Ehret 2001) and
Sino-Tibetan (Thurgood and LaPolla 2003); cf. van Driem (2003). Many
groupings, however, are quite satisfactory, such as e.g. Grassfields Bantu
(Watters 2003).

The is no mention of the definition used for 'Mixed Language' but it seems
to follow the discussion in Matras and Bakker (2003) since the category
contains the commonly discussed cases: Ma'a/Mbugu, Media Lengua, Michif,
Callahuaya plus quite a few more (totalling 21), including some poorly
known European travellers' languages. (Cocama-Cocamilla [cod] (Adelaar
2004, p. 432) may belong here but is classified under Tupi.) Similarly,
although it is not directly mentioned, one can infer that the most
important aspects of the definition used for creole is "native speaker"
and "full expressivity".

3.4 ISOLATES AND UNCLASSIFIED LANGUAGES. Although Ethnologue never state
it, the meaning of 'unclassified' vs. 'isolate' ought to be that
unclassified languages have too little data to be classified, whereas
isolate means that there is sufficient data but that any attempts to link
to it have failed. There are also many languages, apart from the 78 stand-
alone unclassifieds, which are unclassified within families. This, I
gather from the entries in question, should be interpreted as
either "sufficient data to classify into family but insufficient for lower-
level assignment" or "full data on language is available but current
research on lower-level assignment inconclusive".

Following the definition of isolate vs. (stand-alone) unclassified, a
number of unclassified languages should be moved to isolate: Beothuk
(Mithun 1999), Kunza/Atacamen~o (Adelaar 2004, p. 375-385), Puquina,
Yuwana (Migliazza 1985; Fabre 2005), and Yaruro (Adelaar 2004, p. 163).

Luo (aka Kasabe) and Yeni, if at all different from Njerep (Connell 1998,
p. 214-217), are close relatives of Njerep rather than unclassifieds
(Connell and Zeitlyn 2000). Likewise, as Ethnologue admits, Bung may go
with Ndung-Kwanja.

The unclassified category further includes a number of languages whose
unclassified status is harder to attack: Brazilian Wasu (better known as
Wassú), Amikoana, Arára, Agavotaguerra, Miarrã, Tapeba, Tingui-Boto (sic),
Tremembé, Truká, a couple of Papuan and Nigerian languages and some second-
language special languages like Haitian Vodoun Culture Language and
Traveller Scottish.

The unclassified extinct poorly attested Brazilian languages Kaimbé,
Kamba, Kambiwá, Karirí-Xocó, Pankararé, Uamué, Xukurú, Pataxó-Hãhaãi,
Wakoná and Tuxá seem to be listed only because they appear in the SIL
Publication (Meader 1978), otherwise extinct unclassified Amazonian or non-
Amazonian, e.g. Kenaboi (Hajek 1998) languages usually do not get an entry.

The status of the Indian and Afghan unclassifieds Andh, Bhatola, Majhwar,
Mukha-dora, Aariya, Malakhel and Warduji, as well as Waxianghua of China,
will hopefully be examined in the near future.

3.5 OTHER ISOLATES. A number of individual languages that Ethnologue
classifies into families are better treated as isolates, such as: Masep
(Clouse, Donohue, and Ma 2002, p. 5), Kusunda (Rana 2002), Lenca, Xinca
(Campbell 1997, p. 166-167) the African isolates Ongota (Fleming, Yilma,
Mitiku, Hayward, Miyawaki, Mikesh, and Seelig 1993; Savà and Tosco 2000;
Savà 2003), Jaláa (Kleinewillinghöfer 2001), and Shabo (Teferra 1991;
Ehret 2001, p. 68). Kujarge and Laal are two other unclassified languages
which seem to have enough material to be called isolates. Kara is
problematic to place in Central Sudanic (Djarangar 2000, p. 219) so it's
not clear what to do with it.

3.6 LOCATION. I am not competent to scrutinize the location data so I have
no comments.

As a catalogue the Ethnologue is of very high absolute value and by far
the best of its kind. However, it is not a reference book and one should
always double check to get the latest and most authoritative information
on individual entries. The relative number of errors is low but the
Ethnologue is leaking in various places where it should not have to. I
don't think the Ethnologue deserves much beating for their practice of
splitting dialects into languages. My impression is that, at any rate, the
specialist literature (as a whole) is not any better. The language/dialect
implementation, although still relatively eager to split, is now rather
informed and can boast many recent dialect surveys conducted by SIL
themselves. Therefore I look forward to an even sharper 16th edition.

Thanks to all language speakers, fieldworkers and libraries.

The name of last speaker of Ubykh, given as 'Tevfik Esen', should be
spelled with a 'ç' at the end.

Data which belong to 'remarks on classification' seem to have been
systematically misplaced into the 'dialects' field. For instance, we find
under 'dialects' such comments as:
* "Greenberg places it in Macro-Chibchan" [kuz]
* "It may be distantly related to Altaic or Uralic" [ykg]
* "Ruhlen says it is Andean. Adelaar says it is in the Hibito-Cholon
family" [cht]
* "May be in a Takelma-Kalapuyan subgroup, but not conclusive." [tkm]
* "Mason (1950:246 with disclaimer), Tax (1960:433), and Kaufman (1990:43
tentatively) say this is Witotoan. Tovar (1961:150), Witte (1981:1), and
Aschmann (1993:2) say it is an isolate." [ano]

The introduction (p. 13) claims that there has been 50,000 updates since
the last edition. Clearly, 7 fields per entry have not been updated, so
this leaves us with a very diluted notion of an update.

The index says Kolyma Yukaghir (p. 1225) under "Yukaghir, southern [yux]"
has its entry on p. 499 instead of the correct p. 507.

The list of sources has roughly one immediately spottable typo per page:
'Die nordjemenitischen Dialaekte' (p. 650) should be '.. Dialekte .. '
'Northern Ter ritory' (p. 651) should be '.. Territory ..'
'Die Sprach von Wotapur' (p. 652) should be '.. Sprache ..'
'Annales' (p. 652) should be 'Annales de l'Université d'Abidjan, série H,
'Paris: Laroux' (p. 654) should be '.. Leroux' or '.. Ernest Leroux'
'des perlers dardes' (p. 655) should be '.. parlers ..'
'1903-1928. Linguistic Survey of India, 3 vols.' (p. 656) should be '.. 11
'Leningrad.', 'Moscow' on three entries by Grjunberg (p. 657) should be
prefixed 'Izdatel´stvo Akademii Nauk SSSR'
'A dialektologii' (p. 657) should be 'O dialektologii'
'A. Jazyery' (p. 657) should be 'M. Jazayery' (or M. A. for Mohammad Ali)
'Rudiger Koppe' (p. 657) should be 'Rüdiger Köppe'
'Togorestsprachen. Kölner Beiträge zur Afrikanistik, Band )' (p. 658)
should be '.. Band 1'
'Ein neuaramaischen Dielekt aus dem Vilayet Siirt (Ustanatolien). ZDub
121' (p. 659) should be 'Ein neuaramäischer Dialekt aus dem Vilayet Siirt
(Ostanatolien). Zeitschrift der Deutschen Morgenländischen Gesellschaft
'Mesopotamisch-Arabishen' (p. 659) should be 'Mesopotamisch-Arabischen'
'Neuaramaische Dialect' (p. 659) should be 'Neuaramäische Dialekt'
'Karassowitz' (p. 659) should be 'Harrassowitz'
'Beitrage' (p. 659) should be 'Beiträge' (and the publisher is probably
Afro-Pub and Beitr. zur Afrikanistik the series name).
'Kastenholz ... Vol. 2' (p. 659) should be '... Mande Languages and
Linguistics Vol. 2'
'Rudiger, Koppe' (p. 659) should be 'Rüdiger Köppe'
'Anthropological Linguistics 19.8.' (p. 660) could add the pages '378-
401', and there is a newer version of this article in the cited Manelis
Klein and Stark (1985).
'Societe' (p. 660) should be 'Société'
'Mahapatra, B. P. Malto 1979. An Ethnosemantic Study' (p. 661) should
be 'Mahapatra, B. P. 1979. Malto: An Ethnosemantic Study'
'Migliazza 1977 .. ms' (p. 662) was published in the cited Manelis Klein
and Stark 1985
'Heinz-Jurgen' (p. 664) should be 'Heinz-Jürgen'
'Sonsoral' (p. 665) should be 'Sonsorol'
'Saenz-Badillos' (p. 665) should be 'Sáenz-Badillos'
'filologica' (p. 667) should be 'filología'
'Afrika und Ubersee 40:110-112' (p. 668) should be '... Übersee ...' and
the full article is on pp. 73-84 and 93-115 as well as continued in vol
41:27-65, 117-153, 171-196.
'The Tati languages group' (p. 668) should be 'The Tati language group'
'langues parlees' (p. 669) should be 'langues parlées'
'Zhao, Xiangru ..' (p. 672) add 'pp. 260-287'


Harald Hammarström is a PhD Student in Computational Linguistics at the
Depertment of Computing Science at Chalmers University of Technology,
Gothenburg, Sweden. His current research topic is Unsupervised Learning of
Concatenative Morphology but interests go significantly wider and include
linguistic typology and computational linguistics in general.

Format: Hardback
ISBN: 155671159X
ISBN-13: 9781556711596
Pages: 1272
Prices: U.S. $ 80.00