Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!


Revitalizing Endangered Languages

Edited by Justyna Olko & Julia Sallabank

Revitalizing Endangered Languages "This guidebook provides ideas and strategies, as well as some background, to help with the effective revitalization of endangered languages. It covers a broad scope of themes including effective planning, benefits, wellbeing, economic aspects, attitudes and ideologies."

New from Wiley!


We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at***

Review of  The World Atlas of Language Structures

Reviewer: Mark Donohue
Book Title: The World Atlas of Language Structures
Book Author: Martin Haspelmath Matthew S. Dryer David Gil Bernard Comrie
Publisher: Oxford University Press
Linguistic Field(s): Historical Linguistics
Issue Number: 17.1055

Discuss this Review
Help on Posting
EDITORS: Haspelmath, Martin; Dryer, Matthew S.; Gil, David; Comrie,
TITLE: The World Atlas of Language Structures
PUBLISHER: Oxford University Press
YEAR: 2005

Mark Donohue, Monash University and National University of


The World Atlas of Language Structures (WALS) is a lot of fun. Before
anything else, that needs to be said. Addictively much fun. Beware!
This is not simply because of the interesting subject matter that it
deals with, but also because of the inclusion of a CD-ROM version of
the atlas; though calling the CD a 'version' of the atlas is unfair, and it
would be more accurate to call the large atlas a version of the CD-
ROM. But more on the specifics of the CD-ROM later.

WALS is an ambitious undertaking, and represents the first realisation
of a research program that will, no doubt, change the way we do
linguistics. This is the first attempt to map a large number of linguistic
features (142 different features, ranging from the phonetic,
phonological, morphological to the lexical and the paralinguistic) found
in a large number of languages (the core sample is 200 languages; see
for the list). Nichols (1992) is the only comparable published work
in terms of aims and scope, and that deals with 174 languages, and
10 multivalued linguistic features, all morphological. To say that WALS
significantly increases the degree to which linguists are exposed to
typological mapping is to make a great understatement.

The four editors coordinated the work of 54 chapter authors, each of
whom produced a map (or maps) with an explanatory mini-chapter
describing one aspect of linguistic structure (see the end of this review
for the full list). The choice of which features were included on the
maps is one that cannot, of course, satisfy everyone, though I must
admit I did find most of my reasonable hopes fulfilled. (I have a number
of unreasonable hopes, the lack of fulfillment of which I do not
begrudge the editors.) A full list of the (sometimes multivalued)
features included in the atlas can be found at the end of this review.

On the other hand, the presence of some of the maps that made it into
WALS confuses me. Why was the presence of an affricate or a plain
stop as the onset in the word for 'tea' deemed map-worthy? If we
really wanted to know, we could, as the author of the map admits he
did in many (most?) cases, got to and look up
the word. The map plots the lexemes used for 'tea' in 230 languages,
arranging them by stop-initial 'tea'-like words, affricate-initial 'cha'-like
words, and others, attributing the stop-initial versions to an ultimate
Min origin in China (the author mentions Amoy as the origin of the
etymon, but curiously does not include a language reference from this
area, the closest being found on Taiwan. The absence of any attempt
to indicate intermediate loci of diffusion, and the astonishing lack of
analysis of many of the forms found in languages. For example,
Soninke _dute_ is listed as one of the 'others' (not derivable
from 'Sinitic _cha_' or 'Min Nan Chinese _te55_' (_te55_ is a Chaozhou
[=Teochew] form; why it, and not an Amoy form, is cited is left
unexplained). The most cursory analysis would reveal a French
etymology for the Soninke word (<_du thée_), entirely plausible given
the colonial history of Mali. Similarly Hawaiian _kii_ is given as
an 'other', ignoring the well-known *k > t sound change that has
applied in Hawaiian, revealing a minimally-changed English loanword.
The author notes that in English some dialects preserve a form
spelled as 'tay', reflecting an 'older' pronunciation; but why not also
mention northern English 'char', as in char-lady, reflecting the
alternative non-Min etymology and not particularly hard to find out

The reasons for the inclusion of this particular piece of 'language
structure' with this lack of analysis can only be known to the editors.
Another issue I have with the editing is the lack of consistency
between chapters; this is hailed as a sign of the vitality of the atlas
('no attempt was made to make chapters by different authors that
overlap in their features consistent' - from the Introduction), but to this
reviewer it feels more like a lack of editorial authority: what, other than
deciding on the languages and language features, was the role of the
editors, if not to ensure consistency across different chapters?

To answer my own question, another role that I would assume fell to
the editors was one of language selection, and typological
representation. The core set of languages chosen clearly reflects
careful thought, and is a set of languages that shows both areal and
genetic diversity (it is, however, remarkably hard to make a map of all
and only these 200 languages).

Many of the maps, however, clearly do not reflect adherence to the
WALS list of core languages at all, and are simply the authors' own
private databases made into publishable form. The maps of colour
terms, for instance, show no evidence of having consulted publicly-
available dictionaries or lexicons of the languages in the sample, and
as such does not contain enough information to really justify two
foolscap pages of mapping: there simply isn't enough information (119
languages) to be able to make areal judgements (there are, for
instance, five entries for all of Australia, and three for all of mainland
east Asia [Mandarin, Korean and Japanese]). Here, too, there has
been no attempt to enforce the language 'minimum' (for more on the
number of languages represented, see below).

Most of the features are, however, the types of things that linguists
deal with every day: unusual consonants, size of the vowel inventory,
alignment (of 'full' NPs or pronouns), word order, demonstrative
contrasts, etc.). What is there to be gained by displaying these
features on a map? To answer this question, and to fully appreciate
WALS it is necessary to 'try it out'; short of purchasing it and poring
over it for days, interested readers can examine some samples pages
and maps on the Max Planck website (
This doesn't let you appreciate how useful it can be to correlate a
pair of features, which you have customised into the values that you
want, and spot the patterns (this is possible with the CD). It does,
however, allow you to see the wealth of information that some of the
'fuller' maps contain. And, more to the point, it allows you to appreciate
the geographical distribution of these features; a surprising number of
linguistic features show strong areal or macro-areal tendencies, cutting
across any genetic boundaries in their way. We have all known that there
are linguistic areas (such as South Asia, the Balkans, North America's
Pacific Northwest) in which languages converge towards a pan-
genetic norm; but with WALS we can see just _how_ prevalent areal
clusterings are in the distribution of linguistic structure.

This is not a criticism of WALS directly, or even of typology specifically,
but of any research that relies on secondary sources, and it is
obvious. Any research that relies on secondary sources can only be
as accurate as the reporting in the primary sources and as the
interpretation that the secondary author has of those primary sources.
As I said, this is obvious. The problems, however, are magnified when
undertaking a project that aims to have a minimum of 200 languages
on each of 142 maps.

The WALS organisers apparently recognised this, and requested of
the primary source authors that they would be available to answer any
questions of interpretation. This is an admirable safety measure, as
anyone who has browsed through a grammar knows that the data one
is looking for can sometimes be hard to find, can be spread out over
different sections with only indifferent cross-referencing, or even
completely absent. It can only be effective, however, if it is used. As
one of the authors of a grammar on the 200-language core sample I
was contacted less than a handful of times by a WALS author. This
might reflect superb clarity in the Tukang Besi grammar (I can only
hope), but the fact that Tukang Besi, a core language, is missing from
7 maps shows that that isn't true. I cannot possibly examine the
representation of every feature for every language in detail, so I will
concentrate on a few areas for which I have expertise. Even taking
into account the explanatory chapterettes, I took exception to the
representation of Tukang Besi no less than 20 of the 142 maps, 14%
of the total (the list can be found at [], then
following the link to WALS). I do not know how well this reflects the
accuracy of the maps as a whole; but I managed to find a large
number of encoded decisions which I strongly disagreed with (some of
which are mentioned at the end of this review). If we assume that
approximately 10% of the information encoded is inaccurate, then the
maps with small numbers of languages become highly
unrepresentative, and even the larger maps need to be approached
with caution.

Some of the maps abound in information; some do not. In general,
those maps that have 300 languages or more look nicely 'full'; the
occasional map that hits the 1,000 language mark (Dryer's maps of
word order patterns) are especially satisfying (though I am not sure
how a word order was assigned to many of the languages described
as having free order of words in a clause, with no evidence for NP
constituents; Warlpiri, for instance, has adjective-noun order, and
Dyirbal is noun-adjective; how was this determined in the absence of
NPs? Dryer (p371) mentions text counts as a determiner, siding with
Greenberg for frequency (and against Dryer 1995; though note that the
maps plot _dominant_ word order, and not basic word order), but in
the Dyirbal text I checked there were no instances of modificational
adjectives). On the other hand there are quite a few maps that have
obvious gaps, and so are not representative. This is particularly
apparent if you examine a small area of the world (the CD comes with
built-in 'zooms' of Australia, the Caucasus, and Indonesia; it is simple
to create your own default zoom areas). When you zoom to an area
5000km long and find two dots, you do not get a representative view
of that feature in that part of the world. In some cases this lack reflects
a genuine lack of carefully assembled data; and it might be that even
the 200 language 'core' sample lacks easily findable information on a
particular topic. On the other hand, it was for this very purpose that
various language experts were contacted and asked if they would
mind being questioned by some of the compilers. Similarly, there are
some topics that are relatively easily researched (such as relative
clauses) and which are not very satisfyingly represented.

The number of entries on a map of a particular feature varies wildly.
As stated above, the authors attempted to include a 'core sample' of
200 languages on every map, with authors encouraged to include
other languages. Many maps do not meet the 200 language minimum
target; this strikes this reviewer as an issue with which the editors
should have engaged a little more firmly with the authors. On the other
hand, other maps display over 1,000 different languages. The
promotional material states that 'Each world map shows an average of
400 languages'; this is true, but the median number of languages per
map is 301. As anyone familiar with statistics knows, if the mean and
the median differ radically, as is true here, it suggests that we are not
dealing with one population. Indeed, most of the more 'full' maps are
the work of three authors, Matthew Dryer, Ian Maddieson and Cecil
Brown. If these authors' maps are excluded from the count we find the
average and median both drop to 250 languages per map. This
means that only just over half the maps have that 'full' look.

The interactive CD version of WALS was implemented by Hans-Jörg
Bibiko, who deserves a huge thanks from the linguistics community. In
a real sense the CD is the publication, and the large, hard-bound,
attractive atlas is simply a by-product of the CD.

All the information necessary to produce the atlas can be found in the
CD (with a couple of exceptions - the map for writing systems, for
instance, is not available on the CD). Through the us of the CD the
user is also capable of customising maps. This can apply to trivial
things (You don't like that shade of brown? Change it to yellow! Fed
up with little diamond figures on the map? Make them squares!), but it
can also make for quite significant changes: You can't see the point in
distinguishing between value 3 and value 4 in a map? Combine them!
This allows you to simplify maps to display just the value, or cluster of
values, that you want for a particular grammatical feature. But even
more interestingly, it is possible to combine features together. For
instance, if we are interested in OV versus VO order in main clauses,
and wish to see if there are correlations with the presence of tone, we
can create a map combining these two grammatical features, and
plotting the intersection of the values for the relative clause feature
with those of the word order feature. This will create a bewilderingly
complex map; so we simplify it: not interested in languages with no
dominant order in the clause? Take'em off. Not interested in the
difference between 'pitch-accent' and 'full tone' languages? Combine
them. Now we have an interpretable map (I made this map; in fact, I
made several maps showing the intersection of various subsets of
these features. You can look at them by going to, and following the WALS link). And we find
that, of the 383 languages that meet the combined criteria, roughly
half of the VO languages show tonal behaviour, while only one quarter
of the OV languages show tone. Furthermore, VO languages with
tones and OV languages with tone are generally adjacent (the
exception being the OV tonal languages of New Guinea), implying that
tone is an areal feature independent of the order of the verb and its
object: there is no correlation between the order of a verb and its
object with the presence of tone.

ANY of the features in WALS can be so combined (though only two
features at a time, in the current version). This allows for us to check
both the possible, but silly combinations (tone and verb-object order)
and some of the sensible ones (tone and syllable structure, for
instance). And who knows? With enough people trying enough 'silly'
combinations, we might well learn about which ones are sensible, and
might well be surprised. One disadvantage of the compiling feature is
the fact that, as mentioned before, commonality between languages
plotted on different maps is elusive. As a result, there is only a small
set of languages that are guaranteed to be shared between two
maps, with the subsequent result that maps of combinations can yield
very small outputs, unless you make sure you only take the numerous-
language maps as your inputs.

Nonetheless, the CD is wonderful. The ability to zoom into small
areas, to select exactly which features you want to view, and how to
view them, is likely to be the cause of many missed deadlines (and
hopefully several new and intriguing discoveries). The CD should,
ideally, be marketed as a separate teaching tool. The amount that can
be taught through the hands-on use of this software, about typology
and the distribution of language patterns across the world, is
immense. Unfortunately the current all-inclusive price for the atlas +
CD prohibits this.

Some other comments on the electronic WALS:
* Some menus insist on being in German; I have learned the words
Schreibtisch, sichern, and something else (meaning, I
gather, 'cancel'). Learning some German isn't a _terrible_ thing, but it
is, shall we say, an unexpected and insistent bonus.
* The CD-ROM will not scale font sizes to screen; you should set your
screen's resolution to a low value before working with WALS, unless
your eyesight is both very good, and unstrainable.
* The _topology_ and _ocean_ map option features are not
independent of each other; something's not working.
* Some important geographic features are missing - the Sepik river
does not appear, for instance.
* When using the language-name-on-scroll-over with a map that
employs expanded language dots, the scroll-over applies over an
area much, much greater than the dot.
* 'right click for editing' ?? - what does this mean for a Macintosh user?
* I have found exactly one typo, which is a pretty remarkable
achievement. In the CD version, if you take 'the tour' (a demonstrative
user guide) you come across one point where you see 'five differnet
* On the CD, long strings of words are cut off from the map legend
when using the feature composer and collapsing distinctions between
different values of a feature (for instance, examining syllable types
and the size of a consonant inventory, I created 'Complex AND Small
OR Moderately small OR Average', which fits into the space allotted
for a map legend, but nothing longer will.


Some of the genetic affiliations used in the classification are
controversial, though none are ones that are reviled by reasonable
linguists. Examples of these include 'Australian' and 'Trans New
Guinea', used with various degrees of acceptance

The dot indicating the position of Ekari (a western New Guinea
language) is a long way to the west of where I would have placed it.
Nara (in Ethiopia) appears to be in Eritrea on the maps. Kiwai is a lot
further north than I would have expected. Mangap-Mbula is not
spoken in the Vitiaz straits, as appears on the maps.

Is there a gender distinction in 3sg pronouns in Mandarin? In the
written form, yes, but both are pronounced [tha]. If we want to go by
written forms it is also true that subtitles in Mandarin have a gender
distinction for 2sg as well. This subtitling distinction of feminine and
non-feminine has not spread to other writing domains as yet, but it
does exist. Are we deliberately excluding subtitling from our
considerations? It seems so, but on what reasoned grounds. And,
since most of the languages in the world lack (self-innovated) writing
systems, surely we should look at speech when comparing things?

The voiced stops in Thai are at least partly implosive, and minimally
deserve comment, if not necessarily coding as part of the set of
glottalised consonants.

I have documented a large number of what I consider to be coding
errors in WALS; I have conducted an exhaustive check of the codings
for Tukang Besi, and, as mentioned earlier, found that there is a 14%
error rate in the coding of the data. A list of the coding decisions I
disagree with, stated briefly, can be found at [],
and then following the WALS link. I have, along the way, compiled a
list of my objections to the coding decisions made for other languages,
necessarily less complete and with less authority, but nonetheless with
a significant number of entries. These examples, too, are available

I tend towards being a perfectionist, and so I have picked at the flaws
which appeared to me to be most salient in WALS - and they are not
insignificant. But we should remember that this is the first edition of the
first volume heralding a whole new research program: patience! This
is something that is different enough that we should be
accommodating of some teething problems.

Can you ignore the WALS? No. Can you believe it as it stands? In
broad outline, yes; but for the details, no. The WALS maps have the
feel of the first release of a Microsoft product: useful, colourful, rather
exciting, but basically being beta-tested on the paying public. Knowing
how to interpret these broad outlines is also something that linguists
are not, in the main, well-trained in, and the experience of critically
assessing these data will be useful. Good linguists make a career out
of learning to interpret data, and carefully checking their sources.
WALS is not a shortcut that allows you to avoid that, but it certainly
enables a person to very quickly know which data to steer towards.

What's the bottom line on WALS, as I see it? I've had access to a copy
for 6 weeks now, and it _has_ changed the way I think of, and do,
linguistics, in a good way. I am currently writing an article that has
changed dramatically as a result of my access to WALS. Based on
this, I have no hesitation in saying that everyone should have access
to a copy of WALS. Should they rush out and buy it right now?
Contradictorily, no. There are enough easily-fixed mistakes and
omissions in the current version of WALS that a second edition should
be a quite marked improvement on the current first edition, and should
be available without too much delay. In effect, the current version of
WALS is being beta-tested on the wider audience. Should everyone
encourage their libraries to get a copy now? Without hesitation of
qualm of any kind, yes. This is a hold-in-your-hands demonstration of
a methodology that has been growing (see, for example, Bickel and
Nichols 2002), and is now available to be examined by anyone. This is
world-wide typology waiting to happen without having to own, or even
borrow, hundreds of books. This is a very real chance to spot
correlations between different linguistic features, between linguistic
features and areas, and between linguistic features and language
families. And, to return to the start of this review: it's a _lot_ of fun.


Bickel, Balthasar, and Nichols, Johanna. 2002ff. The Autotyp research

Dryer, Matthew S. 1995. Frequency and pragmatically unmarked word
order. In Pamela Downing and Michael Noonan, eds., Word order in
discourse: 105-135. Typological Studies in Language 30. Amsterdam:
John Benjamins.

Nichols, Johanna. 1992. Linguistic diversity in space and time.
University of Chicago Press.

Mark Donohue is primarily a syntactician, but is also very interested in
modelling tone and epenthesis, as well as the work and methodology
of historical linguistics. Initially working on languages and language
relationships in Southeast Sulawesi, he has also investigated
Austronesian languages from Flores and northern New Guinea, as
well as working on non-Austronesian languages of the Timor-Alor-
Pantar group, the Western Ok and Kwerba families, the Dani family,
the Skou family, the West-Papuan family and the Torricelli family. The
languages he has worked on all share the property of being on the
very edge of a genetic unit or units: they show the effects of language
contact and unusual grammaticalisation. Mark has published a
grammar of Tukang Besi and sketches of Warembori and I'saka, and
is currently preparing a grammar of Skou and a formal grammar of
Tukang Besi. When he finds the time, he intends to finish his
monograph on the use of lexical transitivity patterns as a comparative

Format: CD
ISBN: n/a
ISBN-13: N/A
Prices: U.S. $ free
Format: Hardback
ISBN: 0199255911
ISBN-13: N/A
Pages: 712
Prices: U.S. $ 495.00