EDITORS: Haspelmath, Martin; Dryer, Matthew S.; Gil, David; Comrie, Bernard TITLE: The World Atlas of Language Structures PUBLISHER: Oxford University Press YEAR: 2005
Mark Donohue, Monash University and National University of Singapore
The World Atlas of Language Structures (WALS) is a lot of fun. Before anything else, that needs to be said. Addictively much fun. Beware! This is not simply because of the interesting subject matter that it deals with, but also because of the inclusion of a CD-ROM version of the atlas; though calling the CD a 'version' of the atlas is unfair, and it would be more accurate to call the large atlas a version of the CD- ROM. But more on the specifics of the CD-ROM later.
WALS is an ambitious undertaking, and represents the first realisation of a research program that will, no doubt, change the way we do linguistics. This is the first attempt to map a large number of linguistic features (142 different features, ranging from the phonetic, phonological, morphological to the lexical and the paralinguistic) found in a large number of languages (the core sample is 200 languages; see http://linguistics.buffalo.edu/people/faculty/dryer/dryer/atlas.map200.html for the list). Nichols (1992) is the only comparable published work in terms of aims and scope, and that deals with 174 languages, and 10 multivalued linguistic features, all morphological. To say that WALS significantly increases the degree to which linguists are exposed to typological mapping is to make a great understatement.
The four editors coordinated the work of 54 chapter authors, each of whom produced a map (or maps) with an explanatory mini-chapter describing one aspect of linguistic structure (see the end of this review for the full list). The choice of which features were included on the maps is one that cannot, of course, satisfy everyone, though I must admit I did find most of my reasonable hopes fulfilled. (I have a number of unreasonable hopes, the lack of fulfillment of which I do not begrudge the editors.) A full list of the (sometimes multivalued) features included in the atlas can be found at the end of this review.
On the other hand, the presence of some of the maps that made it into WALS confuses me. Why was the presence of an affricate or a plain stop as the onset in the word for 'tea' deemed map-worthy? If we really wanted to know, we could, as the author of the map admits he did in many (most?) cases, got to http://www.travlang.com and look up the word. The map plots the lexemes used for 'tea' in 230 languages, arranging them by stop-initial 'tea'-like words, affricate-initial 'cha'-like words, and others, attributing the stop-initial versions to an ultimate Min origin in China (the author mentions Amoy as the origin of the etymon, but curiously does not include a language reference from this area, the closest being found on Taiwan. The absence of any attempt to indicate intermediate loci of diffusion, and the astonishing lack of analysis of many of the forms found in languages. For example, Soninke _dute_ is listed as one of the 'others' (not derivable from 'Sinitic _cha_' or 'Min Nan Chinese _te55_' (_te55_ is a Chaozhou [=Teochew] form; why it, and not an Amoy form, is cited is left unexplained). The most cursory analysis would reveal a French etymology for the Soninke word (<_du thée_), entirely plausible given the colonial history of Mali. Similarly Hawaiian _kii_ is given as an 'other', ignoring the well-known *k > t sound change that has applied in Hawaiian, revealing a minimally-changed English loanword. The author notes that in English some dialects preserve a form spelled as 'tay', reflecting an 'older' pronunciation; but why not also mention northern English 'char', as in char-lady, reflecting the alternative non-Min etymology and not particularly hard to find out about?
The reasons for the inclusion of this particular piece of 'language structure' with this lack of analysis can only be known to the editors. Another issue I have with the editing is the lack of consistency between chapters; this is hailed as a sign of the vitality of the atlas ('no attempt was made to make chapters by different authors that overlap in their features consistent' - from the Introduction), but to this reviewer it feels more like a lack of editorial authority: what, other than deciding on the languages and language features, was the role of the editors, if not to ensure consistency across different chapters?
To answer my own question, another role that I would assume fell to the editors was one of language selection, and typological representation. The core set of languages chosen clearly reflects careful thought, and is a set of languages that shows both areal and genetic diversity (it is, however, remarkably hard to make a map of all and only these 200 languages).
Many of the maps, however, clearly do not reflect adherence to the WALS list of core languages at all, and are simply the authors' own private databases made into publishable form. The maps of colour terms, for instance, show no evidence of having consulted publicly- available dictionaries or lexicons of the languages in the sample, and as such does not contain enough information to really justify two foolscap pages of mapping: there simply isn't enough information (119 languages) to be able to make areal judgements (there are, for instance, five entries for all of Australia, and three for all of mainland east Asia [Mandarin, Korean and Japanese]). Here, too, there has been no attempt to enforce the language 'minimum' (for more on the number of languages represented, see below).
Most of the features are, however, the types of things that linguists deal with every day: unusual consonants, size of the vowel inventory, alignment (of 'full' NPs or pronouns), word order, demonstrative contrasts, etc.). What is there to be gained by displaying these features on a map? To answer this question, and to fully appreciate WALS it is necessary to 'try it out'; short of purchasing it and poring over it for days, interested readers can examine some samples pages and maps on the Max Planck website (http://www.eva.mpg.de/lingua/files/wals.html). This doesn't let you appreciate how useful it can be to correlate a pair of features, which you have customised into the values that you want, and spot the patterns (this is possible with the CD). It does, however, allow you to see the wealth of information that some of the 'fuller' maps contain. And, more to the point, it allows you to appreciate the geographical distribution of these features; a surprising number of linguistic features show strong areal or macro-areal tendencies, cutting across any genetic boundaries in their way. We have all known that there are linguistic areas (such as South Asia, the Balkans, North America's Pacific Northwest) in which languages converge towards a pan- genetic norm; but with WALS we can see just _how_ prevalent areal clusterings are in the distribution of linguistic structure.
THE DATA This is not a criticism of WALS directly, or even of typology specifically, but of any research that relies on secondary sources, and it is obvious. Any research that relies on secondary sources can only be as accurate as the reporting in the primary sources and as the interpretation that the secondary author has of those primary sources. As I said, this is obvious. The problems, however, are magnified when undertaking a project that aims to have a minimum of 200 languages on each of 142 maps.
The WALS organisers apparently recognised this, and requested of the primary source authors that they would be available to answer any questions of interpretation. This is an admirable safety measure, as anyone who has browsed through a grammar knows that the data one is looking for can sometimes be hard to find, can be spread out over different sections with only indifferent cross-referencing, or even completely absent. It can only be effective, however, if it is used. As one of the authors of a grammar on the 200-language core sample I was contacted less than a handful of times by a WALS author. This might reflect superb clarity in the Tukang Besi grammar (I can only hope), but the fact that Tukang Besi, a core language, is missing from 7 maps shows that that isn't true. I cannot possibly examine the representation of every feature for every language in detail, so I will concentrate on a few areas for which I have expertise. Even taking into account the explanatory chapterettes, I took exception to the representation of Tukang Besi no less than 20 of the 142 maps, 14% of the total (the list can be found at [http://www.donohue.cc], then following the link to WALS). I do not know how well this reflects the accuracy of the maps as a whole; but I managed to find a large number of encoded decisions which I strongly disagreed with (some of which are mentioned at the end of this review). If we assume that approximately 10% of the information encoded is inaccurate, then the maps with small numbers of languages become highly unrepresentative, and even the larger maps need to be approached with caution.
THE SIZE AND COVERAGE OF THE MAPS Some of the maps abound in information; some do not. In general, those maps that have 300 languages or more look nicely 'full'; the occasional map that hits the 1,000 language mark (Dryer's maps of word order patterns) are especially satisfying (though I am not sure how a word order was assigned to many of the languages described as having free order of words in a clause, with no evidence for NP constituents; Warlpiri, for instance, has adjective-noun order, and Dyirbal is noun-adjective; how was this determined in the absence of NPs? Dryer (p371) mentions text counts as a determiner, siding with Greenberg for frequency (and against Dryer 1995; though note that the maps plot _dominant_ word order, and not basic word order), but in the Dyirbal text I checked there were no instances of modificational adjectives). On the other hand there are quite a few maps that have obvious gaps, and so are not representative. This is particularly apparent if you examine a small area of the world (the CD comes with built-in 'zooms' of Australia, the Caucasus, and Indonesia; it is simple to create your own default zoom areas). When you zoom to an area 5000km long and find two dots, you do not get a representative view of that feature in that part of the world. In some cases this lack reflects a genuine lack of carefully assembled data; and it might be that even the 200 language 'core' sample lacks easily findable information on a particular topic. On the other hand, it was for this very purpose that various language experts were contacted and asked if they would mind being questioned by some of the compilers. Similarly, there are some topics that are relatively easily researched (such as relative clauses) and which are not very satisfyingly represented.
The number of entries on a map of a particular feature varies wildly. As stated above, the authors attempted to include a 'core sample' of 200 languages on every map, with authors encouraged to include other languages. Many maps do not meet the 200 language minimum target; this strikes this reviewer as an issue with which the editors should have engaged a little more firmly with the authors. On the other hand, other maps display over 1,000 different languages. The promotional material states that 'Each world map shows an average of 400 languages'; this is true, but the median number of languages per map is 301. As anyone familiar with statistics knows, if the mean and the median differ radically, as is true here, it suggests that we are not dealing with one population. Indeed, most of the more 'full' maps are the work of three authors, Matthew Dryer, Ian Maddieson and Cecil Brown. If these authors' maps are excluded from the count we find the average and median both drop to 250 languages per map. This means that only just over half the maps have that 'full' look.
THE CD The interactive CD version of WALS was implemented by Hans-Jörg Bibiko, who deserves a huge thanks from the linguistics community. In a real sense the CD is the publication, and the large, hard-bound, attractive atlas is simply a by-product of the CD.
All the information necessary to produce the atlas can be found in the CD (with a couple of exceptions - the map for writing systems, for instance, is not available on the CD). Through the us of the CD the user is also capable of customising maps. This can apply to trivial things (You don't like that shade of brown? Change it to yellow! Fed up with little diamond figures on the map? Make them squares!), but it can also make for quite significant changes: You can't see the point in distinguishing between value 3 and value 4 in a map? Combine them! This allows you to simplify maps to display just the value, or cluster of values, that you want for a particular grammatical feature. But even more interestingly, it is possible to combine features together. For instance, if we are interested in OV versus VO order in main clauses, and wish to see if there are correlations with the presence of tone, we can create a map combining these two grammatical features, and plotting the intersection of the values for the relative clause feature with those of the word order feature. This will create a bewilderingly complex map; so we simplify it: not interested in languages with no dominant order in the clause? Take'em off. Not interested in the difference between 'pitch-accent' and 'full tone' languages? Combine them. Now we have an interpretable map (I made this map; in fact, I made several maps showing the intersection of various subsets of these features. You can look at them by going to http://www.donohue.cc, and following the WALS link). And we find that, of the 383 languages that meet the combined criteria, roughly half of the VO languages show tonal behaviour, while only one quarter of the OV languages show tone. Furthermore, VO languages with tones and OV languages with tone are generally adjacent (the exception being the OV tonal languages of New Guinea), implying that tone is an areal feature independent of the order of the verb and its object: there is no correlation between the order of a verb and its object with the presence of tone.
ANY of the features in WALS can be so combined (though only two features at a time, in the current version). This allows for us to check both the possible, but silly combinations (tone and verb-object order) and some of the sensible ones (tone and syllable structure, for instance). And who knows? With enough people trying enough 'silly' combinations, we might well learn about which ones are sensible, and might well be surprised. One disadvantage of the compiling feature is the fact that, as mentioned before, commonality between languages plotted on different maps is elusive. As a result, there is only a small set of languages that are guaranteed to be shared between two maps, with the subsequent result that maps of combinations can yield very small outputs, unless you make sure you only take the numerous- language maps as your inputs.
Nonetheless, the CD is wonderful. The ability to zoom into small areas, to select exactly which features you want to view, and how to view them, is likely to be the cause of many missed deadlines (and hopefully several new and intriguing discoveries). The CD should, ideally, be marketed as a separate teaching tool. The amount that can be taught through the hands-on use of this software, about typology and the distribution of language patterns across the world, is immense. Unfortunately the current all-inclusive price for the atlas + CD prohibits this.
Some other comments on the electronic WALS: * Some menus insist on being in German; I have learned the words Schreibtisch, sichern, and something else (meaning, I gather, 'cancel'). Learning some German isn't a _terrible_ thing, but it is, shall we say, an unexpected and insistent bonus. * The CD-ROM will not scale font sizes to screen; you should set your screen's resolution to a low value before working with WALS, unless your eyesight is both very good, and unstrainable. * The _topology_ and _ocean_ map option features are not independent of each other; something's not working. * Some important geographic features are missing - the Sepik river does not appear, for instance. * When using the language-name-on-scroll-over with a map that employs expanded language dots, the scroll-over applies over an area much, much greater than the dot. * 'right click for editing' ?? - what does this mean for a Macintosh user? * I have found exactly one typo, which is a pretty remarkable achievement. In the CD version, if you take 'the tour' (a demonstrative user guide) you come across one point where you see 'five differnet values'. * On the CD, long strings of words are cut off from the map legend when using the feature composer and collapsing distinctions between different values of a feature (for instance, examining syllable types and the size of a consonant inventory, I created 'Complex AND Small OR Moderately small OR Average', which fits into the space allotted for a map legend, but nothing longer will.
SOME GENERAL CRITIQUES.
Some of the genetic affiliations used in the classification are controversial, though none are ones that are reviled by reasonable linguists. Examples of these include 'Australian' and 'Trans New Guinea', used with various degrees of acceptance
CONTROVERSIAL MAPPING DECISIONS The dot indicating the position of Ekari (a western New Guinea language) is a long way to the west of where I would have placed it. Nara (in Ethiopia) appears to be in Eritrea on the maps. Kiwai is a lot further north than I would have expected. Mangap-Mbula is not spoken in the Vitiaz straits, as appears on the maps.
CONTROVERSIAL CODING DECISIONS: Is there a gender distinction in 3sg pronouns in Mandarin? In the written form, yes, but both are pronounced [tha]. If we want to go by written forms it is also true that subtitles in Mandarin have a gender distinction for 2sg as well. This subtitling distinction of feminine and non-feminine has not spread to other writing domains as yet, but it does exist. Are we deliberately excluding subtitling from our considerations? It seems so, but on what reasoned grounds. And, since most of the languages in the world lack (self-innovated) writing systems, surely we should look at speech when comparing things?
The voiced stops in Thai are at least partly implosive, and minimally deserve comment, if not necessarily coding as part of the set of glottalised consonants.
I have documented a large number of what I consider to be coding errors in WALS; I have conducted an exhaustive check of the codings for Tukang Besi, and, as mentioned earlier, found that there is a 14% error rate in the coding of the data. A list of the coding decisions I disagree with, stated briefly, can be found at [http://www.donohue.cc], and then following the WALS link. I have, along the way, compiled a list of my objections to the coding decisions made for other languages, necessarily less complete and with less authority, but nonetheless with a significant number of entries. These examples, too, are available online.
OVERALL COMMENTS I tend towards being a perfectionist, and so I have picked at the flaws which appeared to me to be most salient in WALS - and they are not insignificant. But we should remember that this is the first edition of the first volume heralding a whole new research program: patience! This is something that is different enough that we should be accommodating of some teething problems.
Can you ignore the WALS? No. Can you believe it as it stands? In broad outline, yes; but for the details, no. The WALS maps have the feel of the first release of a Microsoft product: useful, colourful, rather exciting, but basically being beta-tested on the paying public. Knowing how to interpret these broad outlines is also something that linguists are not, in the main, well-trained in, and the experience of critically assessing these data will be useful. Good linguists make a career out of learning to interpret data, and carefully checking their sources. WALS is not a shortcut that allows you to avoid that, but it certainly enables a person to very quickly know which data to steer towards.
What's the bottom line on WALS, as I see it? I've had access to a copy for 6 weeks now, and it _has_ changed the way I think of, and do, linguistics, in a good way. I am currently writing an article that has changed dramatically as a result of my access to WALS. Based on this, I have no hesitation in saying that everyone should have access to a copy of WALS. Should they rush out and buy it right now? Contradictorily, no. There are enough easily-fixed mistakes and omissions in the current version of WALS that a second edition should be a quite marked improvement on the current first edition, and should be available without too much delay. In effect, the current version of WALS is being beta-tested on the wider audience. Should everyone encourage their libraries to get a copy now? Without hesitation of qualm of any kind, yes. This is a hold-in-your-hands demonstration of a methodology that has been growing (see, for example, Bickel and Nichols 2002), and is now available to be examined by anyone. This is world-wide typology waiting to happen without having to own, or even borrow, hundreds of books. This is a very real chance to spot correlations between different linguistic features, between linguistic features and areas, and between linguistic features and language families. And, to return to the start of this review: it's a _lot_ of fun.
Bickel, Balthasar, and Nichols, Johanna. 2002ff. The Autotyp research program.http://www.uni-leipzig.de/~autotyp/
Dryer, Matthew S. 1995. Frequency and pragmatically unmarked word order. In Pamela Downing and Michael Noonan, eds., Word order in discourse: 105-135. Typological Studies in Language 30. Amsterdam: John Benjamins.
Nichols, Johanna. 1992. Linguistic diversity in space and time. University of Chicago Press.
ABOUT THE REVIEWER:
ABOUT THE REVIEWER
Mark Donohue is primarily a syntactician, but is also very interested in modelling tone and epenthesis, as well as the work and methodology of historical linguistics. Initially working on languages and language relationships in Southeast Sulawesi, he has also investigated Austronesian languages from Flores and northern New Guinea, as well as working on non-Austronesian languages of the Timor-Alor- Pantar group, the Western Ok and Kwerba families, the Dani family, the Skou family, the West-Papuan family and the Torricelli family. The languages he has worked on all share the property of being on the very edge of a genetic unit or units: they show the effects of language contact and unusual grammaticalisation. Mark has published a grammar of Tukang Besi and sketches of Warembori and I'saka, and is currently preparing a grammar of Skou and a formal grammar of Tukang Besi. When he finds the time, he intends to finish his monograph on the use of lexical transitivity patterns as a comparative tool.