Reviewer: Geoffrey Sampson
Book Title: The Oxford Guide to the Transeurasian Languages
Book Author: Martine Robbeets Alexander Savelyev
Publisher: Oxford University Press
Linguistic Field(s): General Linguistics
Historical Linguistics
Anthropological Linguistics
The term “Transeurasian languages” is novel, and this review had best begin with an explanation of what the editors mean by it. They use it to refer to a large set of languages spoken over an immense range stretching from places in eastern Europe almost all the way to the easternmost tip of Siberia; five well-known members of the set are Turkish, Mongolian, Manchu, Korean, and Japanese. Broadly speaking, each of these five belongs to a different subfamily of languages whose members are generally agreed to descend from a common ancestor (the subfamily containing Manchu is called Tungusic); none of the five subfamilies is known with certainty to be genetically related to any of the others, but there are many theories asserting such relationships, some more controversial than others. (Some of the five subfamilies contain numerous languages, often obscure, unwritten ones. Japanese has as sister languages only those of the Ryukyu islands; the sole present-day sister of Korean is the moribund language of Cheju – alias Jeju – island, though there is shadowy evidence for the past existence of lects on the Korean peninsula that may have ranked as separate languages rather than dialects.)

It is generally accepted that common features at all levels of structure run through the languages of the five subfamilies – a few examples are: vowel harmony; lack of words beginning with /r/, except for loans from other languages; negation expressed via a negative auxiliary verb; SOV word order; no finite subordinate clauses; kinship vocabularies which use the same word for “younger brother” and “younger sister”. (Not every single language in these five subfamilies has every one of the shared features, but most of the languages have traces, at least, of most of the features, and they are less common among other languages.) These editors use “Transeurasian” to codify their belief that all five subfamilies belong to a single superfamily related by descent – they do not deny that some similarities may be areal phenomena, but they assert that the evidence is enough to show that others reflect inheritance from a common proto-language. Sceptics, on the other hand, see all the common features as adequately explained as resulting from contact between languages of separate ancestry, or even from chance coincidence. At one sceptical extreme, Sir Gerard Clauson (1956) believed that not even Turkic and Mongolic share a common ancestor, though this has been seen by some as a relatively safe linkage – Martine Robbeets quotes Gustaf Ramstedt (1924) as writing that “Old Turkish and Old Mongolian were no more remote from each other than are English and German today”. Alexander Vovin (e.g. 2009) is a present-day scholar who does not believe that Japanese or Korean share an ancestor with each other or with other “Transeurasian” languages; András Róna-Tas remarked in 1988 that if Japanese was related to those other languages, that relationship must be more remote than the relationship between any two Indo-European languages.

“Transeurasian” in this sense overlaps heavily with the far better-established term “Altaic”. These editors prefer not to use that term, mainly because it has often referred to a narrower group of subfamilies, including Turkic and Mongolic but only one or another subset of the other three (though the wider sense would not be unprecedented – famously, in 1971 Roy Miller published a book entitled ‘Japanese and the Other Altaic Languages’, linking in the subfamily for which evidence of affiliation is thinnest). Not all contributors to the book reviewed here are happy with the neologism. The new name does not describe superfamily membership uniquely – the Indo-European family also includes both languages of Asia and languages of Europe. “Indo-European” is a satisfying term because it asserts a link between a purely Indian subfamily and purely European subfamilies, whereas the languages spoken near both western and eastern extremes of the “Transeurasian” range all belong to the same uncontroversial subfamily, Turkic; and Indo-European languages extend over a large part of India as well as almost all of Europe, whereas (leaving aside recent patterns of emigration into many continents) the overlap of “Transeurasian” languages into Europe is really marginal – mainly Turkish in the small area of “Turkey-in-Europe” since the conquest of Byzantium in 1453. “Altaic”, from the Altai mountains, does at least roughly indicate the geographical centre of gravity of the set of languages in question. Debates about names are unprofitable, but readers familiar with the name “Altaic” might be confused without this explanation of how it relates to Robbeets & Savelyev’s term.

After the editorial introduction, the book comprises 47 chapters by a total of 55 authors or co-authors; six chapters are authored or co-authored by Martine Robbeets. One of its purposes is to assemble evidence for the reality of the “Transeurasian” superfamily, but this is far from the only aim of the book, and indeed some contributors do not believe in the superfamily, or are explicitly agnostic. Much of the book comprises historically-oriented descriptions of the individual languages and of relationships within the subfamilies. (By “historically oriented” I mean that the book is full of detail on structures and comparison of them across the different languages; it pays little attention to modern sociolinguistic aspects.) The book will undoubtedly be consulted by many scholars concerned with one or another language subfamily who may have no particular interest in the validity or otherwise of the Transeurasian hypothesis.

The first two chapters of the book survey the history of the subfamilies and the sources of evidence for pre-modern periods. Chapters 3 to 9 set out to establish family-tree relationships between individual languages and major dialects within the separate subfamilies, and between the subfamilies within the hypothetical superfamily. These chapters (and some later chapters) make heavy use of computational cladistics, a technique first encountered by many linguists through writings by Don Ringe (e.g. Ringe et al. 2002) on relationships between the branches of Indo-European. Chapters 10 and 11 survey in detail the typological properties distinguishing the “Transeurasian” languages from others, particularly those such as Uralic languages which are geographically adjacent.

Chapters 12 to 27 give structural descriptions, organized on a fairly uniform pattern, of representative languages from the five subfamilies and their main subdivisions. Chapters 28 to 38 compare languages across the hypothetical superfamily with respect to successive aspects of structure, for instance Chapter 29 deals with vowel systems and vowel harmony, Chapter 34 with categories of verb inflexion, Chapter 38 with kinship terminologies. Chapters 39 to 42 examine arguments for and against explanations of shared features in terms of areal diffusion rather than inheritance; in Chapter 42 Cecil Brown applies a technique intended to make this aspect of historical linguistics “compatible with 21st-century norms of scientific inquiry” by replacing impressionistic judgements with quantitative significance tests.

Finally, Chapters 43 to 47 bring disciplines other than linguistics, particularly the archaeology of East Asian agriculture, and genomics, to bear on “Transeurasian” prehistory and identification of the proto-Transeurasian homeland. Choongwon Jeong and co-authors find that DNA evidence shows present-day Tungusic speakers “to be genetically closer to modern Koreans and Japanese than to Han Chinese or other southern Chinese populations”, tending to support the Transeurasian hypothesis.

Some readers may find it surprising that, if the Transeurasian hypothesis is true, it would not be fairly obviously true. Once a link between Sanskrit and European languages was first postulated in the late 18th century, it was quickly recognized as a real rather than coincidental relationship. In Chapter 43, Robbeets and co-authors see the five top-level Transeurasian protolanguages as having been located in adjacent areas in and around the Liaoning province of southern Manchuria as late as about 1000 B.C., which seems rather recent (and hence relatively easy to study) by comparison with the dispersal of the top-level Indo-European subfamilies (though in the following chapter Robbeets dates the split of proto-Transeurasian into those five languages far earlier).

But the research situation is much less favourable in the Transeurasian than in the Indo-European case. In the first place, classical languages of Europe and India became written languages centuries before the Christian Era, and they were written alphabetically. No Transeurasian language was written so early; Marc Miyake dates the earliest examples of continuous writing in any language of the respective subfamilies as (all centuries A.D.): Korean 6th c., Mongolic 6th–7th c., Japanese 7th c., Turkic 8th c., Tungusic 12th c. Many Transeurasian languages were not written before the 20th century – some have no written form now. Furthermore, the most controversial aspects of the Transeurasian hypothesis are the relationship between Japonic and Koreanic subfamilies, and their relationship to the rest of the hypothetical superfamily: when Korean and Japanese were first written, the only writing system available for adaptation to those languages was Chinese script, which is not alphabetic and was developed for a language which is certainly not “Transeurasian”, so it is often hard to know what an early Korean or Japanese inscription is saying about the sequence of phonemes it was intended to record. Then, many Transeurasian speakers were nomads (some still are), making areal-diffusion explanations of long-distance relationships between languages more plausible than they might be as between languages of settled European societies. And in recent centuries pressure from Russian and Chinese has been pushing many of the Transeurasian languages close to or all the way to extinction. (Not long ago, Manchu was a very significant language, being that of the ruling caste of China after the Manchu conquest of 1644, and considerable resources were put into trying to maintain it; but it became a political irrelevance with the Chinese revolution of 1911 – Volker Rybatzki tells us that by 1992 only about fifty speakers of Manchu remained, and “nowadays probably no one able to speak the language is left”.) Even when plenty of written material in a Transeurasian language existed in the past, it has not necessarily survived. After the Mongol Yuan dynasty was conquered by the Ming in 1368, Rybatzki says that a “rage against anything Mongolic … resulted in the destruction of every remnant of the Mongolic culture by Chinese people”, which included most material written in Mongolian. Indo-Europeanists do not always appreciate how favourable their research situation is.


This is a huge book (almost a thousand large-format pages, and accompanied by a website offering further data), and it is chock-full of detail. The senior editor, Martine Robbeets, is based at the highly-regarded Max Planck Institute, Jena, and other contributors have similarly respectable credentials. The book is surely destined to become a standard reference for any scholar working on some of the areas it covers, whether or not they are interested in the overarching Transeurasian hypothesis. (The publisher’s blurb claims that the book is the first major reference work on its field since Nicholas Poppe 1965.)

No one person would be qualified to work through the book language by language checking all the facts stated – I am certainly not. I do have some misgivings about the arguments deployed to support the Transeurasian hypothesis, and for Linguist List readers who do not happen to be specialists in any of these languages that might be the most interesting aspect of the book to examine.

Shared language features can only be evidence for a relationship between languages (whether a relationship of shared ancestry or contact) if the features are not equally common in other languages. The features Martine Robbeets mainly relies on to support the Transeurasian hypothesis are formalized in her Chapter 10. Some of the twenty features she lists, e.g. vowel harmony, clearly are unusual on a world scale; but others seem much less so. Feature F9, for instance, is “The imperative is expressed by a bare verb stem”. Yet, as Robbeets herself says, imperatives “are cross-linguistically commonly expressed by the bare root, or stem of the verb” – in which case how can F9 be relevant to common descent? Again, F11 is about the use of forms similar to /mi, Ti/ to express first and second person singular pronouns. But forms like this are so common among Indo-European and adjacent (non-Transeurasian) languages that Merritt Ruhlen suggested “Mitian” as an alternative to “Nostratic” for the hypothetical superfamily which some linguists believe to contain Indo-European as one branch!

There are also questions about how closely a feature needs to match across languages for them to count as sharing it. Robbeets’s chapter 36 addresses a standard objection by sceptics, that these languages contain too few shared items of basic vocabulary to make descent from a common ancestor plausible. Using the Leipzig–Jakarta hundred-item basic vocabulary list (Tadmor et al. 2010), she finds a satisfactory number of shared words obeying regular sound-correspondences. But item (12) is the concept “breast”, which she claims to be represented by an etymon shared by all the five subfamilies except Koreanic, but where /kɨkɨrǝ/, offered as the proto-Japonic version of the etymon, actually meant “heart”; she supports this with a reference to an (unrelated) Ryukyuan word which shifted between the meanings “breast” and “heart”. It is easy to agree that a word might slide between the senses “heart” and “breast”, singular, that is the part of a man’s or woman’s body containing the heart. However, the forms in the other subfamilies correspond only to /kɨkɨ/; Robbeets explains the /rǝ/ as a plural suffix. But plural “breasts” must surely refer to a woman’s mammary glands, and a semantic slide between that idea and “heart” feels less convincing. Doubtless such a slide is possible, but can this possibility be counted as positive evidence for a Japanese–Altaic relationship?

Then I am unclear about how much these authors are claiming to infer from their use of computational cladistics. As Robbeets points out, “the ‘mathemagic’ it involves is not made sufficiently transparent for classically trained historical linguists”, and while I might not count as “classically trained” I have only a limited understanding of the technique; it is taken for granted rather than explained in this book, and I have not found suitable tutorials on the Web. (At the first point in the book where a specific application of the technique is cited, Robbeets simply says “I applied Bayesian phylogenetic methods using BEAST … Through a statistical procedure, we determined that the pseudo Dollo covarion model with relaxed clock was the most suitable evolutionary model for our data”; she gives no explanations or citations to clarify the technical terms.) My – possibly mistaken – impression is that, given the assumption that a set of languages descend from a common ancestor, cladistics will identify a “best” tree-structure of successive splits among proto-languages leading to the observed languages at the leaf nodes (which is how several contributors use it), but that it does not say anything about whether that tree-structure is “good” enough to make genetic relationship likely; and furthermore, since the technique treats a language simply as a set of binary properties (“a sequence of 1s and 0s”, as Robbeets puts it), it does not seem that it could say anything about whether shared properties derive from common ancestry rather than areal diffusion. If this is wrong and the technique can yield inferences on one or both of these latter issues (and hence support the Transeurasian hypothesis), that is not spelled out clearly in this book. And indeed it is not quite clear whether the contributors are claiming to use the technique for those purposes – but repeatedly one finds remarks that suggest they are. For instance, Nataliia Hübler sums up her cladistic analysis applied to 226 binary properties of 38 “Transeurasian” languages by saying “The internal structure of the Transeurasian unity achieved in this study goes in line with the proposal of Robbeets & Bouckaert (2018) on the Transeurasian tree” – the words “unity achieved” sound like a claim that the cladistic analysis supports the Transeurasian hypothesis, rather than merely showing what the likely internal structure of the superfamily would be if one assumes that it is one. And Robbeets says that Hübler “shows that the concrete shape of the tree [produced by Hübler’s application of cladistics] is more suggestive of inheritance than of areal influence”, but I cannot see which passage in Hübler’s chapter relates to this (possibly because it is veiled by unfamiliar technical terminology). I wonder whether readers who share my ignorance of computational cladistics may take the book to have demonstrated more than it can, and I suspect that the soundest arguments it contains for the Transeurasian hypothesis could be those couched in terms familiar to “classically trained” linguists.

A different issue with this book relates to audience. The book ought to appeal to historical linguists working on any of the language subfamilies it covers, few of whom will be expert on all the other Transeurasian subfamilies. Thus contributions should not require readers to be familiar with technical concepts and terminology relevant only to one language or subfamily. This principle is not always observed. For instance, Chapter 1 is Miyake’s chapter on “Historical sources of Japonic and Koreanic” – since the chapter is placed first, it will be influential in potential readers’ decisions on whether the book is for them. Comparing early written Korean and Japanese, Miyake says (I correct a misprinted example number) that “ ソ, a ‘kugyŏl’ abbreviation of the ‘idu’ semantogram 爲 <ho> ‘to be, do’ seen in (7), was a ‘katakana’ abbreviation of the ‘man’yōgana’ 曾 <so>.” Miyake’s point here is that both Korean and Japanese scribes formed phonographic signs representing syllables by taking small distinctive-looking parts of complete Chinese graphs, but this sometimes led to the same sign having different syllabic values in the two languages. ‘Idu’ was an earlier method of writing Korean with complete Chinese graphs, representing roots by Chinese translation-equivalents, and Korean grammatical inflexions (which have no equivalent in Chinese) by Chinese words with broadly relevant meanings; in ‘idu’ a verb-forming Korean suffix /ho/ could be written with the Chinese graph 爲 meaning “do”. In practice much documentation in Korea was in Chinese, which has a different word-order from Korean; ‘kugyŏl’ was a method of clarifying Chinese sentence-structure for Korean readers by adding small Chinese graphs to mark the grammatical function of the Chinese words written full-size, and when 爲 was used in this way it was reduced to ソ. ‘Man’yōgana’ was an early method of writing Japanese comparable to ‘idu’ for Korean, in which the syllable /so/ could be written as 曾 (Chinese “past”); ‘katakana’ was (and is) a Japanese syllabic script formed from elements of Chinese graphs, and when 曾 was abbreviated for this purpose the reduced form was again ソ, so the same sign came to represent /ho/ in Korean but /so/ in Japanese. (It may not be easy to see ソ as an abbreviation of the respective Chinese graphs as they appear in modern printing fonts, but the sign is a natural reduction of either graph as they are handwritten.)

Miyake’s glosses of ‘idu’ and the other technical concepts here are so cursory that I believe someone unfamiliar with these languages – a specialist in Turkish, say – might find the quoted passage more or less impenetrable and be put off the book altogether. This would be a pity, particularly since the passage is actually fairly peripheral to the topic of Miyake’s chapter (but if you don’t understand it you won’t know that).

The book contains a number of inconsistencies. I do not underestimate the large editorial problems that must be involved in co-ordinating contributions by so many separate authors, but the most noticeable anomalies occur in material for which Martine Robbeets herself was responsible. On p. 37 she refers to blue and green elements in Fig. 3.7, but although the book contains many colour plates Fig. 3.7 is printed in black and white. The opening paragraph of the introduction to the book tells us that the Transeurasian languages stretch “from the Pacific in the East to the Baltic, the Black Sea, and the Mediterranean in the West”, which comes as a surprise since the previous page contains a large and detailed map (Fig. 1) of the “Distribution of the Transeurasian languages” that shows none of them anywhere near the Baltic. (Turkish accounts for the Black Sea and Mediterranean.) This puzzle is partly solved by a later map, Fig. 3.1, also labelled “The distribution of the Transeurasian languages”, which differs considerably from Fig. 1 and does show a Transeurasian language near the south-east shore of the Baltic; however this language is identified only as “.UP”, a member of the family “7XUNLF”. The passage on p. 31 which discusses Fig. 3.1 says that its abbreviations are explained in the list of abbreviations in the prelims, but they are not. (Comparing the two maps, it is clear that “7XUNLF” is Turkic. I believe “.UP” must be Lithuanian Karaim, the Turkic language of a small Jewish community which originated in the Crimea and now live in Trakai, west of Vilnius; according to Éva Csató & Lars Johanson in chapter 23, fewer than twenty of them still speak it.)

Some minor errors I spotted are: p. 90, note to Fig. 6.2, “centuries before the present” should be “years before the present”; p. 156, Fig. 11.3, the legend defines the meaning of two shades of grey in the Figure, but it uses three shades; p. 443 says that “In contrast to what is found in Turkic” Sakha (Yakut) and Dolgan express the concept “have” in a certain way, yet the chapter identifies Sakha and Dolgan as Turkic languages; p. 694, first paragraph of sec. 38.3.2, “do not make an age distinction” should be “do not make a sex distinction”.

This ‘Oxford Guide’ is a valuable, comprehensive historically-oriented survey of the languages and language subfamilies it covers. About the extended argument for the Transeurasian hypothesis which is grafted onto the survey I feel less sure. I wonder how many readers whose prior position was agnostic or sceptical will find themselves shifted to a significantly different posterior assessment of the hypothesis.

The wisest view may be that expressed in chapter 41 by Edward Vajda, who sees it as unhealthy for linguists to think that “proving a genetic relationship between languages represents some sort of pinnacle of achievement …, with evidence of borrowing or other forms of language contact often treated as chaff to be winnowed away”. Modern historical linguists, Vajda urges, should be interested in all aspects of language history.


Geoffrey Sampson graduated in Chinese from Cambridge University in 1965. After several years studying linguistics and computing as a graduate student at Yale, he began his career as a fellow of Queen's College, Oxford University. He went on to teach linguistics at the London School of Economics, at Lancaster University (where he was a member of Geoffrey Leech's corpus linguistics group), and Leeds University, with sabbatical periods at Swiss and South African universities and British Telecom; he later moved to Sussex University to teach informatics. After reaching retirement age in 2009 he spent several years as a research fellow in South Africa. Sampson has published books and articles on most areas of linguistics, on computer science, and on several other subjects; his most recent book is ''Voices from Early China: the 'Odes' Demystified'' (2020).

