Editor for this issue: Maria Lucero Guillen Puon <luceroguillenlinguistlist.org>
Book announced at https://linguistlist.org/issues/33.3603
EDITOR: Thomas Olander
TITLE: The Indo-European Language Family
SUBTITLE: A Phylogenetic Perspective
PUBLISHER: Cambridge University Press
YEAR: 2022
REVIEWER: Geoffrey Sampson
SUMMARY
This book is about the prehistory of the Indo-European language family. Reconstructing the ancestry of languages before they were attested through writing is a discipline which was pursued throughout the twentieth century by methods which would have been perfectly familiar to the Neogrammarians of the late nineteenth century, until around the millennium it was electrified – some would say, revolutionized – by the introduction of a “computational cladistics” technique pioneered by a team led by Don Ringe (Ringe et al. 2002, Nakhleh et al. 2005). Ringe is one of seventeen contributors to the book; there are three U.S. contributors, while more than half of the seventeen work at either Leiden or Copenhagen Universities.
For Indo-European, the problem with which this book is concerned can be stated as follows. It is more or less uncontroversial today that Proto-Indo-European broke up into ten main branches: roughly from west to east, they are Celtic, Germanic, Balto-Slavic, Italic (i.e. Latin and extinct sister languages such as Oscan), Albanian, Greek, Anatolian, Armenian, Indo-Iranian, and Tocharian. (There were also meagrely-attested extinct languages such as Phrygian and Messapic, for which the evidence is enough to show that they were Indo-European but not enough to assign them with certainty to a particular branch.) Each of the main branches is taken to have begun as a single proto-language, though most of them went on to diversify into multiple later languages. (Armenian and Albanian have remained single-language branches, with dialect differences of course, and the same might be said of Greek, though Lucien van Beek in chapter 11 here argues that Phrygian was a sister language of Greek, so that the subfamily could appropriately be called Graeco-Phrygian.) It does not seem likely that all ten branches separated simultaneously; more probably, PIE broke into two or three daughter languages, which may have survived for long periods as single languages before some or all of them split again, and so forth. So we would like to draw a tree structure with PIE at its root and the ten subfamilies as leaves, showing which subfamilies separated relatively late and which are more remotely related. We would like to know more, of course, for instance we would like to be able to give rough dates for the various splits, and the book under review contains some discussion of questions like that. But the central issue with which it is concerned is the shape of the family tree linking PIE to its ten subfamilies, and the detailed evidence for that phylogenetic structure.
For an idea about how wide the differences between various scholars’ hypotheses have been, consider a few tree-structures that have been proposed. (I represent tree-structures via bracketings.) The first proposal was by August Schleicher in the 1860s, when the Anatolian and Tocharian subfamilies had not yet been discovered:
( (Gmc BaSl) ( ( (Cel Ita) (Alb Grk) ) InIr) )
Armenian does not appear in Schleicher’s tree, because he believed it was part of the Indo-Iranian branch. In 1985 Tomas Gamkrelidze and Vyacheslav Ivanov (Mallory 1989: 21, Fig. 10) proposed:
( ( (Gmc BaSl) (Grk (Arm InIr) ) ) ( (Cel Ita) Toc) Ana)
Their tree omitted Albanian, which is generally recognized as difficult to place in the family tree because, unlike in the case of the other branches, our data on Albanian reach back only as far as the sixteenth century A.D. Don Ringe’s group offered the following (Nakhleh et al. 2005: 397), with uncertainty about the position of Albanian:
( ( ( (Cel Ita) ( (Gmc ?Alb) ( (BaSl InIr) (Grk Arm) ) ) ) Toc) Ana)
Eric Hamp (2013) suggested:
( ( ( ( (Cel Ita) (Gmc ( (BaSl Alb) Toc) ) ) (Grk Arm) ) InIr) Ana)
Hamp also proposed that Burushaski, a language of India normally regarded as an isolate, was in fact a top-level sister of the Indo-European family. He and some of the others mentioned made suggestions about how less well-attested languages fitted into their trees, but I show just structures proposed for the main subfamilies.
These structures are only a sample of those that have been proposed, but they give an impression of the extent of disagreement. Some features recur: none of these trees depart from Schleicher’s idea that the Celtic and Italic subfamilies were closely related, and since Anatolian (Hittite and its sister languages) was recognized as Indo-European, it has generally been seen as the first branch to split off from the rest of the family. But many other features differ widely. Most linguists have held that the Tocharian group also split off early (this was a pair of languages attested from the fifth to tenth centuries A.D. far from the Indo-European heartland, in oases on the northern edge of the Tarim Basin in what is now Chinese Sinkiang); but Hamp saw Tocharian as embedded deep within the language-family.
The novel approach introduced by Ringe is to use automated techniques to search the solution-space of possible tree structures to find the structure with fewest features making that structure implausible. For instance, if two subfamilies share some unusual language-change, rather than supposing that the change occurred independently in separate branches of the tree it is more parsimonious to hypothesize that it occurred only once, at a time when the subfamilies had not yet split apart – but after one takes numerous different language-changes into account, it becomes impossible to find any structure that avoids all such implausibilities, other than at the cost of introducing new anomalies (such as a branch which undergoes a language-change and later reverts to its previous state). The best one can do is to look for a tree structure which reduces anomalies to a minimum.
Historical linguists who are sceptical about computer techniques often call them “quantitative methods”, which strikes me as a misunderstanding: linguists who prefer traditional methods are also seeking answers which reduce anomalies to a minimum. The advantage of what I would call “automated methods” is that they permit something closer to an exhaustive search of a solution-space than unaided human brains can achieve. (Even if we assume all splits are binary, there are more than 34 million alternative tree structures over ten subfamilies, if I am correct in thinking that the number of full binary trees with labelled leaves and n nonterminal nodes, but no ordering among daughters of a node, is (2n)! / n! . 2^n.) The term “quantitative” leads traditionalists to reject approaches like Ringe’s because of associations with glottochronology, which aimed to yield dates for language splits via an axiom about constancy of rates of language change which proved to be unjustified (Bergsland and Vogt 1962). But the real limitation on Ringe’s approach is that the metric for assessing “goodness” of a candidate tree can take into account only well-defined, countable features, whereas it is clear from this book that traditional methods draw on a much broader range of intuitive judgements. The easiest features to apply automated tree-optimization to are cognate vocabulary items, but they are among the least reliable indicators of relatedness between languages because of the propensity of languages to borrow vocabulary from one another.
Ringe himself offered his automated approach as merely one new tool for the historical linguist’s toolbox, and in his editorial Introduction to this book Thomas Olander comments that “Over the last couple of decades, computer-assisted approaches have, in my view, received more attention than can be justified by the results they have produced.” Olander goes on to describe the book under review as “in some way, an attempt at reinvigorating the traditional methodology”.
(Undue credulity about the potential of automated methods in historical linguistics is only one minor symptom of a far larger problem. Gross overestimation, by educated people who are not themselves computer professionals, of the extent to which computers can contribute to the achievement of human goals has become a serious disease of 21st-century public life, cf. Sampson 2012.)
In his Introduction, Olander outlines the problem situation which I have sketched, pointing out that standard general works on Indo-European linguistics have tended to devote surprisingly little attention to the early phylogeny of the family. Olander discusses the precise meaning of historical-linguistic terms that are often taken for granted without their ambiguity being noticed. “Proto-language”, for instance, might refer to a language immediately before it splits into distinct descendant languages, or might mean a language that has evolved through time from the point when it became distinct from its sister languages down to the point when it splits into separate daughter languages – in terms of tree diagrams, a node of a tree, or a branch between mother and daughter nodes. Olander ends his Introduction by briefly summarizing the contents of the fourteen following chapters.
The first three of these are on general subjects. James Clackson’s chapter 2, “Methodology in linguistic subgrouping”, surveys the history of research on the issue and identifies some of its methodological axioms – notably that recognition of a clade or “subgroup” must be based on shared innovations rather than on shared retentions of earlier features which have been lost from related languages, and that one must be cautious about treating shared features as evidence of relationship if they might have resulted from language contact rather than from shared ancestry. These two points, of course, comprise much of the difficulty of linguistic genealogy: it is often hard to know whether a given feature is an innovation or a retention, and although vocabulary is specially prone to being borrowed, I am not sure that any kind of language property will not be found to have been transmitted in some cases through contact between unrelated languages.
Chapter 3, “Computational approaches to linguistic chronology and subgrouping” by Dariusz Piwowarczyk, surveys various ways in which computers have been harnessed to historical-linguistics research tasks. Don Ringe’s approach is one of these, but there have been others: for instance scholars including Adam Baker (2008) have developed software which simulates the application of sound-change rules to a vocabulary, so as to check that a series of sound laws reconstructed on paper do in fact collectively yield the end results that their authors suppose them to yield. Patrick Sims-Williams (2018) used a computer to “apply … forty-three sound changes [to] the material of 159 selected Common Celtic forms”, enabling him “to find amendments in the relative chronology of changes and identify usually overlooked Celtic cognates from a Proto-Indo-European root”.
The title of Don Ringe’s chapter 4, “What we can (and can’t) learn from computational cladistics”, is self-explanatory. Ringe says in his opening paragraph that “To at least some observers, it has not always been clear that what we can learn from computational cladistics is limited. This chapter is an attempt to explore those limits.”
The remaining chapters are each devoted to one of the main Indo-European branches, by an expert on the respective branch; and there is also a separate chapter, by Michael Weiss, on the Italo-Celtic grouping, since although all the sample trees I displayed above treated Italo-Celtic as a lowest-level clade, according to Weiss “It would be fair to say that Italo-Celtic is more debatable than any other higher order subgrouping, certainly much more so than Balto-Slavic” – and Tijmen Pronk, writing about Balto-Slavic, points out that that clade was itself controversial for much of the twentieth century, though not today. (Weiss nevertheless tentatively concludes that Italo-Celtic was a clade, the first group to have branched off after Anatolian and Tocharian.)
The chapters on individual subfamilies are written to a fairly standard pattern, with sections on the innovations distinguishing the respective subfamily from the rest of Indo-European, on its relationships with sister subfamilies, and on its internal structure (i.e. how members of the subfamily relate to one another).
These chapters do not contain surprising, novel theories about Indo-European phylogeny – that is not the purpose of the book. Rather, it aims to set out the current state of play in this field, spelling out the detailed evidence for (and against) conclusions about tree-structures which are often generally accepted. Nevertheless, where up-to-date evidence settles issues that have been seen as debatable until recently, the authors are not afraid to say so. Notably, Don Ringe describes the archaeologist Colin Renfrew’s “Anatolian hypothesis” (Renfrew 1987), that the Indo-European languages spread into Europe from Anatolia rather than from the Eurasian steppe, as having been decisively refuted (not by his own work!) – which will probably surprise few linguists who have taken an interest in this controversy. Inevitably, different contributors’ ideas about the relationships between “their” subfamily and sister subfamilies are not always mutually compatible. Bjarne Hansen and Guus Kroonen’s chapter on Germanic, for instance, surmises that Proto-Germanic “broke off from Proto-Indo-European after Anatolian and just before or after Tocharian”, which would presumably contradict Michael Weiss’s conclusion reported above about Italo-Celtic.
(Olander remarks that “Interestingly, the different conclusions reached in the various chapters only rarely seem to hinge on discrepancies in the reconstruction of Proto-Indo-European and its development into the individual daughter languages, although one might have expected such discrepancies to play a significant role.”)
EVALUATION
This is an outstandingly valuable book, bringing together facts and hypotheses which have been scattered through the professional literature. It will surely become a convenient reference source and a starting-point for scholars who develop new hypotheses in this field. Not the least of its virtues is that the Jubilee Fund of the Danish Riksbank has made it available from the publisher on open access, free for all to read.
The book is not flawless. I am not qualified to detect errors, if there are any, in forms from most of the ancient languages discussed, but I did wonder about apparent illogicalities in some of the contributors’ discussion of them (though I may well have misunderstood through lack of expertise). Hansen and Kroonen list phonological innovations distinguishing Proto-Germanic from the rest of Indo-European, one of which is *ā > *ō; their first example is Gothic /sōkjan/ “seek” v. Latin ‘sāgīre’, which is straightforward at least with respect to the phonemes, but their other example is Gothic /blōma/ “flower” v. Latin ‘flōs’. The Latin word is ‘flōs’ not ‘flās’, and according to Mallory and Adams (1997, s.v. FLOWER) the PIE form giving rise to the Gothic and Latin words is likely to have been *bhloHdhos (where H represents an unspecified laryngeal); so where did *ā come in?
Listing sound-changes which established Goidelic as a separate clade within Celtic, Anders Jørgensen includes a rule which deleted nasals before voiceless obstruents, giving as one example Proto-Celtic *krenxtV- > Old Irish ‘crécht’ “wound, scar”, to which in brackets he adds Middle Welsh ‘creith’. (In the modern language the word is ‘craith’.) But Welsh is Brittonic rather than Goidelic Celtic, so why is the /n/ lost there too? Two pages later the same author appears to say that Brittonic nasal mutation arises through a rule which nasalizes voiced stops, ND > NN. But Welsh nasal mutation applies to all stops, not just voiced stops – hence the surprise often felt by visitors to Wales who notice the exotic consonant sequence in the signs at the border, ‘Croeso yng Nghymru’, “Welcome to Wales”, with the /k/ of ‘Cymru’, “Wales”, mutated to a voiceless /ŋ/ following ‘yn’, “in”.
In his Tocharian chapter, Michaël Peyrot argues that it is necessary to posit a Proto-Tocharian ancestral to the attested Tocharian A and Tocharian B languages, because neither attested language can be derived from the other; for instance, Tocharian A ‘want’, “wind”, cannot have yielded Tocharian B ‘yente’, so some intermediate proto-form must be hypothesized. But if the problem is the initial consonants, then I note that all cases of Proto-Semitic word-initial /w/ became /j/ in Hebrew, so why could a similar sound-law not have applied in Tocharian? (I do not doubt that the two Tocharian languages were sisters rather than mother and daughter, but I cannot see how Peyrot’s remark is evidence for that.)
In a few places contributors assume general principles which look questionable. Objecting to a scenario proposed by Henning Andersen according to which alternative forms of a given word in dialects of Proto-Balto-Slavic “continued to coexist throughout the Proto-Slavic and Proto-East-Baltic periods and, in some cases, in the modern Slavic and Baltic languages”, Tijmen Pronk objects that “Such a long period of coexisting variants of the same words is highly unlikely”. Is that clearly true? Some English-speakers pronounce ‘economic’ with initial /i:/ and others with /ɛ/; so far as I know, that has been so for quite a long time, and I see no reason to assume that one pronunciation is bound eventually to eliminate the other. In Chinese there are many words with alternative pronunciations resulting from contacts between dialects, with the alternatives typically belonging to different speech registers but recognized as the same word (Ho 2015: 155–6).
A recurring problem (as it seems to me) is that contributors often treat words from separate languages as cognates without feeling that they need to explain large differences in meaning. Adam Hyllested and Brian Joseph in the Albanian chapter derive Albanian ‘thundër’, “hoof”, from a PIE derived noun of the form “sting + er” which also yielded Greek ‘kéntron’ “point, goad; nail”. From “stinger” to “point” or “goad” is straightforward, but how is “stinger” to “hoof” a plausible semantic development? Michael Weiss sees a Hittite word meaning “press” as cognate with the PIE root meaning “sow”, from which that English word, Latin ‘serēre’, and other Indo-European words all derive. A gardener may sow individual seeds by pressing them into the soil, but surely serious agriculture would have proceeded by broadcasting seeds and hoeing soil over them? (And I notice that Mallory and Adams, 1997 s.v. SOW, say that the PIE root is “Ultimately the same” as a homophone meaning “throw”, which sounds more like broadcasting than pressing.)
There are quite a few instances like these. Conversely, Birgit Olsen and Rasmus Thorsø in their Armenian chapter see the fact that Greek ‘mētryiā’ and Armenian ‘mawrow’ both mean “stepmother” rather than “mother’s sister”, which is the meaning of a parallel Germanic derivate, as a coincidence so striking that it suggests joint innovation. I should have thought it a very natural development which could easily have happened more than once independently. It is a sticky moment for any man when he introduces his young children to his new wife; surely it would be natural enough for him to smooth the situation by saying “Come and meet your auntie Sue” (or whatever her name might have been)?
Some of these contributors link their conclusions about language relationships to relevant findings from archaeology or from human genetics. For instance, Tijmen Pronk writes “A study of the Y chromosome of Slavic populations supports the hypothesis that the Slavic expansion started from present-day Ukraine”, and, later, “If the Balto-Slavic proto-language is associated with the (earlier phases of the) Middle Dnieper culture, which seems reasonable, the split between Baltic and Slavic can be dated no later than [2000 B.C.]. … After the split, Baltic and Slavic developed independently for over two millennia … [and] shifted to a more agriculture-based mode of subsistence, as is shown by their distinct agricultural terminology …” Other contributors discuss exclusively linguistic facts. That could be because archaeological or genetic research relevant to the languages in question is not available, or is available but suggests no particular implications for language relationships. Alternatively, it might be that some contributors prefer not to examine non-linguistic evidence.
If the latter were true, it would be regrettable. Modern conditions of academic employment encourage narrow specialization, but those who prioritize the expansion of human knowledge over career advancement have to resist that pressure. In his discussion of the failure of Renfrew’s Anatolian hypothesis, Ringe says that the decisive evidence against it has been produced “neither by archaeologists nor by linguists; the crucial evidence is ancient DNA evidence” (Ringe cites Haak et al. 2015). For Ringe, the most important contention of his own chapter is “that information from all disciplines must be used”. If one aims to discover the truth about a topic, it is illogical to ignore evidence that bears on that truth, wherever it comes from.
The issue here is not just about how best to arrive at truth, but that the public is entitled to expect academic researchers to make connexions which lead outside their specialization. Someone might be able to demonstrate that, without any doubt, linguistic subfamily A split from the ancestor of subfamilies B and C centuries before B and C separated; but if the demonstration tells us about nothing other than the early history of the A, B, and C language groups, many of the taxpayers who have paid for the research might reasonably respond “So what?” If on the other hand the linguistic developments shed light on the evolution of human cultures and the origins of nations, far more members of the public will accept that their money has been well spent.
One very minor niggle is that it is a pity that the editor of this book did not amalgamate the reference lists in the many separate chapters into a single list at the end of the book. It would have saved a great deal of flicking back and forth to check individual citations.
But enough nit-picking. This is a fine book.
REFERENCES
Baker, A. 2008. “Computational approaches to the study of language change”. Language and Linguistics Compass 2.289–307.
Bergsland, K. and H. Vogt. 1962. “On the validity of glottochronology”. Current Anthropology 3.115–53.
Gamkrelidze, T. and V. Ivanov. 1985. Indo-European and the Indo-Europeans (in Russian). English translation published by de Gruyter Mouton (Berlin), 1995.
Haak, W., et al. 2015. “Massive migration from the steppe was a source for Indo-European languages in Europe”. Nature 522.207–11.
Hamp, E.P. 2013. “The expansion of the Indo-European languages: an Indo-Europeanist’s evolving view”. Sino-Platonic Papers, 239. Online at <sino-platonic.org/complete/spp239_indo_european_languages.pdf>.
Ho, Dah-An. 2015. “Chinese dialects”. In W. S.-Y. Wang and Chaofen Sun, eds, The Oxford Handbook of Chinese Linguistics, pp. 149–59. Oxford University Press.
Mallory, J.P. 1989. In Search of the Indo-Europeans: language, archaeology and myth. Thames & Hudson.
Mallory, J.P. and D.Q. Adams, eds. 1997. Encyclopedia of Indo-European Culture. Fitzroy Dearborn Publishers.
Nakhleh, L., D. Ringe, and T. Warnow. 2005. “Perfect phylogenetic networks: a new methodology for reconstructing the evolutionary history of natural languages”. Language 81.382–420.
Renfrew, C. 1987. Archaeology and Language: the puzzle of Indo-European origins. Cambridge University Press.
Ringe, D., T. Warnow, and A. Taylor. 2002. “Indo-European and computational cladistics”. Transactions of the Philological Society 100.59–129.
Sampson, G.R. 2012. “Whistleblowing for health”. Journal of Biological Physics and Chemistry 12.37–43. Online at <www.grsampson.net/AWfh.pdf>.
Sims-Williams, P. 2018. “Mechanising historical phonology”. Transactions of the Philological Society 116.555–73.
ABOUT THE REVIEWER
Geoffrey Sampson graduated in Oriental Studies at Cambridge University in 1965, and studied Linguistics and Computer Science as a graduate student at Yale University before teaching at the universities of Oxford, LSE, Lancaster, Leeds, and Sussex. After retiring from his Computing chair at Sussex he spent several years as a research fellow in Linguistics at the University of South Africa. Sampson has published in most areas of Linguistics and on a number of other subjects. His recent books include ''The Linguistics Delusion'' (2017), ''Voices from Early China'' (2020), and ''God Proofs'' (2022).
Page Updated: 20-Sep-2023
LINGUIST List is supported by the following publishers: