Sat 17 Dec 1994

Disc: Comparative Method

The ongoing discussions about the comparative method do not seem to
be getting anywhere on achieving real consensus in Greenberg and
anti-Greenberg camps on the question of what would could as valid
evidence that certain language families ARE related at a large time
depth. I wonder if it would not be a good idea to hear something --
from the defenders of wide-ranging and large-time-depth comparison,
preferably -- concerning what would count as evidence AGAINST a genetic
relationship? As a concrete example, take the fact, recently cited by
Poser, that ALL the Muskogean evidence in Greenberg's book has been
found to be tainted by data errors (Geoffrey D. Kimball, A Critique of
Muskogean, "Gulf", and Yukian Material in _Language in the Americas_,
IJAL 58 (1992), 447-501). I can imagine how one might want to maintain
that even this total collapse of the case on Muskogean merely puts us
back in a state of being neutral, a priori, on whether Muskogean
languages are related to other Amerindian languages, or to Nostratic
for that matter. Anti-Greenberg Amerindianists are perfectly prepared
to agree that the Amerind languages MIGHT have descended from a common
source now lost. That's neutrality.

But suppose we move from that neutrality to the position that we will
assume as a default that Muskogean IS Amerind, and so are all the
languages of South America, and indeed, that Amerind is related to
Sino-Tibetan and both to Indo-European and thus Nostratic and all of
the above to Khoisan... Let us assume for the sake of argument that
the world's languages are all genetically related; but let us take this
to be an empirical assumption -- not just a willingness to reject the
closet racism that Poser says Ruhlen once alleged in his critics, or a
yearning to find universal brotherhood, but an assumption against which
evidence can in principle count. Now, what sort of linguistic evidence
would count, for Greenberg and Ruhlen and Illich-Svitych, as DISPROVING
the inclusion of Muskogean or any other family in in the conjectural
(though tentatively assumed) Proto-Gaeic?

That is, what sort of data pattern or configuration of phonological
and grammatical properties could suffice to make the macrocomparativists
throw in the towel and go outside to meet the press and concede defeat?
There ought to be some imaginable scenario that would end up with Ruhlen
telling a group of reporters from the Stanford Daily and American
Scientist and other supermarket tabloids, "Well, we thought we could
sustain the whole Proto-Gaeic thing, but that set of paradigms on Haida
has us beat; we've had to concede the Haida case; according to our tests,
Haida is unrelated to the other human languages." (Much scope for new
ADMITS.") But what sort of scenario would it have to be, to get the
Greenberg camp to admit that it was in grave trouble on some relatedness

To be fair, orthodox comparativists might well say that if you put it
like this, no answer should be expected. One can argue that a certain
methodology applied to a certain set of data yield no evidence for
relatedness between Burushaski and Bushman, but not that it refutes
such a relatedness. A positivist view of historical linguistics would
see it as maintaining hypotheses about verifiable relatednesses in a
very particular form: when I say that German "pfennig" comes from an
earlier Germanic form with initial "p" that will be seen in languages
like English with no history of a High German sound shift, I am counted
as having been supported by the observation that English speakers say
"penny"; if the form turned out to be "twenny" I would be in trouble;
given German "Pfund" I am committed to something like "pund" in English,
and (given the Great English Vowel Shift) the discovery of "pound" is
more good news for me; and so on. The predictions I am making are
about an indefinitely extensible set of pairs (Ger:pfxxx, Eng:pxxx).

Now, the falsity of one of these could conceivably taken to refute
brittle forms of the hypothesis that English cognates of German
pf-words always begin with p-, but it isn't nearly enough to be
counterevidence to the whole English/German relatedness claim, of
course. That claim would not be given up unless there was a complete
collapse of all the evidence: if "pound" was established textually to
have been a coinage by a novelist who had never heard German, if
"penny" was shown to be borrowed from Italian "penne" during a period
when pasta had been used for small change, and then all the other
sound correspondences started collapsing as well.

I'm asking this: if the 100% collapse of Greenberg's Muskogean evidence,
as alleged by Kimball, does NOT count as a complete collapse of the case
that Muskogean is included in Amerind (hence, a fortiori, of the case that
it is in Proto-Gaeic), then I think I need some help in understanding what
COULD be evidence against that inclusion. There had better be something.

In response to Poser, Nichols does on p. 6 of her book
claim that there is no way for the comparative method
to distinguish between Nostratic and "a much larger
grouping of most lineages of the Old World and New World",
that is, as she herself says, between hypothetical
groupings of around C.12000 and c. 40000 years ago,
and she does say that this "Because the cut-off point
is so shallow", the cut-off point being the ceiling of
6000-10000 years which she imposes (wihout any basis, as
noted in my earlier messages) on the comparative method.
Since this amounts to a rejection of the Nostratic
hypothesis (not as false perhaps but as unverifiable/
unfalsifiable, I guess), this means that I am right and
Poser is wrong about whether there have been people who
have rejected particular theories of linguistic
relationship on the basis of this mythical ceiling idea.

In response to Teeter: I think (and I hope Karl will
endorse this) that our disagreements are really quite
minor, but they are real as far as they go. For example,
while Karl is obviously 100% right about Meillet's position
in the Scientia article (where Meillet says that lexical
comparisons can never prove a relationship, and only
morphological ones can), in his 1925 book Meillet repeatedly
states that you CAN establish a linguistic relationship
purely on the basis of lexical correspondences, makes the
same point that I have been making over and over again here
on LINGUIST that for some language families this is the ONLY
way of showing relationship since they lack morphology, and
even makes the same point that I did about how certain
things can only be done ONCE you have established, at least
tentatively, that the languages you are dealing with are
related. As a matter of fact, he even shows how you could
demonstrate the relatedness of the Romance languages PURELY
on the basis of a lexical comparison, using the numerals 1-10,
and then shows how you could do that for the older Indo-European
languages too (although there he begins to slip in a little

I would also like to add that I think it is a serious
mistake to pretend that there are no models for
comparative linguistics besides Indo-European, because
it is so utterly atypical of the language families of
the world. There ARE plenty of equally well established
families, several of which are OLDER in the only sense
that matters, that is, not in years before the present
but in years before the earliest written records and many
of which are more useful models for those working on
families not yet established (Afroasiatic, Austronesian,
Austroasiatic, Uto-Aztecan, Altaic, etc.). Which is not
to say that there is anything wrong with knowing as much
as possible about IE, but rather that there is much wrong
with knowing naught BUT Indo-European. I am not sure
but I think that this is what Eric Hamp had in mind
in a recent paper in the Davis/Iverson volume when
he complained about how the teaching of historical
linguistics is hampered by textbooks which largely
draw their material from IE (or indeed from some
favored parts of it, such as Romance).

And I am happy to have Sally Thomason point out that
morphological elements can be borrowed. Meillet
we must remember was greatly troubled by the possibility
of such a thing and of the existence of mixed languages.
He tried to debunk every examples around, and thought
(wrongly I think) that if such languages exist, then
they cannot be handled by the comparative method.
The fact that such languages do exist (e.g., Mitchif
) and yet pose no problem (so that we have no trouble
tracing certain parts of Mitchif to French and others
to Cree) means that Meillet was worried for naught. But
it also means that language classification on the basis
of morphology is no more infallibible than that on the basis oif
lexical material. You work with what you have available,
which in some cases may be largely morphology and only
a few obvious lexical parallels (that's how Afro-Asiatic
was first established), morphology AND lexicon (INdo-European),
lexicon and A SINGLE morphological parallel (Algic, as Victor
Golla reminded me just the other day), lexicon only (Vietnames
and the rest of Mon-Khmer), and so on and so forth.
Typology of Historical Change

This note tries to make explicit what I take for granted, and have discussed
with others on occasion, but which perhaps needs a more explicit statement.

One of the most fruitful avenues of research in distant language comparison, I
believe, is the growth of the field I call

Typology of Historical Change.

Under this rubric I include for example the work of Johanna Nichols (whether or
not I agree with any data, findings or particulars of method, is not relevant
to my point; I still think it helps our thinking along).

I also include, and this is a challenge I want to issue, the

***mode of discourse***

in which Mr. Vovin asked recently for help in finding typological parallels to
a hypothesis he was interested in that a phrase meaning "water falls" could
fossilize (?) into a basic word for "rain". In response to his query, he got
back some positive answers, examples which people claimed fit this description.

As a method of reasoning, this is what we need more of. That is, more
accumulations of attested examples of particular changes, to educate our
intuitions of what we naively think are "possible" semantic shifts by ever more
experience with what actual semantic shifts are known or suspected. It will
help us to improve our methods of guestimating possible language relationships,
because it will at least say that a given hypothesized semantic shift is
frequently attested, so it is not straining to compare lexical items whose
meanings differ in such and such a way. Whereas by contrast another
hypothesized semantic shift is not firmly attested. So such an unattested
semantic shift should probably not be used in those distant language
comparisons which are themselves the most difficult to do, because over large
time spans the number of context-sensitive conditioning environments is as
great as the number of lexical items available to compare, and thus there are
few or no ***recurring*** sound correspondences.

In other words, as we move towards deeper comparisons, we must more and more
rely on ways of measuring "distance" of semantic shift and "distance" of
phonological change, rather than measuring repeating sound correspondences and
semantic identities. We do not yet have our tools for doing this very well
sharpened, but we can proceed gradually to sharpen them. A study of the known
attested cases is the best start.

In other words, if someone really wants to see how our methods fare with
gradually more distant language comparisons, and to see how some new methods
may fare, they should tabulate, for all known language relationships,

(a) the proportion of sound-correspondence repetition in the comparable
 vocabulary (and what "comparable" means is itself a variable,
 not exclusively defined by (b) and (c))

(b) the "semantic distance" along attested paths of semantic change of
 lexical items being compared. Where multiple such shifts have
 been attested, the estimated "distance" counts as closer, smaller.
 Where few such shifts have been attested, the estimated "distance"
 counts as greater. We of course do not have enough such information
 in database form to use at present, but whatever we do have can
 be used provisionally, as explained in (d)

(c) The "phonetic distance" along attested paths of phonetic change of
 lexical items being compared. There is relatively more of this
 knowledge available than for phonological change.

(d) Exploring how the three measures above vary as we go to greater time
 depths. That is, using first the more assured cases, then the less
 assured ones,

 How does a weighted average of "closeness" of compared lexical items
 vary as we go to increasing time depths?

 How does the proportion of regular and often recurring sound
 correspondences to unique or rarely recurring sound correspondences
 vary as we go to increasing time depths?

It is the development of the tools in (b,c) which will most advance our
 abilities to compare at greater time depths, improve our methods.

I will be very grateful to anyone who points me to studies which approximate to
parts of the program outlined just above.

"The Comparative Method" currently does not have the benefit of fully developed
tools of this kind. To that degree, the current comparative methods can be
considered less rigorous than they ought to be, and for that reason not as
powerful at distant language comparisons as they will sometime come to be.
A future comparative method can use these tools more and more precisely.

The real challenge today to existing comparativists is to avoid artificially
fossilizing the term "The Comparative Method", to avoid treating its methods as
fixed and not subject to improvement and supplementing with newer and more
powerful methods, as are the methods in any other science. It would be healthy
if the word "the" were dropped from the term and it were made a mass or plural
term "comparative methods". That implies no lessening of rigor. Indeed, as I
have been at pains to point out above, I firmly believe some of the limitations
of the present state of comparative methods result from a ***lack of rigor***
in the area of the typology of possible changes (phonetic, semantic,

Work discussed by Bill Croft in the topic of syntactic reconstruction and
typology is certainly relevant to the concerns raised here. I think we are
seeing the beginnings of a new paradigm in the focus on paths of change in
language, and comparative-historical linguists will be left behind if they do
not add these techniques to their box of tools (while keeping all the good
techniques they already have).

Lloyd Anderson
