Editor for this issue: <>
Sooner or later, I guess, someone will find it utterly incredible that 6 objects can be fitted into a row of 20 slots no less than 38,760 different ways (as I asserted in a recent posting). If that person is you, the least I can do is show you my calculations. The calculation proceeds in two parts. FIRST PART. Imagine that we are actually sitting before a row of twenty holes, with six actual objects to put into them. In placing the first object we have 20 choices (since at that point all the slots are empty). *No matter which choice we make*, we then have another 19 choices for the second object (since at that point 19 slots are still empty); in other words, *each* of the 20 choices for the first object must be multiplied by 19 for the second (20 x 19). So also for the third, except that now we have only 18 slots open (20 x 19 x 18), and so on; and since we have 6 objects overall, we must multiply 20 x 19 x 18 x 17 x 16 x 15 = 27,907,200. But of course that's *much* too large, for the following reason. The way I've set the problem up, it makes a difference what order we follow in filling the slots; for example, putting the first object in slot 3 and the second in slot 17 is reckoned as different from putting the first object in slot 17 and the second in slot 3. But that's no good for our purposes, because all we care about is whether both slot 3 and slot 17 are filled (in this example); 3-then- 17 and 17-then-3 are duplications of the same thing. So we have to remove *all* the duplicates from the above result, and that's done in the SECOND PART. Suppose we had only two objects instead of six. In that case removing the duplicates would be easy: there'd be only two orders in which the two slots could be filled, and all we'd have to do is divide the number calculated above by 2. Very well: mark two objects a and b, and add a third. For the two marked objects, the same two possibilities exist. For the third there are *three* possibilities relative to a and b (or b and a, as the case may be): to count the third object before the two marked, or between the two marked, or after the two marked; so we'd have to multiply the two possibilities for a and b by 3 for the third object (2 x 3). Now mark the third object c and add a fourth. For a, b, and c we have the same 6 (2 x 3) possibilities, multiplied by four for the fourth (2 x 3 x 4; you can work out for yourself why that's so). And so on; so that the duplications for *6* objects are given by 2 x 3 x 4 x 5 x 6 = 720. So we divide 27,907,200 by 720 to remove the duplications, and we get 38,760. This answer is *wildly* counterintuitive, but correct. (This is a standard way of calculating, or so I understand; I learned it from John Allen Paulos' book *Innumeracy*, pp. 22-23.) Such a counterintuitive answer is TYPICAL. In general, intuitions are the worst guide I can think of in trying to deal with probabilities or any related area of mathematics--so bad, in fact, that it's actually worth asking whether genuinely random guessing (determined by coin-tossing, for example) might not actually give better results. More seriously, there's just no alternative to actually doing the math. --Don RingeMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
I have been following with interest the Scientific American article by Greenberg and Ruhlen and the discussion that it has caused. I am not a linguist, but I do make extensive use of statistics in my computer science research. There have been many criticisms of Greenberg and Ruhlen's statistical arguments. I would characterize these arguments as correct in detail but misleading overall. The critics correctly point out that there are a lot of languages and a lot of words that are available for comparison. These factors make the conclusions less reliable by several powers of 1000 than would be suggested by the simplified calculations in Scientific American. On the other hand, the authors have noticed these effects in more than one word, and each word improves the statistical argument by a power of a million or so. It seems pretty clear to me that Greenberg and his followers have indeed noticed things in their statistics that need explanation. I will leave it to the linguists to argue about what the explanation is. Surely with time the statistical arguments will be refined. I would expect that a lot of work will be needed to develop statistical techniques that clearly separate instances of word borrowing from cases where two languages develop from the same parent, but it seems that it should be possible to do so. Paul Purdom, Professor of Computer Science, Indiana University.Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue