LINGUIST List 3.886

Tue 10 Nov 1992

Disc: Probabilistic Reasoning

Editor for this issue: <>


Directory

  1. Don Ringe, explanation of a recent statement regarding probabilities
  2. "Paul Purdom", Re: 3.872 Probabilistic Reasoning

Message 1: explanation of a recent statement regarding probabilities

Date: Sun, 08 Nov 92 17:21:33 ESexplanation of a recent statement regarding probabilities
From: Don Ringe <dringeunagi.cis.upenn.edu>
Subject: explanation of a recent statement regarding probabilities

Sooner or later, I guess, someone will find it utterly incredible that 6
objects can be fitted into a row of 20 slots no less than 38,760 different ways
(as I asserted in a recent posting). If that person is you, the least I can do
is show you my calculations. The calculation proceeds in two parts.
FIRST PART.
Imagine that we are actually sitting before a row of twenty holes, with six
actual objects to put into them. In placing the first object we have 20
choices (since at that point all the slots are empty). *No matter which choice
we make*, we then have another 19 choices for the second object (since at that
point 19 slots are still empty); in other words, *each* of the 20 choices for
the first object must be multiplied by 19 for the second (20 x 19). So also
for the third, except that now we have only 18 slots open (20 x 19 x 18), and
so on; and since we have 6 objects overall, we must multiply 20 x 19 x 18 x 17
x 16 x 15 = 27,907,200.
But of course that's *much* too large, for the following reason. The way I've
set the problem up, it makes a difference what order we follow in filling the
slots; for example, putting the first object in slot 3 and the second in slot
17 is reckoned as different from putting the first object in slot 17 and the
second in slot 3. But that's no good for our purposes, because all we care
about is whether both slot 3 and slot 17 are filled (in this example); 3-then-
17 and 17-then-3 are duplications of the same thing. So we have to remove
*all* the duplicates from the above result, and that's done in the
SECOND PART.
Suppose we had only two objects instead of six. In that case removing the
duplicates would be easy: there'd be only two orders in which the two slots
could be filled, and all we'd have to do is divide the number calculated above
by 2. Very well: mark two objects a and b, and add a third. For the two
marked objects, the same two possibilities exist. For the third there are
*three* possibilities relative to a and b (or b and a, as the case may be): to
count the third object before the two marked, or between the two marked, or
after the two marked; so we'd have to multiply the two possibilities for a and
b by 3 for the third object (2 x 3). Now mark the third object c and add a
fourth. For a, b, and c we have the same 6 (2 x 3) possibilities, multiplied
by four for the fourth (2 x 3 x 4; you can work out for yourself why that's
so). And so on; so that the duplications for *6* objects are given by 2 x 3 x
4 x 5 x 6 = 720.
So we divide 27,907,200 by 720 to remove the duplications, and we get 38,760.
This answer is *wildly* counterintuitive, but correct. (This is a standard way
of calculating, or so I understand; I learned it from John Allen Paulos' book
*Innumeracy*, pp. 22-23.)
Such a counterintuitive answer is TYPICAL. In general, intuitions are the
worst guide I can think of in trying to deal with probabilities or any related
area of mathematics--so bad, in fact, that it's actually worth asking whether
genuinely random guessing (determined by coin-tossing, for example) might not
actually give better results. More seriously, there's just no alternative to
actually doing the math. --Don Ringe
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Re: 3.872 Probabilistic Reasoning

Date: Mon, 9 Nov 1992 13:51:52 -Re: 3.872 Probabilistic Reasoning
From: "Paul Purdom" <pwpmoose.cs.indiana.edu>
Subject: Re: 3.872 Probabilistic Reasoning

I have been following with interest the Scientific American article by
Greenberg and Ruhlen and the discussion that it has caused. I am not a
linguist, but I do make extensive use of statistics in my computer science
research. There have been many criticisms of Greenberg and Ruhlen's
statistical arguments. I would characterize these arguments as correct in
detail but misleading overall. The critics correctly point out that there are
a lot of languages and a lot of words that are available for comparison.
These factors make the conclusions less reliable by several powers of 1000
than would be suggested by the simplified calculations in Scientific
American. On the other hand, the authors have noticed these effects in more
than one word, and each word improves the statistical argument by a power of
a million or so. It seems pretty clear to me that Greenberg and his followers
have indeed noticed things in their statistics that need explanation. I will
leave it to the linguists to argue about what the explanation is.

Surely with time the statistical arguments will be refined. I would expect
that a lot of work will be needed to develop statistical techniques that
clearly separate instances of word borrowing from cases where two languages
develop from the same parent, but it seems that it should be possible to do
so.

Paul Purdom, Professor of Computer Science, Indiana University.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue