LINGUIST List 5.1196
Sat 29 Oct 1994
Disc: Corpus analysis of -BODY/-ONE
Editor for this issue: <>
Jane A. Edwards, Corpus analysis of -BODY/-ONE
Message 1: Corpus analysis of -BODY/-ONE
Date: Mon, 24 Oct 94 16:28:21 -0Corpus analysis of -BODY/-ONE
From: Jane A. Edwards <edwardscogsci.Berkeley.EDU>
Subject: Corpus analysis of -BODY/-ONE
Ellen Prince writes:
>the forms in -BODY are the normal ones for an oral/informal
>register and the ones in -ONE are normal for a formal/written register. i've
>noticed this only because i find myself changing -BODY to -ONE in the writing
>of students of mine who are fluent in english but not native speakers. also,
>presumably because of the register clash, i find 1 seriously weird and 2
>normal, and 3 normal and 4 somewhat weird:
>1. ???everybody brought his wife.
>2. everybody brought their wife.
>3. everyone brought his wife.
>4. ?everyone brought their wife.
>(i purposely made the predicate appropriate of males only, to avoid the issue
>of ideologically based gender-related preferences.)
>anybody (???anyone) out there have the same intuitions?
Rather than intuition, I append some corpus data bearing on
these issues. The data support her overall intuitions, but bring to
light also a couple of additional factors. I report them in
two sections, corresponding to her two main claims.
(1) Prince's first claim is that -BODY and -ONE are distributed differently
with respect to register, and, more specifically, that: (a) -ONE is favored
in written/formal language, and (b) -BODY is favored in spoken/informal.
a. In the Wall Street Journal Corpus, -ONE is used 82% of the time.
This pattern (i.e., -ONE preferred over -BODY) also holds for each of the 4
pairs taken separately (i.e., NO ONE vs. NOBODY, ANYONE vs. ANYBODY, etc.).
b. In word frequency lists for two additional written corpora--the LOB
(British) and the Brown (American)--the same pattern is replicated for each
of the 3 paired terms that I could check in listings. ("no one" is not listed
so I couldn't check "no-one"/"nobody"; I would be very interested to obtain
these two data points from someone who has these corpora.) The overall
dominance of -ONE over -BODY for the 3 pairs was 79% in LOB and 65% in Brown.
a. In the London-Lund Corpus (British), -BODY is used 75% of the time.
Furthermore, this pattern (i.e., -BODY preferred over -ONE) holds for each
of the 4 pairs taken separately (i.e., NOBODY vs. NO ONE, etc.).
b. The London-Lund Corpus consists of 12 texts devoted to different
kinds of spoken language. So, I checked to see whether the amount of
dominance would vary with formality of text type. And here again I found
support for Prince's general claim: -BODY was used more frequently
than -ONE in every one of the 12 texts in the LLC. Furthermore, the
degree of dominance varied with text type in exactly the direction she
predicted. Thus, the text consisting of the most formal type of talk
in the LLC ("prepared oration") has the lowest percent of -BODY
dominance (52%), while the most informal type ("conversations between
equals") has the highest percent of -BODY dominance (71% to 91% for the
6 texts of this type).
Discussion for Claim #1: These data provide overwhelming support for
the gist of Prince's claim. At the same time, they show the need for
one small refinement: the distributions of -BODY and -ONE are not
in strictly complementary distribution but overlap. In fact, a further
look at the data revealed that -BODY and -ONE sometimes even occur
in the same sentence:
- [Wall Street Journal:wsj7]:
"EVERYBODY's confused and NO ONE has an opinion that lasts longer than
30 seconds," said Mr. Zipper.
- [London-Lund:2 5a]:
and EVERYBODY will know it`s snide and NO ONE will take it seriously #
There are 11 such sentences in the Wall Street Corpus and 5 in the LLC.
Since register shouldn't change within a sentence, this suggests the
operation of additional factors besides register. A look at further
instances might give some clues as to what these might be, but that
must await some further efforts.
(2) Prince's second claim, also in two parts, is that: (a) sentences #1
and #4 are "weird", and (b) their weirdness is due to "register clash".
The term "clash" implies discord or anomaly, which suggests that
sentences #1 and #4 should not be acceptable to speakers. The table
below gives the frequencies of all relevant forms from the London-Lund
(spoken) corpus, notated to the right with her sentence numbers:
Singular Pronoun Substition 14 (#1) 2 (#3)
Plural Pronoun Substition ("informal") 31 (#2) 9 (#4)
Contrary to expectation, both #1 and #4 *do* occur in the data, multiple
times. And so, if by "weird" she means "anomalous", her claim is
falsified by these data.
If, however, "weird" means simply "atypical", then the conflict disappears,
and the data can be taken to support her claim in two ways:
(a) the combination she judged as most "normal" for spoken/informal (i.e.,
sentence #2), is the one that occurred most frequently in the corpus;
(b) the two which she found weird (i.e., #1 and #4) occur only half as
frequently as #2.
Before putting too much weight on raw frequencies, of course, this
should be checked in additional corpora, but this softer interpretation
of "weird" is closely related to the view that acceptability judgments are
graded rather than all-or-none, a position which others have argued
for on independent grounds.
Now, if #1 and #4 are not anomalous, but simply atypical, then they
aren't examples of "register clash", and that raises the final question,
namely, how best to describe them with respect to the other 2 sentences.
Svartvik and Leech (1975, p. 163) provide a partial answer.
In comparing BODY+Singular with BODY+Plural, they state that both
are acceptable and that BODY+Singular is simply *more formal* than
BODY+Plural. That is, these utterances differ along a continuum, with
no clash. As for the rest of the answer, a look at the instances of
#1 in the corpus indicates that some of them may be directly determined
by factors other than register. For example:
2 14: Vidor is SOMEBODY who gets bees in HIS bonnet #
in this one, because the speaker has a specific person in mind
it would sound weird *not* to use the Singular pronoun, i.e., no matter
how informal the discourse.
In this posting the corpus data give overwhelming support for the gist
of Prince's two claims. They also indicate that there may be other
factors besides register that still await discovery. And so, as Prince
did, I close with an appeal for further data.
--Jane Edwards (edwardscogsci.berkeley.edu)