LINGUIST List 5.1196

Sat 29 Oct 1994

Disc: Corpus analysis of -BODY/-ONE

Editor for this issue: <>


Directory

  • Jane A. Edwards, Corpus analysis of -BODY/-ONE

    Message 1: Corpus analysis of -BODY/-ONE

    Date: Mon, 24 Oct 94 16:28:21 -0Corpus analysis of -BODY/-ONE
    From: Jane A. Edwards <edwardscogsci.Berkeley.EDU>
    Subject: Corpus analysis of -BODY/-ONE


    Ellen Prince writes: >the forms in -BODY are the normal ones for an oral/informal >register and the ones in -ONE are normal for a formal/written register. i've >noticed this only because i find myself changing -BODY to -ONE in the writing >of students of mine who are fluent in english but not native speakers. also, >presumably because of the register clash, i find 1 seriously weird and 2 >normal, and 3 normal and 4 somewhat weird: > >1. ???everybody brought his wife. >2. everybody brought their wife. >3. everyone brought his wife. >4. ?everyone brought their wife. > >(i purposely made the predicate appropriate of males only, to avoid the issue >of ideologically based gender-related preferences.) > >anybody (???anyone) out there have the same intuitions?

    Rather than intuition, I append some corpus data bearing on these issues. The data support her overall intuitions, but bring to light also a couple of additional factors. I report them in two sections, corresponding to her two main claims. --- (1) Prince's first claim is that -BODY and -ONE are distributed differently with respect to register, and, more specifically, that: (a) -ONE is favored in written/formal language, and (b) -BODY is favored in spoken/informal.

    WRITTEN: a. In the Wall Street Journal Corpus, -ONE is used 82% of the time. This pattern (i.e., -ONE preferred over -BODY) also holds for each of the 4 pairs taken separately (i.e., NO ONE vs. NOBODY, ANYONE vs. ANYBODY, etc.). b. In word frequency lists for two additional written corpora--the LOB (British) and the Brown (American)--the same pattern is replicated for each of the 3 paired terms that I could check in listings. ("no one" is not listed so I couldn't check "no-one"/"nobody"; I would be very interested to obtain these two data points from someone who has these corpora.) The overall dominance of -ONE over -BODY for the 3 pairs was 79% in LOB and 65% in Brown.

    SPOKEN: a. In the London-Lund Corpus (British), -BODY is used 75% of the time. Furthermore, this pattern (i.e., -BODY preferred over -ONE) holds for each of the 4 pairs taken separately (i.e., NOBODY vs. NO ONE, etc.). b. The London-Lund Corpus consists of 12 texts devoted to different kinds of spoken language. So, I checked to see whether the amount of dominance would vary with formality of text type. And here again I found support for Prince's general claim: -BODY was used more frequently than -ONE in every one of the 12 texts in the LLC. Furthermore, the degree of dominance varied with text type in exactly the direction she predicted. Thus, the text consisting of the most formal type of talk in the LLC ("prepared oration") has the lowest percent of -BODY dominance (52%), while the most informal type ("conversations between equals") has the highest percent of -BODY dominance (71% to 91% for the 6 texts of this type).

    Discussion for Claim #1: These data provide overwhelming support for the gist of Prince's claim. At the same time, they show the need for one small refinement: the distributions of -BODY and -ONE are not in strictly complementary distribution but overlap. In fact, a further look at the data revealed that -BODY and -ONE sometimes even occur in the same sentence: - [Wall Street Journal:wsj7]: "EVERYBODY's confused and NO ONE has an opinion that lasts longer than 30 seconds," said Mr. Zipper. - [London-Lund:2 5a]: and EVERYBODY will know it`s snide and NO ONE will take it seriously # There are 11 such sentences in the Wall Street Corpus and 5 in the LLC. Since register shouldn't change within a sentence, this suggests the operation of additional factors besides register. A look at further instances might give some clues as to what these might be, but that must await some further efforts. --- (2) Prince's second claim, also in two parts, is that: (a) sentences #1 and #4 are "weird", and (b) their weirdness is due to "register clash".

    The term "clash" implies discord or anomaly, which suggests that sentences #1 and #4 should not be acceptable to speakers. The table below gives the frequencies of all relevant forms from the London-Lund (spoken) corpus, notated to the right with her sentence numbers: -BODY -ONE Singular Pronoun Substition 14 (#1) 2 (#3) Plural Pronoun Substition ("informal") 31 (#2) 9 (#4) Contrary to expectation, both #1 and #4 *do* occur in the data, multiple times. And so, if by "weird" she means "anomalous", her claim is falsified by these data.

    If, however, "weird" means simply "atypical", then the conflict disappears, and the data can be taken to support her claim in two ways: (a) the combination she judged as most "normal" for spoken/informal (i.e., sentence #2), is the one that occurred most frequently in the corpus; (b) the two which she found weird (i.e., #1 and #4) occur only half as frequently as #2. Before putting too much weight on raw frequencies, of course, this should be checked in additional corpora, but this softer interpretation of "weird" is closely related to the view that acceptability judgments are graded rather than all-or-none, a position which others have argued for on independent grounds.

    Now, if #1 and #4 are not anomalous, but simply atypical, then they aren't examples of "register clash", and that raises the final question, namely, how best to describe them with respect to the other 2 sentences. Svartvik and Leech (1975, p. 163) provide a partial answer. In comparing BODY+Singular with BODY+Plural, they state that both are acceptable and that BODY+Singular is simply *more formal* than BODY+Plural. That is, these utterances differ along a continuum, with no clash. As for the rest of the answer, a look at the instances of #1 in the corpus indicates that some of them may be directly determined by factors other than register. For example: 2 14: Vidor is SOMEBODY who gets bees in HIS bonnet # in this one, because the speaker has a specific person in mind it would sound weird *not* to use the Singular pronoun, i.e., no matter how informal the discourse. --- In this posting the corpus data give overwhelming support for the gist of Prince's two claims. They also indicate that there may be other factors besides register that still await discovery. And so, as Prince did, I close with an appeal for further data.

    --Jane Edwards (edwardscogsci.berkeley.edu)