LINGUIST List 13.1932

Wed Jul 17 2002

Disc: New: Language Description,Huddleston & Pullum

Editor for this issue: Karen Milligan <>


  • Geoffrey K. Pullum, a response concerning The Cambridge Grammar

    Message 1: a response concerning The Cambridge Grammar

    Date: Mon, 15 Jul 2002 23:00:56 -0700 (PDT)
    From: Geoffrey K. Pullum <>
    Subject: a response concerning The Cambridge Grammar

    Response to Joybrato Mukherjee regarding "The Cambridge Grammar of the English Language" by Rodney Huddleston and Geoffrey K. Pullum

    Joybrato Mukherjee's recent Book Discussion Forum posting (LINGUIST 13.1853, July 4, 2002) draws an unfavorable comparison between "The Cambridge Grammar of the English Language" (Huddleston and Pullum 2002; hereafter CGEL) and two grammars (closely related to each other) that Mukherjee prefers: Quirk et al. (1985; henceforth Quirk) and Biber et al. (1999; henceforth Biber). He criticizes CGEL for not being corpus-based, and for adopting analyses on grounds of dogma rather than evidence. But it is Mukherjee who fails to show respect for textual evidence. He makes misattributions with respect to all three of the grammars he discusses, and fails to check facts before delivering opinions. Essentially all his negative criticisms of CGEL rest on false claims. Here we offer a brief response to half a dozen especially egregious ones.

    1. Binary branching. Mukherjee announces that CGEL analyses "take for granted that syntactic constituent structure should be represented by strictly binary-branching (or, in some cases, singulary-branching) trees." He thinks this is "but one example of the influence that generative concepts have obviously exerted on the Cambridge Grammar." But even a cursory glance at the 40 tree diagrams in CGEL would show that the claim about our alleged binarism is false (a full list of trees is provided on p. xiii, so he could easily have checked). Multiple branching is visible in the coordination example on p. 1279. And the discussion in chapter 4 of phrases with two or more complements, such as "give her this", makes it clear that we assume ternary branching. Binary branching would make "this" the complement of "give her" rather than of "give" -- or else would assign "give" a single complement, a phrase of the form "her this". CGEL adopts many insights from generative grammatical research, but sides with Jackendoff (1990) against much other current generative work in rejecting both kinds of binary analysis. CGEL maintains that in "give her this", the NPs "her" and "this" are both complements of (and sisters of) "give".

    2. The subject-predicate division. Mukherjee thinks CGEL's acceptance of the NP-VP (subject-predicate) analysis of clauses stems from the authors' bigoted binarism, and he also thinks that Quirk rejects the binary analysis. Both claims are mistaken. Quirk does make a binary division in clause structure between subject and predicate. True, a remark is made to the effect that "we shall find little need to refer to the predicate as a separate structural unit", but immediately after that comes a note about a case where it has real significance (see Quirk, p. 79). CGEL's acceptance of the subject--predicate division certainly has nothing to do with acceptance of results from generative grammar; the subject--predicate division is familiar from traditional grammar, as Quirk notes (p. 78).

    3. Multiple analyses. Mukherjee alleges that CGEL does not allow for multiple analyses. He prefers Quirk's treatment of "She looked after her son" as both Subject - Verb - "Adverbial" (She - looked - after her son) and Subject - Verb - Object (She - looked after - her son). This criticism is another double error. CGEL provides detailed syntactic evidence in favour of the former bracketing (though with "after her son" as a complement, not an adjunct (Quirk's ill-advised term is "adverbial") -- a point on which we believe Quirk got things wrong). Mukherjee gives no reason for wanting to allow the second as well. And there is solid evidence against it: it would predict the possibility of postposing a heavy NP object, yielding:

    *She looked after all morning the children from several other families in her street.

    But CGEL does allow multiple analyses where appropriate. In the construction "Bob is as generous as Sue", the complement of "as" may be either an NP or an elliptical clause consisting of nothing but a subject NP. This differs from the prepositional verb case in that there is no compelling syntactic evidence to choose one analysis over the other. Thus CGEL uses evidence to distinguish between situations in which a constituent structure claim is motivated and cases where it is not. Quirk fails to do this. Mukherjee has things backwards.

    4. Corpus use. Mukherjee sees it as a "major weakness of the Cambridge Grammar" that it is not more corpus-based. He asks: "Can a reference grammar of the English language, published in the year 2002, really be based on corpus material containing three million words only? I would say no." He says no, but he does not say why. The fact is that CGEL was not based on three million words. Its range of sources was vast: the authors' lifelong experience of the English language; the similar experience possessed by a dozen other native-speaker collaborating authors; further evidence pointed out by others; facts cited in hundreds of technical articles and books; the large grammars of Poutsma, Jespersen, Quirk, and other large grammars; the Oxford English Dictionary; various collections of texts that we happened to have on computer, including the 44 million words of the Wall Street Journal corpus (WSJ); and where necessary the World Wide Web. (The British National Corpus became available to us only when CGEL was almost complete.)

    It is true that for examples we standardly mined the Brown corpus for American English, the London-Oslo-Bergen corpus for British English, and the Australian Corpus of English for Australian English (we had convenient interactive access to these through the courtesy of Macquarie University), and these total three million words. But these corpora were merely sources of illustrative examples, nearly always edited for expository reasons. (It is one of the errors of strictly corpus-oriented grammars to use only raw attested data for purposes of illustration. We think it is counterproductive to quote a sentence with a subject NP containing a long and distracting relative clause when all we are concerned to illustrate is the order of adjuncts in the verb phrase.)

    Mukherjee maintains the peculiar view that WSJ is not a corpus at all. He says: "the Wall Street Journal, in my view, does not qualify as a representative 'corpus' but is an example of a linguistically unstructured 'archive' (which may be used as a source of authentic examples but from which general trends in language cannot be extrapolated)." What improvement results in a descriptive grammar if we rigorously restrict our attention some "representative" corpus is not made clear. Mukherjee may be confusing the purpose of a descriptive reference grammar with the aim of statistical studies of frequencies of specific words or constructions across genres, dialects, or times (Biber specializes in providing this sort of information). We were not attempting a survey of trends or genre differences; we were writing a grammar of international Standard English.

    5. Extraposition. The one way to show that a descriptive grammar failed by not being corpus-based enough would be to point out something that was missed because of a failure to attend to corpus evidence. Mukherjee's only attempt at making a point like this concerns extraposition. He asserts that Quirk and Biber, taken together, give a more convincing account of English than CGEL does. Mukherjee cites B's corpus-restricted study based on a 40-million-word collection of texts (which follows Quirk on most points of syntactic analysis) in support of the claim that extraposition constructions like "It would be pointless to resist" are more basic than non-extraposition clausal-subject counterparts like "To resist would be pointless".

    Now, let there be no disagreement about frequency: CGEL states clearly that the extraposition construction is much more frequent. (No need of a 40-million-word corpus to establish this, incidentally. Huddleston 1971 had no Brown, LOB, ACE, or computer, and worked with a corpus of only about 135,000 words. He found 3 examples of the clausal subject construction to 89 with extraposition, certainly a dramatic enough difference to establish the conclusion.) However, CGEL goes on to explain, in the paragraph right after the page Mukherjee cites (p. 1403), why the non-extraposition construction is analytically more basic: it is syntactically simpler, and has a structure that is normally the only one available for NP subjects.

    Mukherjee cites the CGEL distinction between canonical and non-canonical structures as one of the book's "many examples of refreshingly innovative concepts and/or terminology"; but he apparently does not see its relevance here. Non-extraposition clauses exemplify canonical clause structure and are in that sense more basic. The canonical vs non-canonical distinction permits a simplified grammar presentation: we first confine attention to elementary constructions and then deal with others in terms of how they differ. We hold (contrary to what is implicit in many generative accounts) that it would be a mistake to include extraposed subjects among the elements figuring in the structure of canonical clauses.

    When we turn to Biber we find that, contrary to what Mukherjee suggests, there is no substantive difference with CGEL anyway: Biber's section 3.5 (pp. 141-52) deals with "major clause patterns", while extraposition is introduced in section 3.6, headed "Variations on clause patterns" and beginning, "In addition to the basic clause patterns...". Mukherjee has failed to notice that the grammar he prefers takes the same analytical view as CGEL.

    6. Conclusion. Essentially all of Mukherjee's critical comments about CGEL stem from factual errors about the book or about other books. This is true not only for points of analysis, but for simple points about presentation that anyone could check. For example, he complains that "only very few tables and diagrams are used" in CGEL relative to Quirk. We have not done a full comparative listing of all tables and diagrams (it is unclear where to draw the line between tables and mere columned displays), but we checked two comparable chapters for trees: there are just 4 tree diagrams (two merely skeletal) in Quirk's chapter on coordination (Ch. 13), whereas CGEL's corresponding chapter (Ch. 15) contains 16 fully detailed trees. We believe these chapters are representative. Mukherjee has not done the homework to back up his critique even on simple counting such as this.

    The reader of Mukherjee's review should be cautioned, therefore, that he does not practice his quantitative preaching. He talks the corpus-based talk, but when elaborating his impressionistic comparison of CGEL with Quirk and Biber, he does not walk the walk.


    Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad and Edward Finegan (1999): Longman Grammar of Spoken and Written English. Harlow: Pearson Education.

    Huddleston, Rodney (1971): The Sentence in Written English: A Syntactic Study Based on an Analysis of Scientific Texts. Cambridge University Press.

    Huddleston Rodney, and Geoffrey K. Pullum (2002): The Cambridge Grammar of the English Language. Cambridge University Press.

    Jackendoff, Ray S. (1990) On Larson's treatment of the double object construction. Linguistic Inquiry 21:427-456.

    Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech and Jan Svartvik (1985): A Comprehensive Grammar of the English Language. London: Longman.

    Rodney Huddleston University of Queensland

    Geoffrey K. Pullum University of California, Santa Cruz