LINGUIST List 13.1952

Sun Jul 21 2002

Disc: Language Description, Huddleston & Pullum

Editor for this issue: Karen Milligan <>


  1.>, Disc: A response concerning The Cambridge Grammar of the English Language

Message 1: Disc: A response concerning The Cambridge Grammar of the English Language

Date: Sat, 20 Jul 2002 22:32:39 +0200
From:> <>
Subject: Disc: A response concerning The Cambridge Grammar of the English Language

A reply to Rodney Huddleston and Geoffrey K. Pullum 
concerning The Cambridge Grammar of the English Language

Joybrato Mukherjee, University of Bonn

In their response to my review of The Cambridge Grammar of 
the English Language (LINGUIST 13.1932), Rodney Huddleston 
and Geoffrey K. Pullum claim that my "critical comments 
about CGEL stem from factual errors about the book or other 
books." In this context, they point out six issues in 
particular ("especially egregious ones", as they put it) to 
which I will briefly turn in the following. Given the 
strangely offensive tone, it is with some reluctance that I 
respond to their comments. Yet, as things stand, a 
corrective to the picture of the reviewer (as someone who 
lacks even basic reading skills) that has been drawn by the 
authors of the Cambridge Grammar in the Book Discussion 
Forum is needed. However, the authors should not expect any 
further messages from me.

The following abbreviations will be used:
CamGr - The Cambridge Grammar of the English Language 
(Huddleston and Pullum 2002)
CGEL - A Comprehensive Grammar of the English Language 
(Quirk et al. 1985)
LGSWE - Longman Grammar of Spoken and Written English 
(Biber et al. 1999)
REV - Review of CamGr (by J. Mukherjee, LINGUIST 13.1853)
RESP - Authors' response (by R. Huddleston and Pullum, 
LINGUIST 13.1932)

(NB: It is unfortunate that Huddleston and Pullum seem to 
insist on using CGEL for the Cambridge Grammar, although 
this abbreviation has already been widely used for the 
Comprehensive Grammar.)

1. Binary branching

As Huddleston and Pullum themselves point out , "there will 
be relatively few readers who begin at the beginning and 
work their way through the chapters to the end" (CamGr, p. 
44). It stands to reason, then, that most readers are 
expected to read the first two chapters ("Preliminaries" 
and "Syntactic Overview") and proceed to those chapters and 
sections that are relevant to their individual needs. From 
a complementary perspective, it should be quite clear that 
all other chapters and sections will be read against the 
background of the basic concepts and general principles 
introduced at the beginning.

The two kinds of branching that are introduced and 
visualised in the introductory part, are binary branching 
and singulary branching (CamGr, p. 26). In RESP,
the authors introduce multiple branching and ternary 
branching and point to a specific coordination example 
(CamGr, p. 1279) in which multiple branching is visible. 
This had not escaped my notice. As a matter of fact, I did 
check all tree diagrams (by the way, diagram (24) is not on 
p. 1098, as listed on p. xiii of CamGr, but on p. 1089): a 
genuine ternary/multiple branching can only be found for 
coordination and ditransitive verbs. However, the authors 
give the impression in RESP that the CamGr would show no 
preference whatsoever for binary/singulary branching. But 
it is obvious to any reader that there is a clear focus 
throughout CamGr on syntactic analyses on the basis of 
binary/singulary branching. And this is a focus that cannot 
be found in CGEL. In criticising REV, the authors 
repeatedly seem to confuse general core principles and 
peripheral special cases. In fact, binary/singulary 
branching is presented at the beginning of CamGr as a 
general principle from which it would be necessary to 
deviate only in order to account for special phenomena. 
That coordination is a syntactic phenomenon of a peculiar 
type in this regard is mentioned by the authors themselves 
(CamGr, p. 66). By the way, the clear focus on - and 
preference for - binary/singulary branching is vindicated 
by the fact that the conceptual index of CamGr only 
includes entries for 'binary branching' and 'singulary 
branching', but not for 'multiple branching' or 'ternary 
branching' - terms that the authors introduce in RESP but 
that obviously play a peripheral role only in CamGr.

Similarly, the authors seem to resent my claim that CamGr -
unlike CGEL - is strongly influenced by generative 
concepts, although they themselves speak of "many insights 
from generative grammatical research" (RESP) that have been 
included in CamGr. Since a review, as I see it, is intended 
to provide potential readers with information on what to 
expect from the book at hand, it is thus necessary to point 
out that CamGr is influenced by generative grammar in 
general and favours analyses along the lines of 
binary/singulary branching in particular. On the whole, 
these are features that are typical of CamGr, although in 
comparatively few cases multiple branching may be adopted 
and particular generative concepts/analyses may not be 
taken over (as mentioned in REV). What is more, these 
features clearly distinguish CamGr from CGEL.

2. The subject-predicate division

Huddleston and Pullum are right in pointing out in RESP 
that the subject-predicate division, which is at the heart 
of CamGr, can also be found in CGEL. However, they ignore 
the fundamentally different extents to which this binary 
division is capitalised on in the two grammars. Firstly, it 
is at the basis of virtually all syntactic analyses in 
CamGr, while in CGEL it is drawn on in order to explain 
less than a handful of some particular phenomena, e.g. 
clause negation (CGEL, p. 1064ff.). In fact, what is much 
more important than the predicate in CGEL is the concept of 
predication, i.e. the predicate excluding the operator. 
Secondly, the predicate is regarded as the head of the 
clause in CamGr (p. 24), while this is not the case in 
CGEL. Thirdly, the term VP is used for the realisation of 
the predicate in its entirety in CamGr: predicate and VP 
are co-extensive. In CGEL, on the other hand, the verb 
phrase is not co-extensive with the predicate, because VP 
is the realisation of a more rigidly defined clause 
element, i.e. the verb without any other complements. It 
would be another question altogether, by the way, whether 
the subject-predicate distinction and the NP-VP distinction 
(which are conflated by Huddleston and Pullum in RESP) can 
both be traced back to traditional grammar or whether the 
traditional subject-predicate distinction was taken up by 
generativists in terms of NP-VP. The important point here 
is that the subject-predicate division is a cornerstone of 
CamGr, while it is not at all central to CGEL.

3. Multiple analyses

Huddleston and Pullum state that CamGr "does allow multiple 
analyses where appropriate" (RESP). They point to the 
construction "Bob is as generous as Sue" for which two 
analyses of the complement of "as" are offered (CamGr, p. 
1113ff.). (One piece of quisquilia should be mentioned: the 
construction "" is not listed in the lexical index 
of CamGr). It has not escaped my notice that there are also 
other examples of different analytical approaches to 
specific phenomena, e.g. the "dependent-auxiliary analysis" 
and the "catenative-auxiliary analysis" of core auxiliaries 
(CamGr, p. 1210ff.). In criticising REV, however, the 
authors seem to lose sight of the fact that multiple 
analyses play a fundamentally different role in CamGr and 
CGEL. Notwithstanding the few fields in which CamGr makes 
use of multiple analysis, the reader is never given the 
impression that alternative/competing/multiple analyses are 
a significant principle of CamGr. Again, it is telling that 
the conceptual index does neither list "multiple analysis" 
(which is not surprising since it is a term peculiar to 
CGEL) nor "alternative/competing analysis" (the terms that 
Huddleston and Pullum use when discussing the examples 
mentioned above). On the other hand, CGEL concludes its 
introductory second chapter with two sections on gradience 
as "a guiding principle" and multiple analysis as a window 
on grammar as an "indeterminate system" (CGEL, p. 90f.). 
While multiple analysis is central to CGEL, it is adopted 
in CamGr only if the authors see no compelling evidence in 
favour of one particular analysis (which is the exception). 
Thus, I stand by my line of argumentation in REV that the 
extents to which multiple analyses come into play in CamGr 
and CGEL are fundamentally different.

An aside: in this context, Huddleston and Pullum pick up on 
the example of "She - looked - after her son" vs. "She - 
looked after - her son" and state that "Mukherjee gives no 
reason for wanting to allow the second as well" (RESP). 
There are some good reasons, all of which are hinted at in 
CGEL (e.g. p. 1155f.): the prepositional verb which is at 
the basis of the second analysis is a semantic unit (one 
could hypothesise that it is also an acquisitional unit), 
it can usually be replaced by a one-place lexical verb, and 
there is a structural analogy that can be drawn between the 
prepositional verb and the object on the one hand and a 
non-prepositional verb and the object on the other hand. 
(Structural analogy, of course, is another key concept 
which distinguishes CGEL from CamGr, but this is another 
issue.) The point here is not that one particular analysis 
is inherently better (needless to say, there is evidence 
for and against either analysis); rather, it is one 
illustrative example of the fact that CamGr very often 
favours one particular analysis while CGEL does not.

4. Corpus use

Huddleston and Pullum's criticism of my remarks on corpus 
use in CamGr is unacceptable, because they quote two 
sentences in isolation. However, my line of argumentation 
is not captured by those two sentences alone.

To begin with, I explicitly listed all kinds of data that 
the CamGr is based on: "(1) their (i.e. the authors') own 
intuitions as native speakers; (2) other native speakers' 
intuitions; (3) computer corpora; (4) other (pre-corpus and 
corpus-based) dictionaries and grammars" (REV). (By the 
way, I gave the correct page number (p. 11) in REV, but 
mistakenly referred to the preface, although the 
information is given in Chapter 1.) With regard to (3), 
three one-million-word corpora are specified: Brown, LOB 
and ACE. If the authors used other corpora directly, they 
should have specified them. In RESP, they list other - 
certainly valuable - text databases that they had access 
to. However, none of them, in my view, is a corpus. As for 
the OED, the WWW and the collection of texts the authors 
had on computer, Huddleston and Pullum do not attempt to 
subsume them under the notion of 'corpus'. As for the 44 
million words of the Wall Street Journal (WSJ), they state 
that "Mukherjee maintains the peculiar view that WSJ is not 
a corpus at all" (RESP). For one, this is not a peculiar 
view of mine. The distinction between representative 
corpora and linguistically unstructured archives (such as 
the WSJ) can also be found, for example, in Leech (1991: 
11) and Kennedy (1998: 57). In a wider setting, it seems to 
me that 'corpus use' has become a buzzword, but it is often 
neglected that there is more to a corpus than the sheer 
amount of data it includes. (Of course, it remains a matter 
of dispute what exactly representativeness in corpus design 
means and, accordingly, what a corpus is. However, the 
authors should not dismiss the reviewer's view as exotic 
and untenable.) What is more, I did point out in general 
terms (contrary to what Huddleston and Pullum claim in 
RESP) why representativeness of the database is useful for 
a descriptive grammar (otherwise, "general trends in 
language cannot be extrapolated", REV). Furthermore, 
Huddleston and Pullum think that "Mukherjee may be 
confusing the purpose of a descriptive reference grammar 
with the aim of statistical studies of frequencies" (RESP). 
I am not. The simple fact is that frequency and grammar are 
inseparable, because, in a sense, there always is a 
frequency-based threshold level: not anything that appears 
in performance data can/should be included in a grammar, 
and the decision on what to include - picking up on Aarts' 
(1991) terminology - usually has to do with 'normalcy' and 
'frequency'. Why, for example, do the authors include 
specific words in the numerous wordlists they give (e.g. in 
the list of mandative verbs, adjectives, and nouns, CamGr, 
p. 999)? On a merely intuitive basis? And/or because these 
words are attested at least once in their database? And/or 
on grounds of frequency of occurrence? If occurrence and/or 
frequency in natural discourse are relevant, two questions 
arise if the grammatical description is to be testable: (1) 
Where do the data come from? (2) Where do the frequencies 
(of, say, relevant corpus-based resources) come from?

CamGr provides no answer to either of the questions. The 
reader does not know which of the examples are invented, 
edited or natural (nor, if they are authentic, where they 
come from). The authors seem to think that this is 
irrelevant anyway. On the other hand, I would contend that 
this kind of information is of great importance from an 
empirical point of view and not at all a lightweight 
matter. Of course, it is fair enough to draw on corpus-
based insights provided by dictionaries and gramars that 
are already available. (In fact, the authors fail to 
acknowledge the true nature of my criticism: I did not 
accuse them of having ignored corpus data. They simply do 
not go into details about the data resources and where they 
come into play in CamGr.) However, it is certainly 
unfortunate that the reader is never told which of the 
wordlists are taken over from, say, specific corpus-based 
grammars (and the corpora on which they are based). In this 
context, it is for example telling that the reader does not 
even know whether it is the first edition of the Collins 
COBUILD English Dictionary (Sinclair 1987, cf. CamGr, p. 
1765), based on 20 million words, or the second edition 
(Sinclair 1995, cf. CamGr, p. 1772), based on 200 million 
words, that has been used by the authors in the first 

Generally speaking, then, the authors and the reviewer 
disagree on two crucial points: the notion of corpus and 
the theoretical and methodological implications of the use 
of corpus data. In a sense, the different opinions 
culminate in the authors' description of LGSWE as a 
"corpus-restricted study" (RESP), whereas I prefer to 
regard it as a corpus-based grammar. Which brings me to the 
issue of extraposition/non-extraposition.

5. Extraposition

Huddleston and Pullum think that the reviewer is unable to 
acknowledge the distinction between canonical and non-
canonical structures in CamGr ("he apparently does not see 
its relevance here", RESP). This is not the case. In fact, 
I did not call into question that "non-extraposition is 
analytically more basic: it is syntactically simpler, and 
has a structure that is normally the only one available for 
NP subjects" (RESP). From a syntactic point of view, there 
is unanimous agreement on this description. And, indeed, 
one would not need corpus data for this conclusion.

The point is that whatever counts as basic is a matter of 
linguistic interpretation and, more important, of the 
criteria that are taken into consideration: basicness is 
not out there. In REV, I took the liberty of simply 
pointing to an alternative approach which could have been 
mentioned in CamGr. Frequency, for example, is a criterion 
that the authors do not take into account. Note that I am 
not talking about frequency for its own sake but as a 
quantitative signpost of something that is qualitative in 
nature (about which more later). From this quantitative-
qualitative perspective, it would indeed be "better to 
regard the extraposed form as the more basic form" (REV). 
In REV, there is a reference to the corpus-based findings 
that can be found in LGSWE.

Before coming to the LGSWE findings and their implications, 
it should be noted that Huddleston and Pullum think that 
there is no difference between CamGr and LGSWE in 
considering non-extraposition as the basic form. At first 
sight, this seems to be true. Interestingly enough, they 
only refer to sections 3.5 and 3.6 of LGSWE - sections in 
which frequency and distribution play a peripheral role: 
"This characteristic of the grammar (i.e. quantitative, 
empirical investigations) is less striking in Section B 
(Chapters 2 and 3), since the primary purpose of those 
chapters is to provide a descriptive framework of English 
word classes and grammatical structures" (LGSWE, p. 44). 
Corpus data and, more important, discussions of corpus-
based findings and their implications for grammatical 
description, are at the heart of the subsequent chapters. 
As for extraposition of to-clauses (the example mentioned 
in REV), sections 9.4.6 and 9.4.7 are of particular 
interest (LGSWE, pp. 722 ff.). The starting point here is 
that, firstly, non-extraposed to-clauses are less frequent 
than extraposed to-clauses in general and that, secondly, 
there is a difference between spoken and written genres. 
LGSWE gives several reasons for these findings, e.g. 
reasons of processability, different production constraints 
in spoken and written medium, and marked topicalisation by 
means of non-extraposition. In the light of the 
quantitative findings, LGSWE explicitly speaks of 
extraposition as "the unmarked choice" (LGSWE, p. 725), and 
this conclusion can be explained by factors such as the 
ones mentioned above. In this case, frequency is thus 
symptomatic of important discourse and processing factors. 
And from this perspective, one could easily argue that non-
extraposition is 'more basic'. Whether or not one prefers 
the syntactic approach outlined in CamGr or the frequency-
based approach sketched out in LGSWE, neither analysis 
invalidates the other one. There is, however, no use in 
ignoring the differences between CamGr and LGSWE when 
comparing the two grammars in their entirety. 

6. Number of figures, tables, diagrams etc.

Huddleston and Pullum take issue with my complaint 
about the "lack of graphical visualisation" (REV), i.e. the 
number of tables, figures and diagrams. 

However, in order to prove me wrong, they simply count 
trees in chapter 15 of CamGr and in chapter 13 of CGEL. 
The result is not at all surprising. They confine 
themselves to trees because "it is unclear where to draw 
the line between tables and mere columned displays" (RESP). 
This is certainly true. But may I add that this very line 
could be much more easily drawn (say, for counting 
purposes) if the tables, diagrams and figures had been 
numbered separately in CamGr (as it is done, with some 
inconsistencies, in CGEL). Be this as it may, the fact is 
that my criticism, put forward in REV, was not about the 
number of tree diagrams in CamGr. Also, the authors give no 
reason why "they believe these chapters are representative" 

In preparing REV, I conducted a more refined counting of 
tables/table-like displays and diagrams/figures (incl. tree 
diagrams) in chapter 3 of CamGr ("The verb", 141 pages) and 
in chapters 3/4 of CGEL ("Verbs and auxiliaries"/"The 
semantics of the verb phrase", 154 pages). They are of 
comparable size, and I still think that the description of 
the verb is much more central to grammar than a more 
specialised chapter (say, on coordination) and thus more 
'representative' of the grammars at hand.

In CGEL, there are 29 tables (3.2, 3.5a, 3.5b, 3.5c, 3.12, 
3.13, 3.14, 3.15, 3.16, 3.17, 3.18, 3.19, 3.20, 3.32, 3.33, 
3.36, 3.39, 3.40a, 3.40b, 3.42, 3.52, 3.56a, 3.56b, 3.64, 
4.17, 4.28, 4.30, 4.33, 4.66) and 15 figures (3.21, 3.55, 
3.65, 4.2a, 4.2b, 4.7, 4.14, 4.18, 4.19, 4.20, 4.24a, 
4.24b, 4.24c, 4.27, 4.51).

In CamGr, I took into account the following 14 tables (as 
stated above, the numbering is not reader-friendly): in 
section 3.1 no. 1, 2, 3, fn 1, 35; in section 3.2 no. 17, 
43, 48, 49; in section 3.3 no. 1; in section 3.4 no. 2, 6; 
in section 3.7 no. 2, 3. Additionally, there are 2 figures: 
in section 3.3 no. 6; in section 3.5 no. 4. (There are some 
other borderline cases, but I would contend that as soon as 
one starts to discuss whether it is a table/figure or not, 
it is certainly not as clear-cut a visual aid as all the 
above-mentioned tables and figures in CGEL and CamGr which 
everyone would readily regard as tables and figures).

The quantitative differences between CamGr and CGEL in this 
sample analysis are statistically significant at a one 
percent level. 

Final remarks

It is a pity that the authors' response to REV and my 
response to RESP have entirely focused on those aspects of 
CamGr that the reviewer does not find convincing. It is, 
thus, more than appropriate to summarise the positive 
aspects that were mentioned in REV:

- comprehensiveness: breadth and depth of coverage
- systematisation of previous linguistic research
- a reference grammar with many new/promising concepts
- many examples of innovative terminology
- many wordlists
- in-depth treatment of morphology and word-formation
- well-structured

What is perhaps most important in the light of REV, RESP 
and the present response is the fact that CamGr is an 
unprecedented reference work that is different from all 
other standard grammars of the English language. Therefore, 
it is beyond reasonable doubt that many linguists will 
agree with the authors that it "bridge(s) the large gap 
between traditional grammar and the partial descriptions of 
English grammar proposed by those working in the field of 
linguistics" (CamGr, p. xv). Let me thus emphasise once 
again that there are very good reasons why CamGr "is 
without any doubt a reference work that should be available 
to all grammarians" (REV). For some, it will certainly turn 
out to be the preferred choice, for others it will not. And 
many will use CamGr and other reference grammars side by 
side - in general or for particular purposes.

I received quite a few replies to REV from people who have 
already worked with CamGr. The versatility of the feedback 
- ranging from "a great contribution" to "false claims" -
makes it clear that in grammar, too, beauty is in the eye 
of the beholder.


Aarts, Jan (1991): "Intuition-based and observation-based 
grammars", Svartvik, ed. Karin Aijmer and Bengt Altenberg. 
London: Longman. 44-62.

Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan 
Conrad and Edward Finegan (1999): Longman Grammar of Spoken 
and Written English. Harlow: Pearson Education. (LGSWE)

Huddleston, Rodney and Geoffrey K. Pullum (2002): The 
Cambridge Grammar of the English Language. Cambridge: 
Cambridge University Press. (CamBr)

Kennedy, Graeme (1998): An Introduction to Corpus 
Linguistics. London: Longman.

Leech, Geoffrey (1991): "The state of the art in corpus 
linguistics", English Corpus Linguistics: Studies in Honour 
of Jan Svartvik, ed. Karin Aijmer and Bengt Altenberg. 
London: Longman. 8-29.

Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech and Jan 
Svartvik (1985: A Comprehensive Grammar of the English 
Language. London: Longman. (CGEL)

Sinclair, John (ed.) (1987): Collins COBUILD English 
Language Dictionary. London: Collins.

Sinclair, John (ed.) (1995): Collins COBUILD English 
Dictionary. London: HarperCollins. 
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue