Publishing Partner: Cambridge University Press CUP Extra Publisher Login

Discussion Details

Title: Re: Remarks by Noam Chomsky in London
Submitter: Mark Brenchley
Description: It seems to us that there is one aspect of Noam Chomsky's talk that
really stands out (and this includes the papers Geoffrey Pullum
mentions; namely, Chomsky 2011 and Berwick et al. [BPYC] 2011):
scholars need to stop and reflect upon what they are doing.

(1) This is no more true than in the case of whether machine learning is
relevant (and if so, how relevant) for linguistic theory in general and
language acquisition in particular. Whilst it is true that the field of
mathematical linguistics has yielded many interesting results (some of
which were initiated by Chomsky himself), Chomsky has been rather
adamant regarding their limited relevance to the study of language qua
biological system. This is undoubtedly true, in our opinion.

When someone states that language is mildy context-sensitive, surely
they do not mean it in a literal sense (how could they?). Rather, what
scholars really mean by statements like these is that the expressive
power of language, when the latter is described in terms of strings of
symbols that stand for terminals and non-terminals, is mildly context
sensitive (that is, generable by the right collection of rewriting rules); a
slightly different matter. Thus, even if an 'efficient, correct' algorithm
(to reference Clark, 2011; cited by Pullum in his discussion piece) is
shown to successfully acquire multiple context-free grammars, this is
not ipso facto a demonstration that is directly relatable to the
acquisition of natural language.

As many authors have pointed out before, the expressive power of a
(formal) language and its place within the so-called Chomsky Hierarchy
constitute a fact about what has come to be known as 'weak
generativity' (i.e. string-generation), but what the linguist ought to be
studying is the generation and conceptualization of structure (i.e.,
strong generativity). Consequently, whilst it may be true that Chomsky
misunderstood/misheard Clark's question, Clark misses the point that
we ought to be interested in strong generativity, and not on the weak
equivalence between strings of symbols and the structures they
supposedly stand for.

We are certain that both Pullum and Clark are aware of this, but some
of their publications appear to show the suspension (temporary, we
hope) of belief in these facts. In Rogers & Pullum (2011), we find a very
careful analysis of the different grammars and languages of the
Chomsky Hierarchy, but there is much at fault when these authors seek
to identify the 'psychological correlates' that would show, in an
experimental setting, what system subjects are employing/have
internalized. The supposed connection between these cognitive
abilities (e.g. the ability to recognize that every A is immediately
followed by a B versus the ability to detect that at least one B was
present somewhere) and the expressive power of an underlying
grammar tells us very little indeed about mental properties and
principles. Plausibly, the psychological correlates they list are the result
of hierarchical mechanisms that operate over hierarchical (mental)
representations, and the cognitive science literature contains myriad
examples of theories that explicitly make use of these two components.
Miller et al.'s (1960) TOTE units, or those studies that focus on Control
operations (such as Simon 1962, Newell 1980 or Pylyshyn 1984) are
some of the clearest examples we can think of. Crucially, these
complex systems bear no relation whatsoever to formal grammars or
languages. Much like in natural language, the key notion here is
structure (incidentally, Miller & Chomsky 1963 already pointed to the
analogy between TOTE units and the syntactic trees linguists
postulated for sentences, something they did not consider

In a way, computational linguists are hostage to the fact that strong
generativity has so far resisted formalization and that, therefore, their
results do not appear to be directly relatable to the careful descriptions
and explanations linguists propose; a fortiori, their formulae do not tell
us much about the psychological facts of human cognition. In our
opinion, then, Chomsky's analysis does not show an 'extremely shallow
acquaintance' with computational models, but a principled opposition to
them because of what these models assume and attempt to show.

(2) We also take issue with Pullum’s comment that the aforementioned
papers ‘share a steadfast refusal to engage with anything that might
make the debate about the poverty of the stimulus (POS) an empirical
one.’ We think this is both false and not a little unfair.

It is true, of course, that Chomsky seems to have little interest in what
we might call empirical “number crunching” with respect to POS (e.g.
quantifying the actual syntactic patterns in the child’s environmental
input and relating these quantifications to the actual frequencies of
equivalent patterns within the child’s developing output). However, the
fact that he himself has not undertaken such research is entirely
orthogonal to the claim that he has not provided empirical grounds for
debating the POS. On the contrary, the last fifty-plus years have seen
Chomsky build up a substantial body of actual natural language
analysis. And it is this analysis which we would argue constitutes a
clear empirical contribution to POS arguments.

In particular, it seems to us that what Chomsky’s work does (or, at
least, looks to do) is provide an explication which is grounded in the
study of natural language syntax, thereby attempting to establish the
nature of human syntactic knowledge. As such, it necessarily
establishes a framework within which all learning models must operate,
defining the particular target structures that these models are to
converge on. So, for example, whatever learning model/algorithm is
eventually worked out - a task we believe to be both important and
non-trivial - it must account for the fact that languages are hierarchical
in structure; for it indeed seems to be an empirical fact that human
languages have such structure (unlike, say, the linear strings of formal
language theory; see BPYC for evidence to this effect). If a proposed
general learning model does not produce such structures, it
necessarily fails to provide a viable account of language acquisition,
and does so precisely because it fails to match the empirically
established account of natural language structure.

And, indeed, if you listen to the talk, this seems to be precisely the
grounds on which Chomsky criticizes the computational cognitive
science research literature raised in the Q+A session. So, when he
criticizes the Perfors article in the talk, he does so because the
researchers’ specific approach simply fails to capture the syntactic
knowledge that (some) linguistic theory has not only argued for, but
argued for through detailed empirical analyses of natural language.
Hence, perforce, their work fails outright to constitute an adequate
rebuttal to POS (UCL video, 65:00; see also the relevant section in

A similar point applies to his comments regarding Clark’s question (or,
rather, what he takes to be Clark’s question; not at all, as Pullum points
out, the same thing). That is, Chomsky seems to argue against it (past
it?) because the approach does not provide a realistic model of human
syntactic knowledge. And the approach is not realistic because it
doesn’t stand up to (what he believes to be) the independent, viable
and empirically established account of what this knowledge consists of
(UCL video, 69:00; see Chomsky 2011 and BPYC for a brief
recapitulation of certain pertinent features of this account). Hence, it
couldn’t possibly constitute a genuine POS counterargument.

The basic schema of the argument would, therefore, seem to be
something like this: (1) As linguists, we are interested in the nature of
human linguistic knowledge. (2) Our analyses of actual natural
language syntax lead us to believe certain facts to be true of this
knowledge (e.g. structure dependent movement), which we account for
in a certain way (e.g. Merge). (3) The computational cognitive science
literature has so far failed to provide domain-general learning models
that adequately capture these facts about human language. (4)
Therefore, they do not constitute POS counterarguments.

Now, whilst this may of course turn out to be a bad argument, perhaps
even a terrible one, it is prima facie one that looks to ground itself in
empirically-derived content; content that Chomsky has surely been
instrumental in contributing to.

Mark Brenchley
David J. Lobina

Berwick, R. C., Pietroski, P., Yankama, B., & Chomsky, N. (2011)
Cognitive Science, 35, 1207-1242.

Chomsky, N. (2011) Language and other cognitive systems. What is
special about language? Language Learning and Development, 7,

Miller, G. A. & Chomsky, N. (1963) Finitary models of language users.
Handbook of Mathematical Psychology, vol. 2, John Wiley and sons,
Inc. 419-492.

Miller, G. A.; Galanter, E. & Pribram, K. H. 1960. Plans and the
Structure of Behaviour. Holt, Rinehart and Winston, Inc.

Newell, A. 1980. Physical symbol systems. Cognitive Science, 4, 135-

Pylyshyn, Z. 1984. Computation and Cognition. The MIT Press.

Rogers, J. & Pullum, G. K. 2011. Aural Pattern Recognition
Experiments and the Subregular Hierarchy. Journal of Logic, Language
and Information, 20, 329-42.

Simon, H. 1962. The architecture of complexity. Proceedings of the
American Philosophical Society, 106, 467-82.
Date Posted: 21-Nov-2011
Linguistic Field(s): Computational Linguistics
Cognitive Science
Language Acquisition
Discipline of Linguistics
LL Issue: 22.4650
Posted: 21-Nov-2011

Search Again

Back to Discussions Index