Review of Linguistic Nativism and the Poverty of the Stimulus
|AUTHORS: Clark, Alexander and Lappin, Shalom
TITLE: Linguistic Nativism and the Poverty of the Stimulus
Nick Moore, Khalifa University, United Arab Emirates
“Linguistic Nativism and the Poverty of the Stimulus” tackles a key issue in
linguistics over 260 pages. The book is intended for a general linguistics
audience, but the reader needs some familiarity with basic concepts in formal
linguistics, at least an elementary understanding of computational linguistics,
and enough statistical, programming or mathematical knowledge not to shy away
from algorithms. The book will most benefit linguists working in formal grammar,
computational linguistics and linguistic theory.
The main aim is to replace the view that humans have an innate bias towards
learning language that is specific to language with the view that an innate bias
towards language acquisition depends on abilities that are used in other domains
of learning. The first view is characterised as the argument for a strong bias,
or linguistic nativism, while the second view is characterised as a weak bias or
domain-general view. The principle line of argument is that computational,
statistical and machine-learning methods demonstrate superior success in
modeling, describing and explaining language acquisition, especially when
compared to studies from formal linguistic models based on strong bias arguments.
Chapter 1 establishes the boundaries of the discussion for the book. The
authors’ main focus is on a viable computationally-explicit model of language
acquisition. Clark and Lappin (C&L) do not dismiss nativism entirely. They point
out that it is fairly uncontroversial, first, that humans alone acquire
language, and, second, that the environment plays a significant role in
determining the language and the level of acquisition. What they intend to
establish in the book, however, is that any innate cognitive faculties employed
in the acquisition of language are not specific to language, as suggested by
Chomsky and the Universal Grammar (UG) framework, but are general cognitive
abilities that are employed in other learning tasks. It is from this ‘weak bias’
angle that they critique the Argument from the Poverty of Stimulus (APS).
The APS is considered central to the nativist position because it provides an
explanation for Chomsky’s core assumptions for UG: (1) grammar is rich and
abstract; (2) data-driven learning is inadequate; (3) children are exposed to
qualitatively and quantitatively degenerate linguistic data; and (4) the
acquired grammar for a given language is uniform despite variations in
intelligence. These assumptions are dealt with in chapter 2 and throughout the
book and replaced with assumptions from a machine-learning, ‘weak bias’
perspective, although the reader is referred to other sources to counter
assumption (3). C&L point out that arguments against connectionist or
frequency-based theories do not prove the UG assumption of linguistic nativism
correct -- strong bias remains unproven. C&L claim that many strong bias
arguments become self-serving. C&L preview machine-learning alternatives to
linguistic nativism based on distributional, or relative frequency, criteria and
a Bayesian statistical model to account for these same learning phenomena.
Chapter 3 examines the extent to which the stimulus really is impoverished.
While C&L encourage the reader to examine empirical evidence provided by corpora
(e.g. MacWhinney, 2004), they focus on the role of negative evidence in the
learning process because a small amount of negative evidence can make a
significant difference to language acquisition. Indirect negative evidence, such
as the non-existence of hypothesised structures, can also significantly alter
the learning task. C&L challenge the premise of no negative evidence partly
because it is a central tenet of Gold’s (1967) “Identification In the Limit”
(IIL). Gold’s highly-influential study argues that because learning cannot
succeed within the cognitive and developmental limits imposed, then children
must have prior knowledge of the language system. However, C&L point out that
this view is partly a consequence of ignoring non-syntactic information in the
learning environment; ignoring semantic, pragmatic and prosodic information
produces a circular argument about the centrality of syntax.
Clark and Lappin are keen to point out that demonstrating the tractability of
the learning problem through viable formal computational descriptions does not
equate to modelling the actual neurological or psychological processes that may
be employed in language acquisition. In Chapter 4, C&L reject a number of key
assumptions in Gold’s IIL model. They do not allow for the unnatural condition
of the malevolent presentation of data -- intentionally withholding crucial
evidence samples and offering evidence in a sequence detrimental to learning.
They reject Gold’s lack of time limitation placed on learning. They reject the
impossibility of learners querying the grammaticality of a string. They reject
the view that learning is through positive evidence only. Most significantly,
C&L reject limiting the hypothesis space available to the learner. Rather, they
insist that while the learnable class of a language is limited, the learner is
free to form unlimited hypotheses.
C&L introduce their machine-learning approach in chapter 5, “Probabilistic
Learning Theory”. The initial step in the weak bias argument is to replace a
convergent grammar defined categorically (a string is DEFINITELY correct) with a
probabilistic definition (a string is PROBABLY correct). C&L again object to the
simplistic lines of argument employed by Chomsky and his followers in their
rejection of statistically-based models of learning. While it may be true that
the primitive statistical models critiqued by Chomsky are incapable of producing
a satisfactory distinction between grammatical and ungrammatical strings, this
does not prove that all statistical methods are inferior to UG descriptions. C&L
introduce a range of statistical methods that they propose can better represent
the nature of language acquisition. Central to a plausible probabilistic
approach to modelling language is the distinction between simple measures of
frequency and distributional measures. C&L propose that learners hypothesise the
likelihood of a sequence, based on observations, in order to converge on the
most likely grammar.
Replacing Gold’s paradigm with three key assumptions (1. language data is
presented unlabelled; 2. the data includes ungrammatical sentences; and 3.
efficient learning takes place despite negative examples) allows C&L to
introduce the Disjoint Distribution Assumption in chapter 6. This probabilistic
algorithm depends on the observed distribution of segmented strings and on the
principle of converging on a probabilistic grammar. Distributional measures
ensure that the probability of any utterance will never be zero, allowing for
errors in the presented language data. This model predicts over-generalisation
and under-generalisation in the learner’s output because of the unlimited
hypothesis space. The addition of word class distributional data also ensures
greater reliability of judging the probability of a string being grammatical.
A major aim of this book is to provide a formal account of the language learning
puzzle that will make the acquisition problem tractable. C&L contend that UG
theories have made the wrong assumptions in relation to the learning task and
the learning conditions, and in chapter 7, they set out the assumptions that
allow efficient learning with a weak bias. They advocate algorithms that can
simulate learning under natural conditions. They assume that the input data is
not homogenous -- some language items are ‘more learnable’ than others, and more
complex learning tasks can be deferred according to a Fixed Parameter
Tractability algorithm. Ultimately, C&L argue that complex grammatical problems
are no better solved by a UG Principles and Parameters approach. When UG
theories use the strong bias position as the only argument to deal with
complexity, they have not solved the problem posed by a seemingly intractable
If we are to reject the presumption of the strong bias in linguistic nativism,
we need to be confident that its replacement can produce reliable results. The
proposed algorithms in chapter 8 start to provide those results. The process of
hypothesis generation in Gold’s IIL is described as being close to random, and
consequently “hopelessly inefficient” (p.153). Various replacements that have
been tested include (Non-)Deterministic Finite State Automata algorithms which
have proved effective in restricted language learning contexts. Simple
distributional and statistical learning algorithms offer promising results, but
must be adapted to also simulate grammar deduction.
Objections to distributional models are countered in chapter 9 by presenting the
results of real algorithms working on real data. Typically the algorithm
performs a parsing, tagging, segmenting or similar task on a corpus, and the
results are measured against a previously-annotated version of the corpus -- a
‘gold standard.’ Corpora in these experiments tend to be samples of English
intended for adults, such as the Penn Treebank (Marcus, Marcinkiewicz and
Santorini, 1993). Learning algorithms can be divided into “supervised” --
requiring the corpus to carry some form of information such as part of speech
tagging -- and “unsupervised” -- working on a ‘bare’ corpus of language.
Supervised learning algorithms, such as the Maximum Likelihood Estimate, match
the ‘gold standard’ in about 88-91% of cases. More surprising, perhaps, are the
high success rates of unsupervised learning algorithms in word segmentation, in
learning word classes and morphology, and in parsing.
Chapter 10 examines ‘Principles and Parameters’ and other UG models. C&L claim
that the strong bias in this UG model would require an innate phonological
segmenter and part of speech tagger, and that by limiting the hypothesis and
parameter space available to the learner, the language learning task becomes
intractable, particularly as the highly abstract nature of UG parameters appear
to have very little direct relationship to the primary language data. C&L lament
the paucity of theoretical and experimental evidence for the Principles and
Parameters (P&P) framework. Even more worrying for UG theories is the
near-indifference to questions of acquisition in Minimalist Program research,
the latest version of UG. In place of UG models, C&L offer “Probabilistic
Context-Free Grammars” and “Hierarchical Bayesian Models” to account for
language acquisition through comprehensive descriptions and high levels of
success in simulations. In chapter 11, “A Brief Look at Some Biological and
Psychological Evidence” C&L quickly review accounts of language learning that
support a weak bias. Even where evidence from genetic, biological or
psychological studies has been used to support a strong bias, C&L are able to
show that this evidence does not necessarily favour nativist arguments.
In the concluding chapter, C&L review their evidence against the argument from
the poverty of the stimulus. They argue for a weak innate bias towards learning
language based on general-domain learning abilities. They point out that the UG
framework has produced few concrete experiments or cases that “produce an
explicit, computationally viable, and psychologically credible account of
language acquisition” (p.214). They have attempted to introduce explicit, formal
computational models of learning that have produced a credible account of
learning. Although they are far from perfect and much work needs to be done,
computational models have already provided a more adequate account than the UG
models: “We hope that if this book establishes only one claim it is to finally
expose, as without merit, the claims that Gold’s negative results motivate
linguistic nativism. IIL is not an appropriate model” (p.215). Instead C&L
advocate the use of domain-general statistical models of language acquisition.
In 12 chapters, Clark and Lappin use “Linguistic Nativism and the Poverty of
Stimulus” to evaluate a key concept in modern linguistics, taking a clearly
computational perspective and examining a wide variety of topics. Limitations of
space have led me to simplify or ignore a number of arguments presented in this
book, and skim over presentations of learning algorithms. I would suggest
however, from this reviewer’s point of view, that C&L present a very cogent and
There are so many sides from which to attack linguistic nativism, and the
argument from the poverty of stimulus in particular. Opponents have argued that
most UG theories are unfalsifiable (e.g. Pullum and Scholz, 2002), that corpora
designed to reflect children’s exposure to language demonstrate that the
stimulus is not impoverished (e.g. MacWhinney, 2004), and that it is absurd to
posit the notion that the brain adapted to language, as if language exists in
the environment prior to man, rather than language adapting to the general
abilities of the brain (Deacon, 1998; Christiansen and Chater, 2008). These
arguments, alongside alternatives to linguistic nativism from functional
linguistics (e.g. Halliday, 1975; 1993; Halliday and Matthiessen, 1999), are
often dismissed as irrelevant to the theory of UG. What sets Clark and Lappin’s
book apart, and why it must be taken seriously by everyone who proposes some
form of linguistic nativism to explain language acquisition and typology, is
that it attacks from within. It claims the very ground claimed by theories of
UG. UG attempts to formally and explicitly account for the apparent mismatch
between the complexity of the language learning task and the near-universal
success of humans in achieving it with such apparently meagre resources. The
methods proposed by Clark and Lappin identify what methods could be applied to
make the complex task tractable. Specifically, these methods are not restricted
to language, but are generally useful learning methods -- they are
domain-general. If there is one criticism I would make of Clark and Lappin’s
argument it is that they do not demonstrate clearly enough how likely it is that
we all use the domain-general learning methods that they propose. For instance,
we are left to presume that Probabilistic Approximately Correct and
Probabilistic Context-Free Grammar algorithms represent general, non-language
specific, models of learning.
I fear that we may be fooled by the apparent sophistication of the tools at our
disposal. We need to remember that computational tools may force us to see a
phenomenon as the tool understands it when that phenomenon is more complex than
computers. It seems more than coincidental that a computational approach to
language acquisition mirrors findings of corpus linguistics; for instance, that
language can be viewed as inherently statistically structured. That it can be
analysed in this way, or in the form of tree diagrams, does not prove that this
is how humans learn language. Fortunately, Clark and Lappin are well aware of
this trap and frequently warn readers that their computational theories aim to
demonstrate what is possible, not what really happens in the human mind. Until
we better understand exactly what neurological processes are actually involved
in language acquisition, our task is to try to represent acquisition as best we
can. In this endeavour, we have been expertly assisted by Clark and Lappin’s book.
“Linguistic Nativism and the Poverty of the Stimulus” is a challenging book. It
challenges the reader to deal with a range of linguistic, philosophical,
mathematical and computational issues, and to remember a dizzying array of
acronyms and abbreviations (including APS, CFG, DDA, EFS, GB, HBM, IIL, MP,
PCFG, PLD and UG). Most of all, it challenges basic concepts in mainstream
linguistics. It rejects key tenets of UG in the light of advances in machine
learning theory, and research in the computational modelling of the language
acquisition process. It exposes so-called proofs supporting the poverty of
stimulus, and reveals alternatives that are formally more comprehensive than the
explanations previously provided by UG theories, and empirically more likely to
match natural language acquisition processes.
Christiansen, Morten H. and Chater, Nick. 2008. Language as Shaped by the Brain.
Behavioural and Brain Sciences 31. pp.489-558.
Deacon, Terrence W. 1998. The Symbolic Species: The Co-Evolution of Language and
the Brain. New York: W.W. Norton.
Gold, E.M. 1967. Language identiﬁcation in the limit. Information and Control 10/5,
Halliday, Michael A.K. 1975. Learning How to Mean: Explorations in the
Development of Language. London: Edward Arnold.
Halliday, Michael A.K. 1993. Towards a language-based theory of learning.
Linguistics and Education 5 pp.93-116.
Halliday, Michael A.K. & Matthiessen, Christian M.I.M. 1999. Construing
Experience Through Meaning. London: Continuum.
MacWhinney, Brian. 2004. A Multiple Solution to the Logical Problem of Language
Acquisition. Journal of Child Language 31, pp. 883-914.
Marcus, Mitchell P., Marcinkiewicz, Mary Ann and Santorini, Beatrice. 1993.
Building a Large Annotated Corpus of English: The Penn Treebank. Computational
Pullum, Geoffrey K. and Scholz, Barbara C. 2002. Empirical Assessment of
Stimulus Poverty Arguments. The Linguistic Review 19, pp.9-50.
ABOUT THE REVIEWER
| ABOUT THE REVIEWER:
Nick Moore has worked in Brazil, Oman, Turkey, UAE and UK with students and
teachers of English as a foreign language, English for specific and
academic purposes, and linguistics. His PhD in applied linguistics from the
University of Liverpool addressed information structure in written English.
Other research interests include systemic functional linguistics, corpus
linguistics, theories of embodiment, lexis and skills in language teaching,
and reading programmes. Dr. Moore is the co-editor of 'READ' and he
currently coordinates and teaches undergraduate courses in English
composition and technical writing, as well as an introductory linguistics
course, at Khalifa University.