AUTHORS: Clark, Alexander and Lappin, Shalom TITLE: Linguistic Nativism and the Poverty of the Stimulus PUBLISHER: Wiley-Blackwell YEAR: 2011
Nick Moore, Khalifa University, United Arab Emirates
INTRODUCTION “Linguistic Nativism and the Poverty of the Stimulus” tackles a key issue in linguistics over 260 pages. The book is intended for a general linguistics audience, but the reader needs some familiarity with basic concepts in formal linguistics, at least an elementary understanding of computational linguistics, and enough statistical, programming or mathematical knowledge not to shy away from algorithms. The book will most benefit linguists working in formal grammar, computational linguistics and linguistic theory.
The main aim is to replace the view that humans have an innate bias towards learning language that is specific to language with the view that an innate bias towards language acquisition depends on abilities that are used in other domains of learning. The first view is characterised as the argument for a strong bias, or linguistic nativism, while the second view is characterised as a weak bias or domain-general view. The principle line of argument is that computational, statistical and machine-learning methods demonstrate superior success in modeling, describing and explaining language acquisition, especially when compared to studies from formal linguistic models based on strong bias arguments.
SUMMARY Chapter 1 establishes the boundaries of the discussion for the book. The authors’ main focus is on a viable computationally-explicit model of language acquisition. Clark and Lappin (C&L) do not dismiss nativism entirely. They point out that it is fairly uncontroversial, first, that humans alone acquire language, and, second, that the environment plays a significant role in determining the language and the level of acquisition. What they intend to establish in the book, however, is that any innate cognitive faculties employed in the acquisition of language are not specific to language, as suggested by Chomsky and the Universal Grammar (UG) framework, but are general cognitive abilities that are employed in other learning tasks. It is from this ‘weak bias’ angle that they critique the Argument from the Poverty of Stimulus (APS).
The APS is considered central to the nativist position because it provides an explanation for Chomsky’s core assumptions for UG: (1) grammar is rich and abstract; (2) data-driven learning is inadequate; (3) children are exposed to qualitatively and quantitatively degenerate linguistic data; and (4) the acquired grammar for a given language is uniform despite variations in intelligence. These assumptions are dealt with in chapter 2 and throughout the book and replaced with assumptions from a machine-learning, ‘weak bias’ perspective, although the reader is referred to other sources to counter assumption (3). C&L point out that arguments against connectionist or frequency-based theories do not prove the UG assumption of linguistic nativism correct -- strong bias remains unproven. C&L claim that many strong bias arguments become self-serving. C&L preview machine-learning alternatives to linguistic nativism based on distributional, or relative frequency, criteria and a Bayesian statistical model to account for these same learning phenomena.
Chapter 3 examines the extent to which the stimulus really is impoverished. While C&L encourage the reader to examine empirical evidence provided by corpora (e.g. MacWhinney, 2004), they focus on the role of negative evidence in the learning process because a small amount of negative evidence can make a significant difference to language acquisition. Indirect negative evidence, such as the non-existence of hypothesised structures, can also significantly alter the learning task. C&L challenge the premise of no negative evidence partly because it is a central tenet of Gold’s (1967) “Identification In the Limit” (IIL). Gold’s highly-influential study argues that because learning cannot succeed within the cognitive and developmental limits imposed, then children must have prior knowledge of the language system. However, C&L point out that this view is partly a consequence of ignoring non-syntactic information in the learning environment; ignoring semantic, pragmatic and prosodic information produces a circular argument about the centrality of syntax.
Clark and Lappin are keen to point out that demonstrating the tractability of the learning problem through viable formal computational descriptions does not equate to modelling the actual neurological or psychological processes that may be employed in language acquisition. In Chapter 4, C&L reject a number of key assumptions in Gold’s IIL model. They do not allow for the unnatural condition of the malevolent presentation of data -- intentionally withholding crucial evidence samples and offering evidence in a sequence detrimental to learning. They reject Gold’s lack of time limitation placed on learning. They reject the impossibility of learners querying the grammaticality of a string. They reject the view that learning is through positive evidence only. Most significantly, C&L reject limiting the hypothesis space available to the learner. Rather, they insist that while the learnable class of a language is limited, the learner is free to form unlimited hypotheses.
C&L introduce their machine-learning approach in chapter 5, “Probabilistic Learning Theory”. The initial step in the weak bias argument is to replace a convergent grammar defined categorically (a string is DEFINITELY correct) with a probabilistic definition (a string is PROBABLY correct). C&L again object to the simplistic lines of argument employed by Chomsky and his followers in their rejection of statistically-based models of learning. While it may be true that the primitive statistical models critiqued by Chomsky are incapable of producing a satisfactory distinction between grammatical and ungrammatical strings, this does not prove that all statistical methods are inferior to UG descriptions. C&L introduce a range of statistical methods that they propose can better represent the nature of language acquisition. Central to a plausible probabilistic approach to modelling language is the distinction between simple measures of frequency and distributional measures. C&L propose that learners hypothesise the likelihood of a sequence, based on observations, in order to converge on the most likely grammar.
Replacing Gold’s paradigm with three key assumptions (1. language data is presented unlabelled; 2. the data includes ungrammatical sentences; and 3. efficient learning takes place despite negative examples) allows C&L to introduce the Disjoint Distribution Assumption in chapter 6. This probabilistic algorithm depends on the observed distribution of segmented strings and on the principle of converging on a probabilistic grammar. Distributional measures ensure that the probability of any utterance will never be zero, allowing for errors in the presented language data. This model predicts over-generalisation and under-generalisation in the learner’s output because of the unlimited hypothesis space. The addition of word class distributional data also ensures greater reliability of judging the probability of a string being grammatical.
A major aim of this book is to provide a formal account of the language learning puzzle that will make the acquisition problem tractable. C&L contend that UG theories have made the wrong assumptions in relation to the learning task and the learning conditions, and in chapter 7, they set out the assumptions that allow efficient learning with a weak bias. They advocate algorithms that can simulate learning under natural conditions. They assume that the input data is not homogenous -- some language items are ‘more learnable’ than others, and more complex learning tasks can be deferred according to a Fixed Parameter Tractability algorithm. Ultimately, C&L argue that complex grammatical problems are no better solved by a UG Principles and Parameters approach. When UG theories use the strong bias position as the only argument to deal with complexity, they have not solved the problem posed by a seemingly intractable learning task.
If we are to reject the presumption of the strong bias in linguistic nativism, we need to be confident that its replacement can produce reliable results. The proposed algorithms in chapter 8 start to provide those results. The process of hypothesis generation in Gold’s IIL is described as being close to random, and consequently “hopelessly inefficient” (p.153). Various replacements that have been tested include (Non-)Deterministic Finite State Automata algorithms which have proved effective in restricted language learning contexts. Simple distributional and statistical learning algorithms offer promising results, but must be adapted to also simulate grammar deduction.
Objections to distributional models are countered in chapter 9 by presenting the results of real algorithms working on real data. Typically the algorithm performs a parsing, tagging, segmenting or similar task on a corpus, and the results are measured against a previously-annotated version of the corpus -- a ‘gold standard.’ Corpora in these experiments tend to be samples of English intended for adults, such as the Penn Treebank (Marcus, Marcinkiewicz and Santorini, 1993). Learning algorithms can be divided into “supervised” -- requiring the corpus to carry some form of information such as part of speech tagging -- and “unsupervised” -- working on a ‘bare’ corpus of language. Supervised learning algorithms, such as the Maximum Likelihood Estimate, match the ‘gold standard’ in about 88-91% of cases. More surprising, perhaps, are the high success rates of unsupervised learning algorithms in word segmentation, in learning word classes and morphology, and in parsing.
Chapter 10 examines ‘Principles and Parameters’ and other UG models. C&L claim that the strong bias in this UG model would require an innate phonological segmenter and part of speech tagger, and that by limiting the hypothesis and parameter space available to the learner, the language learning task becomes intractable, particularly as the highly abstract nature of UG parameters appear to have very little direct relationship to the primary language data. C&L lament the paucity of theoretical and experimental evidence for the Principles and Parameters (P&P) framework. Even more worrying for UG theories is the near-indifference to questions of acquisition in Minimalist Program research, the latest version of UG. In place of UG models, C&L offer “Probabilistic Context-Free Grammars” and “Hierarchical Bayesian Models” to account for language acquisition through comprehensive descriptions and high levels of success in simulations. In chapter 11, “A Brief Look at Some Biological and Psychological Evidence” C&L quickly review accounts of language learning that support a weak bias. Even where evidence from genetic, biological or psychological studies has been used to support a strong bias, C&L are able to show that this evidence does not necessarily favour nativist arguments.
In the concluding chapter, C&L review their evidence against the argument from the poverty of the stimulus. They argue for a weak innate bias towards learning language based on general-domain learning abilities. They point out that the UG framework has produced few concrete experiments or cases that “produce an explicit, computationally viable, and psychologically credible account of language acquisition” (p.214). They have attempted to introduce explicit, formal computational models of learning that have produced a credible account of learning. Although they are far from perfect and much work needs to be done, computational models have already provided a more adequate account than the UG models: “We hope that if this book establishes only one claim it is to finally expose, as without merit, the claims that Gold’s negative results motivate linguistic nativism. IIL is not an appropriate model” (p.215). Instead C&L advocate the use of domain-general statistical models of language acquisition.
EVALUATION In 12 chapters, Clark and Lappin use “Linguistic Nativism and the Poverty of Stimulus” to evaluate a key concept in modern linguistics, taking a clearly computational perspective and examining a wide variety of topics. Limitations of space have led me to simplify or ignore a number of arguments presented in this book, and skim over presentations of learning algorithms. I would suggest however, from this reviewer’s point of view, that C&L present a very cogent and coherent argument.
There are so many sides from which to attack linguistic nativism, and the argument from the poverty of stimulus in particular. Opponents have argued that most UG theories are unfalsifiable (e.g. Pullum and Scholz, 2002), that corpora designed to reflect children’s exposure to language demonstrate that the stimulus is not impoverished (e.g. MacWhinney, 2004), and that it is absurd to posit the notion that the brain adapted to language, as if language exists in the environment prior to man, rather than language adapting to the general abilities of the brain (Deacon, 1998; Christiansen and Chater, 2008). These arguments, alongside alternatives to linguistic nativism from functional linguistics (e.g. Halliday, 1975; 1993; Halliday and Matthiessen, 1999), are often dismissed as irrelevant to the theory of UG. What sets Clark and Lappin’s book apart, and why it must be taken seriously by everyone who proposes some form of linguistic nativism to explain language acquisition and typology, is that it attacks from within. It claims the very ground claimed by theories of UG. UG attempts to formally and explicitly account for the apparent mismatch between the complexity of the language learning task and the near-universal success of humans in achieving it with such apparently meagre resources. The methods proposed by Clark and Lappin identify what methods could be applied to make the complex task tractable. Specifically, these methods are not restricted to language, but are generally useful learning methods -- they are domain-general. If there is one criticism I would make of Clark and Lappin’s argument it is that they do not demonstrate clearly enough how likely it is that we all use the domain-general learning methods that they propose. For instance, we are left to presume that Probabilistic Approximately Correct and Probabilistic Context-Free Grammar algorithms represent general, non-language specific, models of learning.
I fear that we may be fooled by the apparent sophistication of the tools at our disposal. We need to remember that computational tools may force us to see a phenomenon as the tool understands it when that phenomenon is more complex than computers. It seems more than coincidental that a computational approach to language acquisition mirrors findings of corpus linguistics; for instance, that language can be viewed as inherently statistically structured. That it can be analysed in this way, or in the form of tree diagrams, does not prove that this is how humans learn language. Fortunately, Clark and Lappin are well aware of this trap and frequently warn readers that their computational theories aim to demonstrate what is possible, not what really happens in the human mind. Until we better understand exactly what neurological processes are actually involved in language acquisition, our task is to try to represent acquisition as best we can. In this endeavour, we have been expertly assisted by Clark and Lappin’s book.
“Linguistic Nativism and the Poverty of the Stimulus” is a challenging book. It challenges the reader to deal with a range of linguistic, philosophical, mathematical and computational issues, and to remember a dizzying array of acronyms and abbreviations (including APS, CFG, DDA, EFS, GB, HBM, IIL, MP, PCFG, PLD and UG). Most of all, it challenges basic concepts in mainstream linguistics. It rejects key tenets of UG in the light of advances in machine learning theory, and research in the computational modelling of the language acquisition process. It exposes so-called proofs supporting the poverty of stimulus, and reveals alternatives that are formally more comprehensive than the explanations previously provided by UG theories, and empirically more likely to match natural language acquisition processes.
REFERENCES Christiansen, Morten H. and Chater, Nick. 2008. Language as Shaped by the Brain. Behavioural and Brain Sciences 31. pp.489-558.
Deacon, Terrence W. 1998. The Symbolic Species: The Co-Evolution of Language and the Brain. New York: W.W. Norton.
Gold, E.M. 1967. Language identiﬁcation in the limit. Information and Control 10/5, pp.447–474
Halliday, Michael A.K. 1975. Learning How to Mean: Explorations in the Development of Language. London: Edward Arnold.
Halliday, Michael A.K. 1993. Towards a language-based theory of learning. Linguistics and Education 5 pp.93-116.
Halliday, Michael A.K. & Matthiessen, Christian M.I.M. 1999. Construing Experience Through Meaning. London: Continuum.
MacWhinney, Brian. 2004. A Multiple Solution to the Logical Problem of Language Acquisition. Journal of Child Language 31, pp. 883-914.
Marcus, Mitchell P., Marcinkiewicz, Mary Ann and Santorini, Beatrice. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics19/2, pp.313-330.
Pullum, Geoffrey K. and Scholz, Barbara C. 2002. Empirical Assessment of Stimulus Poverty Arguments. The Linguistic Review 19, pp.9-50.
ABOUT THE REVIEWER
ABOUT THE REVIEWER:
Nick Moore has worked in Brazil, Oman, Turkey, UAE and UK with students and
teachers of English as a foreign language, English for specific and
academic purposes, and linguistics. His PhD in applied linguistics from the
University of Liverpool addressed information structure in written English.
Other research interests include systemic functional linguistics, corpus
linguistics, theories of embodiment, lexis and skills in language teaching,
and reading programmes. Dr. Moore is the co-editor of 'READ' and he
currently coordinates and teaches undergraduate courses in English
composition and technical writing, as well as an introductory linguistics
course, at Khalifa University.