LINGUIST List 23.1096

Sun Mar 04 2012

Review: Cog. Science; Lang. Acq.; Philosophy of Lang.: Clark & Lappin (2011)

Editor for this issue: Joseph Salmons <jsalmonslinguistlist.org>



Date: 04-Mar-2012
From: Nick Moore <nick.moorelycos.com>
Subject: Linguistic Nativism and the Poverty of the Stimulus
E-mail this message to a friend

Discuss this message

Announced at http://linguistlist.org/issues/22/22-654.html

AUTHORS: Clark, Alexander and Lappin, ShalomTITLE: Linguistic Nativism and the Poverty of the StimulusPUBLISHER: Wiley-BlackwellYEAR: 2011

Nick Moore, Khalifa University, United Arab Emirates

INTRODUCTION"Linguistic Nativism and the Poverty of the Stimulus" tackles a key issue inlinguistics over 260 pages. The book is intended for a general linguisticsaudience, but the reader needs some familiarity with basic concepts in formallinguistics, at least an elementary understanding of computational linguistics,and enough statistical, programming or mathematical knowledge not to shy awayfrom algorithms. The book will most benefit linguists working in formal grammar,computational linguistics and linguistic theory.

The main aim is to replace the view that humans have an innate bias towardslearning language that is specific to language with the view that an innate biastowards language acquisition depends on abilities that are used in other domainsof learning. The first view is characterised as the argument for a strong bias,or linguistic nativism, while the second view is characterised as a weak bias ordomain-general view. The principle line of argument is that computational,statistical and machine-learning methods demonstrate superior success inmodeling, describing and explaining language acquisition, especially whencompared to studies from formal linguistic models based on strong bias arguments.

SUMMARYChapter 1 establishes the boundaries of the discussion for the book. Theauthors' main focus is on a viable computationally-explicit model of languageacquisition. Clark and Lappin (C&L) do not dismiss nativism entirely. They pointout that it is fairly uncontroversial, first, that humans alone acquirelanguage, and, second, that the environment plays a significant role indetermining the language and the level of acquisition. What they intend toestablish in the book, however, is that any innate cognitive faculties employedin the acquisition of language are not specific to language, as suggested byChomsky and the Universal Grammar (UG) framework, but are general cognitiveabilities that are employed in other learning tasks. It is from this 'weak bias'angle that they critique the Argument from the Poverty of Stimulus (APS).

The APS is considered central to the nativist position because it provides anexplanation for Chomsky's core assumptions for UG: (1) grammar is rich andabstract; (2) data-driven learning is inadequate; (3) children are exposed toqualitatively and quantitatively degenerate linguistic data; and (4) theacquired grammar for a given language is uniform despite variations inintelligence. These assumptions are dealt with in chapter 2 and throughout thebook and replaced with assumptions from a machine-learning, 'weak bias'perspective, although the reader is referred to other sources to counterassumption (3). C&L point out that arguments against connectionist orfrequency-based theories do not prove the UG assumption of linguistic nativismcorrect -- strong bias remains unproven. C&L claim that many strong biasarguments become self-serving. C&L preview machine-learning alternatives tolinguistic nativism based on distributional, or relative frequency, criteria anda Bayesian statistical model to account for these same learning phenomena.

Chapter 3 examines the extent to which the stimulus really is impoverished.While C&L encourage the reader to examine empirical evidence provided by corpora(e.g. MacWhinney, 2004), they focus on the role of negative evidence in thelearning process because a small amount of negative evidence can make asignificant difference to language acquisition. Indirect negative evidence, suchas the non-existence of hypothesised structures, can also significantly alterthe learning task. C&L challenge the premise of no negative evidence partlybecause it is a central tenet of Gold's (1967) "Identification In the Limit"(IIL). Gold's highly-influential study argues that because learning cannotsucceed within the cognitive and developmental limits imposed, then childrenmust have prior knowledge of the language system. However, C&L point out thatthis view is partly a consequence of ignoring non-syntactic information in thelearning environment; ignoring semantic, pragmatic and prosodic informationproduces a circular argument about the centrality of syntax.

Clark and Lappin are keen to point out that demonstrating the tractability ofthe learning problem through viable formal computational descriptions does notequate to modelling the actual neurological or psychological processes that maybe employed in language acquisition. In Chapter 4, C&L reject a number of keyassumptions in Gold's IIL model. They do not allow for the unnatural conditionof the malevolent presentation of data -- intentionally withholding crucialevidence samples and offering evidence in a sequence detrimental to learning.They reject Gold's lack of time limitation placed on learning. They reject theimpossibility of learners querying the grammaticality of a string. They rejectthe view that learning is through positive evidence only. Most significantly,C&L reject limiting the hypothesis space available to the learner. Rather, theyinsist that while the learnable class of a language is limited, the learner isfree to form unlimited hypotheses.

C&L introduce their machine-learning approach in chapter 5, "ProbabilisticLearning Theory". The initial step in the weak bias argument is to replace aconvergent grammar defined categorically (a string is DEFINITELY correct) with aprobabilistic definition (a string is PROBABLY correct). C&L again object to thesimplistic lines of argument employed by Chomsky and his followers in theirrejection of statistically-based models of learning. While it may be true thatthe primitive statistical models critiqued by Chomsky are incapable of producinga satisfactory distinction between grammatical and ungrammatical strings, thisdoes not prove that all statistical methods are inferior to UG descriptions. C&Lintroduce a range of statistical methods that they propose can better representthe nature of language acquisition. Central to a plausible probabilisticapproach to modelling language is the distinction between simple measures offrequency and distributional measures. C&L propose that learners hypothesise thelikelihood of a sequence, based on observations, in order to converge on themost likely grammar.

Replacing Gold's paradigm with three key assumptions (1. language data ispresented unlabelled; 2. the data includes ungrammatical sentences; and 3.efficient learning takes place despite negative examples) allows C&L tointroduce the Disjoint Distribution Assumption in chapter 6. This probabilisticalgorithm depends on the observed distribution of segmented strings and on theprinciple of converging on a probabilistic grammar. Distributional measuresensure that the probability of any utterance will never be zero, allowing forerrors in the presented language data. This model predicts over-generalisationand under-generalisation in the learner's output because of the unlimitedhypothesis space. The addition of word class distributional data also ensuresgreater reliability of judging the probability of a string being grammatical.

A major aim of this book is to provide a formal account of the language learningpuzzle that will make the acquisition problem tractable. C&L contend that UGtheories have made the wrong assumptions in relation to the learning task andthe learning conditions, and in chapter 7, they set out the assumptions thatallow efficient learning with a weak bias. They advocate algorithms that cansimulate learning under natural conditions. They assume that the input data isnot homogenous -- some language items are 'more learnable' than others, and morecomplex learning tasks can be deferred according to a Fixed ParameterTractability algorithm. Ultimately, C&L argue that complex grammatical problemsare no better solved by a UG Principles and Parameters approach. When UGtheories use the strong bias position as the only argument to deal withcomplexity, they have not solved the problem posed by a seemingly intractablelearning task.

If we are to reject the presumption of the strong bias in linguistic nativism,we need to be confident that its replacement can produce reliable results. Theproposed algorithms in chapter 8 start to provide those results. The process ofhypothesis generation in Gold's IIL is described as being close to random, andconsequently "hopelessly inefficient" (p.153). Various replacements that havebeen tested include (Non-)Deterministic Finite State Automata algorithms whichhave proved effective in restricted language learning contexts. Simpledistributional and statistical learning algorithms offer promising results, butmust be adapted to also simulate grammar deduction.

Objections to distributional models are countered in chapter 9 by presenting theresults of real algorithms working on real data. Typically the algorithmperforms a parsing, tagging, segmenting or similar task on a corpus, and theresults are measured against a previously-annotated version of the corpus -- a'gold standard.' Corpora in these experiments tend to be samples of Englishintended for adults, such as the Penn Treebank (Marcus, Marcinkiewicz andSantorini, 1993). Learning algorithms can be divided into "supervised" --requiring the corpus to carry some form of information such as part of speechtagging -- and "unsupervised" -- working on a 'bare' corpus of language.Supervised learning algorithms, such as the Maximum Likelihood Estimate, matchthe 'gold standard' in about 88-91% of cases. More surprising, perhaps, are thehigh success rates of unsupervised learning algorithms in word segmentation, inlearning word classes and morphology, and in parsing.

Chapter 10 examines 'Principles and Parameters' and other UG models. C&L claimthat the strong bias in this UG model would require an innate phonologicalsegmenter and part of speech tagger, and that by limiting the hypothesis andparameter space available to the learner, the language learning task becomesintractable, particularly as the highly abstract nature of UG parameters appearto have very little direct relationship to the primary language data. C&L lamentthe paucity of theoretical and experimental evidence for the Principles andParameters (P&P) framework. Even more worrying for UG theories is thenear-indifference to questions of acquisition in Minimalist Program research,the latest version of UG. In place of UG models, C&L offer "ProbabilisticContext-Free Grammars" and "Hierarchical Bayesian Models" to account forlanguage acquisition through comprehensive descriptions and high levels ofsuccess in simulations. In chapter 11, "A Brief Look at Some Biological andPsychological Evidence" C&L quickly review accounts of language learning thatsupport a weak bias. Even where evidence from genetic, biological orpsychological studies has been used to support a strong bias, C&L are able toshow that this evidence does not necessarily favour nativist arguments.

In the concluding chapter, C&L review their evidence against the argument fromthe poverty of the stimulus. They argue for a weak innate bias towards learninglanguage based on general-domain learning abilities. They point out that the UGframework has produced few concrete experiments or cases that "produce anexplicit, computationally viable, and psychologically credible account oflanguage acquisition" (p.214). They have attempted to introduce explicit, formalcomputational models of learning that have produced a credible account oflearning. Although they are far from perfect and much work needs to be done,computational models have already provided a more adequate account than the UGmodels: "We hope that if this book establishes only one claim it is to finallyexpose, as without merit, the claims that Gold's negative results motivatelinguistic nativism. IIL is not an appropriate model" (p.215). Instead C&Ladvocate the use of domain-general statistical models of language acquisition.

EVALUATIONIn 12 chapters, Clark and Lappin use "Linguistic Nativism and the Poverty ofStimulus" to evaluate a key concept in modern linguistics, taking a clearlycomputational perspective and examining a wide variety of topics. Limitations ofspace have led me to simplify or ignore a number of arguments presented in thisbook, and skim over presentations of learning algorithms. I would suggesthowever, from this reviewer's point of view, that C&L present a very cogent andcoherent argument.

There are so many sides from which to attack linguistic nativism, and theargument from the poverty of stimulus in particular. Opponents have argued thatmost UG theories are unfalsifiable (e.g. Pullum and Scholz, 2002), that corporadesigned to reflect children's exposure to language demonstrate that thestimulus is not impoverished (e.g. MacWhinney, 2004), and that it is absurd toposit the notion that the brain adapted to language, as if language exists inthe environment prior to man, rather than language adapting to the generalabilities of the brain (Deacon, 1998; Christiansen and Chater, 2008). Thesearguments, alongside alternatives to linguistic nativism from functionallinguistics (e.g. Halliday, 1975; 1993; Halliday and Matthiessen, 1999), areoften dismissed as irrelevant to the theory of UG. What sets Clark and Lappin'sbook apart, and why it must be taken seriously by everyone who proposes someform of linguistic nativism to explain language acquisition and typology, isthat it attacks from within. It claims the very ground claimed by theories ofUG. UG attempts to formally and explicitly account for the apparent mismatchbetween the complexity of the language learning task and the near-universalsuccess of humans in achieving it with such apparently meagre resources. Themethods proposed by Clark and Lappin identify what methods could be applied tomake the complex task tractable. Specifically, these methods are not restrictedto language, but are generally useful learning methods -- they aredomain-general. If there is one criticism I would make of Clark and Lappin'sargument it is that they do not demonstrate clearly enough how likely it is thatwe all use the domain-general learning methods that they propose. For instance,we are left to presume that Probabilistic Approximately Correct andProbabilistic Context-Free Grammar algorithms represent general, non-languagespecific, models of learning.

I fear that we may be fooled by the apparent sophistication of the tools at ourdisposal. We need to remember that computational tools may force us to see aphenomenon as the tool understands it when that phenomenon is more complex thancomputers. It seems more than coincidental that a computational approach tolanguage acquisition mirrors findings of corpus linguistics; for instance, thatlanguage can be viewed as inherently statistically structured. That it can beanalysed in this way, or in the form of tree diagrams, does not prove that thisis how humans learn language. Fortunately, Clark and Lappin are well aware ofthis trap and frequently warn readers that their computational theories aim todemonstrate what is possible, not what really happens in the human mind. Untilwe better understand exactly what neurological processes are actually involvedin language acquisition, our task is to try to represent acquisition as best wecan. In this endeavour, we have been expertly assisted by Clark and Lappin's book.

"Linguistic Nativism and the Poverty of the Stimulus" is a challenging book. Itchallenges the reader to deal with a range of linguistic, philosophical,mathematical and computational issues, and to remember a dizzying array ofacronyms and abbreviations (including APS, CFG, DDA, EFS, GB, HBM, IIL, MP,PCFG, PLD and UG). Most of all, it challenges basic concepts in mainstreamlinguistics. It rejects key tenets of UG in the light of advances in machinelearning theory, and research in the computational modelling of the languageacquisition process. It exposes so-called proofs supporting the poverty ofstimulus, and reveals alternatives that are formally more comprehensive than theexplanations previously provided by UG theories, and empirically more likely tomatch natural language acquisition processes.

REFERENCESChristiansen, Morten H. and Chater, Nick. 2008. Language as Shaped by the Brain.Behavioural and Brain Sciences 31. pp.489-558.

Deacon, Terrence W. 1998. The Symbolic Species: The Co-Evolution of Language andthe Brain. New York: W.W. Norton.

Gold, E.M. 1967. Language identiļ¬cation in the limit. Information and Control 10/5,pp.447-474.

Halliday, Michael A.K. 1975. Learning How to Mean: Explorations in theDevelopment of Language. London: Edward Arnold.

Halliday, Michael A.K. 1993. Towards a language-based theory of learning.Linguistics and Education 5 pp.93-116.

Halliday, Michael A.K. & Matthiessen, Christian M.I.M. 1999. ConstruingExperience Through Meaning. London: Continuum.

MacWhinney, Brian. 2004. A Multiple Solution to the Logical Problem of LanguageAcquisition. Journal of Child Language 31, pp. 883-914.

Marcus, Mitchell P., Marcinkiewicz, Mary Ann and Santorini, Beatrice. 1993.Building a Large Annotated Corpus of English: The Penn Treebank. ComputationalLinguistics19/2, pp. 313-330.

Pullum, Geoffrey K. and Scholz, Barbara C. 2002. Empirical Assessment ofStimulus Poverty Arguments. The Linguistic Review 19, pp. 9-50.

ABOUT THE REVIEWERNick Moore has worked in Brazil, Oman, Turkey, UAE and UK with students andteachers of English as a foreign language, English for specific andacademic purposes, and linguistics. His PhD in applied linguistics from theUniversity of Liverpool addressed information structure in written English.Other research interests include systemic functional linguistics, corpuslinguistics, theories of embodiment, lexis and skills in language teaching,and reading programmes. Dr. Moore is the co-editor of 'READ' and hecurrently coordinates and teaches undergraduate courses in Englishcomposition and technical writing, as well as an introductory linguisticscourse, at Khalifa University.

Page Updated: 04-Mar-2012