Review of  Sentence Comprehension: The Integration of Habits and Rules

Reviewer: Ngoni Chipere
Book Title: Sentence Comprehension: The Integration of Habits and Rules
Book Author: Thomas G. Bever David J. Townsend
Publisher: MIT Press
Linguistic Field(s): Computational Linguistics
Cognitive Science
Issue Number: 12.2659

Townsend, David J., and Thomas G. Bever (2001) Sentence
Comprehension: The Integration of Habits and Rules. MIT
Press, xi+445pp, hardback ISBN 0-262-20132-1, paperback
ISBN 0-262-70080-8, $24.95, Language Speech and Communication
Series, a Bradford book.

Ngoni Chipere, School of Education, The University of Reading

The book attempts to integrate Symbolic processing, in
the form of Minimalism, with Connectionism. Minimalism
represents sentences as symbolic structures resulting
from a formal process of syntactic derivation.
Connectionism, on the other hand, represents sentences as
patterns of association between linguistic features.
These patterns are said to obey statistical regularities
of linguistic usage instead of formal linguistic rules.
The authors of the book argue that human sentence
processing displays both structural and statistical
characteristics and therefore requires the integration
of the two views.

The book is intended for a broad cognitive science
audience. It is most directly relevant to those engaged
in sentence processing research. However, the book is
generally accessible to those unfamiliar with either
sentence processing research or formal linguistic theory.
A number of text boxes contain succinct descriptions of
experimental paradigms and there is an introductory
chapter on linguistic theory. The proposal that
Minimalism plays a central role in sentence processing
will interest proponents of the theory. On the other
hand, proponents of Construction Grammar and allied
linguistic theories will be interested to learn about the
psychological role played by prefabricated grammatical
structures. Finally, Connectionists will be interested in
the experimental evidence for statistical influences in
sentence processing.

The text is well laid out and the style of writing is
clear. The book is organised into 10 chapters. Chapter 1
is a short outline of the main ideas in the book and how
they are developed in later chapters. Chapters 2 to 4
contain reviews of the literature and other background
material. Chapter 5 describes the authors' integrated
model. Chapter 6-8 present evidence in favour of the
model. Chapters 9-10 show how the model relates to other
aspects of linguistic cognition.

The review focuses on the largely theoretical sections of
the book (Chapters 1-5). Limitations of space do not
allow for a thoughtful discussion of the empirical
sections (Chapters 6-8) and the attempts to relate the
model to other aspects of language functioning (Chapters

The book begins with a review of experimental literature
from the 1950's and 60's on the role of grammatical rules
in sentence processing (Chapter 2). According to the
authors, the "ultimate conclusion" to be drawn from this
literature "was that while grammatically defined
representations appear to be computed during language
behavior, the grammatical rules that define them may not
be used." (p. 45). There are two main critical
observations concerning this conclusion and the authors'
review of the literature.

Firstly, the wording of the conclusion glosses over some
problematic facts: subjects were sometimes found to
employ sentence representations which were *not*
grammatically defined and methodological problems were
such as to render suspect any strong conclusions about
the psychological reality of phrase structure (see the
extensive reviews by Levelt, 1974 and 1978). Secondly,
the authors' review omits the considerable evidence for
Markovian (probabilistic) models of language processing.
This evidence is important for an informed evaluation of
modern-day Connectionism and for the authors' own
proposals. Corpus Linguists may also be interested to
learn about the psychological reality of n-grams. Because
this important work appears to have been largely
forgotten, the following rather extensive list is
provided for the interested reader: (Miller, Heise and
Lichten, 1951; Miller and Selfridge, 1951; Marks and
Jack, 1952; Miller, Bruner and Postman, 1954; Deese and
Kaufman, 1957; Goldman-Eisler, 1957, 1958 and 1968;
Sharp, 1958; Maclay and Osgood, 1959; Richardson and
Voss, 1960; Onishi, 1962; Pollack and Pickett, 1964;
Traul and Black, 1965; Muise, Leblanc and Jeffrey, 1971,
1972; Lefton, Spragins and Byrnes, 1973 and Scheerer-
Neumann, Ahola, Koenig and Reckerman 1978).

Chapter 3, "What Every Psychologist Should Know About
Grammar", provides useful descriptions of linguistic
terminology and concepts. It will be useful to readers
who are not familiar with linguistic theory. However, the
chapter offers an unrepresentative view of linguistic
theory because it focuses on theories which employ
movement and associated concepts of derivation and trace.
There is a danger that readers unaware of the breadth of
linguistic theory might be led to think that these
concepts are uncontroversial and must be accommodated by
an valid model of sentence processing. A more
representative coverage of linguistic theory would also
have revealed interesting parallels between the authors'
notion of canonical sentoid with the notion of
construction in Construction Grammar and the notions of
collocation and n-gram in Corpus Linguistics.

The quality of the discussion in this chapter is lowered
by the use of unsupported grammatical intuitions. For
instance, in part of a complex chain of argument, the
authors state categorically that "in a relative clause,
adverbs cannot appear before the relative pronoun" (p.
61). No support is given for this generalisation, apart
from the following example:

This is the horse frequently that raced.

However, a simple query on the Google search engine
[search for: "something sadly +that"] reveals many
examples of normal sounding relative clause sentences
which contain a relative pronoun preceded by an adverb,

Okinawa's experience offers a sober reminder of the
horrors of war, something, sadly, that mankind constantly
seems to forget.

... going to the synagogue on a weekly basis is something
sadly, that most Jews do not do ...

... and the puzzles give you feedback when you're close
but not quite--something, sadly, that no traditional
book-based puzzle can do for you ...

The pattern, <something> <evaluative adverb> <relative
pronoun> seems to be rather common, something, in fact,
which ironically supports the authors' general argument
for statistical patterning in language. Their own example
can be altered slightly in the light of this pattern to
read better as follows:

This is the horse, sadly, that died.

The use of corpus data requires insignificant effort and
it is difficult to understand why the authors jeopardise
their arguments by relying on personal intuition.

Chapter 4 describes contemporary models of sentence
processing. These are numerous and subject to constant
revision. Under the circumstances, the authors cover a
fairly wide and representative sample. They divide the
models into two groups: Structural models, which employ
phrase structure rules and Connectionist models, which
are statistical. There is also a short section on hybrid

The authors present evidence for statistical influences
on sentence processing but argue that Connectionist
models are limited in a fundamental way. These models
excel at pattern completion, but only by dint of having
learned a limited set of patterns against which to
compare the input. Sentences, the authors argue, are
infinite in number and cannot be handled exclusively in
terms of pattern completion. They therefore argue that
Connectionist models must be supplemented by a grammar.

The authors' critique of Connectionism could have been
sharpened by reference to previous work on this topic. A
succinct empirical argument for integrating structural
and statistical aspects was proposed by Goldman-Eisler
(1968) along the lines of a 19th century distinction made
Hughlings Jackson between superior (novel) and inferior
(learned) speech. Ferdinand de Saussure (1916) also
integrates statistics and structure in his discussion of
the way paradigmatic series emerge from linguistic
experience. More recently, Miikkulainen (1996), Marcus
(2001, appearing presumably too late for the authors to
reference), have argued, among others, that hybrid
solutions can overcome certain non-trivial limitations in
the ability of Connectionist models to generalise to
novel inputs. Reference by the authors to this previous
work could have presented readers with alternative
possibilities of integration against which to assess the
authors' own model.

Chapter 5 describes the authors' hybrid model of the
listener. It combines dual processing with analysis-by-
synthesis, whereby approximations to the input are
successively synthesised and compared against the input
until a suitable match is found. The model operates as
follows. An input string is first placed in a temporary
store and then subjected to preliminary analysis by the
Connectionist component of the model. The preliminary
analysis determines major phrases and their conceptual
relationships. The output from this analysis constitutes
a 'numeration' which initialises the Minimalist
component. This component now carries out a standard
syntactic derivation and outputs logical and
phonological representations of the input. The
phonological representation is then compared against the
original input string. If the match is good, processing
is complete and the listener hears the two
representations played simultaneously, otherwise the
whole process starts again and a different candidate
representation is generated.

The piquancy of the irony in this proposal will not be
lost on those familiar with the history. Generativism was
born out of a vigorous rejection of Associationism and
the notion of linguistic habit. It takes an admirable
degree of integrity to admit that this rejection was
precipitous and that Generativism should have sought to
complement rather than to replace Associationism. Thomas
Bever, in fact, made this admission as early as 1970. The
proposed model is a courageous and ingenious attempt to
integrate the current forms of Generativism and
Associationism into one system. Chapters 6-8 of the book
indicate that there is actually a considerable amount of
experimental data which is broadly consistent with the
hybrid model. However, the authors' desire for empirical
validation is detrimental to the detailed elucidation of
the mechanics of the model. Several important issues are
not addressed, each of which can seriously undermine the
psychological plausibility of the model.

One set of issues relates to the memory requirements of
the model. It appears that temporary storage is required
for two complete phonological representations of the same
sentence. These representations must also presumably be
stored in two separate buffers, otherwise they would
interfere with each other. However, the authors do not
cite independent psychological evidence for a) the
requisite short-term memory (stm) capacity; b) the
maintenance in stm of two distinct phonological
representations of the same sentence or c) the existence
of two separate phonological buffers. Further, if these
buffers have a limited capacity, as one would expect, it
is hard to predict how the model would cope with buffer
overflow caused by excessive sentence length.

A second set of issues relates to the sequencing of
processing events. When precisely does the listener hear
the sentence? Only when a suitable match is found? What
mechanism prevents the listener from hearing the sentence
each time a comparison is made? And what if no match is
found? Does the listener then not hear the sentence? From
the information given in the book, the model seems to
predict a delay in hearing the sentence, spanning the
time the input string initially enters the temporary
store to the time a suitable match is found. The model
also seems to predict variations in this delay, depending
on the number of times candidate representations are
generated before a satisfactory match is found. Do the
authors therefore predict that some sentences are heard
systematically later than others relative to onset time?

There is a third set of problems concerning the memory
requirements of the model in relation to the sequencing
of processing events. If several approximations of the
input are generated, a record must be kept concerning
failed analyses. Otherwise the system runs the occasional
risk of looping infinitely through the same wrong
analyses. In the flow diagram on page 163, the authors
present a box which indicates that data from the
preceding analysis feeds into the preliminary analysis.
Does this data contain a record of previous analyses? If
so, what is the nature of the record? A key argument made
by the authors is that there are an infinite number of
possible sentences. This argument would seem to preclude
the use of a strategy in which a token of some kind is
stored in order to record each specific structure which
has been proposed and rejected. Such a strategy depends
on the forbidden assumption that it is possible to
enumerate all possible structures. If sentence tokens are
out of the question, does the record consist of entire
sentence structures? If so, the storage requirements are
considerable. And if excessive on-line memory demands
trigger an stm purge, records of previous analyses would
be lost and the system would presumably get locked into a
loop once again, repeating past errors indefinitely.

A fourth set of problems has to do with the fragmentary
nature of conversational language in relation to the
authors' claim that "The sentence level is the
fundamental object of language perception ..." (p. 5).
Consider the following conversation heard recently on
British radio:

Interviewer: are you on time?
Interviewee: ish
Interviewer: (laughs) are you on budget?
Interviewee: ish
Interviewer: (laughs)

This example illustrates a difficult problem for the
authors' model. The problem is that people communicate
effortlessly without using complete sentences. It is not
at all clear, given a sentence fragment as input, what
the Connectionist component of the model would output.
The Minimalist component seems to have two options. Given
an incomplete numeration, it could 'crash'. It is not
clear from the book what would happen to the parse then
or what the behavioural correlates of 'crashing' might
be. The other option is to accept the incomplete
numeration; generate a complete tree structure and output
a whole sentence. The difficulty is that it would not
then be possible to phonologically match a complete
sentence with a fragmentary input string.

And, in any case, it must be asked whether Minimalist
principles are so subtle that they can convert the
adjectival suffix -ish into an adverb meaning something
like, 'approximately' and then, using material from the
discourse context, build a tree structure, complete with
IP node and all the rest, to derive sentences like 'I am
approximately on time' and 'I am approximately on budget'
etc. It seems more plausible to regard the creative use
of -ish here as the product of fluid verbal intelligence
rather than something which a grammar can reasonably be
expected to predict.

A fifth set of problems concerns the relationship between
the Connectionist and Minimalist components. The entire
argument of the book seems to hinge on the ability of the
Minimalist component to inform the Connectionist
component in some way. However, it is not obvious from
the flow diagram on page 163 precisely how the Minimalist
component informs the Connectionist component. There is a
box which indicates that results from previous analyses
feed into the preliminary analysis, but it is not made
clear just what sort of information this box contains.
The two components also use different representational
formats: distributed representations versus symbolic
representations. How is one format translated into the
other? If translation between formats is possible, so
that the Minimalist component can feed into the
Connectionist component, how does that tally with the
authors' argument (p. 147) that Connectionist models
cannot represent detailed syntactic structure?

These questions neutralise the impact of the experimental
evidence presented in favour of the model in Chapters 6-
8. Some of this evidence is interesting in its own right.
For instance, Chapter 7 suggests that numerous findings
in sentence processing can be reduced to the operation of
a prefabricated N(oun)V(erb)N(oun) sentence schema. There
also seem to be cases where subjects compute meaning
first and syntax later, as predicted by the model
(Chapter 6). However, there could well be alternative
explanations for the data. It might be that
Connectionist-style processing interacts with conscious
verbal problem solving instead of a Minimalist component.

The main strength of the book lies in its wide coverage
of psycholinguistic data and models and in its search for
coherence. In this search, the authors find, contrary to
the spirit of Generativism, an important psychological
role for statistical influences such as the use of
prefabricated grammatical structures. The Connectionist
component of their model is therefore justified. However,
the case for the Minimalist component is weak. The chain
of argumentation is opaque around the links between the
Connectionist and Minimalist components. More extensive
discussion of previous work on the integration of
statistical and structural aspects would have been
helpful. The mechanics of the authors' own model need to
be specified in greater detail to exclude some
implausible consequences of the current formulation.

I am a Research Fellow in the School of Education at The
University of Reading, UK, where I am applying
quantitative techniques to a corpus of children's
writing in order to discover trends in native language
development during the school years. My doctoral thesis
evaluated Symbolic and Connectionist accounts of
individual differences in sentence processing in light of
my experimental findings. The findings indicate, contrary
to the prevailing assumption, that adult speakers are not
necessarily fully productive in the syntax of their
native language. There appears to be a schema-rule
continuum wherein individuals at the schema end of the
continuum are less syntactically productive than
individuals at the rule end (Chipere, in press). These
findings require the integration of elements from
Symbolic and Connectionist viewpoints. The theoretical
and empirical basis for an alternative approach to
integration is set out in my forthcoming book:
"Understanding Complex Sentences: Variations in Native
Speaker Competence" to be published by Palgrave.


