Review of Working Memory in Sentence Comprehension
|Date: Thu, 21 Apr 2005 17:05:46 -0700
From: T. Florian Jaeger
Subject: Working Memory in Sentence Comprehension: Processing Hindi Center
AUTHOR: Vasishth, Shravan
TITLE: Working Memory in Sentence Comprehension
SUBTITLE: Processing Hindi Center Embeddings
SERIES: Outstanding Dissertations in Linguistics
T. Florian Jaeger, Linguistics Department, Stanford University
Self-center-embedding constructions (henceforth SCEs) as in (1) have
received an enormous amount of attention in the psycholinguistic
literature because of the difficulty they impose on the human sentence
processor. They have, however, only been studied for a small group of
languages (mostly Dutch, English, German, Japanese, and Korean).
(1) Don't you find [that, with the right intonation, sentences [that
people [that somebody has introduced to you] produce] are relatively easy
Shravan Vasishth's Working Memory in Sentence Comprehension (henceforth
WM) presents a detailed and carefully controlled psycholinguistic
investigation of SCEs in Hindi. Three models of sentence comprehension
(Gibson, 2000; Hawkins, 1994; Lewis, 1998) are evaluated. Weighing
evidence from seven experiments, Vasishth concludes that none of the
models is sufficient to capture the whole range of facts observed for
Hindi (this extends to Hawkins, 2004).
Vasishth proposes a new model based on the Retrieval Interference Theory
(Lewis, 1998; Lewis, 2001). Comprehenders generate a set of hypotheses
(about possible parses) consistent with what they have heard so far.
Whenever a word is processed, the associated processing complexity is
derived from both memory-related processes (the retrieval/construction of
elements from/in working memory) and the consistency of the word currently
being processed with the set of established hypotheses.
WM is of considerable interest to psycholinguists working on sentence
processing, as well as a must for researchers interested in processing-
based constraints on cross-linguistic variation (cf. Hawkins, 2004). In
the course if this review, I discuss several findings relevant to ongoing
typological research on, e.g. (a) word order freedom in South Asian
languages (and to which extent it depends on discourse/information
structure constraints); (b) case OCP effects (Obligatory Contour
Principle, Leben, 1973); and (c) the cognitive basis of Differential Case
Marking (Aissen, 2003). The data presented in WM also pertain to the
ongoing debate to which extent comprehension complexity is reducible to
the predictability of a word given the information preceding it (e.g.
Gibson, submitted; Hale, 2001; Levy, 2005).
WM has 262 pages in seven chapters, an appendix with all experimental
stimuli (in Devanagari script only), a brief index, and a list of
references, as well as 17 additional pages containing a table of contents,
annotated lists of tables, figures, and algorithms, and a brief preface.
I first summarize each of WM's seven chapters (Section III of this review)
and then present an overall evaluation (Section IV) and my recommendation.
To make this review accessible to an audience beyond psycholinguists, I
have attempted to couch the issues addressed in WM in general terms,
providing additional background where necessary. Section VI provides
information about the reviewer (i.e. me).
CHAPTER BY CHAPTER SUMMARY
In this section, I give a brief summary of each chapter. The first chapter
(29 pp.) contains an introduction to the linguistic and psycholinguistic
background assumed in WM. Chapter 2 (11 pp.) introduces and summarizes the
predictions of three theories of sentence comprehension that WM evaluates.
Chapters 3 to 5 (47, 43, and 11 pp.) describe seven experiments comparing
the theories introduced in Chapter 2. Chapter 6 (17 pp.) takes the reader
on a brief detour related to positional and word length effects on reading
time relevant to several of the experiments. Finally, Chapter 7 (36 pp.)
constitutes the theoretical heart of WM, presenting Vasishth's theory of
sentence comprehension along with a discussion of its empirical coverage
(based on data from Dutch, German, Hindi, and Japanese).
Chapter 1, the introduction, contains three main parts. The first part
provides the reader with a background on the role of working memory in the
processing of SCEs. While all working memory-based processing theories
agree that increased processing load of SCEs is due to increased demands
on working memory for those constructions, the accounts differ in
precisely what is assumed to affect the memory-load and when (during
processing) the effects of additional working memory-load surface. These
differences are discussed in the summary of Chapter 2.
The second part of Chapter 1 summarizes those properties of Hindi that
pertain to the processing of SCEs. The SCEs investigated in WM are control
constructions (Bickel and Yadava, 2000):
Siitaa-ne Hari-ko [PRO kitaab khariid-neko] kahaa.
Sita-ERG Hari-DAT PRO book buy-INF told
'Sita told Hari to buy a book.'
As seen in (2), Hindi is a dependent-marking head final language. Two
aspects of Hindi SCEs are of crucial importance for the current purpose.
First, Vasishth claims that both the indirect object (Hari-ko) and the
direct object (kitaab) can be fronted without ''rendering the sentence
ungrammatical'' (p. 10):
Hari-ko Siitaa-ne [PRO kitaab khariid-neko] kahaa.
Hari-DAT Sita-ERG PRO book buy-INF told
Kitaab Siitaa-ne Hari-ko [PRO khariid-neko] kahaa.
book Sita-ERG Hari-DAT PRO buy-INF told
'Sita told Hari to buy a book.'
Crucially, the experiments presented in WM investigate these fronting
constructions out of context, which raises the question to what extent
they are subject to discourse or information structure-based constraints
(I return to this issue in Section IV). The second aspect of Hindi
relevant here is Differential Object Marking (Aissen, 2003): While non-
prototypical direct objects (e.g. definite human direct objects) must be
case-marked in Hindi, the most prototypical direct objects (unspecific
indefinite inanimates) must not be case-marked. Some types of direct
objects (e.g. indefinite inanimates as 'kitaab' - book) can occur with or
without a case marker:
Siitaa-ne Hari-ko [kitaab(-ko) khariid-neko] kahaa.
Sita-ERG Hari-DAT book-ACC buy-INF told
'Sita told Hari to buy a book.'
In the final part of Chapter 1, Vasishth argues that (a) direct objects
with case-marking are specific and conversationally imply definiteness,
and (b) direct objects without case-marking are real indefinites. The
comparison of two of the theories discussed in the next chapter (Gibson,
2000 vs. Lewis, 1998) crucially relies on these two assumptions.
Chapter 2 summarizes three models of sentence processing: Hawkins' Early
Immediate Constituency (Hawkins, 1994, henceforth EIC), Gibson's
Dependency-Locality Theory (Gibson, 1998; Gibson, 2000, henceforth DLT),
and two variants of Lewis' Retrieval Interference Theory (Lewis, 1998,
henceforth RIT). Since a detailed discussion of these theories is beyond
the scope of this review, I limit myself to a summary of the crucial
differences. Both Gibson's DLT and Hawkins' EIC (as well as the revised
theory in Hawkins, 2004) predict that processing cost increases the more
material intervenes between a dependent (e.g. an argument) and the point
of its integration (the head of the dependent). This prediction is based
on the assumption that the working memory-load at the point of integration
is higher the more complex the information that intervenes between the
dependent and its head. Thus both of them would predict that (5b) is
harder than (5a) since the distance between the direct object
argument 'kitaab' (book) and the verb 'kariid-neko' (to buy) is larger in
Siitaa-ne Hari-ko [kitaab-ko khariid-neko] kahaa.
Sita-ERG Hari-DAT book-ACC buy-INF told
Kitaab-ko Siitaa-ne Hari-ko [khariid-neko] kahaa.
book-ACC Sita-ERG Hari-DAT buy-INF told
'Sita told Hari to buy a book.'
Lewis' RIT on the other hand predicts (5a) to be harder to process than
(5b). This prediction follows from the assumption that similar items
(where similarity in this case is due to surface identical case-marking)
interfere in working memory at the point of retrieval (the verb). RIT
predicts that this difficulty is amplified if the identical items are
adjacent (e.g. the two adjacent '-ko' marked phrases in (5a)).
WM exploits this property of RIT to further distinguish RIT empirically
from DLT. DLT predicts that the processing cost at the point of
integration is higher (a) the more discourse referents intervene between
the dependent and the head (see above) and (b) the less accessible these
interveners are (Gibson, 2000; Warren and Gibson, 2002. Note that this is
not quite how Vasishth summarizes DLT. I address this discrepancy in
Section IV). This predicts that definite interveners (e.g. a -ko marked
direct object) cause less processing cost than indefinite interveners
(e.g. a direct object without -ko marking). RIT on the other hand does not
attribute any processing cost to accessibility of referents. Instead, as
stated above, increased processing cost is predicted for cases with two or
more -ko marked objects. Thus DLT predicts that (4) with a -ko marked
object incur less processing cost on the verb than if the object is
indefinite (not -ko marked). RIT makes the opposite prediction.
Chapter 3 presents three experiments testing the effect of identical case
marking. The first two experiments (acceptability elicitation and moving
window self-paced reading) compare the effect of -ko marking in SCEs with
either one level of embedding, as in (4) above, or two levels of
embedding, as in (6).
Siitaa-ne Hari-ko [Ravi-ko [kitaab(-ko) khariid-neko] bol-neko] kahaa.
Sita-ERG Hari-DAT Ravi-DAT book-ACC buy-INF tell-INF told
'Sita told Hari to tell Ravi to buy a book.'
Both experiments reveal a main effect of nesting (double-nested SCEs are
harder then single-nested SCEs) and case-marking: Crucially, -ko marked
objects were harder to process (both the object itself and the verb
integrating it) than non -ko marked objects. Since -ko marking in
experiment 1 and 2 always results in two adjacent -ko marked phrases,
these results support Lewis' RIT over DLT and EIC. Recall that, contrary
to the facts in Hindi, DLT predicts that definite interveners will result
in more processing load at the integrating verb. Vasishth points out that
all evidence provided for the validity of this claim (Warren and Gibson,
2002) comes from intervening subjects, whereas all experiments in WM
contain intervening objects. The observed difference of definiteness
effects on processing complexity could thus be related to (violations of)
expectations about prototypical subjects and objects (although Vasishth
does not relate this intriguing evidence to the research on Differential
Case Marking, his findings provide experimental support for accounts
describing Differential Case Marking in terms of harmonic alignment of
grammatical functions and markedness hierarchies).
Similar results are found in the third experiment (self-paced reading),
which investigates more complex structures. Interestingly, -ko marking in
the absence of another -ko marker also leads to a (small) increase in
processing load, which is not predicted by any of the theories.
Furthermore, adjacency of two ablatives (marked by -se) does not lead to
increased processing load. Since several other comparisons remain
inconclusive (e.g. adjacent -ko phrases are not harder to process than non-
adjacent ones contrary to RIT), Vasishth tentatively concludes that
evidence from processing associated with case-marking favors RIT over DLT
and EIC but provides ''only limited support for Lewis' similarity-based
interference hypothesis [i.e. RIT]'' (p. 102; for an overview of the
results, see p.100).
Chapter 4 presents three experiments investigating the effect of object-
fronting. One self-paced reading experiment tests the effect of direct
object fronting, and another self-paced reading experiment investigates
indirect object fronting. Since both experiments yield mostly identical
results, I describe only the direct object-fronting cases, illustrated
above in (5b) vs. (5a).
The experiment provides support for the sensitivity of DLT and EIC to
dependency length: Reading times on the integrating verb ('khariid-neko' -
to buy, in (5)) were significantly longer when the direct object was
fronted (i.e. when more material intervenes between the dependent and its
head). RIT cannot account for this effect without additional assumptions
(a potential revision of RIT is discussed in Chapter 5, pp. 156).
Potential but inconclusive support for RIT comes from the effect of -ko
marking (as in the first three experiments): for objects that occur in the
canonical position, -ko marking results in a slow down. Fronted objects
show no effects of case-marking. Under the assumption that adjacent -ko
marked NPs interfere more than non-adjacent ones (for which no evidence
was found in the first three experiments), this effect is compatible with
RIT (and not predicted by DLT and EIC) since the canonical word order
results in adjacent -ko marked phrases, cf. (5a) vs. (5b).
Chapter 5 contains the final experiment, which provides evidence
explicitly arguing against EIC and RIT, and not predicted by DLT (but see
Gibson, submitted). The experiment shows a decrease in reading times on
the verb ('khariid-neko' - to buy) if an adverb intervenes before the most
deeply embedded verb, as in (7) but not (6):
Siitaa-ne Hari-ko [Ravi-ko [kitaab-ko jitnee-jaldi-ho-sake khariid-neko]
Sita-ERG Hari-DAT Ravi-DAT book-ACC as-soon-as-possible buy-INF
'Sita told Hari to tell Ravi to buy a book as soon as possible.'
Chapter 6 contains a methodological discussion of the most adequate way to
analyze reading time effects (the dependent measure in several of the
experiments presented above). Although the evidence presented is of
interest to researchers concerned with positional and word length effects
on reading times, it is neither particularly strong (it mostly stems from
null effects), nor suited for a review intended for a broader audience.
Importantly, Vasishth concludes that the effect observed in Chapter 5 is
not due to a positional confound (the verb is read later in those examples
that contain an intervening adverb and such positional differences have
been argued elsewhere to result in a speed-up).
Chapter 7 closes the discussion of the experiments with a concise
evaluation of each theory's predictions (see the table on p. 189, overall
DLT fares better than EIC and RIT) and introduces a new model of sentence
comprehension, termed the Abductive Inference Model (henceforth AIM). Like
the revised Retrieval Interference Theory (Lewis, 2001), and in contrast
to DLT, EIC, and the original RIT (Gibson, 2000; Hawkins, 1994; Lewis,
1998), AIM combines memory-based principles with the construction of
expectations about the structure that has yet to be processed given the
information that has already been encountered.
AIM uses abductive reasoning to generate sets of hypotheses about possible
parses given the information encountered so far. Crucially, only minimally
consistent hypothesis are entertained. That is, AIM assumes that, out of
all parses consistent with the current input, comprehenders only consider
the minimal ones. For example, comprehenders do not consider any parses
that would require more NP arguments to be introduced than minimally
necessary to finish the sentence (e.g. in German, if the first word is a
nominative case-marked NP, comprehenders would not consider that this
could be followed by a transitive verb; instead only an intransitive verb
is considered at this point). In other words, comprehenders construct
as 'cheap' a hypothesis space as possible given the available input (this
bears resemblance to Frazier's (1987) Active Filler Strategy). AIM
calculates processing difficulty at each encountered word as a sum of
mostly three factors: (a) the construction of referents for each NP
encountered; (b) number of predicates expected given current hypotheses
about possible parses; (c) the number of available minimally consistent
hypotheses. A fourth factor, termed Mismatch Cost, can add to the overall
processing cost: whenever a verb is processed, the processing load is
increased for each failed attempt to match the verb with one of the
hypothesized predicates (see (b) above). The verb-predicate matching
algorithm is assumed to proceed from the outmost predicate inwards.
In the final part of Chapter 7, Vasishth discusses evidence in favor of
AIM coming from Dutch, German, Japanese, and Hindi.
WM is the result of an impressive research project. Without almost any
earlier processing literature on Hindi available, Vasishth presents
thoroughly controlled experimental studies that yield intriguing insights
into the structure of Hindi as well as the processing of dependencies in
head final, dependent-marking languages. Several of the experimental
findings pertain to important questions in the processing literature
(e.g., the nature of working memory effects in sentence processing;
predictability vs. locality effects of the distance between a head and its
dependents). The argumentation and presentation of the results are clear
and well-structured throughout WM. Thanks to this clarity, the book should
be very accessible even to readers so far unfamiliar with the literature
on SCEs. Below I briefly discuss three issues raised in WM that I deem of
particular interest to a broad community of researchers.
First, the effect of multiple -ko marking and the lack of such an effect
for -se relates to the research on case OCP effects in Hindi. Moreover,
the theoretical motivation of similarity-based interference in working
memory can be seen as providing the motivation for the case marking OCP
effects discussed in the linguistic literature. It is somewhat unfortunate
that this issue is not raised in WM. Especially, since (Mohanan, 1994:
208) presents a potentially revealing example: (8) is supposedly more
acceptable with the additional -ko marked intervener raat-ko:
Ramm-ko (raat-ko) bacco-ko samhaalnaa paadaa.
Ram-DAT night-at children-ACC take-care-INF fall-PERF
'Ram had to take care of the children at night.'
This lends support to Vasishth's observation that RIT may be too
restrictive if only form-similarity is considered (in which case (8)
should be harder with the additional -ko intervener). Apparently, raat-ko
is not similar enough to cause interference (due to its different semantic
and syntactic status).
Second, the object-fronting effect is relevant for ongoing research on
word order freedom in Hindi and other South Asian languages (see several
articles in Butt et al., 1994). Interestingly, Vasishth cites several
follow-up studies (conducted by him) showing that fronting effects
disappear for indirect objects but not for direct object once a proper
discourse is provided. This may be taken to indicate that indirect object
fronting is subject to discourse/information structure constraints while
direct object fronting is not.
The speed-up on the verb observed in the final experiment pertains to
predictability-based models of processing such as Hale (2001) (and more
recently Gibson, submitted; Levy, 2005) . The effect adds to similar
evidence coming from German (Konieczny, 2000) and Japanese and re-iterates
the necessity of a predictability-based component in theories sentence
processing. While Vasishth (ibid) fairly comments that precise predictions
are hard to derive for such accounts given the lack of large parsed
corpora of Hindi, it seems rather clear that predictability-based accounts
would, at least for some cases, make similar predictions as Vasishth's
AIM. WM could therefore have benefited from a more detailed discussion of
the role predictability plays in language processing (e.g. Hale's 2001
model is only mentioned in passing, p. 221).
Given the task it takes on, it is unsurprising that WM also has some minor
shortcomings. Here, I will briefly mention one: Vasishth presents at times
a slightly distorted version of Gibson's DLT (this is pervasive throughout
the book and potentially confusing). Contrary to Vasishth's claims, DLT
considers indefinite NPs (in this case bare direct objects) to only cause
a higher processing load on the verb if they intervene between the verb
and its dependent. Definiteness of the dependent itself is not predicted
to matter (Gibson, 2000, and personal communication).
Researchers working on aspects of Hindi morphology and/or syntax may find
some of the assumptions Vasishth makes in the introduction problematic,
but I strongly recommend approaching WM with an open mind, keeping in mind
that Vasishth accomplishes what still relatively few even approach:
typologically interesting, experimentally well-controlled work on sentence
processing. In sum, I highly recommend WM. WM provides crucial insights
into the nature of the human language processor that cannot be obtained
from the study of English alone.
AISSEN, JUDITH. 2003. Differential Object Marking: Iconicity vs. Economy.
Natural Language and Linguistic Theory, 21.435-83.
BICKEL, B. and YADAVA, Y. P. 2000. A fresh look at grammatical relations
in Indo-Aryan. Lingua, 110.343-73.
BUTT, MIRIAM; KING, TRACY HOLLOWAY and RAMCHAND, GILLIAN (eds.) 1994.
Theoretical perspectives on word order in South Asian languages. vol. 50.
CSLI Lecture Notes. Stanford: CSLI.
FRAZIER, LYNN. 1987. Syntactic processing: Evidence from Dutch. Natural
Language and Linguistic Theory, 5.519-60.
GIBSON, EDWARD. 1998. Linguistic complexity: Locality of syntactic
dependencies. Cognition, 68.1-76.
GIBSON, EDWARD. 2000. The Dependency Locality theory: A Distance-based
theory of linguistic complexity, 95-126.
GIBSON, EDWARD. submitted. The interaction of top-down and bottom-up
statistics in syntactic ambiguity resolution.
HALE, JOHN. 2001. A Probabilistic Earley Parser as a Psycholinguistic
Model. Paper presented at Second Meeting of the North American Chapter of
the Asssociation for Computational Linguistics.
HAWKINS, J. A. 1994. A Performance Theory of Order and Constituency.
Cambridge: Cambridge University Press.
HAWKINS, J. A. 2004. Efficiency and Complexity in Grammars. Oxford: Oxford
KONIECZNY, LARS. 2000. Locality and parsing complexity. Journal of
Psycholinguistic Research, 29.627-45.
LEBEN, WILL. 1973. Suprasegmental Phonology, Linguistics, MIT.
LEVY, ROGER. 2005. Processing difficulty in verb-final clauses matches
syntactic expectations. Annual meeting of the Linguistic Society of America
LEWIS, RICHARD L. 1998. Interference in Working Memory: Retroactive and
proactive interference in parsing. Paper presented at CUNY Sentence
LEWIS, RICHARD L. 2001. Language. Berkeley Springs, West Virginia
MOHANAN, TARA. 1994. Case OCP: A Constraint on Word Order in Hindi.
Theoretical Perspectives on Word Order in South Asian Langauges, ed. by
Miriam Butt, Tracy Holloway King and Gillian Ramchand. Stanford: CSLI.
WARREN, TESSA and GIBSON, EDWARD. 2002. The influence of referential
processing on sentence complexity. Cognition, 85.79-112.
| ABOUT THE REVIEWER:
ABOUT THE REVIEWER
Florian Jaeger is a Ph.D. student at the Linguistics Department, Stanford
University supposedly in the process of writing his thesis on production-
driven variation. His current research interests include English prosody
(phrasing, as well as post-nuclear prominences), and processing-based
models of linguistic variation. This includes work (more often than not
with hordes of other researchers) on wh-phrase ordering (and Superiority),
work on relativizer and complementizer omission, work on choice of
linguistic expressions (the distribution of anaphors vs. pronouns), as
well as work on constructional choice (existential vs. canonical subject