Review of Movement in Language
Date: Mon, 8 Sep 2003 23:33:47 +0200
From: Gisbert Fanselow
Subject: Movement in Language: Interactions and Architectures
Richards, Norvin (2001) Movement in Language: Interactions and
Architectures, Oxford University Press.
Gisbert Fanselow, University of Potsdam
The insights of Chomsky (1964), and, in particular, Ross (1967) lead to
the establishment of a new research topic in syntax: constraints on
movement. This new line of research generated an impressive number of
empirical insights, and culminated in attempts such as Chomsky (1981),
Chomsky (1986), or Baker (1988) of finding one or two simple principles
from which all constraints on movement can be derived.
However, empirical data that might prove fatal for a purely syntactic
account of the restrictions on movement already came in around 1980.
Huang (1982) observed that some of the local domains that restrict
movement in English also constraint scope assignment in Chinese
questions, in spite of the fact that question words do not undergo
(visible/audible) wh-movement in this language but may stay in situ.
Earlier, Erteschik (1973) had made the discovery that the degree of
acceptability of extractions from certain domains is a function of
information structure. These findings did not lead to a dismissal of
syntax-based accounts of constraints on movement, however. Rather,
models were developed in which the assignment of semantic scope to
operators is conceived of as the construction of a formal level of
representation (viz., Logical Form. LF), which involves essentially the
same type of operations that we find in visible syntax, including
movement (see, e.g., Huang (1982)). According to the GB-model (Chomsky
(1981)), the ultimate target of a syntactic derivation is Logical Form.
Much of the derivation of LF consists of a sequence of movement
operations. During the derivation, there is a point (identified as a
level of representation, S-structure, in early approaches, and simply
called Spellout, nowadays) at which the phonological and the syntactic
aspects of the derivation split up. Movement taking place before
Spellout has a phonological effect (visible displacement, overt
movement), movement taking place after Spellout (covert movement) has
no phonological effect, it is invisible/inaudible.
1. Overview of Movement in Language
This classical view of the distinction between covert and overt
movement prevailed through the eighties, but in the nineties, its
assumptions were questioned: is the difference between overt and covert
movement really expressible in terms of a Spellout point in the
derivation, or does it have to be specified independently, so that
covert operations may precede overt ones? Are overt and covert movement
really identical? Norvin Richards has written his book Movement in
Language. Interactions and Architectures (MiL) as a contribution to
this discussion (chapter 6), and he argues for a neo-classical concept
of movement, in which the difference between overt and covert
operations is (in principle) one of the timing relative to Spellout.
MiL claims that the neoclassical view is supported by the existence of
similarities among languages that only have overt (Bulgarian) or covert
(Chinese, Japanese) wh-movement, respectively, as opposed to languages
such as English that employ both types of movement. Models in which the
difference between overt and covert movement is one of timing are
particular in predicting that Bulgarian and Chinese type languages have
common properties (because all wh-movement steps take place at the same
point in the derivation, before Spellout in Bulgarian, after Spellout
in Chinese), whereas the different instances of wh-movement in a
multiple question are carried out in different parts of the derivation
in English type languages.
In addition, MiL offers analyses for a number of phenomena that are
formulated in terms of the neoclassical view and support it to the
extent that these are compelling. These detailed analyses of various
phenomena related to movement make the book extremely interesting and
valuable. Chapter 2 presents evidence for the idea that UG allows two
different types of multiple questions: those, in which all wh-phrases
cluster in the CP-domain, and those in which this clustering happens
within IP. Chapter 3 discusses strict ordering effects among multiple
specifiers of the same category (wh-phrases in Bulgarian, clitic
sequences, certain types of A-movement, etc.) and argues that they can
be derived from the Shortest Move condition and a particular way of
encoding cyclicity in grammar.
Chapter 4 is concerned with a fundamental problem of the (neo-)
classical model: there seem to exist positions P in natural languages
that are normally targeted by covert movement, but are passed through
by overt movement to higher positions Q. How can movement (to P)
applying after Spellout precede movement (from P to Q) before Spellout?
Richards solves this problem by formulating a model in which Pesetsky's
Earliness Principle and a constraint related to the phonological
realization of links in a chain imply that some instances of "covert"
movement may take place before Spellout.
Chapter 5 gives a detailed discussion of "minimal compliance":
sometimes, constraints such as subjacency or the superiority condition
do not have to be fulfilled by all links created by movement - rather,
it suffices that one dependency is in line with the constraint and
thereby licenses the later creation of dependencies violating it.
2. Two ways of forming multiple questions
The theory developed in MiL presupposes and elaborates on a proposal
originally made by Rudin (1988): in multiple questions, the wh-phrases
may either be all adjoined to IP (as in Serbo-Croatian), or they may be
made multiple specifiers of CP (as in Bulgarian). If long distance wh-
movement proceeds via the specifier position of CP only, we understand
why CP-absorption languages tolerate extractions from wh-clauses, while
IP-absorption languages do not. According to MiL, "IP-absorption"
languages are further characterized by allowing scrambling. They lack
superiority effects with local wh-movement (wh-objects may be placed in
front of wh-subjects in multiple questions), and they do not show weak
crossover-effects (as English does in ?who does his mother like). When
multiple wh-phrases from the same clause interact, they have the same
scope in IP-absorption languages, but CP-absorption languages are
different: wh-phrases with different scope are possible, and they
prefer crossing dependencies. Richards argues that this distinction
also applies to languages with covert wh-movement only, and to
languages such as German and English which combine overt and covert wh-
movement in multiple questions.
The discussion in MiL sheds an interesting new light on the possible
scope of a proposal that was originally made for languages with
multiple fronting of wh-phrases. Two remarks are in order, however.
First, some of the properties by which IP- and CP-absorbing languages
are distinguished are straightforward consequences of scrambling. That
scrambling languages show neither superiority nor weak crossover
effects, was, e.g., observed by Haider (1986), and he related this
property to the additional ordering possibilities created by
scrambling. Since objects may be scrambled to a position P c-commanding
the subject, the data in (1) have a derivation compatible with the
conditions responsible for superiority and weak crossover: whenever
object wh-movement starts in the position P c-commanding the subject,
it neither crosses a wh-subject nor a pronoun which it binds.
(1) a. ich weiss wen t-WH [wer t-SCRA liebt]
I know who.acc who.nom loves
b. wen liebt [t-WH [seine Mutter t-SCRA]
who.acc loves his mother
The question arises, then, whether the differences between German and
English with respect to the descriptive properties of wh-movement do
not just reduce to the fact that German is a free word order language,
while English is not. Such a solution would be incorrect only if one
could show that A-scrambling must not precede wh-movement, so that wh-
movement in (1) cannot start from the position t-WH preceding the
subject, but must originate in the object position t-SCRA c-commanded
by the subject. Such a constraint on the interaction of scrambling and
wh-movement has in fact been postulated by Müller & Sternefeld (1993),
and it seems to be a consequence of the general approach pursued in
MiL, since a chain resulting from a succession of A-scrambling and wh-
movement contains two strong positions (see below). But empirically,
the claim that wh-movement must not be preceded by A-scrambling is hard
to defend, given data such as (2), in which the movement of the wh-
operator was strands the rest of the object noun phrase in front of the
subject, i.e., in a scrambling position (see Fanselow 2001).
(2) Wasi hätte denn [DP,acc t für Aufsätze] selbst Hubert nicht
what had PTC [ t for papers ] even Hubert not
'What kind of paper would even Hubert not have wanted to review?
Apart from the question of whether the CP-IP-absorption distinction is
really supported by English-German-contrasts, it also cannot be taken
for granted that the properties in the clusters associated with CP- vs.
IP-absorption always go hand in hand. Swedish does not show superiority
effects in simple multiple questions (so it should be an IP-absorption
language), but it is quite liberal with respect to wh-islands (a
property claimed to be characteristic of CP-absorption languages) and
does not have scrambling (but Object Shift). Spanish is like Swedish in
this respect, but its word order is much more flexible.
Of course, one cannot exclude that the existence of languages that do
not fall in line with the clustering of properties predicted in MiL is
due to additional parameters and further structural distinctions.
Nevertheless, the above remarks concerning German, English, and Swedish
relativize the merits of an attempt to extend Rudin's proposal beyond
the multiple wh-movement languages.
3. Tucking in
The superiority effect observed in English multiple questions has been
a topic of syntactic theorizing for more than thirty years, and a
number of diverging theories have been proposed. It was again Rudin
(1988) who enriched this discussion with new data from multiple filler
languages: in Bulgarian double questions, both wh-phrases must be
fronted in overt syntax, and the order in which they appear in clause
initial position must be identical to the order in which they were
merged in IP. Rudin's own account involves the adjunction of wh-phrases
to the specifier position of CP, which is unsatisfactory from a
theoretical point of view, given that this analysis violates the strict
cyclicity of derivations (but see Grewendorf 2001 for a modern version
of this account).
MiL accounts for the contrast in (3) in the following way (chapter 3).
Movement to Spec,CP is subject to a Shortest Move/Minimal Link
Condition requirement: only the wh-phrase closest to the attracting
position moves. Therefore, the subject koj is the first category in the
derivation of (3) that moves to Spec,CP. Since Bulgarian is a multiple
wh-movement language, the object kogo must be moved as well. The strict
order effects in (3) follow if the second specifier position created by
moving kogo must be created _below_ the position of the XP moved first,
i.e, if XPs are "tucked in" below the phrase moved previously in
multiple specifier constructions. This presupposes a specific
definition of cyclicity that Richards takes over from Chomsky (1995).
(3) a. koj kogo vizda
who whom sees
b. *kogo koj vizda
Chapter 3 is particularly interesting because Richards shows that the
scope of the phenomenon captured by the tucking in - operation goes
beyond multiple questions in Bulgarian: Object shift, cliticization,
and certain types of A-scrambling and quantifier raising are further
cases in point. It is a fairly new discovery that in quite a number of
constructions, the c-command relations among moved phrases must be the
same before and after movement!
Unfortunately, MiL does not contain a detailed comparison of its
strictly derivational "tucking in"-model with the strictly
representational accounts offered by Müller (2001) and Williams (2003).
E.g., Müller proposes a (violable) constraint according to which c-
command relations among phrases must be identical at all levels of
representation (PF, LF, etc.). The representational models are
compatible with a derivation of (3a) proceeding in a traditional way
(kogo moves first). Independent evidence for the tucking in-idea thus
seems to be called for, and would be extremely valuable, since it would
strongly support a derivational model of grammar. MiL contains a brief
discussion (pp. 49-53) of Bulgarian constructions in which local wh-
movement mitigates subjacency violations of later non-local wh-
movement. If the licensing local movement must precede the licensed
long movement, insights into the order by which wh-phrase move to the
specifiers of a CP seem possible, and Richards claims the empirical
facts support his view. However, I do not find the "crucial" contrast
between a "*" sentence (his (21) on p. 53) and a "??"-sentence (his
(20) on p.52) too impressive, in particular, since it is based on the
intuitions of a single native speaker only.
4. Strong and weak features
A timing model of the contrast between overt and covert movement in
terms of a Spellout point in the derivation is confronted with the
problem that some constructions seem to involve an application of
covert movement that precedes overt movement steps. Richards dedicates
the fourth chapter of his book to a discussion of this problem. His
approach is framed in terms of the standard minimalist assumption that
movement serves the purpose of feature checking, and that there are two
types of features: strong and weak ones. In contrast to "standard"
minimalism, the strong-weak distinction is framed in terms of effects
on PF-chains (p 105): a strong feature is an instruction that the
position (in the chain) that checks this feature must be pronounced.
Weak features do not imply any constraints in terms of pronounciation.
On this basic assumption, Richards build a simple and elegant algorithm
for determining whether movement is overt or covert. The key idea lies
in the assumption (p. 105) that PF must receive unambiguous
instructions about which element in the chain must be pronounced.
If the chain contains one element only at PF, this condition is
trivially fulfilled. A chain with A checking a strong feature is
licensed since the single strong feature constitutes an umambiguous
instruction for PF pronounciation. On the other hand, a chain
with A checking a weak feature is not a legal PF object, since there
are two positions that could be pronounced, and weak feature do not
come with any pronounciation instructions. Such illegal PF objects
with A checking a weak feature can be avoided by postponing
movement to a weak position after the Spellout (so that will be
an LF object only).
Obviously, all chains containing a single strong position are predicted
to be grammatical PF-objects in this approach. Chains in which
either A or B are strong positions (but not both) possess a unique
pronounciation instruction. In fact, as MiL demonstrates, there are in
fact many instances of heads and phrases that need to target weak
positions when they undergo overt movement. Here are a few examples:
a. Case checking movement of objects is covert in English, but it needs
to precede overt wh- movement in wh-questions. Likewise, objects have
to move through specifier positions of participial phrases in French
wh-questions (triggering agreement there), although this movement must
be covert outside the context of wh-movement [Final position strong,
intermediate position weak]
b. There is no V-to-Infl movement in Mainland Scandinavian clauses,
i.e., the feature of Infl checking V is weak in Mainland Scandinavian.
In verb second clauses, V must pass through Infl on its way to Comp,
however. [Final position strong, intermediate position weak]
c. In Malay, there is partial wh-movement: the wh-phrase undergoes
overt movement to an intermediate specifier position only, and then
moves to its scope position covertly [Final position weak, intermediate
Particularly strong support for the view developed in MiL comes from
the fact that weak movement may also be carried out visibly in the
overt component of grammar when ellipsis and other reduction operations
apply. Sluicing constructions such as (4) are a case in point: from
which has moved to a specifier position of CP on the basis of a weak
feature (normally, only one wh-phrase moves in English multiple
questions), but this does not imply an illegal PF-chain, because the
root position of this movement is deleted in the sluicing construction
together with the whole VP, so that the wh-phrase in Spec,CP is the
only link in the chain left at PF. Spellout can thus proceed in a
unique and unambiguous way.
(4) I know that in each instance one of the girls got something from
one of the boys. But they did not tell me which from which.
I think this is a very elegant theory of the covert- overt distinction,
and it is quite a pity that a number of data do not really fit into it
- in particular, the theory disallows chains involving two strong
features (both features force pronunciation, so there is an ambiguous
instruction), and substantial part of the book (147 - 195) is dedicated
to a discussion of the problems that arise in this context. For
example, English subjects move to Spec,IP in overt syntax (attraction
by a strong feature), and, at least in long distance questions, they
are able to move on to the specifier of CP (attraction by a second
strong feature). Indeed, in quite a number of languages. subject wh-
movement must start from a "weaker" subject position (perhaps in VP),
but still, the grammaticality of (5a) must be accounted for. The
solution offered in MiL is that the ban against two strong features in
a chain is a violable one - it is violated whenever more important
principles must be respected. In the case of (5a), clausal pied piping
(who loves Irina do you think) is the competing derivation, but it is
excluded because the constraint ruling out the pied-piping of
complementizer-less CPs is stronger than the need to have unambiguous
pronunciation instructions. Probably, the ban against pied-piping IP-
complements is responsible for the possibility of overtly extracting
who from an ECM Spec,IP position reached by overt movement itself (5b).
(5) a. who do you think t loves Irina?
b. who do you expect t to kiss Irina?
Above, we have seen that overt scrambling may precede wh-movement in
German, a constellation that is also incompatible with a ban against
two strong features within a chain. Mahajan (1990, 1996) argues that
overt A-scrambling may be followed by overt A-bar-scrambling in Hindi.
Multiple strong features seem to be in general wellformed in PF-chains
(p190) when the strong features are of the same "type": overt V to Infl
and Infl to Comp movement may be combined in Icelandic (and phrases may
go from one subject position to the next one higher up in cyclic A-
movement). MiL leaves an account of these counterexamples open (p.
Covert movement may also have to precede overt movement in
constellations different from the ones considered in MiL. If islands
are made transparent by head incorporation (as in Baker (1988)), it is
often the case that overt movement is licensed by covert head
incorporation. The Minimal Compliance effects in English multiple
questions (see below) also presuppose that covert wh-movement of X may
precede the overt wh-movement of Y (as proposed by Pesetsky (2000 )).
The general architecture of the model proposed in MiL allows such
constellations: PF-chains with a root position and a position checking
a weak feature do not come with a unique pronunciation instruction,
but, as we have seen, the ban against such ambiguous chains is a
violable one in the MiL model. Nothing in principle excludes that
constraints other than the ban against certain types of clausal pied
piping override the principle favoring unambiguous PF-chains.
Although it takes over from Chomsky (1981) the idea that covert
movement is one that applies after Spellout, the model Richard proposes
has various loopholes by which covert movement may be brought forward.
An application after Spellout it thus a sufficient, but not a necessary
condition for a movement being covert.
5. Minimal Compliance
Examples such as (6) are a notorious problem for all theories of
superiority: the presence of a third wh-phrase in a clause renders
(certain) violations of the superiority condition possible. Similar
facts hold in Bulgarian: while the order of indirect and direct object
wh-phrases is not free in double questions, it is in triple questions,
where, e.g., both the wh-cluster nom-acc-dat and the cluster nom-dat-
acc are grammatical.
(6) what did who buy where
In MiL, Richards develops a very interesting account of such facts:
within certain domains, grammatical constraints must be respected by a
single grammatical relation only. Once the constraint has been
"checked" within a domain, other relations of the same kind in the same
domain need not obey the constraint in question. This is the Principle
of Minimal Compliance. Again, PMC effects can be observed in a large
number of different kinds of constructions: reflexivity in Dutch, weak
crossover effects in English, VP-ellipsis, and scrambling.
MiL argues that PMC effects support a derivational approach, because
the operation that satisfies the constraint in question must be
applied before operations violating it for there to be a PMC effect
saving the structure. Japanese and Bulgarian Shortest Move and
Subjacency facts seem to support this conclusion. Section 5.6. shows
that the PMC may be extended to an impressive number of further
The PMC proposal sheds a very interesting light on the way how
constraints are applied in natural language. The amount of data the PMC
seems to characterize correctly is impressive.
One question that does not really find an answer in MiL is the question
of what determines whether a principle is checked according to PMC or
not. Thus, in Hindi, a wh-phrase in an embedded clause (with matrix
scope) is not licensed by the presence of a wh-phrase in the matrix
clause (rather, the kyaa-construction must be used, see Mahajan 1990),
although short movement of the matrix wh-phrase should be sufficient
for satisfying the subjacency requirement of the relevant matrix Comp.
Clitic placement appears to be strictly local, independent of how many
clitics are attached to a head. An anaphor cannot be bound non-locally
just because a further anaphor is bound to the same antecedent in a
local fashion. The application of local scrambling within a certain
clause does not render long scrambling into that clause grammatical in
German. Presumably, one can formulate a model that allows one to
predict or at least describe whether a constraint is checked in the PMC
way or nor, but such a theory still needs to be developed.
6. General assessment
MiL is certainly one of the most important book-length contributions to
minimalist syntax of the last years. It provides fresh insights into
the nature of the shortest move / minimal link condition. The Principle
of Minimal Compliance represents an original, stimulating way of
dealing with the fact that syntactic constraints may fail to be
respected by certain dependencies within a clause. And MiL offers an
elegant theory of the distinction between overt and covert movement.
The only major weakness of the book I can see is one that is not
uncommon in generative (minimalist) syntax: key decisions about the
architecture and fundamental properties of the grammatical model are
motivated on the basis of fairly complex constructions, the
acceptability status of which is not really established beyond doubt.
We have already mentioned one such case in section 3 of this review (a
single speaker giving a contrast between "*" and "?") , but the
situation may even be worse. As Richards concedes himself. "the claim
that some covert wh-movement languages exhibit wh-island effects while
others lack them is not uncontroversial. There are in fact Chinese
speakers who reject wh-island violations and Japanese speakers who
violate wh-islands fairly freely [...]. The possibility arises, then,
the contrast to be discussed in this chapter is not a real one, but an
accident of the particular Chinese and Japanese speakers who provide
the data that have become standard in the literature. Refuting a
possibility like this is not a simple one, and is beyond the scope of
this chapter" (p.12)
Unfortunately, the problem is not really confined to wh-islands. We
know that speakers of a language show considerable variation in judging
the well-formedness of sentences when these are complex, or when their
well-formedness involves aspects of information structure (see, e.g.,
Schütze 1996)). Both factors play a crucial role in the relative degree
of acceptability in a substantial part of the data used in MiL.
Richards may be right in attributing certain properties to grammar and
languages rather than to processing and individual speakers, but given
that we both have the empirical (Schütze (1996), Cowart (1997), Keller
(2000)) and theoretical (Keller (2000), Bresnan & Nikitina (2003),
among others) means to deal with constellations that are characterized
by variation and influences of information structure and
processability, the question is whether the readers of MiL can really
be satisfied by statements that substantiating certain claims has been
"beyond the scope" of a certain chapter or of the book. I really like
the theoretical approach taken in MiL, and I hope that it will
stimulate the empirical research necessary for establishing whether the
factual claims the book makes are correct.
Baker, M. 1988. Incorporation. Chicago.
Bresnan, J. & T. Nikitina. 2003. On the Gradience of the Dative
Alternation. Ms., Stanford.
Chomsky, N. 1964. Current Issues in Linguistic Theory. Den Haag: Mouton
Chomsky, N. 1981. Lectures on Government and Binding. Dordrecht.
Chomsky, N. 1986. Barriers. Cambridge, Mass.
Chomsky, N. 1995. The Minimalist Program. Cambridge, Mass.
Cowart, W. 1997. Experimental Syntax. Thousand Oaks, CA.
Erteschik-Shir, N. 1973. On the nature of island constraints, MIT:
Fanselow, G. 2001. Features, ?-roles, and free constituent order.
Linguistic Inquiry 32,3.
Grewendorf, G. 2001. Multiple wh-movement. Linguistic Inquiry 32: 87-
Haider, H. 1986. Deutsche Syntax - generativ. Habilitation thesis,
Huang, C.-T., 1982. Move WH in a language without WH Movement. Texas
Linguistic Review 1: 369-416.
Keller, F. 2000. Gradience in Grammar. Doctoral dissertation,
Mahajan, A. 1990. The A/A-bar-distinction and movement theory. Doctoral
Müller, G, 2001. Order Preservation, Parallel Movement, and the
Emergence of the Unmarked. In: G. Legendre, J. Grimshaw and S. Vikner,
eds., Optimality-Theoretic Syntax. MIT Press, Cambridge, Mass., pp.
Müller, G. & W. Sternefeld. 1993. Improper movement and unambiguous
binding. Linguistic Inquiry 24: 461-507.
Pesestky. D. 2000. Phrasal Movement and Its Kin. Cambridge, Mass.
Ross, J.R. 1967. Constraints on Variables in Syntax. Doctoral
Rudin, C. 1988. On multiple questions and multiple wh fronting. Natural
Language and Linguistic Theory 6: 445-501.
Schütze, C. 1996. The Empirical Basis of Linguistics. Chicago.
Williams E. 2003. Representation Theory. Cambridge, Mass.
| ABOUT THE REVIEWER:
ABOUT THE REVIEWER Gisbert Fanselow is a professor of syntax at the University of Potsdam, Germany, His research has a focus in free word order phenomena (scrambling, discontinuous noun phrases), aspects of wh-movement (scope marking constructions, MLC). He has done some experimental work on preferences in local ambiguities and processing influences on grammaticality judgements.