LINGUIST List 14.2623

Tue Sep 30 2003

Review: Syntax/Morphology: Richards (2001)

Editor for this issue: Naomi Ogasawara <>

What follows is a review or discussion note contributed to our Book Discussion Forum. We expect discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for review." Then contact Simin Karimi at


  1. Gisbert Fanselow, Movement in Language: Interactions and Architectures

Message 1: Movement in Language: Interactions and Architectures

Date: Mon, 29 Sep 2003 14:33:52 +0000
From: Gisbert Fanselow <>
Subject: Movement in Language: Interactions and Architectures

Richards, Norvin (2001) Movement in Language: Interactions and
Architectures, Oxford University Press.

Announced at

Gisbert Fanselow, University of Potsdam

0. Background

The insights of Chomsky (1964), and, in particular, Ross (1967) lead
to the establishment of a new research topic in syntax: constraints on
movement. This new line of research generated an impressive number of
empirical insights, and culminated in attempts such as Chomsky (1981),
Chomsky (1986), or Baker (1988) of finding one or two simple
principles from which all constraints on movement can be derived.

However, empirical data that might prove fatal for a purely syntactic
account of the restrictions on movement already came in around 1980.
Huang (1982) observed that some of the local domains that restrict
movement in English also constraint scope assignment in Chinese
questions, in spite of the fact that question words do not undergo
(visible/audible) wh-movement in this language but may stay in situ.
Earlier, Erteschik (1973) had made the discovery that the degree of
acceptability of extractions from certain domains is a function of
information structure. These findings did not lead to a dismissal of
syntax-based accounts of constraints on movement, however. Rather,
models were developed in which the assignment of semantic scope to
operators is conceived of as the construction of a formal level of
representation (viz., Logical Form. LF), which involves essentially
the same type of operations that we find in visible syntax, including
movement (see, e.g., Huang (1982)). According to the GB-model (Chomsky
(1981)), the ultimate target of a syntactic derivation is Logical
Form. Much of the derivation of LF consists of a sequence of movement
operations. During the derivation, there is a point (identified as a
level of representation, S-structure, in early approaches, and simply
called Spellout, nowadays) at which the phonological and the syntactic
aspects of the derivation split up. Movement taking place before
Spellout has a phonological effect (visible displacement, overt
movement), movement taking place after Spellout (covert movement) has
no phonological effect, it is invisible/inaudible.

1. Overview of Movement in Language

This classical view of the distinction between covert and overt
movement prevailed through the eighties, but in the nineties, its
assumptions were questioned: is the difference between overt and
covert movement really expressible in terms of a Spellout point in the
derivation, or does it have to be specified independently, so that
covert operations may precede overt ones? Are overt and covert
movement really identical? Norvin Richards has written his book
Movement in Language. Interactions and Architectures (MiL) as a
contribution to this discussion (chapter 6), and he argues for a
neo-classical concept of movement, in which the difference between
overt and covert operations is (in principle) one of the timing
relative to Spellout.

MiL claims that the neoclassical view is supported by the existence of
similarities among languages that only have overt (Bulgarian) or
covert (Chinese, Japanese) wh-movement, respectively, as opposed to
languages such as English that employ both types of movement. Models
in which the difference between overt and covert movement is one of
timing are particular in predicting that Bulgarian and Chinese type
languages have common properties (because all wh-movement steps take
place at the same point in the derivation, before Spellout in
Bulgarian, after Spellout in Chinese), whereas the different instances
of wh-movement in a multiple question are carried out in different
parts of the derivation in English type languages.

In addition, MiL offers analyses for a number of phenomena that are
formulated in terms of the neoclassical view and support it to the
extent that these are compelling. These detailed analyses of various
phenomena related to movement make the book extremely interesting and
valuable. Chapter 2 presents evidence for the idea that UG allows two
different types of multiple questions: those, in which all wh-phrases
cluster in the CP-domain, and those in which this clustering happens
within IP. Chapter 3 discusses strict ordering effects among multiple
specifiers of the same category (wh-phrases in Bulgarian, clitic
sequences, certain types of A-movement, etc.) and argues that they can
be derived from the Shortest Move condition and a particular way of
encoding cyclicity in grammar.

Chapter 4 is concerned with a fundamental problem of the (neo-)
classical model: there seem to exist positions P in natural languages
that are normally targeted by covert movement, but are passed through
by overt movement to higher positions Q. How can movement (to P)
applying after Spellout precede movement (from P to Q) before
Spellout? Richards solves this problem by formulating a model in
which Pesetsky's Earliness Principle and a constraint related to the
phonological realization of links in a chain imply that some instances
of ''covert'' movement may take place before Spellout.

Chapter 5 gives a detailed discussion of ''minimal compliance'':
sometimes, constraints such as subjacency or the superiority condition
do not have to be fulfilled by all links created by movement - rather,
it suffices that one dependency is in line with the constraint and
thereby licenses the later creation of dependencies violating it.

2. Two ways of forming multiple questions 

The theory developed in MiL presupposes and elaborates on a proposal
originally made by Rudin (1988): in multiple questions, the wh-phrases
may either be all adjoined to IP (as in Serbo-Croatian), or they may
be made multiple specifiers of CP (as in Bulgarian). If long distance
wh- movement proceeds via the specifier position of CP only, we
understand why CP-absorption languages tolerate extractions from
wh-clauses, while IP-absorption languages do not. According to MiL,
''IP-absorption'' languages are further characterized by allowing
scrambling. They lack superiority effects with local wh-movement
(wh-objects may be placed in front of wh-subjects in multiple
questions), and they do not show weak crossover-effects (as English
does in ?who does his mother like). When multiple wh-phrases from the
same clause interact, they have the same scope in IP-absorption
languages, but CP-absorption languages are different: wh-phrases with
different scope are possible, and they prefer crossing
dependencies. Richards argues that this distinction also applies to
languages with covert wh-movement only, and to languages such as
German and English which combine overt and covert wh- movement in
multiple questions.

The discussion in MiL sheds an interesting new light on the possible
scope of a proposal that was originally made for languages with
multiple fronting of wh-phrases. Two remarks are in order, however.
First, some of the properties by which IP- and CP-absorbing languages
are distinguished are straightforward consequences of scrambling. That
scrambling languages show neither superiority nor weak crossover
effects, was, e.g., observed by Haider (1986), and he related this
property to the additional ordering possibilities created by
scrambling. Since objects may be scrambled to a position P
c-commanding the subject, the data in (1) have a derivation compatible
with the conditions responsible for superiority and weak crossover:
whenever object wh-movement starts in the position P c-commanding the
subject, it neither crosses a wh-subject nor a pronoun which it binds.

(1) a. ich weiss wen t-WH [wer t-SCRA liebt]
 I know who.acc who.nom loves
 b. wen liebt [t-WH [seine Mutter t-SCRA]
 who.acc loves his mother

The question arises, then, whether the differences between German and
English with respect to the descriptive properties of wh-movement do
not just reduce to the fact that German is a free word order language,
while English is not. Such a solution would be incorrect only if one
could show that A-scrambling must not precede wh-movement, so that wh-
movement in (1) cannot start from the position t-WH preceding the
subject, but must originate in the object position t-SCRA c-commanded
by the subject. Such a constraint on the interaction of scrambling and
wh-movement has in fact been postulated by M�ller & Sternefeld
(1993), and it seems to be a consequence of the general approach
pursued in MiL, since a chain resulting from a succession of
A-scrambling and wh- movement contains two strong positions (see
below). But empirically, the claim that wh-movement must not be
preceded by A-scrambling is hard to defend, given data such as (2), in
which the movement of the wh- operator was strands the rest of the
object noun phrase in front of the subject, i.e., in a scrambling
position (see Fanselow 2001).

(2) Wasi h�tte denn [DP,acc t f�r Aufs�tze] selbst Hubert nicht 
 what had PTC [ t for papers ] even Hubert not
 rezensieren wollen 
 review wanted
 'What kind of paper would even Hubert not have wanted to review?

Apart from the question of whether the CP-IP-absorption distinction is
really supported by English-German-contrasts, it also cannot be taken
for granted that the properties in the clusters associated with CP-
vs. IP-absorption always go hand in hand. Swedish does not show
superiority effects in simple multiple questions (so it should be an
IP-absorption language), but it is quite liberal with respect to
wh-islands (a property claimed to be characteristic of CP-absorption
languages) and does not have scrambling (but Object Shift). Spanish is
like Swedish in this respect, but its word order is much more

Of course, one cannot exclude that the existence of languages that do
not fall in line with the clustering of properties predicted in MiL is
due to additional parameters and further structural distinctions.
Nevertheless, the above remarks concerning German, English, and
Swedish relativize the merits of an attempt to extend Rudin's proposal
beyond the multiple wh-movement languages.

3. Tucking in

The superiority effect observed in English multiple questions has been
a topic of syntactic theorizing for more than thirty years, and a
number of diverging theories have been proposed. It was again Rudin
(1988) who enriched this discussion with new data from multiple filler
languages: in Bulgarian double questions, both wh-phrases must be
fronted in overt syntax, and the order in which they appear in clause
initial position must be identical to the order in which they were
merged in IP. Rudin's own account involves the adjunction of
wh-phrases to the specifier position of CP, which is unsatisfactory
from a theoretical point of view, given that this analysis violates
the strict cyclicity of derivations (but see Grewendorf 2001 for a
modern version of this account).

MiL accounts for the contrast in (3) in the following way (chapter 3).
Movement to Spec,CP is subject to a Shortest Move/Minimal Link
Condition requirement: only the wh-phrase closest to the attracting
position moves. Therefore, the subject koj is the first category in
the derivation of (3) that moves to Spec,CP. Since Bulgarian is a
multiple wh-movement language, the object kogo must be moved as
well. The strict order effects in (3) follow if the second specifier
position created by moving kogo must be created _below_ the position
of the XP moved first, i.e, if XPs are ''tucked in'' below the phrase
moved previously in multiple specifier constructions. This presupposes
a specific definition of cyclicity that Richards takes over from
Chomsky (1995).

(3) a. koj kogo vizda
 who whom sees
 b. *kogo koj vizda

Chapter 3 is particularly interesting because Richards shows that the
scope of the phenomenon captured by the tucking in - operation goes
beyond multiple questions in Bulgarian: Object shift, cliticization,
and certain types of A-scrambling and quantifier raising are further
cases in point. It is a fairly new discovery that in quite a number of
constructions, the c-command relations among moved phrases must be the
same before and after movement!

Unfortunately, MiL does not contain a detailed comparison of its
strictly derivational ''tucking in''-model with the strictly
representational accounts offered by M�ller (2001) and Williams
(2003). E.g., M�ller proposes a (violable) constraint according to
which c- command relations among phrases must be identical at all
levels of representation (PF, LF, etc.). The representational models
are compatible with a derivation of (3a) proceeding in a traditional
way (kogo moves first). Independent evidence for the tucking in-idea
thus seems to be called for, and would be extremely valuable, since it
would strongly support a derivational model of grammar. MiL contains a
brief discussion (pp. 49-53) of Bulgarian constructions in which local
wh- movement mitigates subjacency violations of later non-local wh-
movement. If the licensing local movement must precede the licensed
long movement, insights into the order by which wh-phrase move to the
specifiers of a CP seem possible, and Richards claims the empirical
facts support his view. However, I do not find the ''crucial''
contrast between a ''*'' sentence (his (21) on p. 53) and a
''??''-sentence (his (20) on p.52) too impressive, in particular,
since it is based on the intuitions of a single native speaker only.

4. Strong and weak features

A timing model of the contrast between overt and covert movement in
terms of a Spellout point in the derivation is confronted with the
problem that some constructions seem to involve an application of
covert movement that precedes overt movement steps. Richards dedicates
the fourth chapter of his book to a discussion of this problem. His
approach is framed in terms of the standard minimalist assumption that
movement serves the purpose of feature checking, and that there are
two types of features: strong and weak ones. In contrast to
''standard'' minimalism, the strong-weak distinction is framed in
terms of effects on PF-chains (p 105): a strong feature is an
instruction that the position (in the chain) that checks this feature
must be pronounced. Weak features do not imply any constraints in
terms of pronounciation. On this basic assumption, Richards build a
simple and elegant algorithm for determining whether movement is overt
or covert. The key idea lies in the assumption (p. 105) that PF must
receive unambiguous instructions about which element in the chain must
be pronounced.

If the chain contains one element only at PF, this condition is
trivially fulfilled. A chain <A,B> with A checking a strong feature is
licensed since the single strong feature constitutes an umambiguous
instruction for PF pronounciation. On the other hand, a chain <A,B>
with A checking a weak feature is not a legal PF object, since there
are two positions that could be pronounced, and weak feature do not
come with any pronounciation instructions. Such illegal PF objects
<A,B> with A checking a weak feature can be avoided by postponing
movement to a weak position after the Spellout (so that <A,B> will be
an LF object only).

Obviously, all chains containing a single strong position are
predicted to be grammatical PF-objects in this approach. Chains
<A,B,C> in which either A or B are strong positions (but not both)
possess a unique pronounciation instruction. In fact, as MiL
demonstrates, there are in fact many instances of heads and phrases
that need to target weak positions when they undergo overt
movement. Here are a few examples:

a. Case checking movement of objects is covert in English, but it
needs to precede overt wh- movement in wh-questions. Likewise, objects
have to move through specifier positions of participial phrases in
French wh-questions (triggering agreement there), although this
movement must be covert outside the context of wh-movement [Final
position strong, intermediate position weak]

b. There is no V-to-Infl movement in Mainland Scandinavian clauses,
i.e., the feature of Infl checking V is weak in Mainland Scandinavian.
In verb second clauses, V must pass through Infl on its way to Comp,
however. [Final position strong, intermediate position weak]

c. In Malay, there is partial wh-movement: the wh-phrase undergoes
overt movement to an intermediate specifier position only, and then
moves to its scope position covertly [Final position weak,
intermediate position strong]

Particularly strong support for the view developed in MiL comes from
the fact that weak movement may also be carried out visibly in the
overt component of grammar when ellipsis and other reduction
operations apply. Sluicing constructions such as (4) are a case in
point: from which has moved to a specifier position of CP on the basis
of a weak feature (normally, only one wh-phrase moves in English
multiple questions), but this does not imply an illegal PF-chain,
because the root position of this movement is deleted in the sluicing
construction together with the whole VP, so that the wh-phrase in
Spec,CP is the only link in the chain left at PF. Spellout can thus
proceed in a unique and unambiguous way.

(4) I know that in each instance one of the girls got something from
one of the boys. But they did not tell me which from which.

I think this is a very elegant theory of the covert- overt
distinction, and it is quite a pity that a number of data do not
really fit into it - in particular, the theory disallows chains
involving two strong features (both features force pronunciation, so
there is an ambiguous instruction), and substantial part of the book
(147 - 195) is dedicated to a discussion of the problems that arise in
this context. For example, English subjects move to Spec,IP in overt
syntax (attraction by a strong feature), and, at least in long
distance questions, they are able to move on to the specifier of CP
(attraction by a second strong feature). Indeed, in quite a number of
languages. subject wh- movement must start from a ''weaker'' subject
position (perhaps in VP), but still, the grammaticality of (5a) must
be accounted for. The solution offered in MiL is that the ban against
two strong features in a chain is a violable one - it is violated
whenever more important principles must be respected. In the case of
(5a), clausal pied piping (who loves Irina do you think) is the
competing derivation, but it is excluded because the constraint ruling
out the pied-piping of complementizer-less CPs is stronger than the
need to have unambiguous pronunciation instructions. Probably, the ban
against pied-piping IP- complements is responsible for the possibility
of overtly extracting who from an ECM Spec,IP position reached by
overt movement itself (5b).

(5) a. who do you think t loves Irina?
 b. who do you expect t to kiss Irina?

Above, we have seen that overt scrambling may precede wh-movement in
German, a constellation that is also incompatible with a ban against
two strong features within a chain. Mahajan (1990, 1996) argues that
overt A-scrambling may be followed by overt A-bar-scrambling in Hindi.
Multiple strong features seem to be in general wellformed in PF-chains
(p190) when the strong features are of the same ''type'': overt V to
Infl and Infl to Comp movement may be combined in Icelandic (and
phrases may go from one subject position to the next one higher up in
cyclic A- movement). MiL leaves an account of these counterexamples
open (p. 190).

Covert movement may also have to precede overt movement in
constellations different from the ones considered in MiL. If islands
are made transparent by head incorporation (as in Baker (1988)), it is
often the case that overt movement is licensed by covert head
incorporation. The Minimal Compliance effects in English multiple
questions (see below) also presuppose that covert wh-movement of X may
precede the overt wh-movement of Y (as proposed by Pesetsky (2000 )).
The general architecture of the model proposed in MiL allows such
constellations: PF-chains with a root position and a position checking
a weak feature do not come with a unique pronunciation instruction,
but, as we have seen, the ban against such ambiguous chains is a
violable one in the MiL model. Nothing in principle excludes that
constraints other than the ban against certain types of clausal pied
piping override the principle favoring unambiguous PF-chains.

Although it takes over from Chomsky (1981) the idea that covert
movement is one that applies after Spellout, the model Richard
proposes has various loopholes by which covert movement may be brought
forward. An application after Spellout it thus a sufficient, but not
a necessary condition for a movement being covert.

5. Minimal Compliance 

Examples such as (6) are a notorious problem for all theories of
superiority: the presence of a third wh-phrase in a clause renders
(certain) violations of the superiority condition possible. Similar
facts hold in Bulgarian: while the order of indirect and direct object
wh-phrases is not free in double questions, it is in triple questions,
where, e.g., both the wh-cluster nom-acc-dat and the cluster nom-dat-
acc are grammatical.

(6) what did who buy where 

In MiL, Richards develops a very interesting account of such facts:
within certain domains, grammatical constraints must be respected by a
single grammatical relation only. Once the constraint has been
''checked'' within a domain, other relations of the same kind in the
same domain need not obey the constraint in question. This is the
Principle of Minimal Compliance. Again, PMC effects can be observed in
a large number of different kinds of constructions: reflexivity in
Dutch, weak crossover effects in English, VP-ellipsis, and scrambling.

MiL argues that PMC effects support a derivational approach, because
the operation that satisfies the constraint in question must be
applied before operations violating it for there to be a PMC effect
saving the structure. Japanese and Bulgarian Shortest Move and
Subjacency facts seem to support this conclusion. Section 5.6. shows
that the PMC may be extended to an impressive number of further

The PMC proposal sheds a very interesting light on the way how
constraints are applied in natural language. The amount of data the
PMC seems to characterize correctly is impressive.

One question that does not really find an answer in MiL is the
question of what determines whether a principle is checked according
to PMC or not. Thus, in Hindi, a wh-phrase in an embedded clause (with
matrix scope) is not licensed by the presence of a wh-phrase in the
matrix clause (rather, the kyaa-construction must be used, see Mahajan
1990), although short movement of the matrix wh-phrase should be
sufficient for satisfying the subjacency requirement of the relevant
matrix Comp. Clitic placement appears to be strictly local,
independent of how many clitics are attached to a head. An anaphor
cannot be bound non-locally just because a further anaphor is bound to
the same antecedent in a local fashion. The application of local
scrambling within a certain clause does not render long scrambling
into that clause grammatical in German. Presumably, one can formulate
a model that allows one to predict or at least describe whether a
constraint is checked in the PMC way or nor, but such a theory still
needs to be developed.

6. General assessment

MiL is certainly one of the most important book-length contributions
to minimalist syntax of the last years. It provides fresh insights
into the nature of the shortest move / minimal link condition. The
Principle of Minimal Compliance represents an original, stimulating
way of dealing with the fact that syntactic constraints may fail to be
respected by certain dependencies within a clause. And MiL offers an
elegant theory of the distinction between overt and covert movement.

The only major weakness of the book I can see is one that is not
uncommon in generative (minimalist) syntax: key decisions about the
architecture and fundamental properties of the grammatical model are
motivated on the basis of fairly complex constructions, the
acceptability status of which is not really established beyond doubt.
We have already mentioned one such case in section 3 of this review (a
single speaker giving a contrast between ''*'' and ''?'') , but the
situation may even be worse. As Richards concedes himself. ''the claim
that some covert wh-movement languages exhibit wh-island effects while
others lack them is not uncontroversial. There are in fact Chinese
speakers who reject wh-island violations and Japanese speakers who
violate wh-islands fairly freely [...]. The possibility arises, then,
the contrast to be discussed in this chapter is not a real one, but an
accident of the particular Chinese and Japanese speakers who provide
the data that have become standard in the literature. Refuting a
possibility like this is not a simple one, and is beyond the scope of
this chapter'' (p.12)

Unfortunately, the problem is not really confined to wh-islands. We
know that speakers of a language show considerable variation in
judging the well-formedness of sentences when these are complex, or
when their well-formedness involves aspects of information structure
(see, e.g., Sch�tze 1996)). Both factors play a crucial role in the
relative degree of acceptability in a substantial part of the data
used in MiL. Richards may be right in attributing certain properties
to grammar and languages rather than to processing and individual
speakers, but given that we both have the empirical (Sch�tze (1996),
Cowart (1997), Keller (2000)) and theoretical (Keller (2000), Bresnan
& Nikitina (2003), among others) means to deal with constellations
that are characterized by variation and influences of information
structure and processability, the question is whether the readers of
MiL can really be satisfied by statements that substantiating certain
claims has been ''beyond the scope'' of a certain chapter or of the
book. I really like the theoretical approach taken in MiL, and I hope
that it will stimulate the empirical research necessary for
establishing whether the factual claims the book makes are correct.


Baker, M. 1988. Incorporation. Chicago. 

Bresnan, J. & T. Nikitina. 2003. On the Gradience of the Dative
Alternation. Ms., Stanford.

Chomsky, N. 1964. Current Issues in Linguistic Theory. Den Haag: Mouton

Chomsky, N. 1981. Lectures on Government and Binding. Dordrecht.

Chomsky, N. 1986. Barriers. Cambridge, Mass. 

Chomsky, N. 1995. The Minimalist Program. Cambridge, Mass. 

Cowart, W. 1997. Experimental Syntax. Thousand Oaks, CA. 

Erteschik-Shir, N. 1973. On the nature of island constraints, MIT:
Ph.D. Dissertation.

Fanselow, G. 2001. Features, ?-roles, and free constituent order.
Linguistic Inquiry 32,3.

Grewendorf, G. 2001. Multiple wh-movement. Linguistic Inquiry 32: 87-

Haider, H. 1986. Deutsche Syntax - generativ. Habilitation thesis,

Huang, C.-T., 1982. Move WH in a language without WH Movement. Texas
Linguistic Review 1: 369-416.

Keller, F. 2000. Gradience in Grammar. Doctoral dissertation,

Mahajan, A. 1990. The A/A-bar-distinction and movement
theory. Doctoral dissertation, MIT.

M�ller, G, 2001. Order Preservation, Parallel Movement, and the
Emergence of the Unmarked. In: G. Legendre, J. Grimshaw and S. Vikner,
eds., Optimality-Theoretic Syntax. MIT Press, Cambridge, Mass., pp.

M�ller, G. & W. Sternefeld. 1993. Improper movement and unambiguous
binding. Linguistic Inquiry 24: 461-507.

Pesestky. D. 2000. Phrasal Movement and Its Kin. Cambridge, Mass. 

Ross, J.R. 1967. Constraints on Variables in Syntax. Doctoral
dissertation, MIT.

Rudin, C. 1988. On multiple questions and multiple wh
fronting. Natural Language and Linguistic Theory 6: 445-501.

Sch�tze, C. 1996. The Empirical Basis of Linguistics. Chicago. 

Williams E. 2003. Representation Theory. Cambridge, Mass. 


Gisbert Fanselow is a professor of syntax at the University of
Potsdam, Germany, His research has a focus in free word order
phenomena (scrambling, discontinuous noun phrases), aspects of
wh-movement (scope marking constructions, MLC). He has done some
experimental work on preferences in local ambiguities and processing
influences on grammaticality judgements.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue