Review of  The Phonology of Standard Chinese

Reviewer: Jie Zhang
Book Title: The Phonology of Standard Chinese
Book Author: San Duanmu
Publisher: Oxford University Press
Linguistic Field(s): Phonology
Subject Language(s): Chinese, Mandarin
Duanmu, San (2000) The Phonology of Standard Chinese,
Oxford University Press, hardback, xv, 300pp., in 'The
Phonology of the World's Languages' series, edited by
Jacques Durand, ISBN: 0-19-829987-7.

Jie Zhang, UCLA and Harvard University.


Duanmu's book is a comprehensive study of the phonology of
Standard Chinese (or Mandarin). Its target readers are
anyone with an interest in the synchronic or diachronic
phonology of Chinese, or just Chinese languages or
phonology in general. It strives to provide both accurate
factual descriptions and theoretical analyses, at the same
time avoiding esoteric jargon specific to historical
Chinese philology or theoretical phonology. As the book
succeeds in all accounts, it should appeal to a wide range
of readership.

The book contains twelve chapters. It starts by
introducing the history of Chinese languages and their
speakers (Chapter 1), then moves on to discuss the sound
inventory (Chapter 2), co-occurrence restrictions of sounds
(Chapter 3), and syllable structure (Chapter 4) of Standard
Chinese (henceforth SC). Chapters 5 to 8 are a detailed
study of wordhood in Chinese, in particular, how the issues
of word length and word order can be addressed in the light
of a theory of stress in Chinese proposed by the author.
Chapter 9 discusses the phonology related to the diminutive
suffix [r] in SC. The next two chapters address the issue
of tone in SC: Chapter 10 focuses on the basic properties
of tone, such as its phonetic correlates, its featural
representation, the concept of tone-bearing unit, and the
tonal inventory of SC; Chapter 11 focuses on the infamous
third tone sandhi of SC. Chapter 12 raises further issues
of Chinese phonology not addressed in the book.

The introductory chapter of the book briefly reviews the
historical development of Chinese languages and the
establishment and preservation of a standard language. I
see four interesting points conveyed by the author. First,
although SC (or Mandarin, Putonghua 'Common Speech', Hanyu
'Chinese Language', Guoyu 'National Language') has only
been the official language of China for a few decades, it
follows from the tradition of having a standardized form of
the language, from Yayan 'Refined Speech' in the Chunqiu
period (722-482BC) to Guanhua 'The Official Language' in
the Ming Dynasty (1368-1644). Second, unlike some standard
European languages, such as RP British English, SC does not
carry a superior social prestige. Third, there had been a
great discrepancy between written and spoken Chinese
throughout history, and only recently (since the beginning
of the 20th century) has this discrepancy been significantly
reduced. And fourth, there are tremendous differences
among Chinese dialects, many of which are mutually
unintelligible. These properties of the Chinese language
determine that the study of spoken SC is necessarily
different in nature than the study of many other languages.

Chapter 2 discusses the sound inventory of SC. Upon
providing theoretical backgrounds in phonological
representation of sounds, phonemic analysis, and syllable
structure, the author proposes that there are three levels
in the analysis of SC sounds: underlying, syllabic, and
phonetic. On the underlying level, SC has 19 consonants
and five vowels: p, ph, f, m, t, th, ts, tsh, s, n, l,
voiceless unaspirated retroflex affricate, voiceless
aspirated retroflex affricate, voiceless retroflex
fricative, r, k, kh, x, ng; i, y, u, E (=schwa), a. On the
syllabic level, there are also three glides that correspond
to high vowels i, y, and u in the onset position, 29
consonant-glide combinations that share one onset slot, 2
syllabic consonants z and r that are derived from empty
rhyme slots filled by the onset consonant, and a zero
onset, which represents an empty onset slot. The phonetic
level includes all other allophones, such as allophones of
E--o, e, E, and the mid back unrounded vowel--in various
environments. The most innovative proposal here is to
treat the palatal affricates and fricative as dental+j
combinations tsj, tshj, and sj underlyingly. The author's
main argument comes from the fact that there is a group of
SC speakers that realizes the palatals as dental+j

Chapter 3 is a detailed study of the co-occurrence
restrictions and surface variations of SC sounds. The
author starts by observing that of all the possible 2,280
syllables, only about 400 actually occur. E.g., [wau],
[fje], [pjEu] (E=schwa) are bad forms in SC. He then
proposes an analysis that accounts for the missing GVX
forms by referring to a harmony constraint that requires
the rhyme elements to agree in [back] and [round] features,
and a dissimilation constraint that bans adjacent palatal
sounds and adjacent rounded sounds. The author also
proposes an Optimality-theoretic account for the surface
realizations of the underlying forms using harmony
constraints and constraints on featural combinations.

Chapter 4 is the analysis of syllable structure in SC.
Traditionally, the structure of SC syllables is considered
variable from a minimum of one segment, V or C, to a
maximum of four segments, CGVC (coda C = n or ng) or CGVG.
The author here, following his dissertation (Duanmu 1990),
proposes a novel approach in which all SC syllables are
either full or weak; full syllables have the structure CVX,
with one onset slot and two rhyme slots; and weak syllables
have the structure CV, with one onset slot and only one
rhyme slot. A number of arguments is presented to support
this approach, the most convincing of which come from
phonetics: the prenuclear glide, when present, is
phonetically realized as the secondary articulation of the
onset consonant; the rhyme portion of full syllables has
comparable duration--an open full syllable has a long vowel
and a closed full syllable has a short vowel; weak
syllables are phonetically CV syllables with a short vowel.
When a full syllable with a diphthong or a nasal coda is
reduced to a weak syllable, the diphthong is
monophthongized and the nasal coda is realized as vowel
nasalization. These proposed syllable structures will also
turn out to be useful in the account of SC tonal
distribution (Chapter 10).

Chapter 5 marks the beginning of the discussion of word-
level phonology in SC. The author first calls attention to
the importance of the compound/phrase distinction in
Chinese by pointing out that they might have different tone
sandhi patterns. E.g., in Shanghai, [tsho ve], when used
as a compound meaning 'fried rice', has the tonal pattern
L-H; while when used as a phrase meaning 'to fry rice', has
the tonal pattern LH-LH. He then reviews the tests that
have been previously proposed to distinguish compounds from
phrases, and points out the conflicts among them. The
tests that the author eventually adopts are the following:
Conjunction Reduction, Adverbial Modification, XP
Substitution, and Productivity. He also argues that the
tests Freedom of Parts, Semantic Composition, and
Exocentric Structure can detect compounds when failed, but
cannot guarantee that the structures which pass them are
phrases, and therefore should be adopted with limitations.
The consequence of applying the adopted tests is that in
Chinese, a modifier-noun [M N] without the particle 'de' is
a compound, so are its derivatives [M [M N]], [[M N] N],
[[M N] [M N]] etc.

Chapter 6 discusses the author's innovative approach to
Chinese stress. The main proposals are the following.
First, contra traditional assumptions, SC has word stress.
Second, the basic stress unit is a syllabic trochee.
Third, a simple di- or polysyllabic word forms trochees
left to right; a compound or phrase assigns stress
cyclically according to the Nonhead Stress rule (Duanmu
1990, Cinque 1993). The advantages of this approach in
accounting for the disyllabic requirement, the restrictions
on word length, and the restrictions on word order in SC
are also briefly touched upon in this chapter, and are
spelled out in greater details in later chapters. The
author also addresses the obvious question to stress in
Chinese: why do native speakers lack good intuitions about
where the stress falls? He argues that this is due to the
fact that Chinese is a tone language, thus one of the major
stress indicators, f0, cannot be freely altered to indicate
stress, since it is being used to mark lexical contrasts.

Chapter 7 elaborates on the author's approach to the word
length problem in SC under his theory of SC stress. The
basic data patterns to be explained are the following. For
modifier-noun [M N] constructions, a [1 2] (the numbers
indicate the numbers of syllables in each syntactic
category) structure, such as *mei shang-dian 'coal store',
is bad, while [2 2], [2 1], and [1 1] structures are good.
But for verb-object [V O] constructions, [2 1] (*zhong-zhi
suan 'plant garlic') is bad, while [2 2], [1 2], and [1 1]
are good. The author argues that the reason for the
difference between [M N] and [V O] is metrical. In [M N],
M is the nonhead and should receive stress. If it is [1
2], N can form a binary foot, but M cannot. Under the
assumption that the main stress must be in a binary foot,
the structure is ungrammatical. For [V O] however, O is
the nonhead and receives stress, therefore [1 2] is good,
but [2 1] is less optimal, since the object can only form a
binary foot with the following empty beat, thus the main
stress falls in a weak foot. The author also dismisses two
widely held myths about Chinese words, the first being most
Chinese words are monosyllabic, the second being Chinese
developed many disyllabic words recently due to sound
changes that resulted in a decrease in the size of its
syllable inventory. According to two corpus studies (ZWGW
1959, He and Li 1987), the author argues that the majority
of the modern vocabulary is in fact disyllabic. As for the
increase in disyllabic vocabulary, he argues that it was
primarily caused by the disyllabic or longer borrowings
from Japanese and English, not by homophone avoidance.

Chapter 8 continues the discussion of word-level phonology
and elaborates on the author's approach to the word order
problem in SC, again in the light of his theory of SC
stress. The basic patterns are as follows. For [V-O N]
compounds, when V and O are both monosyllabic, [V O N] is
the only possible order, e.g., qie cai dao 'cut vegetable
knife'; but when V and O are both disyllabic, [O V N] is
the only possible order, e.g., luo-bo jia-gong dao 'turnip
process knife'. For [X Y N] compounds where X and Y are
modifiers of N, when X and Y are both disyllabic, there is
a fixed order between the two modifiers, e.g., da-xing han-
yu ci-dian 'large-scale Chinese dictionary'; but when one
of them is monosyllabic, the preferred pattern is to have
the monosyllabic modifier in the Y position, e.g, han-yu da
ci-dian 'Chinese large dictionary'. The full analysis of
these patterns involves elaborate constraints and
arguments, but the basic idea still follows from the theory
of SC stress: first, word order variation is triggered by
metrical requirement; second, the mechanism for word
movement is Nonhead Fronting. The difference between [V-O
N] and [X Y N] compounds comes from the difference in the
position of the head.

Chapter 9 discusses the diminutive suffix [r] in SC.
Traditionally, there have been three mysteries to the
realization of stem syllables with the diminutive suffix.
First, why does the [r] suffix replace some portion of the
rhyme in certain rhymes, but is added to the rhyme in
others? E.g., pai+r --> par 'board', pau+r --> paur
'robe'. Second, why is a schwa added between the rhyme and
the [r] suffix when the rhyme has a high vowel? E.g., i+r
- > iEr 'clothes'. Third, after the [r] suffix is added,
why is the alveolar nasal coda lost, while the velar nasal
coda is preserved as vowel nasalization? E.g., pan+r -->
par 'plate', pang+r --> pa~r 'side'. The author answers
these questions as follows. First, when a sound is
articulatorily incompatible with [r], the sound is
replaced. Otherwise [r] is added to the sound. Assuming
that the phonetic realization of pang+r is pangr, this
accounts for both the first and the third mysteries.
Second, a high nuclear vowel spreads to the onset and thus
leaves the nucleus slot empty; the default height value of
the nucleus is then filled in, which is mid.

Chapter 10 lays out the basic properties of tones in SC.
The discussion starts from the phonetic correlates of tone
and the different systems of tonal transcription used in
Chinese linguistics including IPA, number marking, and Chao
letters. The author then moves on to discuss the
phonological representation of tone and proposes to
represent tone levels with two features--Register and
Pitch. The two values for Register are stiff vocal cords
and slack vocal cords, which indicate non-murmured and
murmured quality of the vowel respectively. The two values
for Pitch are thin vocal cords and thick vocal cords, which
indicate high and low pitch respectively. The cross-
classification of these two features yields four possible
levels of tonal contrasts. The author then takes on the
issues of contour tone representation and tone-bearing unit
(TBU). He argues that contour tones are combinations of
level tones, and the TBU is not the syllable or the rhyme,
but the moraic segment. This approach accounts for the
fact that a weak syllable in SC can carry only level tones,
a full syllable can carry simple contour tones, and complex
contour tones are usually restricted to final position
only, where the syllable is lengthened. In this system,
the four Mandarin tones on nonfinal full syllables--55, 35,
21, and 51 in Chao letters--are represented as [-mur, H],
[-mur, LH], [+mur, L], and [-mur, HL] respectively, each
associated with two TBUs. On monosyllables or final
position of di- or polysyllabic words, the third tone 21
lengthens the syllable to trimoraic and is realized as 214.
Various phonetic variations of these tones in different
contexts are discussed in the light of these
representations. The most interesting one also relates to
the stress theory of SC proposed earlier: there is a
difference in the realization of a final third tone between
[M N] and [V O]. For 'sai ma', when it is a [M N]
structure, which means 'a race-horse', the third-toned 'ma'
is more likely to be realized as 21; but when it is a [V O]
structure, which means 'to race horses', the third-toned
'ma' must be realized as 214. The account of these facts
is again Nonhead Stress. In [M N], N is the head and thus
not stressed, while in [V O], O is the nonhead and thus
stressed. It is more likely for 214 to surface on a more
heavily stressed syllable.

Chapter 11 focuses on the infamous third tone sandhi rule
in SC. The exact application of this sandhi rule,
especially in longer strings, is dependent on many factors,
such as the syntactic branching, syntactic categories
involved in the structure, and emphasis. When there are
multiple ways to apply the sandhi, it sometimes gives
various alternative surface patterns. And sometimes the
sandhi rule is optional. The author proposes an analysis
that accounts for all the facts. The analysis again relies
on the syllabic trochee foot structure and Nonhead Stress
and considers that the sandhi rule applies cyclically
starting from each foot, and it is optional between two
cyclic branches. The advantages of this metrical approach
over previous approaches are also discussed.

The final chapter of the book, Chapter 12, raises further
issues in SC phonology that the author has not covered.
They include various connected speech phenomena such as
consonant reduction, rhyme reduction, vowel devoicing, and
syllable merger; phonological processes in other Chinese
dialects, such as tone sandhi in Wu and Min dialects and
rhyme changes under affixation in various dialects; and
properties of Taiwanese accented SC.


As I have said at the beginning of the review, this book is
a comprehensive study of the phonology of Standard Chinese.
Not only is it rich in detailed and amazingly accurate
factual description, it also proposes elegant theoretical
solutions to many long-standing problems in Chinese
phonology, such as word length variation, word order, and
the application of the third tone sandhi. Another great
strength of the book is that in every chapter, the
generative literature on related issues is carefully
reviewed. Therefore it can also serve as a great reference
book for the past advances in Chinese generative phonology.
Moreover, the book is written in a down-to-earth fashion
and is very approachable by anyone with the slightest
interest in Chinese languages or phonology but relatively
little training in either area.

As for the specific issues discussed in the book, Chapters
5 to 8 are probably my personal favorites. I find the
author's arguments for the metrical structure of Chinese
extremely cogent. The stress, word length, and word order
problems have been long-standing in Chinese phonology, and
the author's approach is the most systematic and
comprehensive I have seen on these subjects. It is hard
not to be deterred by the numerous phonetic studies that
show the lack of consistent acoustic correlates and speaker
intuition for SC stress, as many researchers are, but the
author convincingly shows that stress must be
phonologically relevant for SC, since otherwise many
phenomena would go unexplained. Moreover, the author also
shows that this stance is not necessarily in conflict with
the phonetic studies, since unlike English, SC is a
language with tonal contrasts, therefore its pitch cannot
be freely used to mark stress.

I also like the author's position on the syllable structure
of SC. Distinguishing SC syllables into two categories CVX
and CV is phonetically accurate and phonologically
beneficial, considering, for example, rhyming, which does
not take into account the prenuclear glide; and tonal
distribution, which clearly displays the distinction
between full and weak syllables--full syllables can carry
contrastive tonal contours, while weak syllables cannot.

But there are also issues on which I do not completely
agree with the author.

In his discussion of the sound inventory of SC, the author
does not seem to have a clear stand on the status of
phonemic economy in phonemic analysis. At one point, he
questions the importance of this concept (p.17), but later
on, when proposing the treatment of palatals as dental+j
combinations, he takes phonemic economy as one of the
arguments for it (p.87). This reader would have liked to
see more detailed discussion on the psycholinguistic
importance of phonemic economy if it was going to be taken
as an argument for an analysis, since the concept seems to
have followed more from the tradition of treasuring
elegance and symmetry than from any such importance.

I am also not sure about the arguments presented in the
book for treating palatals as dental+j combinations. The
author's main argument comes from the palatal-fronting
speakers in Beijing, who pronounce palatal consonants as
dental+j combinations. The author argues that the rule of
making the Dorsal as a major articulator as well as the
Coronal, which would create the palatals phonetically, does
not apply for the palatal-fronting speakers; and only by
treating the palatals this way can the relation between the
two types of speakers be captured. But the author has
noted that palatal-fronting speakers are mostly children
and young women. In fact, it is associated with high
feminimity. Therefore, even if the palatals are just
palatals underlyingly, when the speaker on the one hand
values feminimity highly and thus prefers high frication
noise, but on the other hand only allows a minimum
deviation from the underlying form, palatals will surface
as dental+j combinations. There is no need to consider
them to have come from the same underlying source.
Moreover, this accounts for why palatal-fronting speakers
are mostly children and young women. But Duanmu's approach
here provides no explanation as to why this particular
group of speakers is more prone to dropping the rule that
makes Dorsal a major articulator.

In the discussion of phonetic realizations of SC rhymes,
Duanmu uses the rhyming groups as evidence for determining
the surface forms, and in turn, underlying forms of the
rhymes. E.g., since [in] and [yn] belong to the same
rhyming group, he considers them to have to the same
surface rhyme, i.e., [yn] is in fact [yin], and [yin] comes
from underlying /win/ by spreading the high vowel to the
onset. This practice is somewhat worrisome, since it is
well known that semi-rhymes are commonly practiced in the
poetry of many languages. So there is no guarantee that
two rhymes are identical even though we know that they
rhyme. Thus it is entirely possible that [in] and [yn]
rhyme because they are phonetically similar, not because
they are identical. Duanmu is in fact aware of this
problem and discusses it in the 'Further Issues' section of
Chapter 3. But without a clear argument against this
approach, it is not clear to me why he opts for the
alternative that makes more assumptions. This also brings
me to a minor point: in the appendix, Duanmu lists all the
full syllables in SC, with both underlying and surface
forms. But a few of the listed items do not seem to agree
with what is argued for in the text. E.g., in the text,
all glides are underlyingly high vowels, but in the
appendix (p.276), the underlying forms for [waa], [wai],
[wan], etc. are listed to have glide onsets--/wa/, /wai/,
/wan/, etc.; in the text, [yn] is argued to be [yin], which
comes from underlying /win/, but in the appendix (p.277),
the underlying form for the same syllable is written as

Regarding the n-ng asymmetry in [r] suffixation, Duanmu
argues that n is incompatible with [r] and must be
replaced, and ng is compatible with [r], so the
[+retroflex] feature is simply added onto the syllable with
an ng ending. He also argues that the [CVngr] is
phonetically similar to [CV~r], since Wang (1993)'s
phonetic study shows that SC nasal codas are nasal glides
without a velar closure. But Zhang (2000) shows that in
the unsuffixed form, [CVng] has a clear dorsal raising
movement, but in the suffixed form, there is no trace of
dorsal raising left. This indicates that [CVngr] and
[CV~r] are phonetically different. So if the analysis
outputs [CVngr] instead of [CV~r], the pattern is not fully
explained. Zhang (2000) offers a perceptual account
couched in Optimality Theory to explain the n-ng asymmetry
and testifies that the factorial typology of the proposed
constraints only produces attested patterns.

I have some reservation about the use of binary features
Register and Pitch to represent tonal levels.
Phonetically, it seems unlikely that if a language has four
level tones, it uses exactly the combinations of
stiff/slack and thin/thick vocal cords to implement these
pitch levels. Moreover, it is not clear to me what the
author means by 'murmur'. On the one hand, he equates
murmur with breathiness, and he states that it 'correlates
with broader formant width and flatter spectral envelop'
(p. 213), although a flatter spectral envelop is the
spectral characteristic of creakiness. On the other hand,
he claims that the third tone in SC has a [+murmur]
quality, but it is well known that this tone is creaky and
is unlikely to have slack vocal cords. Phonologically, it
makes the typologically odd prediction that in a language
with four level tones, the middle two tones, although they
phonetically have the most similar pitches, will not behave
as a natural class, since they have exactly the opposite
feature values. This prediction has been successfully
challenged by Tsay (1994) and her later works. The scalar
representation proposed by Tsay seems to be a better
characterization of phonological pitch levels.

Regarding the representation of contour tones as sequences
of level tones, there might be both psycholinguistic and
phonetic evidence against it. Psycholinguistically, Wan
(1999) and Wan and Jaeger (1999) show from the behavior of
speech errors that SC tones are underlyingly unitary and
are not made of level tone sequences, since no tone blends
or tone spreading errors were observed. (On a related
note, it is somewhat unfortunate that the works by Wan and
Jaeger on speech errors were not cited as evidence for or
against various proposals on the sound inventory and
syllable structure of SC.) Phonetically, Xu (1998) shows
from the consistency of tone-syllable alignment across
different conditions that contour tones are probably
implemented as dynamic targets rather than sequences of
static targets. Zhang (2001), on the other hand, shows
that even if contour tones are considered units, all is not
lost regarding the relation between syllable duration and
contour tone distribution. If all tones are associated
with a tonal complexity index, and a complex contour tone
has a higher tonal complexity than a simple contour tone,
and a simple contour tone has a higher tonal complexity
than a level tone, then the relation between tonal
distribution and duration can still be successfully drawn.
Moreover, Zhang (2001) shows that only an enriched
representation like this can capture all the attested
patterns of contour tone distribution.

Finally, when I was reading the book, although on the one
hand, I was glad that it was generally presented in a
theory-neutral fashion, since this would make it
approachable to a wider audience and more room was left for
the presentation of the data; on the other hand, I also
secretly hoped that there was a consistent theoretical
framework that the book held onto, so that more accurate
predictions can be made using the general principles
proposed by the author, and more accurate comparisons can
be made with other proposals. As it is, Chapter 3 (sound
combinations and variation) is presented in a somewhat odd
version of Optimality Theory, which looks like it has
restrictions on the input, but no faithfulness constraints.
The rest of the chapters is presented in a combination of
rule-based framework and constraint-based framework, which
at times makes the proposals hard to evaluate.

In summary, although there are various points where my view
diverges from the author's, overall I agree with the author
much more than I do not. The book will prove to be an
invaluable resource for anyone interested in the study of
Chinese phonology, and it has set a high standard for
researchers in Chinese phonology to follow. Somewhat
unrelatedly, one particularly enjoyable moment of book
comes during the preface, when the author compares the
question 'which language do you study' to a linguist with
the question 'which country do you study' to a geologist.
At least now we know what to say when we encounter a
question like that.


Jie Zhang received his Ph.D. in linguistics at UCLA in June
2001 and will be a lecturer in the Dept. of Linguistics at
Harvard University for the academic year 2001-2002. His
research interests include phonology, phonetics, the
effects of phonetics in phonological patterning, patterns
of tones and nasals, Chinese languages, Athapaskan
languages, and Otomanguean languages.


