LINGUIST List 12.11

Sun Jan 7 2001

Review: Packard: The Morphology of Chinese

Editor for this issue: Andrew Carnie <carnielinguistlist.org>




What follows is another discussion note contributed to our Book Discussion Forum. We expect these discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for discussion." (This means that the publisher has sent us a review copy.) Then contact Andrew Carnie at carnielinguistlist.org

Directory

  • Richard Sproat, Review of Packard 2000

    Message 1: Review of Packard 2000

    Date: Tue, 19 Dec 2000 16:02:02 -0500
    From: Richard Sproat <rwsresearch.att.com>
    Subject: Review of Packard 2000


    Jerome L. Packard (2000) The Morphology of Chinese: A Linguistic and Cognitive Approach. Cambridge University Press, xvi, 335 pp. ISBN 0-521-771129

    Reviewed by: Richard Sproat, AT&T Labs -- Research, Florham Park, NJ, USA.

    The phrase "Morphology of Chinese" may seem to many to be almost an oxymoron: not only does Chinese at first glance appear to have very little of what that could be called "morphology", but the very notion of "word" in Chinese is often called into question, stemming in part from the fact that Chinese orthography does not mark word boundaries.

    Packard's work on Chinese morphology, which spans more than a decade, argues against that view, and shows that there is a great deal more structure to Chinese morphology than meets the eye. The present book continues in that tradition. However, far from being merely a synthesis of Packard's previous writings, this book presents a wealth of new material, including studies of various types of Chinese word constructions at a level of detail found nowhere else. This book is the most comprehensive single treatment of Chinese morphology available in English, or possibly in any language.

    One point that needs to be clarified at the outset is that by "Chinese", Packard means Standard Chinese, or in other words Mandarin. This is a conventional enough equation, but it is as well to note that other Chinese languages are not treated at all.

    The book is divided into eight chapters. Chapter 1 comprises a brief introduction and a motivation for the rest of the work, discussing a few of the common misconceptions about Chinese. A particularly poignant example from page 3: "Time and again, when I tell people that I work in Chinese linguistics, I get a response like: `Oh, Chinese makes sentences by putting characters together, right?', as if, unlike the rest of the world's languages, Chinese enables spoken communication by the oral exchange of little visual icons." In my work on Chinese speech synthesis I've had occasion to hear or read plenty of similar nonsense about Chinese, so Packard has my sympathies.

    Chapter 2 turns to "Defining the word in Chinese". The chapter begins by going through some standard notions of word --- orthographic, sociological, lexical, semantic, phonological, morphological, syntactic --- and ends with the notion of "psycholinguistic word", defined to be the "`word' level of linguistic analysis that is ... salient and highly relevant to the operation of the language processor (p. 13). The discussion then turns to the definition of word in Chinese, and in particular the notion of word -- syntactic word -- that Packard uses as the basis for defining the topic of the rest of this work. Packard mentions in passing that the definition of Chinese word has important implications for computational systems that deal with Chinese text, and must thus segment it into words. This is true, and it might be interesting to consider the extant proposed standards (GB 1993; Huang et al 1997; Xia, 1999), in light of Packard's proposals.

    Chapter 3 turns to the consideration of the "inner constituents" of Chinese words, what they are, and how they relate to each other. After considering various alternatives, including the idea that Chinese word structure may actually be derived by purely syntactic principles, Packard arrives at the conclusion that the best description of the components of Chinese words is in terms of "form class", or grammatical "part of speech". Thus "chu1-ban3" (emit-edition) `publish' may be considered to be a (bimorphemic) verb with the structure [V N]_V. (I'll use numeric indicators of tone and "_" and "^" to denote, respectively, subscripts and superscripts.) Packard develops this idea by noting that clearly unambiguous morphemes tend to retain their form class identities: "zhi3" `paper' is a noun in whatever word it occurs in. The constancy of such morphemes allows for the formulation of the Headedness Principle, which is central to the rest of the discussion: nouns have their head on the right, and verbs have their head on the left. With ambiguous morphemes -- e.g. "zheng4" `proof' (N) or `prove' (V) -- one finds, as one might expect, that the form class appropriate for a particular usage in a given word depends upon the position in that word and the part of speech of the word: "zheng4" in constructions [X-zheng4]_N tends to have the noun reading, whereas in [zheng4-X]_V it tends to have the verb reading. Nonetheless, exceptions to the Headedness Principle do occur: so "cai3-pai2" (color-rehearse) `dress-rehearse' is a verb despite its [N-V] structure. These are to be allowed by the grammar, but are predicted to have exceptional properties, a point we return to below. Another important property of Chinese morphology is the large number of bound morphemes, or in other words morphemes that cannot occur as free-standing words; for example "shi2" `rock' occurs in lots of derived words -- "shi2-tou" (rock-AFF) `stone', "shi2-you2" (rock-oil) `petroleum', "bao3-shi2" (treasure-rock) `gem' -- but never alone; previous claims that the bound-free distinction is vague or fluid are countered by noting that differences in boundness of a given morpheme invariably relate to differences in meaning, or differences of register. Of course, bound and free roots are not the only types of morphemes that Chinese possesses, and Packard argues that Chinese also has a (modest) set of word-forming and grammatical affixes.

    Chapter 4 provides an exhaustive taxonomy of the types of Chinese words, with a nice set of examples of each type. Two construction types are also discussed in some detail: resultative verbs, and verb-object compounds such as "chu1-ban3" (emit-edition) `publish'. Both of these constructions have been the topic of much controversy in Chinese linguistics, but verb-object (VO) compounds have been particularly controversial since they seem to be able to behave like phrases and words at the same time. Packard argues, somewhat along the lines of Huang (1984), that VO compounds may be either words or phrases depending upon the construction one finds them in, but that once a VO construction becomes a word via lexicalization, it is basically a word, and admits of only limited phrasal reanalysis. (Note though that, as Packard shows, the object from the verb-object construction can even prepose, so it is not entirely clear in what sense the reanalysis is "limited".) This dual status is not as odd as it may seem, and Packard invites us to compare it with such well-known psychological phenomena as the Necker cube. The chapter ends with a statistical analysis of Chinese word classes, with the statistics overwhelmingly supporting the Headedness Principle as a general tendency.

    The core of the book is Chapter 5, which offers an X-bar analysis of Chinese words. After reviewing previous proposals for X-bar analyses of morphology (especially work of Selkirk and Sadock) Packard arrives at the following simple ruleset for Chinese, where X^(-0) are free, X^(-1) are bound, X^W are word-forming affixes and G are grammatical affixes:

    X^(-0) -> X^(-0,-1,W), X^(-0,-1,W)

    X^(-0) -> X^(-0), G

    In addition it is stipulated that you can only get one X^W at any level, or in other words that *X^W,X^W is ill-formed. Furthermore, only X^(-0) and X^(-1) can serve as the base of words since for X^W "the superscript is a letter and not a numeral" and therefore "can never be `incremented' to the point where they can serve the traditional function of `stem'". Similarly, "for grammatical affixes, the fact that the category designation is `G' and not `X' indicates that these items can never function as the heads of words" since "there are no words of the form *[X^(-0) G]_G" (page 165). This of course does have the flavor of hackery, though a more charitable interpretation would be that Packard is making use of a type system that he does not otherwise elaborate on. The X-bar system that Packard develops is recursive at the X^(-0) level, which allows for various kinds of branching structures, many of which occur, and some of which do not. Much of the rest of the chapter is devoted to a catalog of various cases, replete with a nice collection of examples of each. As Packard notes some of the gaps are probably accidental, and indeed it is not hard to see how one might fill a few of these. So while Packard did not find examples of [X^(-1) [X^W X^(-1)]] (page 181) one can certainly construct plausible examples: [shi2-[lao3-hu3]] (stone PREF-tiger) `stone tiger', where "shi2" and "hu3" are both bound, and "lao3" is apparently a word-forming prefix. The chapter ends with a discussion of various notions of morphological "head", as applied to Chinese, and with an interesting application of the X-bar model to English morphology.

    Chapter 6 starts with a discussion of lexicalization, exceptions to the Headedness Principle, and the availability of word-internal information to various (including morphological) processes. Words may be lexicalized to various degrees ranging from purely "conventional" lexicalization -- e.g. "chi1-fan4" (eat-rice) `eat (a meal)' - where the constructions are fairly compositional semantically, and completely regular morphologically but which, through usage, have come to be treated as words; through "asemantic lexicalization" ("wen4-shi4" (ask-world) `to be published'); through complete lexicalization ("shao1-mai4" (burn-sell) `(type of) dumpling'). Lexicalization is related to Packard's earlier proposals for a stratum-ordered lexicon for Chinese, which he briefly introduces here: roughly speaking, the more lexicalized the construction, the deeper the stratum. (Note though that in contrast to his extensive exemplification of other proposals, he does not offer much in the way of support for this linkage.) Exceptions to the Headedness Principle range from neologisms and phonetic loans (e.g. from English), to zero-derived words (e.g. the V-V noun "mai3-mai4" (buy-sell) `business'), to cases like N-V verbs that simply fail to observe the general trends. With respect to the latter, Packard reiterates a claim of earlier work that the N-V structure of verbs like "cai3-pai2" (color-rehearse) `dress rehearse', prevents them from undergoing A-not-A question formation (*"cai3-bu4-cai3-pai2" (color-not-color-rehearse) `dress rehearse?'). Packard then argues that various kinds of information -- phonological, morphological, syntactic, and semantic can be rendered unavailable by the effects of lexicalization. The final section of the chapter deals with the formation of new words in Chinese, and includes an interesting discussion of Chinese "abbreviations" (Chinese "suo1-xie3" or `shrunken writing'), which turn out to be an important source of new morphemes (or, if one prefers, new meanings for old morphemes): so "tuo1" `entrust' in "GONG1-ban4 TUO1-er2-suo3" (public-run entrust-child-place) `public run day-care center', via the abbreviation "gong1-tuo1" `public run day-care center', has now come to mean "day care center" as evidenced by the verb "ru4-tuo1" (enter-entrust) `enter a day care center'. But most new morphemes formed in this way are only available within other words, meaning that most new morphemes in Chinese are bound.

    Chapter 7 discusses the question of what is meant by "the lexicon". Packard reviews psycholinguistic evidence in speech and written language comprehension and production, and concludes that the evidence supports the notion that what is accessed in "lexical access" is words rather than, for example, morphemes. What, then, is listed in the lexicon? Apart from bound roots, and words with idiosyncratic meanings, Packard makes the specific proposal that any word that is familiar to a speaker is listed, whether it is regular in its construction or not; the only systematic exceptions to this are words formed with grammatical affixes, which are not stored because they are completely regular. The chapter ends with a brief discussion of the relation between the writing system of Chinese and the Chinese lexicon: is a Chinese speaker's knowledge of Chinese morphology in any way influenced by their knowledge of the written form of the language? While Packard argues that there is a relationship, he nonetheless is unequivocal in his adoption of the the Bloomfieldian view that spoken language is primary and written language only secondary. In particular, he argues against the view that the entries in the mental lexicon (of literate speakers) *contain* orthographic entries, as opposed to merely positing *connections* between orthographic information and the legitimate (semantic, phonological, grammatical) information in lexical entries. This view is likely to be widely accepted, but as the author of a recent work that proposes the alternative view -- Sproat (2000), where orthographic information is considered to be a part of lexical entries -- I must point out that it is not the only coherent view. Orthography is clearly learned later and furthermore with explicit instruction, but from this it does not follow that orthographic knowledge cannot become part of what a literate speaker knows about words, and therefore part of the lexical entries of those words.

    Chapter 8 provides a concise summary of the work, and what we can conclude from it.

    If I had to point to the single most important feature of this book, it would be the large number of examples with which Packard exemplifies his claims about the structure of Chinese words. Thus, although the book's primary goal is theoretical, it also serves as a useful descriptive taxonomy of a large range of morphological constructions. Only a few constructions are not discussed at all; so for example morphologically complex adjectives (or "stative verbs") such as "fen3-hong2" (powder-red) `pink' are missing, and it is not entirely clear what Packard's theory should say about these. Some of the claims, such as the putative non-availability of A-not-A reduplications in N-V verbs, and arguments for phonological opacity have been discussed elsewhere -- in particular, see Sproat and Shih (1993) -- and anyone considering developing Packard's ideas further would do well to look at this earlier literature.

    I applaud Packard's use of Chinese characters throughout the book, alongside pinyin transcriptions (though personal taste would have led me to use traditional characters, rather than the simplified ones that he uses).

    I always like to end with a note on production quality. On the whole the quality is quite good, and I only noted one table whose left edge was cut off, and a few formatting characters that made their way into the text.

    REFERENCES.

    GB. 1993. Contemporary Chinese language word-segmentation specification for information processing. GB/T 13715-92

    Huang, Chu-Ren; Chen, Keh-Jiann; Chang, Lili; and Chen, Feng-yi. 1997. Segmentation standard for Chinese natural language processing. International Journal of Computational Linguistics and Chinese Language Processing. 2(2), 47-62.

    Huang, James. 1984. Phrase structure, lexical integrity, and Chinese compounds. Journal of the Chinese Language Teachers Association. 19(2), 53-78.

    Sproat, Richard. 2000. A Computational Theory of Writing Systems. Cambridge University Press.

    Sproat, Richard; and Shih, Chilin. 1993. Why Mandarin morphology is not stratum-ordered. Yearbook of Morphology. 185-217.

    Xia, Fei. 1999. Segmentation guideline. Chinese Treebank Project. http://morph.ldc.upenn.edu/ctb/

    -

    Richard Sproat works in the Human/Computer Interaction Research department at AT&T Labs -- Research. His research interests include computational linguistics, speech synthesis and recognition, writing systems, morphology and Chinese linguistics.