AUTHORS: Richard Zhonghua Xiao, Paul Rayson, Tony McEnery TITLE: A Frequency Dictionary of Mandarin Chinese PUBLISHER: Routledge (Taylor and Francis) YEAR: 2009
Michael Grosvald, Center for Mind and Brain & Department of Linguistics, University of California at Davis
This book is intended to help learners of Mandarin Chinese improve their vocabulary, and to serve as a resource for teachers of the language. It is centered on the idea that an efficient way to go about learning new vocabulary is to target the most frequently-used words first. To this end, the main body of the book presents a list of 5000 words, ordered according to frequency of occurrence within a 50-million word corpus that was created for this purpose.
The book begins with an introduction in which the usefulness and some history of this approach to vocabulary learning are discussed, and in which the corpus data are presented in some detail. As the authors explain, the corpus was compiled from a number of sources, broadly categorized as Spoken (3.4 million words), News (16 million), Fiction (15 million), and Non-fiction (15 million); the corpus is therefore weighted more heavily toward written- than spoken-language sources. The introduction also presents some basic information about the Chinese language itself, including an explanation of the difficulty of segmenting Chinese text into words and how segmentation and part-of-speech tagging were performed on this corpus.
The book then proceeds to the frequency index itself, which forms the main body of the book. The words are listed in descending order of frequency; each word entry includes both the traditional- and simplified-character versions of the word, its part of speech and English gloss, a sentence illustrating the word's use (given in simplified Chinese with an English translation), and several related statistics including a normalized frequency rating, ''dispersion index'' within the corpus, and usage rate per million tokens. Also given where appropriate or available are a ''register code'' S or W in the case of words occurring significantly more often in the spoken or written texts, and the word's ''level'' (1 to 4) according to the Chinese government's HSK (Chinese Proficiency Test) Committee.
At the back of the book are some other resources, including an ''alphabetical index'' (more on this below), a part of speech index, and a character frequency index.
Many language learners and teachers have used frequency-based dictionaries as an aid for vocabulary study and instruction and can attest to their usefulness. For example, I found Langenscheidt's frequency-based ''Basic Vocabulary'' series extremely useful while studying German and other European languages, and later regretted the lack of such a resource for Mandarin while living and studying in Taiwan. Xiao, Rayson and McEnery's frequency dictionary is just the kind of book I was wishing for then, and is an impressive work that will undoubtedly prove useful to students, teachers, and - because of the detailed statistical information that is given - perhaps to researchers as well.
The large corpus that was used in determining the frequency rankings is a major contributing factor to the project's credibility. As the authors explain in detail, the corpus required an appreciable amount of work to compile, segment and tag. The 15-page introduction explaining this process is well-written and well-researched, provides a detailed discussion of a number of interesting and relevant issues, and makes for fascinating reading in its own right.
The frequency rankings in the main index do seem a bit quirky at times, probably due to the mix of sources used, which as noted above is weighted much more toward written than spoken-language sources. For example, zhan4chang3 ''battlefield'' (#2602) and ai4zi1bing4 ''AIDS'' (#2605) are listed above (i.e. as more frequent than) than shu1fu2 ''comfortable, (physically) well'' (#2632) and tang2 ''sugar, sweets'' (#2638). However, a goal of the book is to permit the efficient learning of vocabulary needed in both spoken- and written-language contexts, and the collective set of words and their general frequency ranking progression (if not their exact numerical rankings) will certainly be useful to learners in this way.
An added bonus are the 30 additional mini-sections of ''thematic vocabulary'' in which words with related meanings (body parts, foods, etc.) are given in separate tables, also in ranked order. The part-of-speech and character indices are also a thoughtful inclusion, as they permit targeted learning in a variety of ways. Even non-learners may find it interesting to browse frequency-sorted lists of nouns, verbs, classifiers, and so on.
One area in which I feel the book has significant shortcomings is in some of the decisions that were made regarding formatting, which in places may undermine this work's usefulness for learners. For example, the simplified characters are presented in a small bold typeface and as a result, complex characters can be difficult to read, particularly when placed against the gray backgrounds used to highlight the 30 thematic-vocabulary sections of the book. Along similar lines, transcriptions are always given in the pinyin system, which is certainly reasonable since this is the accepted standard, but such transcriptions are always given in /slashes/. This is redundant since pinyin transcriptions are unlikely to be confused with anything else; more problematically, the ever-present slashes make the pinyin transcriptions harder to read, particularly when the reader attempts to scan through the pinyin-based ''alphabetical'' Chinese-to-English index. Because the pinyin itself is presented in an italicized sans-serif font, every word looks like it starts with ''l'' (i.e. a lower-case L).
The Chinese-to-English index is loosely ordered by pinyin transcription, but the ordering is really a hybrid of alphabetical and character-by-character arrangement, and so the index is not quite alphabetical despite being labeled as such. For example, /bei4dong4/ ''passive'' appears later in the index than /bei4hou4/ ''behind'' and /bei4jing3/ ''background'' because the ''bei4'' character in the first word is different from the one in the second and third words. On the plus side, this does permit one to scan quickly through word-forms that are related in the sense of sharing the same initial characters. On the other hand, relative to true alphabetical ordering, the current set-up tends to require more search time. Also, since this index is likely to be one of the most frequently referred-to parts of the book, it would probably be more appropriately placed at the very end of the book for easy access; as it is, other sections intervene between the index and the back of the book, requiring the reader to perform an initial ''search'' for the alphabetical index before being able to actually make use of the index itself. While a strategically-placed post-it note can probably solve that problem, the absence of any kind of English-to-Chinese word index may strike learners as a significant inconvenience.
The formatting in the main body of the book (the frequency index) is efficient in terms of compactness but may also render this section less useful to the learner than it might have been. The entry for each word is a tightly-arranged block listing the word, its pinyin transcription, a Chinese sentence using that word, English translations for the word and sentence, and the several statistical items enumerated earlier in this review. From a learner standpoint, it would be convenient if a pinyin version of the sentence were also given, as this would make it possible for the ambitious learner to pick up on the spot much of the additional vocabulary present in the sentences as it is presented in context. Further, I wonder if the detailed statistical information given for each word in the main frequency index couldn't have been placed neatly next to the word in a different section of the book such as the part of speech index, leaving the word entries in the frequency index less cluttered. This additional space in the main index would then facilitate what I would consider a more learner-friendly format for this part of the book: having the Chinese word and example sentence on one side of the page, and the word and sentence translations in English on the opposite side of the page in an adjacent column. This would permit the learner to quiz him- or herself by covering one column and trying to derive the other (i.e. going from English to Chinese, or vice versa), word by word or sentence by sentence. This was possible with the aforementioned Langenscheidt series, which was formatted in such a way (e.g. with German vocabulary and example sentences in the left column and the English translations in the right column of each page).
In general, these criticisms have little to do with the book's content itself, which as noted above, will no doubt be of great use and interest to many. Since none of the issues I have noted rise to the level of fatal flaws, I have no hesitation in giving this frequency dictionary a positive evaluation overall. Indeed, my main reaction upon perusing this book has been a feeling of regret that it was not available to me years ago when I was first studying Chinese.
ABOUT THE REVIEWER
ABOUT THE REVIEWER:
Michael Grosvald earned his doctorate in Linguistics in 2009 and now works
as a post-doctoral scholar at the Center for Mind and Brain at the
University of California at Davis. His background includes over a decade as
a language instructor in Prague, Berlin, Taipei and the U.S. His interests
include the phonetics and phonology of signed and spoken languages,
psycholinguistics, second language acquisition, and computational linguistics.