LINGUIST List 21.2576

Sat Jun 12 2010

Review: Lexicography: Xiao, Rayson & McEnery (2009)

Editor for this issue: Monica Macaulay <>

        1.    Michael Grosvald, A Frequency Dictionary of Mandarin Chinese

Message 1: A Frequency Dictionary of Mandarin Chinese
Date: 12-Jun-2010
From: Michael Grosvald <>
Subject: A Frequency Dictionary of Mandarin Chinese
E-mail this message to a friend

Discuss this message

Announced at
AUTHORS: Richard Zhonghua Xiao, Paul Rayson, Tony McEneryTITLE: A Frequency Dictionary of Mandarin ChinesePUBLISHER: Routledge (Taylor and Francis)YEAR: 2009

Michael Grosvald, Center for Mind and Brain & Department of Linguistics,University of California at Davis


This book is intended to help learners of Mandarin Chinese improve theirvocabulary, and to serve as a resource for teachers of the language. It iscentered on the idea that an efficient way to go about learning new vocabularyis to target the most frequently-used words first. To this end, the main body ofthe book presents a list of 5000 words, ordered according to frequency ofoccurrence within a 50-million word corpus that was created for this purpose.

The book begins with an introduction in which the usefulness and some history ofthis approach to vocabulary learning are discussed, and in which the corpus dataare presented in some detail. As the authors explain, the corpus was compiledfrom a number of sources, broadly categorized as Spoken (3.4 million words),News (16 million), Fiction (15 million), and Non-fiction (15 million); thecorpus is therefore weighted more heavily toward written- than spoken-languagesources. The introduction also presents some basic information about the Chineselanguage itself, including an explanation of the difficulty of segmentingChinese text into words and how segmentation and part-of-speech tagging wereperformed on this corpus.

The book then proceeds to the frequency index itself, which forms the main bodyof the book. The words are listed in descending order of frequency; each wordentry includes both the traditional- and simplified-character versions of theword, its part of speech and English gloss, a sentence illustrating the word'suse (given in simplified Chinese with an English translation), and severalrelated statistics including a normalized frequency rating, ''dispersion index''within the corpus, and usage rate per million tokens. Also given whereappropriate or available are a ''register code'' S or W in the case of wordsoccurring significantly more often in the spoken or written texts, and theword's ''level'' (1 to 4) according to the Chinese government's HSK (ChineseProficiency Test) Committee.

At the back of the book are some other resources, including an ''alphabeticalindex'' (more on this below), a part of speech index, and a character frequencyindex.


Many language learners and teachers have used frequency-based dictionaries as anaid for vocabulary study and instruction and can attest to their usefulness. Forexample, I found Langenscheidt's frequency-based ''Basic Vocabulary'' seriesextremely useful while studying German and other European languages, and laterregretted the lack of such a resource for Mandarin while living and studying inTaiwan. Xiao, Rayson and McEnery's frequency dictionary is just the kind of bookI was wishing for then, and is an impressive work that will undoubtedly proveuseful to students, teachers, and - because of the detailed statisticalinformation that is given - perhaps to researchers as well.

The large corpus that was used in determining the frequency rankings is a majorcontributing factor to the project's credibility. As the authors explain indetail, the corpus required an appreciable amount of work to compile, segmentand tag. The 15-page introduction explaining this process is well-written andwell-researched, provides a detailed discussion of a number of interesting andrelevant issues, and makes for fascinating reading in its own right.

The frequency rankings in the main index do seem a bit quirky at times, probablydue to the mix of sources used, which as noted above is weighted much moretoward written than spoken-language sources. For example, zhan4chang3''battlefield'' (#2602) and ai4zi1bing4 ''AIDS'' (#2605) are listed above (i.e. asmore frequent than) than shu1fu2 ''comfortable, (physically) well'' (#2632) andtang2 ''sugar, sweets'' (#2638). However, a goal of the book is to permit theefficient learning of vocabulary needed in both spoken- and written-languagecontexts, and the collective set of words and their general frequency rankingprogression (if not their exact numerical rankings) will certainly be useful tolearners in this way.

An added bonus are the 30 additional mini-sections of ''thematic vocabulary'' inwhich words with related meanings (body parts, foods, etc.) are given inseparate tables, also in ranked order. The part-of-speech and character indicesare also a thoughtful inclusion, as they permit targeted learning in a varietyof ways. Even non-learners may find it interesting to browse frequency-sortedlists of nouns, verbs, classifiers, and so on.

One area in which I feel the book has significant shortcomings is in some of thedecisions that were made regarding formatting, which in places may underminethis work's usefulness for learners. For example, the simplified characters arepresented in a small bold typeface and as a result, complex characters can bedifficult to read, particularly when placed against the gray backgrounds used tohighlight the 30 thematic-vocabulary sections of the book. Along similar lines,transcriptions are always given in the pinyin system, which is certainlyreasonable since this is the accepted standard, but such transcriptions arealways given in /slashes/. This is redundant since pinyin transcriptions areunlikely to be confused with anything else; more problematically, theever-present slashes make the pinyin transcriptions harder to read, particularlywhen the reader attempts to scan through the pinyin-based ''alphabetical''Chinese-to-English index. Because the pinyin itself is presented in anitalicized sans-serif font, every word looks like it starts with ''l'' (i.e. alower-case L).

The Chinese-to-English index is loosely ordered by pinyin transcription, but theordering is really a hybrid of alphabetical and character-by-characterarrangement, and so the index is not quite alphabetical despite being labeled assuch. For example, /bei4dong4/ ''passive'' appears later in the index than/bei4hou4/ ''behind'' and /bei4jing3/ ''background'' because the ''bei4'' character inthe first word is different from the one in the second and third words. On theplus side, this does permit one to scan quickly through word-forms that arerelated in the sense of sharing the same initial characters. On the other hand,relative to true alphabetical ordering, the current set-up tends to require moresearch time. Also, since this index is likely to be one of the most frequentlyreferred-to parts of the book, it would probably be more appropriately placed atthe very end of the book for easy access; as it is, other sections intervenebetween the index and the back of the book, requiring the reader to perform aninitial ''search'' for the alphabetical index before being able to actually makeuse of the index itself. While a strategically-placed post-it note can probablysolve that problem, the absence of any kind of English-to-Chinese word index maystrike learners as a significant inconvenience.

The formatting in the main body of the book (the frequency index) is efficientin terms of compactness but may also render this section less useful to thelearner than it might have been. The entry for each word is a tightly-arrangedblock listing the word, its pinyin transcription, a Chinese sentence using thatword, English translations for the word and sentence, and the severalstatistical items enumerated earlier in this review. From a learner standpoint,it would be convenient if a pinyin version of the sentence were also given, asthis would make it possible for the ambitious learner to pick up on the spotmuch of the additional vocabulary present in the sentences as it is presented incontext. Further, I wonder if the detailed statistical information given foreach word in the main frequency index couldn't have been placed neatly next tothe word in a different section of the book such as the part of speech index,leaving the word entries in the frequency index less cluttered. This additionalspace in the main index would then facilitate what I would consider a morelearner-friendly format for this part of the book: having the Chinese word andexample sentence on one side of the page, and the word and sentence translationsin English on the opposite side of the page in an adjacent column. This wouldpermit the learner to quiz him- or herself by covering one column and trying toderive the other (i.e. going from English to Chinese, or vice versa), word byword or sentence by sentence. This was possible with the aforementionedLangenscheidt series, which was formatted in such a way (e.g. with Germanvocabulary and example sentences in the left column and the English translationsin the right column of each page).

In general, these criticisms have little to do with the book's content itself,which as noted above, will no doubt be of great use and interest to many. Sincenone of the issues I have noted rise to the level of fatal flaws, I have nohesitation in giving this frequency dictionary a positive evaluation overall.Indeed, my main reaction upon perusing this book has been a feeling of regretthat it was not available to me years ago when I was first studying Chinese.


Michael Grosvald earned his doctorate in Linguistics in 2009 and now works as a post-doctoral scholar at the Center for Mind and Brain at the University of California at Davis. His background includes over a decade as a language instructor in Prague, Berlin, Taipei and the U.S. His interests include the phonetics and phonology of signed and spoken languages, psycholinguistics, second language acquisition, and computational linguistics.

Page Updated: 12-Jun-2010