Review of  A Frequency Dictionary of Mandarin Chinese

Reviewer: Michael Grosvald
Book Title: A Frequency Dictionary of Mandarin Chinese
Book Author: Richard Xiao Paul Rayson Anthony Mark McEnery
Publisher: Routledge (Taylor and Francis)
Linguistic Field(s): Lexicography
Subject Language(s): Chinese, Mandarin
Issue Number: 21.2576

Michael Grosvald, Center for Mind and Brain & Department of Linguistics,
University of California at Davis


This book is intended to help learners of Mandarin Chinese improve their
vocabulary, and to serve as a resource for teachers of the language. It is
centered on the idea that an efficient way to go about learning new vocabulary
is to target the most frequently-used words first. To this end, the main body of
the book presents a list of 5000 words, ordered according to frequency of
occurrence within a 50-million word corpus that was created for this purpose.

The book begins with an introduction in which the usefulness and some history of
this approach to vocabulary learning are discussed, and in which the corpus data
are presented in some detail. As the authors explain, the corpus was compiled
from a number of sources, broadly categorized as Spoken (3.4 million words),
News (16 million), Fiction (15 million), and Non-fiction (15 million); the
corpus is therefore weighted more heavily toward written- than spoken-language
sources. The introduction also presents some basic information about the Chinese
language itself, including an explanation of the difficulty of segmenting
Chinese text into words and how segmentation and part-of-speech tagging were
performed on this corpus.

The book then proceeds to the frequency index itself, which forms the main body
of the book. The words are listed in descending order of frequency; each word
entry includes both the traditional- and simplified-character versions of the
word, its part of speech and English gloss, a sentence illustrating the word's
use (given in simplified Chinese with an English translation), and several
related statistics including a normalized frequency rating, ''dispersion index''
within the corpus, and usage rate per million tokens. Also given where
appropriate or available are a ''register code'' S or W in the case of words
occurring significantly more often in the spoken or written texts, and the
word's ''level'' (1 to 4) according to the Chinese government's HSK (Chinese
Proficiency Test) Committee.

At the back of the book are some other resources, including an ''alphabetical
index'' (more on this below), a part of speech index, and a character frequency


Many language learners and teachers have used frequency-based dictionaries as an
aid for vocabulary study and instruction and can attest to their usefulness. For
example, I found Langenscheidt's frequency-based ''Basic Vocabulary'' series
extremely useful while studying German and other European languages, and later
regretted the lack of such a resource for Mandarin while living and studying in
Taiwan. Xiao, Rayson and McEnery's frequency dictionary is just the kind of book
I was wishing for then, and is an impressive work that will undoubtedly prove
useful to students, teachers, and - because of the detailed statistical
information that is given - perhaps to researchers as well.

The large corpus that was used in determining the frequency rankings is a major
contributing factor to the project's credibility. As the authors explain in
detail, the corpus required an appreciable amount of work to compile, segment
and tag. The 15-page introduction explaining this process is well-written and
well-researched, provides a detailed discussion of a number of interesting and
relevant issues, and makes for fascinating reading in its own right.

The frequency rankings in the main index do seem a bit quirky at times, probably
due to the mix of sources used, which as noted above is weighted much more
toward written than spoken-language sources. For example, zhan4chang3
''battlefield'' (#2602) and ai4zi1bing4 ''AIDS'' (#2605) are listed above (i.e. as
more frequent than) than shu1fu2 ''comfortable, (physically) well'' (#2632) and
tang2 ''sugar, sweets'' (#2638). However, a goal of the book is to permit the
efficient learning of vocabulary needed in both spoken- and written-language
contexts, and the collective set of words and their general frequency ranking
progression (if not their exact numerical rankings) will certainly be useful to
learners in this way.

An added bonus are the 30 additional mini-sections of ''thematic vocabulary'' in
which words with related meanings (body parts, foods, etc.) are given in
separate tables, also in ranked order. The part-of-speech and character indices
are also a thoughtful inclusion, as they permit targeted learning in a variety
of ways. Even non-learners may find it interesting to browse frequency-sorted
lists of nouns, verbs, classifiers, and so on.

One area in which I feel the book has significant shortcomings is in some of the
decisions that were made regarding formatting, which in places may undermine
this work's usefulness for learners. For example, the simplified characters are
presented in a small bold typeface and as a result, complex characters can be
difficult to read, particularly when placed against the gray backgrounds used to
highlight the 30 thematic-vocabulary sections of the book. Along similar lines,
transcriptions are always given in the pinyin system, which is certainly
reasonable since this is the accepted standard, but such transcriptions are
always given in /slashes/. This is redundant since pinyin transcriptions are
unlikely to be confused with anything else; more problematically, the
ever-present slashes make the pinyin transcriptions harder to read, particularly
when the reader attempts to scan through the pinyin-based ''alphabetical''
Chinese-to-English index. Because the pinyin itself is presented in an
italicized sans-serif font, every word looks like it starts with ''l'' (i.e. a
lower-case L).

The Chinese-to-English index is loosely ordered by pinyin transcription, but the
ordering is really a hybrid of alphabetical and character-by-character
arrangement, and so the index is not quite alphabetical despite being labeled as
such. For example, /bei4dong4/ ''passive'' appears later in the index than
/bei4hou4/ ''behind'' and /bei4jing3/ ''background'' because the ''bei4'' character in
the first word is different from the one in the second and third words. On the
plus side, this does permit one to scan quickly through word-forms that are
related in the sense of sharing the same initial characters. On the other hand,
relative to true alphabetical ordering, the current set-up tends to require more
search time. Also, since this index is likely to be one of the most frequently
referred-to parts of the book, it would probably be more appropriately placed at
the very end of the book for easy access; as it is, other sections intervene
between the index and the back of the book, requiring the reader to perform an
initial ''search'' for the alphabetical index before being able to actually make
use of the index itself. While a strategically-placed post-it note can probably
solve that problem, the absence of any kind of English-to-Chinese word index may
strike learners as a significant inconvenience.

The formatting in the main body of the book (the frequency index) is efficient
in terms of compactness but may also render this section less useful to the
learner than it might have been. The entry for each word is a tightly-arranged
block listing the word, its pinyin transcription, a Chinese sentence using that
word, English translations for the word and sentence, and the several
statistical items enumerated earlier in this review. From a learner standpoint,
it would be convenient if a pinyin version of the sentence were also given, as
this would make it possible for the ambitious learner to pick up on the spot
much of the additional vocabulary present in the sentences as it is presented in
context. Further, I wonder if the detailed statistical information given for
each word in the main frequency index couldn't have been placed neatly next to
the word in a different section of the book such as the part of speech index,
leaving the word entries in the frequency index less cluttered. This additional
space in the main index would then facilitate what I would consider a more
learner-friendly format for this part of the book: having the Chinese word and
example sentence on one side of the page, and the word and sentence translations
in English on the opposite side of the page in an adjacent column. This would
permit the learner to quiz him- or herself by covering one column and trying to
derive the other (i.e. going from English to Chinese, or vice versa), word by
word or sentence by sentence. This was possible with the aforementioned
Langenscheidt series, which was formatted in such a way (e.g. with German
vocabulary and example sentences in the left column and the English translations
in the right column of each page).

In general, these criticisms have little to do with the book's content itself,
which as noted above, will no doubt be of great use and interest to many. Since
none of the issues I have noted rise to the level of fatal flaws, I have no
hesitation in giving this frequency dictionary a positive evaluation overall.
Indeed, my main reaction upon perusing this book has been a feeling of regret
that it was not available to me years ago when I was first studying Chinese.

Michael Grosvald earned his doctorate in Linguistics in 2009 and now works as a post-doctoral scholar at the Center for Mind and Brain at the University of California at Davis. His background includes over a decade as a language instructor in Prague, Berlin, Taipei and the U.S. His interests include the phonetics and phonology of signed and spoken languages, psycholinguistics, second language acquisition, and computational linguistics.

