Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Oxford University Press!


Style, Mediation, and Change

Edited by Janus Mortensen, Nikolas Coupland, and Jacob Thogersen

Style, Mediation, and Change "Offers a coherent view of style as a unifying concept for the sociolinguistics of talking media."

New from Cambridge University Press!


Intonation and Prosodic Structure

By Caroline Féry

Intonation and Prosodic Structure "provides a state-of-the-art survey of intonation and prosodic structure."

The LINGUIST List is dedicated to providing information on language and language analysis, and to providing the discipline of linguistics with the infrastructure necessary to function in the digital world. LINGUIST is a free resource, run by linguistics students and faculty, and supported by your donations. Please support LINGUIST List during the 2017 Fund Drive.

Review of  A Frequency Dictionary of Spanish

Reviewer: Matthew T. Carlson
Book Title: A Frequency Dictionary of Spanish
Book Author: Mark Davies Paul Rayson Anthony Mark McEnery
Publisher: Routledge (Taylor and Francis)
Linguistic Field(s): Lexicography
Subject Language(s): Spanish
Issue Number: 17.1962

Discuss this Review
Help on Posting
AUTHOR: Davies, Mark
EDITORS: Rayson, Paul; McEnery, Anthony
TITLE: A Frequency Dictionary of Spanish
SUBTITLE: Core vocabulary for learners
SERIES: Routledge Frequency Dictionaries
PUBLISHER: Routledge (Taylor and Francis)
YEAR: 2006

Matthew T. Carlson, Department of Spanish, Italian, and Portuguese, The
Pennsylvania State University


This volume contains a listing of the 5000 most frequent words in Spanish,
based on a corpus of 20 million words, with the stated purpose of enabling
learners of Spanish to maximize their efforts in acquiring Spanish
vocabulary. The primary purpose of this dictionary is therefore
pedagogical, and its scope and format are designed to provide learners with
a variety of ways of using frequency information in their study including
generalizations across parts of speech, the relative range of items across
the registers represented in the corpus, and thematic groupings of frequent

The volume begins with an introduction that outlines the contents of the
volume, argues for the usefulness of frequency dictionaries in general and
this dictionary in particular, and describes the construction and
organization of the corpus as well as the procedures used to arrive at the
final list of 5,000 words. The language used is aimed at an audience
without extensive technical knowledge of linguistics or corpus linguistics.
The argument for the use of frequency information in foreign vocabulary
learning is supported by a small number of practical, though very brief,
references to the experiences of both classroom and independent learners
and teachers confronted with ''typical'' textbooks and other texts.

The subsequent sections of the introduction summarize the format of the
dictionary and show how it addresses the weaknesses of earlier similar
works by applying the most current analytical techniques to a large corpus
of very recent written and oral texts. The problems encountered in the
annotation of the corpus and the selection of the final list of 5,000 are
given substantial attention. The discussion is at times more technical, but
an effort is made to provide practical and accessible definitions of terms
such as ''tagging'' and ''lemmatization''. Davies provides support for his
claims that the corpus is both robust and representative. Detailed
information about the texts and corpora making up the present corpus are
given in Table 1 (p. 3) and the proportions from various registers (one
third each of spoken, fiction, and nonfiction) and locations (43% from
Spain, 57% from Latin America) are summarized. The procedures used to tag
the corpus for part of speech and lemma are subsequently described, with
considerable discussion of the problems involved in tagging single word
forms that might have more than one lemma, e.g. 'limpio' (clean), which
might be either a verb form or an adjective. Attention is also given to
words that seem to fall at the boundaries of the traditional part of speech
categories, as in the case of nominal uses of adjectives of nationality and
religion, and of verbal and adjectival uses of past participles. Different
solutions were applied to these cases, but most were resolved by conflating
categories for the words in question. While greater precision might be
desired for the purposes of linguistic analysis, this solution is in
agreement with the practical and pedagogical intent of the dictionary,
given the similarity in meaning between different uses of such words.

Following the summary of the construction of the corpus, Davies presents
and justifies the procedure used to select the most frequent 5,000 words.
In addition to the raw frequency of lemmas, the calculation of an item's
range, or distribution across the registers included in the corpus, is
explained. A formula is given that yields a score for each lemma based on
its range and frequency information, with a weighting assigned to each of
the registers: fiction, nonfiction, and oral texts, with the latter being
divided between transcripts of speech for print and unmodified spoken
texts, with the former being given a much lower weight due to their
modified nature. The selection of the 5,000 most frequent words is based on
this calculation.

The introduction ends with a more specific outline of the three indices in
the dictionary: the frequency index, the alphabetical index, and the index
by part of speech, and of the 30 thematic lists that appear throughout the
dictionary. The frequency index contains the most information about each
lemma, giving its rank frequency, part of speech, a basic English
translation (avoiding more particular or idiomatic meanings such as might
be included in a bilingual dictionary), an example sentence taken from a
corpus, and range and raw frequency information. The alphabetical and part
of speech indices are cross-referenced with the frequency index. The 30
thematic vocabulary lists give frequency-based lists of semantically,
morphologically, or grammatically related words. In addition to semantic
categories such as food and clothing, lists are given that show common
morphological processes (e.g. noun or diminutive formation), grammatical
patterns such as the verbs used most frequently in either the preterit or
imperfect or both past tenses, and differences in the use of certain parts
of speech across registers.


This dictionary is a well-grounded documentation of what the author terms
the most useful vocabulary for learners of Spanish and is in line with
important trends in both first and second language research that
demonstrate the importance of frequency in language learning and use. As
such it is a valuable resource for learners, teachers, and researchers
interested in foreign language vocabulary acquisition. At the same time,
this important information is provided to learners without significant
guidance as to how they might use it to maximize their vocabulary learning.
While no claim is made that the dictionary can stand on its own as a
language-learning tool, many potential users, both learners and teachers,
might require assistance in exploiting its potential, and its absence may
detract from the practical effectiveness of the dictionary.

Technological advances over the past decades have enabled researchers to
gather ever larger corpora of texts and extract ever more detailed
information from them than would have been impractical if not impossible
previously. In addition, the role of frequency in all levels of language
structure has been increasingly noted (cf. Bod et al., 2003; Bybee &
Hopper, 2001) and is receiving significant attention in second language
acquisition as well (see Ellis, 2002 and responses). Within the realm of
second-language vocabulary acquisition, there is a robust literature
describing the application of frequency information to the task of
vocabulary testing (e.g. Laufer & Nation, 1995; Read, 1993; Schmitt et al.,
2001) that builds on Nation's (1990) observation, cited by the series
editors in their preface to the dictionary reviewed here, that the majority
of written and oral texts is comprised of a limited set of the most
frequent words. While Davies does not provide an in-depth review of the
technical aspects of this research, he does make a strong case for the use
of frequency information in the learning of Spanish vocabulary, couched in
prose accessible to most learners of Spanish.

Moreover, once establishing the importance of frequency information, Davies
goes to great lengths to base his dictionary on a sufficiently robust,
historically relevant and representative corpus, and to exercise
appropriate care in deciding which words to count as the most frequent. The
analysis of corpora, particularly large and diverse corpora, requires that
a number of decisions be made regarding their content and labeling, not all
of which have clear answers. Davies is thorough in documenting many of
these decisions, offering examples and explanations of why, for example,
even nominal uses of adjectives are counted as adjectives. Many learners
might be surprised to find so many problems with tagging and lemmatization,
and these explanations are clearly called for. At other times, however,
little is said about the reasons or consequences of these decisions. One
salient example is the weighting of the registers (fiction, non-fiction,
and modified and unmodified oral) and of range and frequency information. A
relatively equal weighting of these factors such as that utilized here
seems a reasonable place to start, and the need for some weighting of the
registers is well supported, but no explanation is given for the advantage
given to fiction (40% vs. 30% each for nonfiction and combined oral). This
may, however, be of more interest to the linguist than to the learner, and
in any case, the frequency index includes annotations for words that are
unusually frequent or infrequent in one particular register, making more
fine-grained information readily available. Notwithstanding the brevity of
some of the background information given, this dictionary is highly
representative of the kind of frequency information that can best serve
learners: a measure of the likelihood that they will encounter a given word
in their use of Spanish for communication.

Alongside the dictionary's obvious strengths and quality, however, are
significant weaknesses that may detract from its usefulness by the very
learners and teachers it is intended to serve. The dictionary's stated
goals are to enable learners of Spanish at all levels to more effectively
learn vocabulary, and the introduction addresses both individual and
classroom learners, as well as language teachers who might incorporate this
dictionary into their syllabi and instruction. However, many learners and
even teachers may not have any explicit understanding of the role of
frequency in language, and while they may be easily convinced that
frequency information is useful, they may require significant guidance in
exploiting the full advantages of a frequency dictionary. The introduction
to the dictionary, however, contains little more than a passing mention of
strategies such as looking up a word in the part of speech index to make
generalizations across larger segments of vocabulary, or even simply
working through the list. A notable exception is the section on the
thematic vocabulary lists. The bulk of the introduction is dedicated to the
construction and analysis of the 20 million word corpus on which the
dictionary is based. This discussion, while necessary and informative,
albeit a bit technical, does not offer much help to the learner who,
confronted with 5,000 Spanish words, is trying to decide where to start.

Related to the need for more guidance in using the dictionary, some of the
most useful frequency information, the frequent collocates of any given
word, is conspicuously absent from the dictionary. Such information is
available, for example, in Sebastián-Gallés et al. (2000), although as
Davies points out, this tool is difficult to obtain outside of Spain.
However, collocations and a variety of other searches are freely available
through the web interface of Corpus del Español (Davies, 2002), on which
much of the frequency dictionary is based. With this resource and others
like it, there is a relatively simple solution to the limitations of the
frequency dictionary, as well as to the lack of guidance in its use. A
listing of resources that might be used in conjunction with the dictionary,
as well as references on the pedagogical use of frequency in vocabulary
learning and teaching, would allow teachers and learners to most fully
exploit the advantages afforded by the frequency dictionary itself without
adding extensive explanations. It is unclear why such a list of references
was not included.

The thematic vocabulary lists do afford learners an obvious way to
synthesize frequency knowledge and apply it to larger problems in the use
of Spanish. The semantically grouped lists are a particular improvement
over many traditional textbooks that offer no frequency information at all,
and may include highly infrequent items in topical vocabulary lists.
Additionally, observations of potential interest, such as the relative bias
towards concrete nouns in fictional texts, are given in some of these
thematic lists. Other lists, however, may be less easy to incorporate into
study or pedagogy. The list showing verbs used most frequently with the
clitic 'se' contrasts particularly strongly with what learners may be
accustomed to, at least in many classrooms. To the extent that the list in
this dictionary more accurately represents actual usage, the frequency
information is crucial, but including the frequency information of the
reflexive verbs traditionally grouped together for pedagogical purposes
would provide a useful comparison, and allow for a smoother interface
between frequency information and other pedagogical approaches that may be
more familiar to learners and teachers. Of course, this information is
available in the indices of the dictionary itself, but it could have easily
been summarized along with other uses of 'se'. Nonetheless, the thematic
lists provide a springboard into a variety of interesting and informative
explorations, and their inclusion only increases the value of the dictionary.

This frequency dictionary is a valuable resource, not only for the
information it contains, but because it highlights the importance of word
frequency in language use and, more specifically, in foreign language
learning. The use of this dictionary in conjunction with other resources,
as it is intended to be used, will provide a way of exploiting frequency
information to the advantage of learners of all levels, although little
guidance is given in the volume itself as to its incorporation into a
program of study.


Bod, R., Hay, J., & Jannedy, S. (eds.) 2003. Probabilistic linguistics.
Cambridge, MA: MIT Press.

Bybee, J. L., & Hopper, P. J. (eds.) 2001. Frequency and the emergence of
linguistic structure. Amsterdam: John Benjamins.

Davies, M. 2002. Corpus del español.

Ellis, N. C. 2002. Frequency effects in language processing: A review with
implications for theories of implicit and explicit language acquisition.
Studies in Second Language Acquisition, 24.143-188.

Laufer, B., & Nation, I. S. P. 1995. Vocabulary size and use: Lexical
richness in L2 written production. Applied Linguistics, 16.307-322.

Nation, I. S. P. 1990. Teaching and learning vocabulary. Boston: Heinle and

Read, J. 1993. The development of a new measure of L2 vocabulary knowledge.
Language Testing, 10.355-371.

Schmitt, N., Schmitt, D., & Clapham, C. 2001. Developing and exploring the
behavior of two new versions of the vocabulary levels test. Language
Testing, 18.55-88.

Sebastián-Gallés, N., Martí, M. A., Carreiras, M., & Cuetos, F. 2000.
Lexesp, léxico informatizado del español. Barcelona: Ediciones de la
Universitat de Barcelona.

Matthew Carlson is a Ph.D. candidate in Hispanic Linguistics at the Penn
State University. His primary research is on the role of frequency and
usage in the adult second language acquisition of Spanish phonology. His
other interests include usage-based approaches to grammar, particularly
phonology, multicompetence, the role of working memory and phonological
memory in language acquisition and use, and the effects of literacy and
orthography on SLA.

Format: Paperback
ISBN: 0415334292
ISBN-13: N/A
Pages: 320
Prices: U.S. $ 31.95
Format: Hardback
ISBN: 0415334284
ISBN-13: N/A
Pages: 320
Prices: U.S. $ 100.00