LINGUIST List 17.1962

Wed Jul 05 2006

Review: Applied Linguistics: Davies (2006)

Editor for this issue: Laura Welcher <lauralinguistlist.org>


Directory         1.    Matthew Carlson, A Frequency Dictionary of Spanish (Applied Linguistics)


Message 1: A Frequency Dictionary of Spanish (Applied Linguistics)
Date: 05-Jul-2006
From: Matthew Carlson <mtc173psu.edu>
Subject: A Frequency Dictionary of Spanish (Applied Linguistics)


Announced at http://linguistlist.org/issues/17/17-701.html

AUTHOR: Davies, Mark EDITORS: Rayson, Paul; McEnery, Anthony TITLE: A Frequency Dictionary of Spanish SUBTITLE: Core vocabulary for learners SERIES: Routledge Frequency Dictionaries PUBLISHER: Routledge (Taylor and Francis) YEAR: 2006

Matthew T. Carlson, Department of Spanish, Italian, and Portuguese, The Pennsylvania State University

SUMMARY:

This volume contains a listing of the 5000 most frequent words in Spanish, based on a corpus of 20 million words, with the stated purpose of enabling learners of Spanish to maximize their efforts in acquiring Spanish vocabulary. The primary purpose of this dictionary is therefore pedagogical, and its scope and format are designed to provide learners with a variety of ways of using frequency information in their study including generalizations across parts of speech, the relative range of items across the registers represented in the corpus, and thematic groupings of frequent vocabulary.

The volume begins with an introduction that outlines the contents of the volume, argues for the usefulness of frequency dictionaries in general and this dictionary in particular, and describes the construction and organization of the corpus as well as the procedures used to arrive at the final list of 5,000 words. The language used is aimed at an audience without extensive technical knowledge of linguistics or corpus linguistics. The argument for the use of frequency information in foreign vocabulary learning is supported by a small number of practical, though very brief, references to the experiences of both classroom and independent learners and teachers confronted with ''typical'' textbooks and other texts.

The subsequent sections of the introduction summarize the format of the dictionary and show how it addresses the weaknesses of earlier similar works by applying the most current analytical techniques to a large corpus of very recent written and oral texts. The problems encountered in the annotation of the corpus and the selection of the final list of 5,000 are given substantial attention. The discussion is at times more technical, but an effort is made to provide practical and accessible definitions of terms such as ''tagging'' and ''lemmatization''. Davies provides support for his claims that the corpus is both robust and representative. Detailed information about the texts and corpora making up the present corpus are given in Table 1 (p. 3) and the proportions from various registers (one third each of spoken, fiction, and nonfiction) and locations (43% from Spain, 57% from Latin America) are summarized. The procedures used to tag the corpus for part of speech and lemma are subsequently described, with considerable discussion of the problems involved in tagging single word forms that might have more than one lemma, e.g. 'limpio' (clean), which might be either a verb form or an adjective. Attention is also given to words that seem to fall at the boundaries of the traditional part of speech categories, as in the case of nominal uses of adjectives of nationality and religion, and of verbal and adjectival uses of past participles. Different solutions were applied to these cases, but most were resolved by conflating categories for the words in question. While greater precision might be desired for the purposes of linguistic analysis, this solution is in agreement with the practical and pedagogical intent of the dictionary, given the similarity in meaning between different uses of such words.

Following the summary of the construction of the corpus, Davies presents and justifies the procedure used to select the most frequent 5,000 words. In addition to the raw frequency of lemmas, the calculation of an item's range, or distribution across the registers included in the corpus, is explained. A formula is given that yields a score for each lemma based on its range and frequency information, with a weighting assigned to each of the registers: fiction, nonfiction, and oral texts, with the latter being divided between transcripts of speech for print and unmodified spoken texts, with the former being given a much lower weight due to their modified nature. The selection of the 5,000 most frequent words is based on this calculation.

The introduction ends with a more specific outline of the three indices in the dictionary: the frequency index, the alphabetical index, and the index by part of speech, and of the 30 thematic lists that appear throughout the dictionary. The frequency index contains the most information about each lemma, giving its rank frequency, part of speech, a basic English translation (avoiding more particular or idiomatic meanings such as might be included in a bilingual dictionary), an example sentence taken from a corpus, and range and raw frequency information. The alphabetical and part of speech indices are cross-referenced with the frequency index. The 30 thematic vocabulary lists give frequency-based lists of semantically, morphologically, or grammatically related words. In addition to semantic categories such as food and clothing, lists are given that show common morphological processes (e.g. noun or diminutive formation), grammatical patterns such as the verbs used most frequently in either the preterit or imperfect or both past tenses, and differences in the use of certain parts of speech across registers.

CRITICAL EVALUATION:

This dictionary is a well-grounded documentation of what the author terms the most useful vocabulary for learners of Spanish and is in line with important trends in both first and second language research that demonstrate the importance of frequency in language learning and use. As such it is a valuable resource for learners, teachers, and researchers interested in foreign language vocabulary acquisition. At the same time, this important information is provided to learners without significant guidance as to how they might use it to maximize their vocabulary learning. While no claim is made that the dictionary can stand on its own as a language-learning tool, many potential users, both learners and teachers, might require assistance in exploiting its potential, and its absence may detract from the practical effectiveness of the dictionary.

Technological advances over the past decades have enabled researchers to gather ever larger corpora of texts and extract ever more detailed information from them than would have been impractical if not impossible previously. In addition, the role of frequency in all levels of language structure has been increasingly noted (cf. Bod et al., 2003; Bybee & Hopper, 2001) and is receiving significant attention in second language acquisition as well (see Ellis, 2002 and responses). Within the realm of second-language vocabulary acquisition, there is a robust literature describing the application of frequency information to the task of vocabulary testing (e.g. Laufer & Nation, 1995; Read, 1993; Schmitt et al., 2001) that builds on Nation's (1990) observation, cited by the series editors in their preface to the dictionary reviewed here, that the majority of written and oral texts is comprised of a limited set of the most frequent words. While Davies does not provide an in-depth review of the technical aspects of this research, he does make a strong case for the use of frequency information in the learning of Spanish vocabulary, couched in prose accessible to most learners of Spanish.

Moreover, once establishing the importance of frequency information, Davies goes to great lengths to base his dictionary on a sufficiently robust, historically relevant and representative corpus, and to exercise appropriate care in deciding which words to count as the most frequent. The analysis of corpora, particularly large and diverse corpora, requires that a number of decisions be made regarding their content and labeling, not all of which have clear answers. Davies is thorough in documenting many of these decisions, offering examples and explanations of why, for example, even nominal uses of adjectives are counted as adjectives. Many learners might be surprised to find so many problems with tagging and lemmatization, and these explanations are clearly called for. At other times, however, little is said about the reasons or consequences of these decisions. One salient example is the weighting of the registers (fiction, non-fiction, and modified and unmodified oral) and of range and frequency information. A relatively equal weighting of these factors such as that utilized here seems a reasonable place to start, and the need for some weighting of the registers is well supported, but no explanation is given for the advantage given to fiction (40% vs. 30% each for nonfiction and combined oral). This may, however, be of more interest to the linguist than to the learner, and in any case, the frequency index includes annotations for words that are unusually frequent or infrequent in one particular register, making more fine-grained information readily available. Notwithstanding the brevity of some of the background information given, this dictionary is highly representative of the kind of frequency information that can best serve learners: a measure of the likelihood that they will encounter a given word in their use of Spanish for communication.

Alongside the dictionary's obvious strengths and quality, however, are significant weaknesses that may detract from its usefulness by the very learners and teachers it is intended to serve. The dictionary's stated goals are to enable learners of Spanish at all levels to more effectively learn vocabulary, and the introduction addresses both individual and classroom learners, as well as language teachers who might incorporate this dictionary into their syllabi and instruction. However, many learners and even teachers may not have any explicit understanding of the role of frequency in language, and while they may be easily convinced that frequency information is useful, they may require significant guidance in exploiting the full advantages of a frequency dictionary. The introduction to the dictionary, however, contains little more than a passing mention of strategies such as looking up a word in the part of speech index to make generalizations across larger segments of vocabulary, or even simply working through the list. A notable exception is the section on the thematic vocabulary lists. The bulk of the introduction is dedicated to the construction and analysis of the 20 million word corpus on which the dictionary is based. This discussion, while necessary and informative, albeit a bit technical, does not offer much help to the learner who, confronted with 5,000 Spanish words, is trying to decide where to start.

Related to the need for more guidance in using the dictionary, some of the most useful frequency information, the frequent collocates of any given word, is conspicuously absent from the dictionary. Such information is available, for example, in Sebastián-Gallés et al. (2000), although as Davies points out, this tool is difficult to obtain outside of Spain. However, collocations and a variety of other searches are freely available through the web interface of Corpus del Español (Davies, 2002), on which much of the frequency dictionary is based. With this resource and others like it, there is a relatively simple solution to the limitations of the frequency dictionary, as well as to the lack of guidance in its use. A listing of resources that might be used in conjunction with the dictionary, as well as references on the pedagogical use of frequency in vocabulary learning and teaching, would allow teachers and learners to most fully exploit the advantages afforded by the frequency dictionary itself without adding extensive explanations. It is unclear why such a list of references was not included.

The thematic vocabulary lists do afford learners an obvious way to synthesize frequency knowledge and apply it to larger problems in the use of Spanish. The semantically grouped lists are a particular improvement over many traditional textbooks that offer no frequency information at all, and may include highly infrequent items in topical vocabulary lists. Additionally, observations of potential interest, such as the relative bias towards concrete nouns in fictional texts, are given in some of these thematic lists. Other lists, however, may be less easy to incorporate into study or pedagogy. The list showing verbs used most frequently with the clitic 'se' contrasts particularly strongly with what learners may be accustomed to, at least in many classrooms. To the extent that the list in this dictionary more accurately represents actual usage, the frequency information is crucial, but including the frequency information of the reflexive verbs traditionally grouped together for pedagogical purposes would provide a useful comparison, and allow for a smoother interface between frequency information and other pedagogical approaches that may be more familiar to learners and teachers. Of course, this information is available in the indices of the dictionary itself, but it could have easily been summarized along with other uses of 'se'. Nonetheless, the thematic lists provide a springboard into a variety of interesting and informative explorations, and their inclusion only increases the value of the dictionary.

This frequency dictionary is a valuable resource, not only for the information it contains, but because it highlights the importance of word frequency in language use and, more specifically, in foreign language learning. The use of this dictionary in conjunction with other resources, as it is intended to be used, will provide a way of exploiting frequency information to the advantage of learners of all levels, although little guidance is given in the volume itself as to its incorporation into a program of study.

REFERENCES:

Bod, R., Hay, J., & Jannedy, S. (eds.) 2003. Probabilistic linguistics. Cambridge, MA: MIT Press.

Bybee, J. L., & Hopper, P. J. (eds.) 2001. Frequency and the emergence of linguistic structure. Amsterdam: John Benjamins.

Davies, M. 2002. Corpus del español.

Ellis, N. C. 2002. Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition, 24.143-188.

Laufer, B., & Nation, I. S. P. 1995. Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16.307-322.

Nation, I. S. P. 1990. Teaching and learning vocabulary. Boston: Heinle and Heinle.

Read, J. 1993. The development of a new measure of L2 vocabulary knowledge. Language Testing, 10.355-371.

Schmitt, N., Schmitt, D., & Clapham, C. 2001. Developing and exploring the behavior of two new versions of the vocabulary levels test. Language Testing, 18.55-88.

Sebastián-Gallés, N., Martí, M. A., Carreiras, M., & Cuetos, F. 2000. Lexesp, léxico informatizado del español. Barcelona: Ediciones de la Universitat de Barcelona.

ABOUT THE REVIEWER:

Matthew Carlson is a Ph.D. candidate in Hispanic Linguistics at the Penn State University. His primary research is on the role of frequency and usage in the adult second language acquisition of Spanish phonology. His other interests include usage-based approaches to grammar, particularly phonology, multicompetence, the role of working memory and phonological memory in language acquisition and use, and the effects of literacy and orthography on SLA.