How do you pronounce biopic, synod, and Breughel? - and why? Do our cake and archaic sound the same? Where does the stress go in stalagmite? What's odd about the word epergne? As a finale, the author writes a letter to his 16-year-old self.
Date: Wed, 11 May 2005 03:07:45 -0700 From: Danko Sipka Subject: Language Testing and Validation: An Evidence-Based Approach
AUTHOR: Weir, Cyril J. TITLE: Language Testing and Validation SUBTITLE: An Evidence-Based Approach SERIES: Research and Practice in Applied Linguistics PUBLISHER: Palgrave Macmillan YEAR: 2005
Danko Sipka, Department of Languages and Literatures, Arizona State University
The present work addresses one of the most socially important aspects of applied linguistics -- the issue of lingometric validity. The book is written by an insider to the field, Prof. Cyril Weir of Roehampton University (United Kingdom), who holds an impressive international record of testing and curriculum development (see http://www.roehampton.ac.uk/staff/CyrilWeir). Prof. Weir's work appears in the Palgrave (http://www.palgrave.com) series Research and Practice in Applied Linguistics, and is designed for the "MA or PhD student in Applied Linguistics, TESOL or similar subject areas and for the language professional keen to extend their research experience" (Weir, 2005:ii).
Confronted with the lack of validity evidence for many tests, and the serious social consequences of using invalidated tests, the author sets the following highly demanding task: "To improve test fairness we need an agenda for reform, which sets out clearly the basic minimum requirements for sound testing practice" (Weir, 2005:12). In order for a test to be considered valid, "...multifaceted and different types of evidence are needed to support claims for the validity scores on a test" (Weir, 2005:13). The goal of this book is then to set a model allowing a thorough, reliable, and multi-pronged approach to validation.
The book is comprised of four sections. The introductory part, "Testing as Validity Evidence," sets the stage by emphasizing the importance of language testing validity, reviewing the history of English language testing, discussing the nature of test validity, and introducing a priori and a posteriori validity evidence. A priori validity evidence (i.e., prior to the text event) draws from theory-based validation and context validity. The term context validity is the author's replacement for the traditional concept of content validity. Adopting the socio-cognitive approach to testing, the author sees context validity as: "...the extent to which the choice of tasks in a test is representative of the larger universe of tasks of which the test is assumed to be a sample" (Weir, 2005:19). A posteriori validity evidence lies in scoring validity (an umbrella term encompassing various aspects of reliability), criterion- related validity (correlation between the test score and a relevant external performance criterion), and consequential validity (concern for the social consequences of testing).
The second part, "New Framework for Developing and Validating Tests of Reading, Listening, Speaking and Writing", places the five key elements of a validation framework (context validity, theory-based validity, scoring validity, consequential validity, criterion-related validity) into socio- cognitive models for validating reading, listening, speaking and writing tests. The model for all four domains of testing starts with test taker characteristics, and then includes interrelated context and theory-based validity. Response is consequently evaluated using scoring validity parameters. Finally, the score/grade is evaluated based on consequential and criterion-related validity.
Test taker characteristics include physical/physiological conditions (short-term ailments and longer-term disabilities), psychological traits (cognitive and affective), and experiential characteristics (education, experience, etc.). Context validity comprises task setting, task demands, and test administration. Theory-based validity addresses various aspects and components of reading, writing, listening, and speaking. Considerable space in this second part is devoted to discussing various response formats and to the techniques aimed at securing scoring validity. Criterion-based validity entails the use of an external criterion, such as comparison with other tests/measurements, with either future performance or external benchmarks. Consequential validity needs to address the issues of differentiality (different effects of the results on different groups of test takers), washback (influence of tests on teaching and learning in a variety of settings), and effect on society.
The third part, "Generating Validity Evidence," explores research methodologies, techniques, and instruments for exploring the validity of a test. The author examines the instruments for all five types of validity. Having primarily MA and PhD students in mind, Prof. Weir provides a succinct summary of the criteria any research in this field should meet: "It should be believable..., It should be logical..., It should be feasible..., It should be important to the person doing it..., It should have value/interest..., It should have relevance" (Weir, 2005: 221-222).
The fourth part, "Further Resources in Language Training," provides a list of books, journals, professional associations, principal testing conferences, email lists and bulletin boards, internet sites, databases, and statistical packages related to language testing.
The book is equipped with a list of literature, a subject index, and lists of further reading after each subsection.
The present work should be commended for a serious examination of an important social problem with due meticulousness. Language testing is one of the fields of linguistic activities with direct and tangible bearing on the human condition. One's educational and employment prospects are oftentimes determined by the results of various language tests, which necessitate the utmost professionalism in preparing, administering, and grading/scoring. However, we are frequently confronted with criticism of major language tests, such as Johnson (2001) who pointed out the inadequacies of the Oral Proficiency Interview (OPI, see http://www.dlielc.org/testing/opi_test.html).
Prof. Weir's book is an attempt to dissect the testing process into its constituents and provide a credible model of validating each segment of the process. The comprehensiveness and thoroughness of the model presented in this book is impressive. The model is furthermore consistent, and concrete solutions for each segment of the validation process are provided. Equally strong is the breadth and depth of the literature in the field as presented in each section of the book. The fact that each chapter lists further reading is most valuable. An additional forte of the present work is the rich exemplification of all key aspects of validation using major language proficiency tests. Finally, a summary of the resources for language testing is highly useful.
There are two minor formal disadvantages of this work. First, overabundance of the boxed citations is at times distracting. One can fully understand and justify the use of the boxes within the text in those cases where the key concepts need to be explained. However, in this book this graphic convention was overused, even for those instances where a simple quotation embedded in the text would fully suffice. Second, the book is not equipped with a names index. Given frequent references to various authors in the field, and with graduate students as the book's principal target audience, an author index should have been an integral part of this work.
The present work addresses validity in a general context, and more specifically in the context of major language proficiency tests. Language ability is also assessed for less commonly taught languages classes at different levels, where no major tests and testing procedures are available. It may be useful, perhaps in the second edition of this work, to address the issue securing maximal possible validation with limited resources. For example, how can a teacher of Tatar achieve maximum validity of his/her final exam, or a test assessing if the students meet a university foreign language requirement, in a situation where limited time and resources prevent him/her from deploying all validation procedures?
All aforementioned minor problems notwithstanding, Prof. Weir's book provides a very useful language testing resource to a broad community of applied linguists. This study will be particularly beneficial for graduate students of applied linguistics, the principal target group of the series " Research and Practice in Applied Linguistics."
Let us hope that the following wishes of the author find a fertile ground in the world of language testing: "It is hoped that this book will provide some help in clarifying the areas of test validity that we need to address and that it will encourage Examining Boards and all test developers to embark on a validity research agenda tailored to the level of stakes of the tests they are involved in" (Weir, 2005: 284).
Johnson, M. (2001). The Art of Non-Conversation. New Haven: Yale University Press.
Weir, C. (2005). Language Testing and Validation: An Evidence-Based Approach. London: Palgrave Macmillan.
The author would like to extend his gratitude to Bryan Moore for proofreading this review.
ABOUT THE REVIEWER:
ABOUT THE REVIEWER
Danko Sipka (http://www.public.asu.edu/~dsipka) holds a PhD and Habilitation in Slavic Linguistics and a doctorate in Psychology. He is a professor of Slavic languages at the Arizona State University Department of Languages and Literatures. His numerous publications include the recent volumes Serbo-Croatian-English Colloquial Dictionary (2000) and A Dictionary of New Bosnian, Croatian, and Serbian Words (2002).