Date: Wed, 11 May 2005 03:07:45 -0700
Danko Sipka
Language Testing and Validation: An Evidence-Based Approach

Danko Sipka, Department of Languages and Literatures, Arizona State


The present work addresses one of the most socially important aspects of
applied linguistics -- the issue of lingometric validity. The book is
written by an insider to the field, Prof. Cyril Weir of Roehampton
University (United Kingdom), who holds an impressive international record
of testing and curriculum development (see Prof. Weir's work appears in
the Palgrave ( series Research and Practice in
Applied Linguistics, and is designed for the "MA or PhD student in Applied
Linguistics, TESOL or similar subject areas and for the language
professional keen to extend their research experience" (Weir, 2005:ii).

Confronted with the lack of validity evidence for many tests, and the
serious social consequences of using invalidated tests, the author sets
the following highly demanding task: "To improve test fairness we need an
agenda for reform, which sets out clearly the basic minimum requirements
for sound testing practice" (Weir, 2005:12). In order for a test to be
considered valid, "...multifaceted and different types of evidence are
needed to support claims for the validity scores on a test" (Weir,
2005:13). The goal of this book is then to set a model allowing a
thorough, reliable, and multi-pronged approach to validation.

The book is comprised of four sections. The introductory part, "Testing as
Validity Evidence," sets the stage by emphasizing the importance of
language testing validity, reviewing the history of English language
testing, discussing the nature of test validity, and introducing a priori
and a posteriori validity evidence. A priori validity evidence (i.e.,
prior to the text event) draws from theory-based validation and context
validity. The term context validity is the author's replacement for the
traditional concept of content validity. Adopting the socio-cognitive
approach to testing, the author sees context validity as: "...the extent
to which the choice of tasks in a test is representative of the larger
universe of tasks of which the test is assumed to be a sample" (Weir,
2005:19). A posteriori validity evidence lies in scoring validity (an
umbrella term encompassing various aspects of reliability), criterion-
related validity (correlation between the test score and a relevant
external performance criterion), and consequential validity (concern for
the social consequences of testing).

The second part, "New Framework for Developing and Validating Tests of
Reading, Listening, Speaking and Writing", places the five key elements of
a validation framework (context validity, theory-based validity, scoring
validity, consequential validity, criterion-related validity) into socio-
cognitive models for validating reading, listening, speaking and writing
tests. The model for all four domains of testing starts with test taker
characteristics, and then includes interrelated context and theory-based
validity. Response is consequently evaluated using scoring validity
parameters. Finally, the score/grade is evaluated based on consequential
and criterion-related validity.

Test taker characteristics include physical/physiological conditions
(short-term ailments and longer-term disabilities), psychological traits
(cognitive and affective), and experiential characteristics (education,
experience, etc.). Context validity comprises task setting, task demands,
and test administration. Theory-based validity addresses various aspects
and components of reading, writing, listening, and speaking. Considerable
space in this second part is devoted to discussing various response
formats and to the techniques aimed at securing scoring validity.
Criterion-based validity entails the use of an external criterion, such as
comparison with other tests/measurements, with either future performance
or external benchmarks. Consequential validity needs to address the issues
of differentiality (different effects of the results on different groups
of test takers), washback (influence of tests on teaching and learning in
a variety of settings), and effect on society.

The third part, "Generating Validity Evidence," explores research
methodologies, techniques, and instruments for exploring the validity of a
test. The author examines the instruments for all five types of validity.
Having primarily MA and PhD students in mind, Prof. Weir provides a
succinct summary of the criteria any research in this field should
meet: "It should be believable..., It should be logical..., It should be
feasible..., It should be important to the person doing it..., It should
have value/interest..., It should have relevance" (Weir, 2005: 221-222).

The fourth part, "Further Resources in Language Training," provides a list
of books, journals, professional associations, principal testing
conferences, email lists and bulletin boards, internet sites, databases,
and statistical packages related to language testing.

The book is equipped with a list of literature, a subject index, and lists
of further reading after each subsection.


The present work should be commended for a serious examination of an
important social problem with due meticulousness. Language testing is one
of the fields of linguistic activities with direct and tangible bearing on
the human condition. One's educational and employment prospects are
oftentimes determined by the results of various language tests, which
necessitate the utmost professionalism in preparing, administering, and
grading/scoring. However, we are frequently confronted with criticism of
major language tests, such as Johnson (2001) who pointed out the
inadequacies of the Oral Proficiency Interview (OPI, see

Prof. Weir's book is an attempt to dissect the testing process into its
constituents and provide a credible model of validating each segment of
the process. The comprehensiveness and thoroughness of the model presented
in this book is impressive. The model is furthermore consistent, and
concrete solutions for each segment of the validation process are
provided. Equally strong is the breadth and depth of the literature in the
field as presented in each section of the book. The fact that each chapter
lists further reading is most valuable. An additional forte of the present
work is the rich exemplification of all key aspects of validation using
major language proficiency tests. Finally, a summary of the resources for
language testing is highly useful.

There are two minor formal disadvantages of this work. First,
overabundance of the boxed citations is at times distracting. One can
fully understand and justify the use of the boxes within the text in those
cases where the key concepts need to be explained. However, in this book
this graphic convention was overused, even for those instances where a
simple quotation embedded in the text would fully suffice. Second, the
book is not equipped with a names index. Given frequent references to
various authors in the field, and with graduate students as the book's
principal target audience, an author index should have been an integral
part of this work.

The present work addresses validity in a general context, and more
specifically in the context of major language proficiency tests. Language
ability is also assessed for less commonly taught languages classes at
different levels, where no major tests and testing procedures are
available. It may be useful, perhaps in the second edition of this work,
to address the issue securing maximal possible validation with limited
resources. For example, how can a teacher of Tatar achieve maximum
validity of his/her final exam, or a test assessing if the students meet a
university foreign language requirement, in a situation where limited time
and resources prevent him/her from deploying all validation procedures?

All aforementioned minor problems notwithstanding, Prof. Weir's book
provides a very useful language testing resource to a broad community of
applied linguists. This study will be particularly beneficial for graduate
students of applied linguistics, the principal target group of the
series " Research and Practice in Applied Linguistics."

Let us hope that the following wishes of the author find a fertile ground
in the world of language testing: "It is hoped that this book will provide
some help in clarifying the areas of test validity that we need to address
and that it will encourage Examining Boards and all test developers to
embark on a validity research agenda tailored to the level of stakes of
the tests they are involved in" (Weir, 2005: 284).


Johnson, M. (2001). The Art of Non-Conversation. New Haven: Yale
University Press.

Weir, C. (2005). Language Testing and Validation: An Evidence-Based
Approach. London: Palgrave Macmillan.


The author would like to extend his gratitude to Bryan Moore for
proofreading this review.


Danko Sipka ( holds a PhD and
Habilitation in Slavic Linguistics and a doctorate in Psychology. He is a
professor of Slavic languages at the Arizona State University Department
of Languages and Literatures. His numerous publications include the recent
volumes Serbo-Croatian-English Colloquial Dictionary (2000) and A
Dictionary of New Bosnian, Croatian, and Serbian Words (2002).

