Review of Fairness, Justice and Language Assessment

Reviewer: Carmen Ebner
Book Title: Fairness, Justice and Language Assessment
Book Author: Tim McNamara Ute Knoch Jason Fan
Publisher: Oxford University Press
Linguistic Field(s): Applied Linguistics
Consisting of nine chapters, Fairness, Justice and Language Assessment provides a detailed introduction and overview of Rasch analysis, which is a type of psychometric measurement used to analyse categorical data not only in language assessment, but also in fields such as healthcare and social science research. The aims of this book are twofold: besides exploring the distinction between the concepts ‘fairness’ and ‘justice’ and their role in language assessment, the main focus lies on demonstrating the usefulness of various Rasch measurements in increasing fairness in language assessments. By using Rasch analysis, it is possible, for instance, to identify the difficulty of each test item, which enables test creators to improve the test’s fairness by revising the test item composition.

Starting with a general description of test validity, McNamara et al. draw on Samuel Messick’s (1989, p.13)’s definition of validity as “an integrated evaluative judgement” of how appropriate and adequate inferences about a test taker’s abilities are, based on test scores. Fairness and justice tie in closely with the concept of validity as they specifically target the inferences’ appropriateness and adequacy. Yet, it is important to emphasise the difference between fairness and justice. McNamara et al. (2019, p.10) describe fairness as an internal quality of language assessments that includes for instance various types of rater effects: Does the language background of test raters have an influence on their assessment? Do novice examiners rate test takers’ performances differently than veteran raters? Justice, on the other hand, is considered external to the test and deals with how the test is being used by society. The book contains a few examples illustrating this concept such as the wildly debated suggestion to use the International English Language Testing System (IELTS) as a means to prove language proficiencies in Australian citizenship applications (McNamara et al., 2019, pp. 192-193). How appropriate and adequate the use of IELTS, which covers all four language skills (writing, reading, listening, and speaking), is in obtaining the Australian citizenship constitutes a valid question. The proposed pass mark of Band 6 (CEFR B2 level) in all four skills would make the Australian citizenship language requirements tougher than any European equivalent (McNamara et al., 2019, p. 192).

The main body of the book (Chapters 3 to 5) contains a descriptive introduction to four main Rasch models: the basic Rasch model, the Andrich rating scale model, the partial credit model and the many-facets Rasch model. The authors do not only explain statistical concepts and measurements for each model, but also provide a step-by-step illustration of how the different types of Rasch models are implemented in the supplementary material which is available on accompanying websites. It also contains the exercise files on which the book’s examples are based. While the basic Rasch model is used for dichotomous categorical data, such as incorrect/correct questions, the Andrich rating scale and the partial credit models are generally used for polytomous data. The Andrich rating scale model is mainly used for the assessment of performative skills (e.g. longer sections of speech or writing) which requires the use of Likert or semantic differential scales (e.g. scales ranging from strongly agree to strongly disagree). Partial credit models, on the other hand, are used for shorter responses to comprehension tasks (i.e. listening or reading), which could be scored as 0, 1, and 2, for instance. What is essential, however, is that neither of these Rasch models takes the rater as a potential influential factor into account. To assess and capture the potential influence of the rater on test scores the many-facets Rasch model can be used.

In Chapter 6, to illustrate the use of Rasch models in the field of language testing, McNamara et al. compiled an overview of studies applying this method to investigate fairness in language assessment. By drawing on a similar survey looking at the period from 1984 to 2009 (McNamara and Knoch, 2012), the authors suggest a growing popularity of the Rasch analysis. Chapter 7 are 8 are theoretical in nature and provide more background information on the development of the different types of Rasch models, another distinction between the different types of Rasch models as well as a section on criticism directed towards Rasch methods. This criticism is mainly focused on the fact that Rasch analysis focuses mainly on item difficulty, whereas other types of Item Response Theory (IRT) analysis also consider parameters such as the guessing behaviour of test takers for example.

Besides summarising the two main aims of the book in the conclusion, McNamara et al. included a discussion of the issue of justice in language assessment for which they made use of a few illustrative examples of inappropriate and inadequate, hence unjust, uses of language tests. The proposal to use IELTS as a requirement for a successful Australian citizenship application mentioned above is one of these examples.


McNamara et al.’s Fairness, Justice and Language Assessment (2019) constitutes a solid introduction to Rasch analysis. The book contains a good amount of theoretical and historical background to be able to contextualise this method in the field of language assessment and to recognise the advantages of applying Rasch models. Fulfilling one of the book’s main aims, the authors provide a convincing argument for the use of Rasch measurements to explore and increase test fairness.

Being written in a straightforward and instructive manner, the book is accessible for students and scholars who wish to gain an elementary understanding of how Rasch models can be used to address issues of fairness. What makes this book particularly useful are the excellent supplementary materials through which the reader obtains a guided hands-on experience with the different Rasch models. Unfortunately, these materials were not incorporated in the book but can only be accessed online. While the authors state space limitations as a reason for the separation between the hands-on exercises, theory and the explanations, one comprehensive guide to Rasch analysis would have facilitated a more natural and structured processing of, at times, complex statistical procedures. In addition, novices to Rasch analysis could have further benefited from the inclusion of a glossary.

Fairness, Justice and Language Assessment is generally well organised and particularly well written. While McNamara et al. have included a good amount of theoretical and historical background regarding the evolution of different Rasch models, Chapters 7 and 8 contain some general background information which would have been better placed at the beginning. For instance, these two chapters contain a good overview of the family of Rasch models and its relatives which could have been placed in one of the introductory chapters, as this would have helped the reader to contextualise the Rasch models better. It is, however, commendable that the authors also included a brief section on two alternatives to Rasch models, Generalizability theory, also known as G-theory, and Structural Equation Modelling in Chapter 8. The descriptions of these alternative approaches and their comparisons to Rasch models are very well written, albeit brief.

Overall, McNamara, Knoch and Fan have very competently illustrated how Rasch models can be used to address fairness as a test’s internal quality. Using relatable and well-explained language assessment examples, the authors describe the key output of the software programs (e.g. Winstep and FACETS) in detail, both in the book and the supplementary materials. Thus, the reader gets a clear explanation and demonstration of how to interpret the software’s output, such as Wright maps, item tables and category probability curves.

While fairness is covered extensively by the authors, the issue of justice in language assessment could have been elaborated on in more depth. Drawing on Messick (1989), the authors make clear that “language testing is a thoroughly social, even political activity” (McNamara et al., 2019, p. 197), which requires taking the social and political contexts of language testing into account. With language being a social medium, the importance of the social dimension of language testing has already been mentioned by McNamara and Roever (2006). The aforementioned case of proposed changes to the Australian citizenship application serves as an excellent example which illustrates the social and political characteristics of language assessment. Whether and how Rasch methods could be used to address injustice in language assessment constitute interesting questions which require a more in-depth explanation and discussion.


Dr Carmen Ebner is a sociolinguist currently working for Cambridge Assessment English as a Projects Assistant. Her PhD examined attitudes towards stigmatised and disputed usage features in British English. Carmen’s research interests include language variation and change, historical sociolinguistics, corpus linguistics and language and identity.

