Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

New from Oxford University Press!


Cognitive Literary Science

Edited by Michael Burke and Emily T. Troscianko

Cognitive Literary Science "Brings together researchers in cognitive-scientific fields and with literary backgrounds for a comprehensive look at cognition and literature."

New from Cambridge University Press!


Intonation and Prosodic Structure

By Caroline Féry

Intonation and Prosodic Structure "provides a state-of-the-art survey of intonation and prosodic structure."

Review of  Testing and Assessment in Translation and Interpreting Studies

Reviewer: Wu Zhiwei
Book Title: Testing and Assessment in Translation and Interpreting Studies
Book Author: Claudia V. Angelelli Holly E. Jacobson
Publisher: John Benjamins
Linguistic Field(s): Applied Linguistics
Discipline of Linguistics
Issue Number: 21.2757

Discuss this Review
Help on Posting
EDITORS: Claudia V. Angelelli; Holly E. Jacobson
TITLE: Testing and Assessment in Translation and Interpreting Studies
SUBTITLE: A call for dialogue between research and practice
SERIES TITLE: American Translators Association Scholarly Monograph Series XIV
PUBLISHER: John Benjamins
YEAR: 2009

Wu Zhiwei, Faculty of English Language and Culture, Guangdong University of
Foreign Studies


Despite the importance of the quality issue in translation and interpreting,
testing and assessment research in these two fields has not been well supported
by the academic community. ''Of these four fields [theory, practice, pedagogy and
evaluation], however, translation evaluation has remained the least developed…''
(Arango-Keeth and Koby 2003:117) and ''there has been little recognition in
translation and interpreting circles that educational measurement as a broader
field has its own tradition of scholarship'' (Campbell & Hale 2003:205). Against
this backdrop, the present volume gives a strong push to development in testing
and assessment research and encourages, as the subtitle claims, ''a dialogue
between research and practice''. This volume, as one of the American Translators
Association Scholarly Monograph Series, consists of eleven chapters plus the
introduction, among which five of the chapters are related to topics in
translation, while six are about interpretation (inclusive of sign
interpretation). The research methodologies in these chapters are varied,
ranging from theory probing to empirical studies, from descriptive case studies
to corpus-based research. Yet they all share one central theme: testing and
assessment in their own rights.


In the introduction, Angelelli and Jacobson explain that testing and assessment
are important instruments in education programs, professional certification and
research endeavors. They review some of the theoretical terms and models
surrounding quality in translation, point out the insufficiencies of assessment
constructs in interpreting and discuss the lack of empirical studies in both
translation and interpreting quality. Based on this brief background, they
introduce the gist of each paper collected in this volume.

Part 1 Theoretical Applications
Part 1 focuses on theory of assessment in translation and (healthcare)
interpreting. It begins with Angelelli's introduction about the language testing
concepts and their links with translation assessment. She then points out
limitations in the current American Translators Association (ATA) certification
and proposes a 5-point-scale scoring rubric, a more inclusive testing tool, in
assessing translators' ability. Her rubric entails five categories, namely
source text meaning, target text style and cohesion, situational
appropriateness, grammar and mechanics, and translation skill (p.42-3). She
argues that this is an encompassing rubric to test the linguistic, textual,
pragmatic and strategic competence of a translator in summative tests.

In Holly E. Jacobson's chapter, she goes beyond linguistic competence and takes
into account interactional sociolinguistics and conversation analysis to
approach performance-based assessment in (healthcare) interpreting. She argues
that the development of an assessment rubric should consider three important
factors: (1) identifying theoretically grounded competences to be measured; (2)
operationalizing the sub-component of each competence and (3) assessment should
be authentic or near-authentic. Based on these three factors, she demonstrates
how the interactive competences such as ''contextualization cues'' and ''discourse
management'' are assessed by means of a four-level analytical rubric.

Part 2 Empirical approaches
Part 2 adopts empirical approaches to the quality issue in translation and
interpreting assessment. In the chapter by June Eyckmans, Philippe Anckaert and
Winibert Segers, they compare three assessment/measurement methods, i.e. the
intuitive-impressionistic (holistic) method, assessment grids, and Calibration
of Dichotomous Items (CDI). Their empirical study, involving 113 participants
across B.A. and M.A. student levels, concludes that the holistic and analytical
methods fail in reliability and discriminating power, though the latter seemed
to lead to better inter-subjective agreement. The CDI method, however, is the
best among the three in terms of reliability and consistency. Given the
requirement of constant monitoring of the implementation of this method, the
authors also argue that CDI is ''only to be promoted for use in summative
context'' (p.87).

In Elisabet Tiselius' chapter, she revisits Carroll's scales, which were
intended for measuring the intelligibility and informativeness of a
machine-translated text, and applies the scales to the assessment of
interpreting. She taps into the possibility and validity of using
non-professionals as graders in interpretation assessment by means of the
adapted Carroll's scales. Three different levels of interpreters (no experience,
short, and long experience) and two groups of graders (professional and
non-professional) were involved in the study. Because of the similar tendencies
between the two groups of graders, Tiselius argues that non-interpreter graders
can grade interpreters' performance in the achievement tests and research
context with the help of the revised Carroll's scale.

In Mira Kim's chapter, she applies systemic functional linguistics to the
meaning-oriented assessment of translation, in order to improve the existing
error deduction method adopted by the National Accreditation Authority for
Translators and Interpreters (NAATI). She categorizes translation errors into
major and minor ones. The major ones consist of four sub-categories:
experiential, logical, interpersonal and textual, each of which exerts their
influence on accuracy and naturalness in the rendition of translation. These
errors are judged in the perspectives of lexis, clause and text with 1-2, 1-3
and 3-5 deduction points, corresponding with each sub-category. She then
explains each sub-category with detailed examples of English-to-Korean
translation and explains the magnitude of the error and the corresponding points
deducted. To explore the pedagogical implications, she conducts a survey to
solicit students' opinions on the application of this meaning-based assessment
in translation class and in formative assessment. Results show that a large
proportion of students regard this method appropriate for them to enhance their
critical thinking towards translation and translation competence and skills.

In the chapter by Brian James Baer and Tatyana Bystrova-McIntyre, they contend
that corpora can be used to remove the subjectivity and randomness of
translation assessment and to achieve better pedagogical results. They cite
examples of differences in punctuation, sentencing and paragraphing between
English and Russian with the help of bilingual corpora. In the comparative
analysis of punctuation they study the average punctuation use per 1,000 words
and find that commas, colons, dashes/em-dashes and parentheses are significantly
more frequently used in Russian. Thus, translators should not copy graphic
features without adaptation. It is justified for translators to preserve the
emphatic use of punctuation. In the analysis of sentencing and paragraphing,
they compare sentence length and paragraph length in three text types:
editorials, literature, and international news, and find that the average number
of words per sentence in English is significantly higher. Based on these
statistical findings, they come up with a framework for errors made in
punctuation use for the formative and summative assessments of translations.

In Keiran Dunne's chapter, he considers the lesser-known topic of assessing
software localization. As an absence of a standardized definition of
localization will impair the legitimacy of discussion of assessment, evaluation
or improvement thereof, Dunne first puts forward his definition of localization
and reviews the software development process and software quality measurement.
Then he contends that localized software quality should be tested in the
categories of linguistic, cosmetic, and functional characteristics. He
elaborates on the types of defects in linguistic testing and introduces two
metrics to assess translation quality in localization projects. He argues,
however, that these two metrics do not lead to objective assessment, despite the
accuracy, equivalence and consistency they claim to ensure. He continues his
discussion with causes of non-objective assessment by citing the inherent
problems of comprehensibility of the software. After that, he also touches upon
cosmetic and functional testing. Finally, he places quality assessment in the
perspectives of the vendors and clients respectively and concludes that quality
management in localization projects should take into account customers'
perception and expectations.

Part 3 Case Studies
Part 3 contains chapters dealing with different cases in different testing
settings. In the chapter by Šárka Timarová and Harry Ungoed-Thomas, they discuss
the important issue of the predictability of admission tests in interpreting
training offered by institutes in Europe. They relate interpreting admission
tests to foreign language aptitude testing, and reiterate the need to test the
candidates' capacity for acquiring consecutive interpreting (CI) and
simultaneous interpreting (SI) skills, instead of testing CI or SI skills per
se. They then review two lines of research in this respect: developing new
aptitude tests and validating the current admission tests. Against this
background, their studies use regression models to tap into the predictability
of admission tests (written and oral) on the final exam in a university. By
tracking the records of 184 students sitting the admission test (possible
components of which are a written test, an oral summary, and an oral
presentation), they compare the admission score with the final exam and find
that the admission tests taken as a whole have weak predictability on the final
exam. Based on this, they advocate interpreting training schools to locate the
proper latent constructs to establish robust and reliable aptitude tests.

On a related topic, Karen Bontempo and Jemina Napier study the efficacy of
admission tests for sign language interpreters in Australia, which have never
been researched before. Against the issue of ad hoc and non-standardized
measures of screening sign interpreters, they come up with three research
questions: the adequacy of the existing programs in training interpreters; the
extent to which admission tests in spoken interpreting can be adapted and
applied to sign interpreting, and the predictability of such tests. In order to
answer these questions, they conducted two empirical studies: a survey and an
admission test. The survey, which involved 110 NAATI-accredited practitioners,
was conducted to solicit predictors of admission tests by identifying the skill
gaps noted by the practitioners, and their opinions on existing training
programs. The survey found that training programs were not adequate to prepare
them to enter the profession. The authors then incorporated the named
significant skills gap, presumably the key components, into the screening test,
the second study. They devised a host of six admission test components, details
of which are explained in length by the authors, and tested 18 applicants. Among
the 11 students recruited, at the end of the course, their final examination
pass rate was 55%, thus the weak predictability of the admission test.

In the case study presented by Hildegard Vermeiren, Jan Van Gucht and Leentje De
Bontridder, they account for the certification tests taken by social
interpreters in Flanders, Belgium, who are yet to be accredited. To ensure
objectivity, the test adopts the criterion-referenced grid-based approach, which
consists of four parts: Languages proficiency test (Dutch and other language),
reproduction (in source language), transfer (sight translation) and role play
(the candidate acts as interpreter for a Dutch expert and the expert in the
other language). A summary evaluation grid would be filled out by the exam board
to determine whether the candidate is granted certification. Detailed
explanation and description of these four tests are given by the authors. To
warrant the legitimacy and consequential validity, the authors also justify the
issues of subjectivity, triangulation, graders, and more. Finally, the authors
call for further research to develop the evaluation grid.

In the final chapter by Debra Russell and Karen Malcolm, they explore the
national certification test for signed language interpreters in Canada. They
first review the establishment of the Association of Visual Language
Interpreters in Canada (AVLIC) and its initial test, which consisted of a
written test and a test of interpretation. The ratings of the test are
three-fold: English, American Sign Language (ASL), Message Equivalency (ME).
Only the candidates who pass both the English and ASL tests will be graded by
the ME rater. This initial testing methodology is problematic in that ME raters
may know the candidates and pass unqualified candidates. They then review the
Australian and American practices in the similar signed language interpretation
accreditation context and compare them with Canadian practices. They come up
with a new four-step testing model: a written test of knowledge, a test of
interpretation preparation workshops, a test of interpretation (which allows
candidates pre-test on-line access to the speakers' presentation on a different
topic) and certificate maintenance. As to the rating process, the English
language domain was eliminated, while retaining the ASL domain to be
re-evaluated. This model, they point out, is subject to ongoing adjustments as
deemed necessary.


This is a comprehensive volume dealing with issues of quality assessment and
testing in translation and interpretation, and software localization and signed
language interpretation, which are relatively under-represented by researchers.
Readers are presented with theory and its application, empirical studies and
case studies.

As McAlester (2000:231) point out, there have been relatively few empirical
studies on assessment within university level translation programs. In this
sense, this volume is a valuable and enlightening one in the empirical studies
on university translation and interpreting programs, because quite a few
chapters discuss the assessment and testing issue by means of an empirical
approach and bring forth their findings based on data collected from university
students (cf. Eyckmans et al., Kim, Timarová & Ungoed-Thomas, Bontempo & Napier).

The findings in this volume will provide insights into the understanding of the
status quo in certification tests and admission tests, and also into the
improvement of evaluation methods and development of new testing tools for
quality assessment. This volume is a stepping stone for researchers,
practitioners, test designers, course instructors and other stakeholders to
blaze a new trail to rethink their test constructs and test methodologies in
summative, formative or diagnostic assessments in translation and interpretation.

Whilst readers appreciate the editors' intention to make this volume inclusive,
they may find some domain-specific jargon difficult to understand. For example,
Dunne's chapter is heavily strewn with computer jargon and terminology. Laymen
readers may find difficulty in reading it. As far as topic coverage is
concerned, to use the classification by Martínez Melis and Hurtado Albir (2001),
this volume covers two of the ''three areas of evaluation'', namely ''the
evaluation in professional translation practice'' and ''Evaluation in Translation
Teaching'', but falls short of ''the Evaluation of Published Translations''.

Central to the discussion of testing and assessment are the notions of
reliability and validity. These two notions are the underlying theme constantly
scrutinized and examined in this volume. In the discussion of topics presented
in each chapter, these two concepts are also yardsticks to assess the degree to
which the discussion is sound and justified.

As regards the reliability issue, since the repertoire of interpreting skills
differs from that of translation (McKay 2006:32, Colin & Morris 1996), the idea
of borrowing the assessment method in (machine) translation to interpretation is
vaguely grounded. In Tiselius' study, interpretation was transcribed to be
graded. In a real situation, can we transcribe all interpretations before they
are graded? This would be time-consuming. Would the reliability of Tiselius'
scale be challenged by the reality of assessing a recorded interpreting
performance instead of transcription?

In addition, on page 105, in the explanation of the standard of ''long
experience'', the description says ''more than 20 years'', but in the table on the
same page, it increases to ''>25''. This is potentially confusing to readers.
Finally, the reasons for defining interpreters with two years experience as
having ''short experience'' are not clear, as there is a huge gap of around 20
years between interpreters with short experience and those with long experience.

Next is the validity issue. As two chapters deal with the topic of aptitude
tests, the legitimacy of aptitude tests is scrutinized. In their conclusions,
they all point to the weak predictability of aptitude tests, but the root causes
of the mismatch between the entrance exam and the final one remain less
conclusive. For example, in Bontempo and Napier's chapter, they invalidate the
admission test by citing the pass rate of the final exam, developed by Technical
and Further Education (TAFE). Before they come to this conclusion, though, they
should explain what potential abilities the admission test measures, what
interpreting skills/abilities or (to use the language testing term) constructs
the TAFE measures, and the validity thereof. Put in another way, even if the
admission test is valid in measuring potential abilities, if the TAFE test is
not valid in measuring interpreting skills, the result would still be a weak

Before the universal application of the test tools and components proposed in
the case studies, larger-scale empirical studies involving more subjects should
be done to secure reliability and validity. This is the path for future
researchers to take in order to expand on their implications.


Arango-Keeth, Fanny and Geoffrey S. Koby. 2003. Translation training evaluation
and the needs of industry quality assessment. In ''Beyond the ivory tower:
rethinking translation pedagogy'', Brian James Baer & Geoffrey S. Koby (eds.).
Amsterdam/ Philadelphia: John Benjamins Publishing Company.

Campbell, Stuart and Sandra Hale. 2003. Translation and Interpreting Assessment
in the Context of Educational Measurement. In ''Translation Today: Trends and
Perspectives'', Gunilla M. Anderman & Margaret Rogers (eds.). Multilingual Matters.

Colin, Joan and Ruth Morris. 1996. Interpreters and the Legal Process,
Winchester, Waterside Press.

Martínez Melis, Nicole and Amparo Hurtado Albir. 2001. ''Assessment in
Translation Studies: Research Needs'', Meta: Translators' Journal, vol. 46, NO.2,
p. 272-287.

McAlester, G. 2000. The evaluation of translation into a foreign language. In
Developing Translation Competence. C. Schäffner & B. Adab (eds). Benjamins
Translation Library 38. Amsterdam: Benjamins, 229-241.

McKay, Corinne. 2006. How to Succeed as a Freelance Translator. Lulu Press.

WU Zhiwei is currently an Assistant Lecturer in Faculty of English Language and Culture, Guangdong University of Foreign Studies. He is the chapter contributor and co-author of two interpreting course books and also a practicing conference interpreter, accredited by China Accreditation Test for Translators and Interpreters (CATTI). His research interests include quality assessment in interpreting, interpreters' role and interpreting pedagogy.