Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!


Voice Quality

By John H. Esling, Scott R. Moisik, Allison Benner, Lise Crevier-Buchman

Voice Quality "The first description of voice quality production in forty years, this book provides a new framework for its study: The Laryngeal Articulator Model. Informed by instrumental examinations of the laryngeal articulatory mechanism, it revises our understanding of articulatory postures to explain the actions, vibrations and resonances generated in the epilarynx and pharynx."

New from Oxford University Press!


Let's Talk

By David Crystal

Let's Talk "Explores the factors that motivate so many different kinds of talk and reveals the rules we use unconsciously, even in the most routine exchanges of everyday conversation."

We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at***

Review of  Corpus-Based Translation Studies

Reviewer: Ulrike Stange-Hundsdörfer
Book Title: Corpus-Based Translation Studies
Book Author: Alet Kruger Kim Wallmach Jeremy Munday
Publisher: Bloomsbury Publishing (formerly The Continuum International Publishing Group)
Linguistic Field(s): Text/Corpus Linguistics
Issue Number: 25.4633

Discuss this Review
Help on Posting
Review's Editor: Helen Aristar-Dry


This book is a collection of twelve papers by different authors. As the title suggests, the common denominator of all contributions is a concern with the relatively new branch of corpus-based translation studies, viz. research in the field of translation (and interpreting) based on corpora of translation (and interpreting) data collected on different occasions. In particular, the work focuses on the past and potential benefits of corpus-based approaches in investigating the “key questions of translation and interpreting” (1). This volume is divided into three main parts, moving from concepts and tools to methods to specific studies. The first part accounts for half of the book, comprising five chapters. The second part is the shortest of all and includes three papers. The final part contains the remaining four chapters. All papers have an average length of 20 pages, with chapter 2 being the prominent exception (40 pages). The individual chapters are briefly summarised below.

Part I: Core Concepts and Tools

Chapter 1: “Corpus-Based Translation Studies: Where Does It Come From? Where Is It Going?” by Sara Laviosa (13-32)

The first chapter explores the past, the present and the potential future of corpus-based studies in descriptive and applied translation studies. The author stresses the effect that the use of corpora had for a variety of linguistic disciplines and identifies the strong links established between corpus linguistics (CL) and Descriptive Translation Studies (DTS) in the early 1990s as “the key to the success story” of Corpus-Based Translation Studies (CTS) (14). Laviosa also briefly sketches a number of related studies, including their major findings and the implications they have for a variety of theoretical approaches. Discussing the important issue of corpora and representativeness, the author comes to the conclusion that “there is always a trade-off between balance and comparability, on the one hand, and representativeness on the other” (19). The paper also offers a brief survey of corpus-based cross-linguistic and cross-cultural research, again sketching relevant studies and their findings.

Chapter 2: “Corpus-Based Interpreting Studies (CIS): Overview and Prospects” by Robin Setton (33-75)

Chapter 2 outlines the history of corpus-based interpreting studies (CIS) and is rather long in comparison to the other chapters. The author briefly contrasts CL, DTS and CIS and discusses the additional benefits that interpreting corpora (the data consist of recordings and/or transcriptions of interpretations) might have -- such as “test[ing] the predictions of cognitive and pragmatic models of speech communication” (35). The chapter includes a short overview of the different types of interpreting, highlighting their particularities and the challenges interpreters face in terms of cognitive and linguistic demands in the different interpreting situations. Setton lists the corpus-based interpreting studies that have been conducted so far, providing details on languages considered, corpora size, availability of transcriptions, foci of analysis, etc. These studies are only commented upon selectively, however, and details pertaining to results are very scarce. The author also discusses the problems of collecting, encoding and analysing interpreting data, problems that are further complicated by the fact that CIS also need to consider the dimensions of orality, multilingualism and synchronicity. Another issue pertaining to CSI corpora involves the question of how to transcribe the data (priority given to details or readability) and which features to include in the tagging procedure. Evidently, the type of features a researcher will want to tag depends on the research question. Ideally, plenty of details are provided in the transcript to be shown or hidden as needed using software tools. Compiling CSI corpora is still difficult and very time-consuming, however, because of the limitations of automatic transcription, alignment and segmentation software. Setton suggests encoding all CSI data such that it includes (1) digitized audio and or video tracks, (2) plain (orthographic) transcription, (3) fine-grained time-coding, (4) prosodic profile, and (5) syntactic profile (61). Regarding data analysis, it is essential that contemporary research norms be met and that researchers conduct both qualitative and quantitative analyses. CSI corpora encoded as suggested can then be used to “tease out the most elusive aspects of interpreting” (66), provided hypothesis-testing and data analysis are carried out within a sound theoretical framework. In his conclusion, the author stresses that CSI offers the benefit of gaining further insight into interpreting despite the challenges that encoding and analysing CSI data pose.

Chapter 3: “Translation Units and Corpora” by Dorothy Kenny (76-102)

After considering various approaches to translation units in turn (including comparative stylistics, process- and product-oriented translation studies and natural language processing), this chapter explores the potential insights to be gained by studying translation units in CTS corpora. To date, this phenomenon has attracted little attention within CTS. Kenny views translation units as “mutually defining source/target text segments” (87), which is the view also adopted in product-oriented studies. These units are identified by manual search in one text first, then looking at the ST and TT in turn in the process of analysing the data. Toury’s ‘no leftovers’ method is adopted and changed slightly to isolate problem/solution pairs. The key question is whether extended units of meaning (in the sense of phrases) coincide with translation units. Using GEPCOLT, the author shows that the translation of German “mit aller Kraft/Gewalt/Macht/Wucht” requires knowledge of the agent of the action in question because it is this piece of information that must be made explicit in the English version (e.g. “as hard as THEY could”). The second example they quote using data from the German-English Parallel Corpus of Literary Texts (GEPCOLT) is that of “mit weit aufgerissenen Augen”. Showing through English translations of the ST that “weit” is partly delexicalised, the author argues that parallel corpora can be used to identify “instances of delexicalisation in the source language” (97).

Chapter 4: “Hardwiring Corpus-Based Translation Studies: Corpus Encoding” by Frederico Zanettin (103-123)

After a brief introduction on corpora in general this chapter discusses three issues: (1) why DTS corpora need encoding, (2) what should be encoded, and (3) how the data should be encoded. As regards the question of how to encode data the adoption of a standard is desirable for obvious reasons. Zanettin suggests an XML (eXtensible Markup Language)/TEI (Text Encoding Initiative) encoding framework because it aids in the construction of “stable, flexible and accessible corpus resources for translation research and for corpus-based descriptive translation studies in particular” (110). The author briefly explains the characteristics of XML, a text-based annotation system, which is compatible with TEI documents. He provides the very basics of XML/TEI, stressing its advantages for compiling CTS corpora. Zanettin goes into more detail regarding encoding in the discussion of the CEXI (Centralised External Input) project (which aimed at compiling a parallel bilingual and bidirectional English-Italian corpus) and provides examples of how the XML/TEI texts may be used in corpus research.

Chapter 5: “Web-Based Corpus Software” by Saturnino Luz (124-149)

Luz provides an overview of recent developments in tools, technologies and standards that could contribute to “an infrastructure for creating and sharing distributed, dynamic and widely accessible corpora” (125). In this line, the chapter offers an introductory description of mark-up languages (XML, DTD, CSS), indexing techniques and client-server architecture. These technologies were used for text storage and retrieval in the Translational English Corpus (TEC) project, and the author offers a tutorial introduction to the TEC browser. The chapter includes a presentation of what web-based corpus software could look like, pinpointing problems and challenges involved in a range of scenarios.

Part II: Methods for the Qualitative Analysis of Contrastive Patterns in Large Corpora

Chapter 6: “Lexical Priming and Translation” by Michael Hoey (155-168)

Hoey discusses the notion of lexical priming, showing that the traditional distinction between lexis and grammar is rather problematic when it comes to collocations. He argues that the acquisition of a lexical item is the result of priming and that the lexical item “in turn becomes primed for collocation, grammatical category, semantic association, colligation and textual colligation” (157). The author shows that even cognates might be primed differently, thus potentially becoming “a new class of false friends” (158). A lexical comparison of an English ST and a Portuguese TT suggests “subtle shifts of colligation, semantic association and textual colligation” (164), a finding which needs to be considered in terms of possible impact on translations. Three options then arise for the translation of the TT: (1) keep the source language priming, thus rendering the TT more alien, (2) import the priming of the target language, making the TT sound more natural, and (3) a mixture of both.

Chapter 7: “Looming Large: A Cross-Linguistic Analysis of Semantic Prosodies in Comparable Reference Corpora” by Jeremy Munday (169-186)

After introducing the concept of semantic prosody, Munday’s paper discusses its implications for translations. Translators might not be aware of differences in semantic prosody and thus cause semantic prosody shifts in the translations. These in turn could affect how the reader responds to a given text. A study of LOOM LARGE and Spanish CERNERSE revealed that the two lexical items both have generally negative semantic prosodies but differ in their collocates and the syntactic structures in which they occur. Importantly, semantic prosody of certain lexical items may be genre-specific. Combining quantitative and qualitative analyses of semantic prosody and colligation patterns will prove beneficial for a better understanding of the translation process.

Chapter 8: “Using Translation and Parallel Text Corpora to Investigate the Influence of Global English on Textual Norms in Other Languages” by Juliane House (187-208)

Drawing on the notions of overt and covert translation, House investigates whether English is so influential these days that covert translations are discontinued in German target texts. She presents the Hamburg ‘Covert Translation’ project, identifying working hypotheses, outlining the features of the corpus and the methodology applied. Although the findings did not confirm that covert translations are discontinued due to English influence, translations into German seem to change with respect to the interpersonal function in popular science and economic texts. Based on these findings, House studied the domain of subjectivity using corpus data. She considered a range of interacting phenomena, including modality (in particular modal verbs and modal particles), speaker-hearer deixis, sentence adverbials and composite deictics. The results failed to support the hypothesis that the interpersonal function in German texts gained more weight through English influence in the translation process, but three explanatory models are offered to account for these findings.

Part III: Studies in Specific Sub-Fields

Chapter 9: “Off the Record and On the Fly: Examining the Impact of Corpora on Terminographic Practice in the Context of Translation” by Lynne Bowker (211-236)

This chapter evaluates the significance of corpus-based research and tools for terminological research and for terminographic practices as carried out by terminologists and translators. It outlines the discipline of terminology and considers how technology has been integrated into this field, especially with regard to the creation of term banks. Importantly, in their creation of term banks terminologists fail to “pass the benefits of their corpus-based research on to the translators in any appreciable form” (215) by limiting the amount of information that is available for a given lexical item. Bower contrasts how terminologists and translators engage in thematic research, differences which are motivated by diverging professional needs. The author presents the integrated tool suites translation memory systems and term extractors, pinpointing both their advantages and their shortcomings. All in all, translators have seemingly changed how they work with term banks. They do not trust them blindly anymore but consult both term banks and corpora to achieve optimum results, especially considering that the information contained in term banks tends to date more quickly these days. Accordingly, the author suggests that terminology courses for translators be adapted to match their professional needs and provides a detailed description of what this modified curriculum could look like.

Chapter 10: “Style of Translation: The Use of Foreign Words in Translations by Margaret Jull Costa and Peter Bush” by Gabriela Saldanha (237-258)

Using target texts by two different translators, Saldanha shows and discusses how the translators may leave their individual stylistic footprints in English translations from Spanish and Portuguese. The author defines the notion of translator style and presents the two corpora used for the present study. She investigates how Jull Costa and Bush treat both highlighted and non-highlighted source culture lexical items in their English translations. The individual strategies and preferences thus identified are matched against a third, comparative corpus to test whether they are indeed instances of translator style. Saldanha also explores the communicative function of source culture items in translation, distinguishing between cases of self-referentiality and culture-specific items. Again, Jull Costa and Bush display different strategies in their translation, which are discussed in an interview with the two translators. Factors accounting for the differences in translator style include how the two translators conceptualise both their readership and their role as intercultural mediators.

Chapter 11: “A Link between Simplification and Explicitation in English-Xhosa Parallel Texts: Do the Morphological Complexities of Xhosa Have an Influence?” by Koliswa Moropa (259-281)

This paper is one of the first attempts to apply corpus-based translation studies to an African language, in this case Xhosa. In the absence of an acceptable standard in translation, parallel corpora have come into use to aid standardisation of terminology. Since Xhosa is an agglutinating language with a concordial system, its translation into English poses a number of challenges. The strategies usually involved in translating Xhosa into English are simplification (of syntax, style and lexis) and explicitation (insertion of explicit demonstratives, use of lexical repetition, and adding explanatory information), which are explained in detail.

Chapter 12: “Disfluencies in Simultaneous Interpreting: A Corpus-Based Analysis” by Claudio Bendalozzi, Annalisa Sandrelli and Mariachiara Russo (282-306)

The final chapter analyses two types of disfluencies in simultaneous interpreting, viz. mispronounced words and truncated (unfinished) words. The data used for analysis of these features were the European Parliament Interpreting Corpus (EPIC), and both the source and the target language speakers were subject to investigation. The languages considered were English, Spanish and Italian, both as source and as target languages. The theoretical background provided comprises a general description of disfluencies and spoken language production, zooms in on language production and disfluencies in simultaneous interpreting, and introduces operational definitions of mispronounced and truncated words to be applied in the present study. The practical part presents the methodology used and discusses the results, taking into consideration not only performance-specific differences (original text vs. interpretation) but also language-specific ones (Germanic vs. Romance, Romance vs. Romance). For instance, it was found that mispronounced and truncated words were more frequent in the TTs than in their STs, and that interpreters produced fewer disfluencies when working from Spanish or Italian into English than when working between the two Romance languages.


In their introduction the editors announce that the book focuses mainly on the benefits of corpus-based studies “in the investigation of key questions of translation and interpreting” (1). Without exception, all the contributions have highlighted the advantages of working with corpora in translation studies, referencing relevant studies. Shortcomings and challenges were also discussed and potential solutions offered.

This book will be of interest to anyone concerned with translation studies, especially those still reluctant to consider corpus-based approaches. This volume convincingly and repeatedly shows how beneficial corpus-based studies are in this discipline. For those already in favour of applying this method, it offers stimulating input regarding the range of research possibilities that are opened up by corpus-based studies. All in all, this volume is also highly accessible to non-specialists and very interesting to read.

Unfortunately, the volume does not cohere fully. Its title “Corpus-Based Translation Studies. Research and Applications” suggests that it solely addresses CTS, presenting a number of relevant studies. In fact, it includes two chapters (2 and 12) on CIS, with one of them even being a prominent part of the book because of its length (chapter 2). Another two chapters (1 and 3) are theory-oriented. Furthermore, the collection of papers for the first part entitled “Core Concepts and Tools” has a somewhat random feel to it. Chapter 1 presents an overview of the present, past and potential future of CTS, chapter 2 covers CIS, chapter 3 discusses the concept of translation units, and chapters 4 and 5 then introduce the reader to the basics of programming languages and related tools. Incidentally, I found these two chapters a bit problematic in that they are too technical and complex for novices but offer too little information for those already at least slightly familiar with XML, DTD and CSS. However, helpful references for further reading are provided. Thus, chapters 4 and 5 allow the readers to form an idea of the benefits of the XML encoding framework (also in combination with TEI, DTD and CSS), but in order to produce XML texts themselves, they will have to do a lot more reading and spend a considerable amount of time to learn the programming language.

Part II, which consisted of three papers, was coherent in that all the papers presented methods for the qualitative analysis of contrastive patterns (in this case pragmatic and discourse features) in large corpora. Part III presented studies in specific subfields, such as terminology studies, stylistics, translation universals and simultaneous interpreting, so naturally but excusably it read like a mix of papers lined up. Again, the chapter on interpreting stuck out because I would not have expected it to be part of the volume.

All in all, although the titles of the parts fit in with the chapters they contain, it is the content of the individual papers that leaves the impression of the parts and the volume as such not being fully coherent. This impression was not attenuated even though the introduction did include a brief description of the individual chapters, attempting to make the connections between them. Nonetheless, all of the contributions are of a very high quality and offer stimulating input, not least because they point to new avenues of research, both empirically and theoretically, and or in terms of combining different disciplines or methods.
Ulrike Stange holds an M.A. in English Linguistics and is a research assistant at the Department of English and Linguistics at Mainz University in Germany. Her research interests include emotive interjections (PhD thesis to be published soon), translation studies and dialectal variation in British English.

Format: Paperback
ISBN-13: 9781623563189
Pages: 320
Prices: U.K. £ 29.99
Format: Hardback
ISBN-13: 9781441115812
Pages: 320
Prices: U.K. £ 85.00
Format: Electronic
ISBN-13: 9781441125217
Pages: 320
Prices: U.K. £ 29.99
Format: Electronic
ISBN-13: 9781441189196
Pages: 320
Prices: U.K. £ 27.99