Review of  Breaking Ground in Corpus-based Interpreting Studies

Reviewer: Mauro Costantino
Book Title: Breaking Ground in Corpus-based Interpreting Studies
Book Author: Francesco Straniero Sergio Caterina Falbo
Publisher: Peter Lang AG
Linguistic Field(s): Text/Corpus Linguistics
EDITORS: Straniero Sergio, Francesco and Falbo Caterina
TITLE: Breaking Ground in Corpus-based Interpreting Studies
SERIES TITLE: Linguistics Insights -Volume 147
YEAR: 2012

Mauro Costantino, Language and Linguistics Department, Universidad Mayor de San Andrés, La Paz, Bolivia


“Breaking Ground in Corpus-based Interpreting Studies,” edited by F. Straniero Sergio and C. Falbo is a collection of seven papers connected by the same goal: to start covering the present lack of research about Corpus Interpreting Studies. The content of each paper varies from 'introductory' to 'case study', thus allowing the book to present both new projects under development and application of already usable corpora.

Introduction: “Studying interpreting through corpora. An introduction.” (Francesco Straniero Sergio and Caterina Falbo)

This chapter presents a fairly extensive introduction to the work, touching on all the needed topics in order to give a complete reference even for the reader who is not acquainted with corpus-based studies or corpus building and analysis. Starting from the basics of corpus design (Tognini-Bonelli, 2001) and representativeness (Barbera et al., 2007b), it moves to more specific points such as the theoretical and methodological issues of translation and interpretation corpora. Eventually, the introduction addresses issues of spoken and speech corpora in relation with transcription and research matters; it sketches the individual approaches of the single corpora presented further on in the book, thus guiding the reader through a well organized presentation of the entire work.

Chapter 1: “The European Parliament Interpreting Corpus (EPIC): implementation and developments”. (Mariachiara Russo, Claudio Bendazzoli, Annalisa Sandrelli and Nicoletta Spinolo)

This work presents the implementation and development of the EPIC project in a clear chapter that details all the steps of the corpus planning and building process. The methodology sections exhaustively present all the steps involved: data collection, the digitizing process, transcriptions (detailing linguistic and paralinguistic level) and eventually the extra-linguistic aspects of meta data and corpus annotation. As for the analysis carried out in the second part of the paper, the work is well structured and presents the research in a clear manner, which could make the work good also as introductory material for students approaching the theme of corpus-based studies, whether dealing with interpreting corpora or not.

Chapter 2: “From international conferences to machine-readable corpora and back: an ethnographic approach to simultaneous interpreter-mediated communicative events.” (Bendazzoli Claudio)

The second chapter stands a little on the side, compared to the rest of the book, since it deals mainly with the taxonomic issue of classifying interpreted-mediated data. The chapter does not rely on corpora data analysis as the objective of the study like the rest of the papers in the collection. It is, instead, a sound study of the intricate problems, both theoretical and technical, that brought about the development of the header of the DIRSI-C (Directionality in Simultaneous Interpreting Corpus). After presenting the data collection issue, of the DIRSI corpus and multimedia archive, the paper focuses on the methodological issue of building a corpus of communicative interaction, thus comparing the methodology with the EPIC corpus of the previous chapter. Discussing the theoretical bases for implementing a set of meta-data that allows one to distinguish, and therefore query, the various speech events and the participants’ roles, it eventually proposes a full taxonomy for the DIRSI header. The chapter offers a better insight of the building and planning process that a sound corpus needs, and leaves the field open to further research and development.

Chapter 3: “Introducing FOOTIE (Football in Europe): simultaneous interpreting in football press conferences.” (Annalisa Sandrelli)

The third chapter introduces the FOOTIE corpus in a clear and well contextualized description of the building process, as well as the methodology and the resulting structure of the corpus itself. The paper starts by presenting aims and goals of the project giving a clear idea of the parameters that form the corpus structure; the second section gives a brief but exhaustive contextualization, useful for the reader who might not be acquainted with the football translation/interpretation panorama. Data collection and transcription issues are briefly sketched in the following section, referring the reader to the first chapter for more information, thus avoiding unnecessary repetitions. The last, and main section of the paper, focuses on press conferences as a communicative situation, discussing the need for special treatment in the building of an interpreting corpus; a fair number of examples supports the author in presenting and discussing the theme.

Chapter 4: “CorIT (Italian Television Interpreting Corpus): classification criteria.” (Caterina Falbo)

This fourth chapter presents the ongoing working on the Cor-IT (Italian Television Interpreting Corpus) ranging from the classification criteria to transcription and eventually interrogation features. The presentation of classification criteria is complete and clearly articulated (even though it refers to previous data for a thorough discussion of the criteria), giving the reader a complete panorama of the matter involved in the process of selecting such a focal point in corpus building.

Chapter 5: “Topical coherence in television interpreting: question/answer rendition.” (Eugenia Dal Fovo)

This chapter presents ongoing doctoral research based on a sub-corpus of CorIT (Italian Television Interpreting Corpus). The main idea was developed from a previous MA thesis that the present article uses as a launching pad in order to develop better criteria for studying the question/answer rendition in television interpreting. The question and methodology sections present the work in a complete and well developed manner, clearly stating the research questions and detailing the corpus structure and content data.

Chapter 6: “Using corpus evidence to discover style in interpreters' performances” (Francesco Straniero Sergio)

The sixth chapter presents an innovative study about 'style' in interpreters' performances. The work is well presented and discussed, opening the field to some until now less considered aspects of Interpreting Corpora such as 'style' and 'recognizability', or 'modus interpretandi' as the author calls it. This relatively short paper, supported by the ample use of examples, gives a clear idea of the potential of the tools used for the data retrieval (CorIT, Italian Television Interpreting Corpus). The chapter achieves its goal by setting a good starting point in the Corpus-based Interpreting Research of style and stylistic features.

Chapter 7: “Data collection in the courtroom: challenges and perspectives for the researcher.” (Marta Biagini)

The last chapter of the collection presents a new project of an Interpreting Corpus based on courtroom recording. Since the project itself is still in its preliminary phase of data collection, the author presents the theoretical and methodological issues that characterize the planning stage of such a complicated project. She details these issues with clarity and good contextualization of the Italian court system reality. The research questions and the procedures for the data collection are well presented and discussed. In the end, the paper presents the project in its preliminary phase and offers some interesting future development hypotheses in the conclusions, thus achieving its goal.


“Breaking Ground in Corpus-based Interpreting Studies” is a well-structured and most of all innovative work. As suggested in the title, the actual panorama of Corpus-based Interpreting Studies is fairly limited and the work attempts to cover this gap.

The work completely achieves the dual goal of discussing ongoing research and of presenting the future perspectives and developments. The theoretical and methodological discussions are sound and helpful also for the researcher who has recently begun the corpus-based study of interpreting interactions. It may lack a bit of in-depth analysis of all the potentiality of the searching and indexing methods, a flaw that is easily overcome by the good battery of examples and data presented.

The small downsides, some of them mainly editorial, some due to the specific presentation of data or results do not detract from the results whatsoever, it only requires more time for the reader to analyze them.

On the low side the corpus linguist reader should be warned if s/he is not acquainted with technical interpreting-related vocabulary, s/he might need a little researching in case s/he wishes to explore this particular aspect of the matter. This trivial shortcoming of the introduction is sometimes shared by the other sections of the book, due to the point of view of the work, but is anyway simply and quickly overcome halfway through the book, where the reader will already be acquainted with technical terms. In conclusion, data and objectives are clearly expressed and the chapters serve well the role of connecting the subsequent papers.

As far as the corpus linguist reader is concerned, a small inaccuracy might be found in the lacking of complete data (chapter 1); to include types and tokens count would have given a clearer picture of the corpus. Also, a few lines more could have been spent in the description of the numerous possibilities of the CWB (Corpus Work Bench) and CQP (Corpus Query Processor) corpus query system (Christ 1994), in order to better explain the possible outreach of the whole corpus. Nevertheless, it must be noted that all the information can be easily retrieved through the references.

Only one shortcoming is present in the analysis developed in Chapter 1: while the structure is sound and well supported by data, talking about 'trends' and 'statistical significance' one would expect values and the statistics, as well as the data and 'p' values, to be reported in the text.

Similarly, it could have added a lot to the content of Chapter 3’s analysis to have the details of the corpus (even though partial, or estimated), referring to the content duration in minutes, and word count. Considering the introductory nature of the research this can be seen as a minor downside, but one that still tends to limit the reader in the interpretation of the scope of the research.

As for the analysis presented in Chapter 4, the only shortcoming is the presentation of figures. Since they are simple screen-shot images, it is difficult for the reader to actually read the content and do not really add any substantial information to the text. On the other hand, this might be considered an editorial shortcoming, not really a content issue.

The second part of the paper presents some controversial concepts related to interpreting modes, interaction types and genres of spoken discourse which offer a detailed panorama on the inner discrete distinctions of the texts that form CorIT, thus giving a better understanding of the full potentiality of the corpus. A little bit more space could have been dedicated to the transcription and interrogation part, in order to promote more research ideas for future study and development.

The only inconvenience is found in the results section, where the author presents many similar graphs and tables that might result in confusing the reader rather than helping her/him. In order to compare between omission and substitutions, a more synthetic presentation showing just one table with percentages would have probably helped. The pie chart presenting the frequency (oddly noted in percentage, instead of numbers) does not add to the information presented and it seems a simple table could have done the job of comparing frequencies just as well. As for the figures, it appears that four separate figures, each one with its own table, presenting the percentage of satisfactory, medium and unsatisfactory degree of coherence do not help understanding; presenting each figure with a different scale requires even more time for the comparison. One figure with four bars (wh-question, Yes/No question, Leading question, Declarative question), each one divided into its three possible results (satisfactory, medium and unsatisfactory) would have possibly helped comparison and improved clarity. So even though all the needed information is actually present, the use of many tables and graphs results in rather a burden to the reader. As for the editorial part, the excel tables presenting the question and answer under examination could have been converted into more reader-friendly examples. In fact these form details do not affect the quality of the content, but neither it do they support it. Nonetheless, the chapter opens the field to some interesting study on interaction, conversation analysis and topical coherence through Interpreting Corpus, which is the aim of the entire work.

Finally, due to the very innovative idea of an Interpreting Corpus of courtroom recording, a more detailed explanation of the entire project presented in Chapter 7 could have greatly added to the paper. Will the corpus be indexed, will it be POS-tagged (Part-of-Speech), will it be made available on line? The paper could have put a little more information that would have stimulated possible support and fruitful discussion by the academic community.

In the end it is a well structured work that gives a clear view of the ongoing research in corpus-based interpreted studies and stimulates many ideas for further development and research.


Barbera, Manuel, Elisa Corino & Cristina Onesti (eds.). 2007a. Corpora e linguistica in rete. Torino: Guerra Edizioni.

Barbera, Manuel, Cristina Onesti & Elisa Corino. 2007b. “Cosa è un corpus? Per una definizione più rigorosa di corpus, token, markup”, in Barbera et al., 2007a. pp. 25-88.

Christ, Oli. 1994. “A modular and flexible architecture for an integrated corpus query system”, COMPLEX '94. Budapest.

Tognini-Bonelli, Elena (ed.). 2001. Corpus Linguistics at Work, Amsterdam/Philadelphia: John Benjamins.

Mauro Costantino is invited professor at the Universidad Mayor de San Andrés (UMSA) of La Paz, Bolivia. His main interests range from Second Language Acquisition, comparing the acquisition of the Italian verb system by speakers of different languages, to Translation Studies, to corpus linguistics (focusing on learners corpora). He teaches Italian, translations seminar and introduction to computational and corpus linguistics at UMSA, actively participates in the VALICO ( and VALERE ( projects from the University of Torino, Italy.

