Review: Computational Linguistics; Discourse Analysis; Text/Corpus Linguistics: Callies, Levin (2019)

Date: 02-Feb-2020
From: Shuyi Sun <>
Subject: Corpus Approaches to the Language of Sports
EDITOR: Marcus Callies
EDITOR: Magnus Levin
TITLE: Corpus Approaches to the Language of Sports
SUBTITLE: Texts, Media, Modalities
SERIES TITLE: Corpus and Discourse
PUBLISHER: Bloomsbury Publishing (formerly The Continuum International Publishing Group)
YEAR: 2019

REVIEWER: Shuyi Amelia Sun, The University of Queensland


To date, the world of sports has witnessed fundamental changes with regard to a diversification of sports types and events, an increasing commercialization and globalization of major spectator sports, and an ever-increasing public attention and intensive coverage in various media and modalities. Despite this increasing popularization, studies of the language and discourse of sports are not only relatively heterogeneous in nature, but also scattered across different academic disciplines. In the field of applied linguistics, there is not a single, specialized journal that deals with sports discourse to date, though several journals in the larger field of social studies serve as such research outlets. In addition, the emergence of new sports genres in the age of computer-mediated communication (CMC, Herring, 1996) has opened up an innovative way of studying sports discourse, i.e. by means of large online electronic resources, while at the same time, linguistic research has greatly benefitted from corpus-based and corpus-driven investigations of real-world language thanks to the compilation and accessibility of computer corpora and software tools.

Accordingly, this timely volume Corpus Approaches to the Language of Sports: Texts, Media, Modalities, edited by Marcus Callies and Magnus Levin, brings together innovative empirical studies that adopt a usage-based perspective and use corpus data and corpus linguistic methods to examine language occurring in a variety of genres and pragmatic contexts of different types of sports. The editors attempt to extend the scope of applied linguistic research on sports beyond football/soccer, which has been very much at the center of attention. Furthermore, they aim to advance the scope of corpus linguistic research more generally by throwing light on both the potential and the necessity of exploring sports language in association with its accompanying audio-visual modes of communication from a multimodal perspective. Given the above scopes, this timely volume is expected to be of great interest to a broad readership, including those researchers working on (sports) discourse analysis and corpus linguistics, or even in the larger fields of applied linguistics and social studies.

Structurally, the volume comprises an introductory chapter and ten empirical studies, where the introductory chapter (Chapter 1) lays down the theoretical and methodological contexts required to appreciate the subsequent studies, while the following ten empirical studies (Chapter 2-11) are divided into three parts. Part one ‘Texts. Contrastive and Comparative Aspects of the Phraseology of Football Match Reports’ (Chapter 2-4) explores the phraseology of football reporting across different text types and languages by adopting a comparative/contrastive linguistic approach. Part two ‘Media. Expanding the Scope of Research to New Contexts of Use’ (Chapter 5-8) extends the existing research to new media and relatively downplayed sports discourse outside of football. The final part ‘Modalities. Multimodal Studies’ (Chapter 9-11) addresses sports language from a multimodal perspective which has rarely been applied to the language of sports.

The volume begins with the introduction (Chapter 1 penned by the editors), which lays the foundation for the following parts by briefly reviewing sports-related linguistic research with a view to main research topics and recent trends, presenting new research initiatives and resources, and finally introducing the rest of the chapters in this volume. Over the last few decades, football/soccer has largely dominated the research agenda with an extensive focus on sports reporting, including such aspects as structural linguistics, linguistic borrowing, metaphors, and diachronic studies to trace the history of reporting genres. Outside of sports reporting, several studies have analyzed the sociolinguistic aspects and significance of football chants. Although previous research appears to have become more diverse and interdisciplinary, it has mostly been limited in monomodal corpus approach (drawing only on the textual level) and scope (football/soccer). The editors then present useful resources, especially the Innsbruck Football Research Group and Simon Meier’s electronic corpora of cross-linguistic football reporting (Meier, 2017), and introduce a recent-initiated research network ‘Applied Linguistics in Sport’. Finally, the editors introduce each of the following chapters and indicate ways they are arranged in three broader parts of the present volume.

To come to the first part, Chapter 2 (Simon Meier) studies authors’ strategies to produce online football coverage while meeting the challenge of reconciling linguistic routines and emotional involvement under high time pressure. Meier adopts data-driven corpus-linguistic methods to investigate two types of formulaicity, namely recurrent schematic constructions and idioms. A large corpus of over 12 million tokens were built based on German and English data from two online football-related genres: live text commentaries (LTC) and match reports (MR). The results suggest that the production of online football coverage oscillates between preconstructed patterns and word-for-word combinations. To be more detailed, syntactic patterns and idioms serve as text routines to present texts in a cross-cultural register-specific way that is tied to the communicative and social needs in the domain of sports coverage, while there are still enough open choices to modify these ‘templates’ so as to demonstrate creativity and make narratives more appealing. As is argued by the author, such findings give evidence to what Sinclair (1991) has pointed to as the alternation between ‘idiom principle’ (word-for-word combinations) and ‘open choice principle’ (preconstructed multi-word combinations) in the production of texts (Erman & Warren, 2009).

Chapter 3 (Signe Oksefjell Ebeling) explores the English-Norwegian Match Report Corpus (ENMaRC) with the aim of filling the gap that little contrastive research between English and Norwegian has been done on more specific and homogeneous non-fictional text types, such as football match reports in the present study. Relying on ENMaRC, with the Premier League (PL) match reports reaching more than 500,000 tokens and the ‘Eliteserie’ (ES) match reports reaching roughly 155,000 tokens, the author applies corpus-driven extraction methods in the forms of word lists, n-gram lists and keyword lists. Results of this study show that, on the one hand, post-match reports in the two languages under study are similar to other text types in the use of time and space expressions, on the other hand, there are cross-linguistic differences when reporting on victories and defeats. Given the pioneering observations based on ENMaRC, this study is impressive in its exploratory nature, rich food for thought, and several avenues for future research. At the same time though, as the ENMaRC is still under construction, the corpus itself could have prevented the author from offering more in-depth studies of specific linguistic tendencies.

Chapter 4 (Rita Juknevičienė and Paulius Viluckas) presents a comparison between human-mediated and computer-mediated football reports in a bid to fill in the niche that (dis)similarities between human- and computer-mediated football language remain unknown. In this study, the two modes (computer- vs human- mediated reports) are represented respectively by ‘Football Manager 2017’ (FM) reports and BBC online football reports, and correspondingly two corpora were compiled, each containing 200 texts and spanning around 20,000 tokens. The researchers then use a corpus-driven approach to analyze keywords that distinguish one mode from the other and the relationships regarding lexical bundles. The results reveal a number of prominent differences in terms of both individual lexemes (particularly among function words) and four-word lexical bundles where only 11 out of the 200 most frequent lexical bundles are shared by both corpora. Besides, this study shows a limited use of conjunctions, cohesive and linking devices in computer-generated football reports, which may explain a major cross-mode difference related to text cohesion. However, findings of this study are not without shortcomings since the analyzed corpora are still relatively small with only 20,000 tokens in each sub-corpus, which may affect the generalizability of the present findings.

Part two, titled ‘Media. Expanding the Scope of Research to New Contexts of Use’ begins with Chapter 5 (Turo Hiltunen) which explores framing in news media accounts of cycling crashes. This is out of the fact that the language of cycling crash reports remains largely unexplored though actually cycling reports reflect and can shape public and policy makers’ understanding, attitude, and behavior towards the sport (Rissel et al., 2010). Thus, this chapter aims to identity the structure and functions of such reports, describe the ways different social actors are represented, and investigate what is identified as the cause of the crash and whether the cause is expressed neutrally. To achieve these goals, Hiltunen applies a corpus-based approach to study framing in these reports on the basis of a 79,000-word corpus of 230 English reports collected from the Internet, identifies and discusses the main textual functions and lexico-grammatical patterns from a discourse-analytical perspective. His findings suggest that similar textual strategies are employed for framing crash and representing social actors, although the shortest and most other texts still vary in the specific aspects they stress. Besides, findings highlight clear differences in the representation of riders and drivers of motor vehicles involved in the crash; this could be interpreted as evidence for the existence of such media bias against cycling.

Chapter 6 (Jukka Tyrkkö and Hanna Limatius) studies race radio interactions between drivers and teams (i.e. race engineers) in Formula One, attempting to fill in the gap that no previous linguistic research has touched upon race radio interactions or any other similar context of language use. Based on a newly compiled corpus, which consists of 5,432 individual messages (63,183 tokens) from the 2016 and 2017 seasons of Formula One, the authors apply corpus-based quantitative and qualitative methods to investigate the dialogic turns for structure and complexity, and present a breakdown analysis of the stylometric markers of both cdriver and team broadcasts. Findings basically support their assumption in that the effects of stress are indeed observed by examining most of the linguistic markers selected though there are significant differences among individual drivers and race engineers during a race. While the sampling method for constructing corpus could be ‘opportunistic’ (p. 117), the present research takes a preliminary step into the language of Formula One and provides heuristic implications for further studies.

Chapter 7 (Isabel Balteiro) investigates the use of English swearword f-expressions (‘WTF’, ‘fucking’, and ‘fuck’) used by Spanish football followers in spontaneous synchronic comments in online chats. The corpus consists of over 390,500 authentic online messages and/or comments produced by Spanish football followers between 2007 and 2018, manually compiled from the comments sections and messages in chatrooms in the online version of the Spanish sports newspaper Marca. However, the actual hits of the expletives are unexpectedly low as there are only a total of 144 examples (28 examples of fuck, 16 examples of fucking and 100 examples of wtf) by 139 different users. Findings of this study show that the f-expressions used as code-switches by Spanish football followers in chats have by and large lost their taboo. Instead, they are primarily used to contribute to organizing the sequentiality of discourse (Li & Milroy, 1995) as well as to communicate speakers’ attitude and mostly negative emotions. In addition, the distribution patterns of f-expressions indicate that Spanish users replicate or imitate native uses of those words, probably motivated by context and/or chat (in-group) norms. As is claimed by the author, the main interests of this study lie in the functions of these expletives as pragmatic markers and their interactional significance, position, and distribution in football-related discourse. Hence, this study is expected to provide interesting insights into both the language of football followers and/or online communities and the cross-linguistic pragmatic role of expletives.

The last chapter in this part (Chapter 8 written by Miguel Ángel Campos-Pardillos) addresses sports-related legal discourse by illustrating the presence of metaphor in the description of sports fraud (e.g. bribing) and the fight against it. The author manually extracted 203 metaphors (a total of 72,809 word counts) associated with sports fraud and fight measures from nine academic studies, including four journal papers and five book chapters, all of which deal with this topic from a legal or law enforcement perspective. Following a qualitative approach, the author analyzes how scholars in the field of law use metaphorical language to justify the fight against sports fraud, and how the metaphorical discourse is influenced by its identification with other criminal activities. His analysis shows that, on the one hand, some metaphors have a clear and objective ontological basis which usually pertains to the world of concrete objects, on the other hand, process and event scenarios are employed to justify measures and actions that seem strict or controversial at first but are eventually accepted as something inevitable within the ‘war against fraud’ (p. 176). This chapter concludes with a reminder of the fundamental role of metaphors in creating a discourse, and thus it is vital to be aware of the metaphors used in both sports law contexts and in general discourse on sports.

In the final part, ‘Modalities. Multimodal Studies’, Chapter 9 (Valentin Werner) explores the multimodal nature of football live text (FLT) and the role of audience participation in the electronic medium, aiming at expanding the description of sports/football discourse. Based on a corpus of 68 FLT reports (around 160,000 tokens in total), Werner takes a multimodal approach towards FLT as an artefact connecting offline and online practices. Accounting for the combined linear and non-linear nature of live text commentaries, his findings indicate that the genre increasingly taps into elements from various external sources (e.g. information from a commercial statistics provider and images) that also enable audience participation, thus merging as a hybrid and complex multimodal ensemble characterized by media convergence. While calls for further studies of (F)LT as an under-studied form of journalism have repeatedly been voiced from the angles both linguistics (Hauser, 2008) and media research (McEnnis, 2016), relying on a multimodal framework for the analysis of this type of communication is undoubtedly an appropriate choice due to the very multimodal nature of the artefact, which thus simultaneously facilitates the exploration of issues such as media convergence.

Chapter 10 (Peter Crosthwaite and Joyce Cheung) studies multimodal discourse practices in 4chan (, an anonymous online community where users post images and texts on a wide range of topics. The analyzed corpus consists of eleven full threads (including 35850 posts and 1169 images posts) relating to the Ultimate Fighting Championship’s (UFC) 2017-2018 New Year’s Eve flagship event UFC 219 – Cyborg vs. Holm. The authors then apply a sentiment analysis to the multimodal corpus’s text and images and focus on positive and negative appraisals of action from the sports event as it occurs in real time, as well as reaction images of the poster’s personal response to the event or to other user’s reactions. Their primary goal is to quantify and characterize the intermodality regarding 4chan posters’ juxtaposition of text, images and videos while communicating their reactions to the event itself and to other posters as the event unfolds, which is expected to reveal how the meanings made in one mode are interwoven with the other to co-present and co-operate during the event. Besides, the authors also examine how hyperlinks serve to direct the sentiment of text and images to other specific posters on an anonymous message board. According to the results, the general sentiment conveyed in 4chan’s discussion facilitated by computer-mediated communication is almost entirely negative, with a strong sense of both fear and disgust expressed multimodally via texts and images. This negativity is directed at the fighters, at other posters, and even to the self-identity of the posters involved, and is accompanied by fantasies of white male dominance over other races and genders. As such, results of this study shed light on the discourse practices within a typically shady corner of the internet population (as occupied by 4chan users), as well as contribute to a greater understanding of online sports discourse as mediated by thousands of users in (semi-) real-time.

Finally, Chapter 11 contributed by this volume’s editors (Marcus Callies and Magnus Levin) presents a comparative study of dislocation in live (i.e. play-by-play) TV football commentary. The point of departure for this study is the assumption that live TV sports commentary is a specialized register which is characterized by largely unplanned discourse and shaped by the time-critical nature of the action that unfolds on and off the field, but also as to what is visible on the TV screen. To test the assumption, a trilingual comparative corpus-based study is conducted based on a corpus of 14,726 words comprising English, German and Swedish transcripts of live TV commentaries of the 2014 men’s football FIFA World Cup final between Germany and Argentina. Findings support the authors’ preliminary argument in that right dislocation could be considered as a register-specific, functionally motivated discourse feature of live TV sports commentary. In addition, considering there are no major differences in the use of dislocation in sports commentary regarding frequency, nor regarding the distributions of the different discourse functions, the authors suggest that future research has to determine to what extent these cross-linguistic similarities hold true in general. It should be noted that the present study highlights both the potential and the necessity of examining language use in association with accompanying modes of communication and visualization from a multimodal perspective.


On the whole, the present volume makes a strong contribution to corpus linguistics and the application of corpus linguistic methods to the language of sports and such contexts. As the only volume dealing with sports in the Corpus and Discourse series, it offers innovative empirical studies that use new corpus resources to showcase the structural-linguistic and discourse aspects of a wide range of sports (e.g. football, cycling, motor racing), genres (e.g. live commentary, post-match reports, legal texts) and contexts of use (e.g. sports media, in-team communication). Considering the pioneering investigations involved in each chapter, the volume is especially impressive in its exploratory nature and rich implications for future research. In addition, detailed corpus-linguistic research methods in each chapter make it easier for both experienced corpus linguists and newcomers to immediately apply these approaches to their research in corpus-based/driven (sports) discourse analysis. Newcomers especially will benefit from the thorough literature review of sports discourse research, which is expected to serve as the theoretical foundations for their work.

Despite these strong points, the volume is not devoid of certain limitations, the main one being the corpora analyzed. As mentioned, the scale of corpus data in some chapters (e.g. Chapter 4) is relatively limited. Similarly, as the ENMaRC in Chapter 3 is still under construction, the corpus itself could have prevented researchers from offering more in-depth studies of specific linguistic tendencies. Such chapters that do provide some evidence are still limited in the number of subjects, and thus, the generalizability of their results. Nevertheless, this shortcoming does not in essence detract from the strength of the present volume. Actually, it could be said that each chapter, while filling one gap in the literature of sports discourse analysis, simultaneously opens another avenue of academic research in the application of corpus-based/driven methods to a wider range of real-word language contexts. As such, the present volume serves as an indispensable step for future research which, if based on larger corpora of sports language, will corroborate the present observations and offer new insights into the specificity of computer-mediated sports language.


Erman, Britt., & Warren, Beatrice. 2009. The idiom principle and the open choice principle. Text 20(1). 29-62.

Hauser, Stefan. 2008. Live-Ticker: ein neues Medienangebot zwischen medienspezifischen Innovationen und stilistischem Trägheitsprinzip. kommunikation gesellschaft 9(1). 1–10.

Herring, Susan. (ed.). 1996. Computer-mediated communication: Linguistic, social and cross-cultural perspectives. Amsterdam: John Benjamins.

Li, Wei. & Milroy, Lesley. 1995. Conversational codeswitching in a Chinese community in Britain: A sequential analysis. Journal of Pragmatics 23. 281-299.

McEnnis, Simon. 2016. Following the action: How live bloggers are reimagining the professional ideology of sports journalism. Journalism Practice 10(8). 967-982.

Meier, Simon. 2017. Korpora zur Fußballlinguistik-eine mehrsprachige Forschungsressource zur Sprache der Fußballberichterstattung. Zeitschrift für germanistische Linguistik 45(2). 345-349.

Sinclair, John. 1991. Corpus, concordance, collocation. Oxford: Oxford University Press.


Shuyi Amelia Sun is a postgraduate student in the Applied Linguistics-TESOL program at the University of Queensland. Her research interests are in the areas of (learner) corpus linguistics, English for Academic Purposes (EAP), quantitative text/data analysis using R. Her previous research experience includes managing a project of the Student Research Training Program for Colleges and Universities in China, publications in academic journal and conferences, working as a reviewer for a linguistic journal. She aspires to pursue a Ph.D. in the field of corpus linguistics in the future.

