Review of  English Corpus Linguistics

Reviewer: Kim Ebensgaard Jensen
Book Title: English Corpus Linguistics
Book Author: Gisle Andersen Kristin Bech
Publisher: Rodopi
Linguistic Field(s): Sociolinguistics
Text/Corpus Linguistics
Subject Language(s): English
Issue Number: 25.341

In “English Corpus Linguistics: Variation in Time, Space and Genre” (henceforth ECL), editors Gisle Andersen and Kristin Bech present a selection of papers, originally delivered in 2011 at ICAME 32 (ECL is itself the 77th volume in Rodopi’s Language and Computers series). In their introductory chapter, the editors address the apparent gap between corpus linguistics and variationist linguistics. The editors point to recent technological and methodological developments in corpus linguistics, arguing that corpus linguistics can indeed be used in the study of linguistic variation. The papers collected in ECL have in common that they, by applying corpus-based methods in the study of variation, bridge this gap between variationist linguistics and corpus linguistics, which according to the editors is among the major purposes of the volume.

ECL contains eleven papers (twelve, if we count the editors’ introductory chapter) divided into three main sections: ‘Variation in time’, ‘Variation in space’, and ‘Variation in genre’. The editors’ introductory chapter, in addition to describing the agenda of the entire volume, also offers a useful summarizing overview of the individual papers.

The first paper of part one is Christian Mair’s ‘Writing the corpus-based history of spoken English: The elusive past of a cleft construction’, in which Mair documents uses of a variant of the pseudo-cleft construction dating back as early as the 17th century. Because the pseudo-cleft is primarily a phenomenon in spoken English, Mair argues that data based on real-time spoken English are necessary. He also points out that there are critical limitations in this respect when it comes to diachronic corpora, as there are no real-time data available prior to the 20th century. In the absence of real-time spoken data, Mair draws on a number of corpora that provide speech-by-proxy in speech-based written genres for pre-20th century instances of the construction. Mair finds that finite clause complements in the construction in question are not merely reduced versions of ‘that’-clauses, but rather innovations in spoken syntax that contribute to the focusing function of the pseudo-cleft.

In the following paper, ‘Discourse communities and their writing styles: A case study of Robert Boyle’, Lilo Moessner offers a diachronic comparison of 17th and 18th century medical and scientific texts. Operating with a corpus of such texts as well as Boyle’s writings in both fields, Moessner’s study, which draws on multidimensional analysis (Biber 1988), reveals three important findings. Firstly, Moessner shows that, contrary to what is commonly held, medicine and science constituted separate discourse communities (Swales 1990). Secondly, Robert Boyle, who was an influential figure in both fields in the 17th century, is revealed to follow the discursive conventions of neither medical nor scientific discourse. While the medical and scientific texts are characterized by non-narrativity and situation-dependent reference, Boyle’s writings in both fields are characterized by narrativity and explicit reference. Moessner takes this as evidence that Boyle had taken on the authorial identity (Hyland 2009) of an expert member of both discourse communities, enabling him to pursue a more idiosyncratic writing style. Thirdly, Boyle’s writings are more similar to the 18th century texts than to the 17th century ones, suggesting that Boyle’s writing style influenced the medical and scientific discourse communities of the 18th century.

Gjertrud F. Stenbrenden addresses the relation between pronunciation and spelling in ‘The diphthongisation of ME ū: The spelling evidence’, suggesting that spellings in Middle English texts may help determine the starting point of the Great Vowel Shift (generally held to have taken place ca. 1400-1750). Investigating three corpora of Middle English which together cover the period 1150-1450, Stenbrenden finds evidence that the Great Vowel Shift may have started as early as the mid-to-late thirteenth century. She argues that irregular spellings such as and , if one assesses these in relation to the entire orthographic system (thus transcending their Anglo-Norman influence), may actually indicate a diphthongization of ‘ū’.

The next five papers constitute part two of the volume. The first paper of this section is Christopher Koch and Tobias Bernaisch’s ‘Verb complementation in South Asian English(es): The range and frequency of “new” ditransitives’. Koch and Bernaisch investigate a range of new ditransitives (NDTs) -- that is, verbs that display ditransitive behavior in South Asian variants of English but not in their historical input variant British English. Using the South Asian Varieties of English newspaper corpus they find that, apart from a small number of NDTs that span more South Asian English varieties, each variety seems to have its own set of NDTs. Having established that the three NDTs with the widest cross-varietal range are ‘extend’, ‘submit’, and ‘supply’, Koch & Bernaisch employ Google Advanced Search Tool (GAST) to investigate the use of these three NDTs in a broader range of contexts. The GAST search shows that all three NDTs are attested in South Asian Englishes in contexts that are not specific to the domain of newspapers. This is indicative of the specific South Asian nature of NDTs.

‘Functional variation in the English present perfect: A cross-varietal study’ by Xinyue Yao and Peter Collins addresses the use of the present perfect in British, Australian, and New Zealand components of ICE (International Corpora of English) and a collection of written American Frown (Freiburg-Brown) corpora in combination with the Santa Barbara Corpus of Spoken American English. More specifically, they measure the distribution of McCawley’s (1971) three readings of the perfective: the continuative, experiential, and resultative uses. In terms of cross-variety and cross-domain distribution, Yao and Collins find that, in the domains of news and fiction, the use of the present perfect in Australian English is actually more similar to its use in American English than in British English, which points in the direction of American news discourse perhaps being more influential on its Australian counterpart than on its New Zealand counterpart. The study addresses the fact that a number of instances are characterized by semantic indeterminacy. This is an issue that has generally been neglected, but Yao and Collins’ study allow them to observe some patterns of use here, too. The article also discusses atypical uses of the present perfect, such as the present perfect being used to actually express temporal progression, pointing out that it may have a vividness function in narrative discourse.

In ‘Gender, culture and language: Evidence from language corpora about the development of cultural differences between English-speaking countries’, Johan Elsness follows up on a statement by Leech & Fallon (1992: 44-45), that American culture tends to be “masculine to the point of machismo, militaristic, dynamic, and actuated by high ideals, driven by technology, activity and enterprise”, while British culture is “more given to temporizing and talking, to benefitting from wealth rather than creating it, and to family and emotional life”. This statement is based on a study by Leech & Fallon (1992) of American and British English in 1961 as attested in the LOB (Lancaster-Oslo-Bergen) and Brown corpora. With the availability of much larger corpora nowadays, Elsness sets out to revisit the cultural differences between Britain and the United States. Drawing on a diachronic dimension, the dimension of media, and a geographical dimension which also brings into the picture Australia and New Zealand for comparative purposes, Elsness finds an overall process of convergence in progress, as the cultural differences, as expressed linguistically, seem to increasingly become smaller. For instance, a frequency study of personal pronouns shows that, while American English still tends toward a preference for masculine pronouns, the ratio between masculine and feminine pronouns is smaller in recent times than in 1961. Moreover, Elsness finds, by investigating the TIME Magazine corpus, that the divergence in cultural terms that was symptomatic in 1961 is less prevalent in recent times, with a process of evening out going on (in which the frequency of specifically American cultural terms is dropping while the frequency of specifically British ones is increasing). One interesting aspect of Elsness’ study is that, by adding the diachronic perspective, he is actually able to explain language use with reference to socio-cultural events; for instance, the decrease of the masculine-feminine ratio starts in the 1970s, which coincides with the activities of the Women’s Liberation movement and, more broadly, reflects the cultural revolution that took place in the West in that era.

Kathrin Luckmann de Lopez’ investigation of the functions of vocative ‘man’ in ‘Clause-final “man” in Tyneside English’ is an enlightening contribution to the body of work on the Tyneside variety of English. In the absence of corpora of Tyneside English in which vocatives are likely to occur, Luckmann de Lopez draws on data from a TV series called “Aufwiedersehen, Pet,” a drama from the 1980s which follows a group of fictional Tyneside residents as they travel to Düsseldorf in Germany to find employment. Luckmann de Lopez also draws on a more recent data source, from the reality show “Geordie Shore.” The dialogue in reality shows is supposedly unscripted and may thus be taken to be naturally occurring language characterized by a certain degree of spontaneity. Drawing on these data sources, Luckmann de Lopez finds, among other things, that clause-final ‘man’ in Tyneside is not limited to a vocative function. In fact she finds that, in addition to its vocative function, it has a considerable range of other functions -- namely, focus marking, signaling end of turn, expressing solidarity, expressing politeness and mock-politeness, and hedging/softening. Furthermore, she finds that the vocative function is actually among the least frequent ones, suggesting that it is not even the primary use of clause-final ‘man’ in Tyneside English.

In the final paper of part two, ‘“They have published a new cultural policy that just come out”: Competing forms in spoken and written New Englishes’, Christina Suárez-Gómez and Elena Seoane investigate the expression of perfect meaning in spoken and written East and South-East Asian Englishes (AsE) as documented in ICE. Using the British component of ICE as a benchmark corpus, Suárez-Gómez and Seoane find that in AsE the present perfect has a lower percentage of occurrence in the context of adverbs like ‘never’, ‘ever’, ‘yet’, and ‘just’. They also find that in AsE the preterite is used with reference to recent past situations in conjunction with ‘just’, whereas in British English the preterite is used with ‘never’ and ‘ever’ to express experiential meaning. Suárez-Gómez and Seoane find that their corpus investigation verifies Schneider’s (2007) hypothesis that in AsE the formal expression of perfect meaning depends on adverb use.

The fist paper of part three is Daniel Lees Fryer’s ‘Exploring the dialogism of academic discourse: Heteroglossic Engagement in medical research articles’. Drawing on Appraisal Theory from Systemic Functional Linguistics (Martin & White 2005), Fryer investigates heteroglossic engagement (essentially the critical interaction of voices in a given field) in medical research articles. Fryer finds that, while different engagement sources are used in challenging other voices in medical research articles, the feature called ‘entertain’ (covering hedging and modality, among other strategies) is the most frequent one. Interestingly, Lees also finds that there seems to be variation, in terms of frequency, in the use of heteroglossic features across sections in medical research articles.

Matteo Fuoli tackles the language of corporate responsibility in his paper ‘Texturing a responsible corporate identity: A comparative analysis of Appraisal in BP’s and IKEA’s 2009 corporate social reports’. As in the previous paper, Fuoli operates within the framework of Martin & White's (2005) Appraisal Theory, as he investigates the appraisal systems of engagement and attitude in sustainability reports by BP and IKEA. Fuoli finds that IKEA’s use of engagement and attitude resources constructs an image of IKEA as a progressive and caring company, while BP comes across as a trustworthy authoritative expert.

Finally, in ‘How specific is English for Academic Purposes? A look at verbs in business, linguistics and medical research articles’, Natassia Schutz studies the verb inventories of business, linguistics and medical science in the Louvain Corpus of Research Articles. The purpose of Schutz’ study is to address the question of whether training in English for General Purposes should be based on general academic vocabulary or discipline-specific vocabulary. She deploys both keyness analysis and relative frequency analysis. She finds that, while medical research articles share few academic verbs with research articles in the two other fields, articles within linguistics and business share patterns of use of academic verbs. This, Schutz points out, suggests that some academic disciplines could be grouped together for teaching purposes.


Each individual paper is an important contribution to the present endeavor among corpus linguists and variationist linguists to bridge the gap that unnecessarily separates them. For instance, Elsness’ study accomplishes this in two ways: 1) while not perhaps variationist in the traditional Labovian sense, Elsness’ study shows that corpus linguistics allows for large scale studies of language variation, and 2) due to the availability of corpora that span several decades of the 20th and 21st centuries, we are now able to explain, in an empirically feasible way, language change against the backdrop of socio-cultural factors.

Moreover, several papers also contribute importantly to the relevant body of research they address. For instance, Mair suggests that WWI Phonographische Kommision recordings be given more attention by corpus linguists, a valuable hint for linguists who are looking for corpora of recorded spoken English. Given the findings in their study, Koch and Bernaisch’s paper is a particularly valuable contribution to the study of South Asian Englishes, as it not only answers questions by attesting many aspects of usage of the NDTs in South Asian Englishes, but also gives rise to a number of new ones, thus laying a path for further research in that particular area. Yao and Collins’ study of the present perfect is a good example of how the corpus linguist’s commitment to addressing all documented patterns of use -- even the atypical and indeterminate ones -- may result in new knowledge and give pointers to subsequent research. By investigating the atypical uses of the present perfect, Yao and Collins suggest that, in narrative discourse, the present perfect may actually have a vividness function. While Mair points out one potentially important source of data for research into spoken English, Luckmann de Lopez points to another perhaps more readily available one: reality TV, which features (supposedly) unscripted and naturally occurring language. The most important observation in her study, however, is her discovery that clause-final ‘man’ in Tyneside English is not merely a vocative form, but that it has a set of communicative functions with its purely vocative function being among the least frequent. Finally, Schutz’ study of academic verbs in research articles in linguistics, medical science, and business is methodologically important, because her use of both keyness analysis and relative frequency analysis shows that the two analytical methods have great complementary potential.

Each paper is well structured, and all contributors make their respective cases in compelling and convincing fashion, leaving only a very few loose ends. Each paper is accompanied by its own bibliography and its own set of endnotes. As a reader, I prefer footnotes to endnotes, because footnotes do not force me to leaf back and forth in the tome. Still, the endnote format adopted in this volume is much preferable to the format in which all footnotes are collected at the end of the volume. There is no index in this volume, however, which is a shame. Quickly locating those papers or paragraphs that cover topics that fall within one’s research interests is much easier with the help of an index.

The main agenda of ECL is, as the editors state in their introductory chapter, to address the apparent gulf between variationist linguistics and corpus linguistics. The volume successfully bridges this gap, and each of the collected papers is an important contribution to the growing body of research into language variation using corpus-based methods. Providing a snapshot of the present state of inquiry in corpus-based research into language variation and, on more than one occasion, offering an outlook into the potential future of this enterprise, this volume not only presents interesting research; it also shows that there is indeed a bright future for variationist corpus linguistics.


An associate professor of English linguistics at Aalborg University, Kim Ebensgaard Jensen is interested in the intersection of language, cognition, and discourse. He operates within the frameworks of cognitive linguistics, construction grammar, and corpus linguistics. His research interests include grammatical constructions, construal operations, and usage-based descriptions of linguistic phenomena.

