LINGUIST List 36.1796 Reviews: Contrastive Corpus Linguistics: Taieb (2025)

LINGUIST List 36.1796

Tue Jun 10 2025

Reviews: Contrastive Corpus Linguistics: Taieb (2025)

Editor for this issue: Joel Jenkins <joellinguistlist.org>

Date: 09-Jun-2025
From: Almontassar Bellah Taieb <almontassar.taiebgmail.com>
Subject: Computational Linguistics, Discourse Analysis, General Linguistics: Taieb (2025)
E-mail this message to a friend

Book announced at https://linguistlist.org/issues/35-2700

Title: Contrastive Corpus Linguistics
Subtitle: Patterns in Lexicogrammar and Discourse
Publication Year: 2024

Publisher: Bloomsbury Publishing
http://www.bloomsbury.com/uk/
Book URL: https://www.bloomsbury.com/contrastive-corpus-linguistics-9781350385931/

Editor(s): Anna Cermakova, Hilde Hasselgård, Markéta Malá, Denisa Šebestová

Reviewer: Almontassar Bellah Taieb

SUMMARY

The technological boom and the proliferation of large multilingual corpora have greatly expanded the breadth and depth of current linguistic inquiry. These developments have, in turn, made it possible to investigate language use across typologically and culturally diverse languages. Contrastive Corpus Linguistics: Patterns in Lexicogrammar and Discourse capitalises on this methodological momentum by bringing together cutting-edge work that spans lexicogrammatical, pragmatic, and discourse-analytical dimensions of cross-linguistic variation. Its ability to situate findings in the broader theoretical discussions of contrastive research sets this volume apart. This work contains eleven chapters organized into two sections: Lexicogrammar in Contrast (I) and Discourse in Contrast (II). The organisation undergirds the assumption that current contrastive analyses need to address both the structural (lexicogrammatical) components of language and the higher-level discourse and pragmatic phenomena that govern language use in context. In combining a comprehensive scope with analytical precision, this exciting collection broadens its appeal to a wider readership and provides promising research avenues for future contrastive research. In this review, I examine the thematic components of each main part before offering an appraisal of the strengths and weaknesses of the volume, thereby offering readers a balanced perspective on its contributions to the field.

In their introduction, the editors—Hilde Hasselgård, Anna Cermakova, Markéta Malá, and Denisa Šebestová—celebrate thirty years of contrastive corpus linguistics and react to some parallel changes coinciding with the development of this line of research. Against this backdrop, they recognise the critical influence of Karin Aijmer and Bengt Altenberg in laying the groundwork for subsequent scholarship. It is therefore not surprising that the editors had Aijmer inaugurate this volume herself. In Chapter 1, Aijmer introduces what she calls the “new contrastive corpus linguistics”—an era characterised by leveraging hard corpus evidence to identify linguistic similarities and differences across languages. She highlights the increasing convergence between contrastive corpus linguistics and pragmatics in the study of various pragmatic phenomena across languages. While this trend may seem a necessary concomitant to recent developments in corpus methods, she points out the relevance of new-fangled types of parallel corpora in contrastive corpus pragmatics, thus expanding the field’s scope. Although still nascent, multimodal corpora are becoming increasingly important in contrastive research on pragmatics and genre analysis. Aijmer collates her discussion with useful literature to reflect on current trends while offering insights that the field can capitalise on for further developments.

Part I. Lexicogrammar in Contrast

Part I of the volume discusses lexicogrammatical phenomena—the building blocks of language that combine lexical items with grammatical structures to create meaning. The chapters here analyse cross-linguistic patterns in several text types and languages, with a particular emphasis on how they are rendered differently across linguistic systems. In Chapter 2, Signe Oksefjell Ebeling presents the results of a contrastive analysis of the cognates see (English) and se (Norwegian). The purpose of her study is twofold: (a) to analyse the specific lexicogrammatical behaviour of these perception verbs in several registers, such as fiction dialogue, fiction narrative, and football match reports; (b) to evaluate, in a further step, the extent to which the behaviour of such a cognate pair is language- and/or register-dependent. The data reveal that the proportional distribution of their semantic and syntactic categories fluctuates within and across corpus materials. This exploration not only reinforces the idea that form-meaning pairings are influenced by context but also demonstrates the methodological rigour of using corpora to tease apart subtle relationships.

In another chapter, Hilde Hasselgård undertakes a cross-linguistic comparison of the English-Norwegian periphrastic genitives—expressed via a postmodifying prepositional phrase, with English employing the of-genitive and Norwegian using the til-genitive— in fictional and non-fictional texts. She demonstrates that the periphrastic of-genitive is far more prevalent in English than in Norwegian. Furthermore, the study provides compelling evidence that the choice of periphrastic genitives can be conditioned by certain possessive relations (e.g., body, feature, and kinship) and register-specific tendencies. The author discusses how the animacy of the possessor varies across the languages under question: while both human and animate possessors typically favour til-genitives, periphrastic of-genitives are typically associated with inanimate possessors to express a wide repository of meaning relations. Hasselgård attributes the high occurrence of divergent translations of periphrastic genitives to differences in the animacy of the possessor and the nature of possessive-like relations.
Chapter 4 presents a third corpus-based contrastive study by Thomas Egan. Similar to Ebeling's approach, the author examines four pairs of ditransitive verbs (“send/sende’’, “bring/bringe’’, “lend/låne’’, and “sell/selge’’) in the English-Norwegian Parallel Corpora (ENPC). These verb pairs are found to encode acts of physical transfer while permitting double object constructions—ditransitive and prepositional dative. Egan reveals a close resemblance in the use of ditransitive and prepositional dative constructions across the ENPC. More striking differences, however, emerge when considering the degree of congruence in the direction of the translations. Of the four pairs of ditransitive verbs, the percentage of the “sell’’ verbs exhibits a near-total correspondence in both directions, whereas the remaining cognate verbs occur with significant variations. To provide further insights, Egan expounds on two key factors: (a) the shared syntactic environment in both languages, which, in this case, increases translation convergence; and (b) the constraints inherent in the semantic field of the receiving language. This research provides valuable insights into cross-linguistic patterns of ditransitive verb constructions while also highlighting key challenges inherent in their translation.
Chapter 5 shifts attention to the formulaic nature of newspaper reports. In their study, Denisa Šebestová and Markéta Malá investigate frequent prepositional patterns in English and Czech, particularly those embedded in recurrent word combinations (also known as n-grams) of varying lengths. The authors compile a lemmatised list of 3–5-grams featuring “in/v’’ patterns in both languages and collate their findings along two major axes: (a) mapping out the prepositional n-grams into phrase-based and clause-based; (b) examining their text-organising functions in tandem with their recurrent patterns. The authors illustrate that newspaper reports have a rich repository of n-grams whose function is not only to mark spatio-temporal meanings but also to convey communication patterns, express event-related relationships, and reflect varying degrees of idiomaticity. A sizeable portion of the chapter, however, explores the textual functions of multi-word prepositions, revealing their semantic preferences and evaluative prosodies per language. Overall, the chapter demonstrates that even seemingly closed-class words, like prepositions, may in fact be well suited for n-gram analysis while acknowledging the challenge of comparing the phraseology of typologically distinct languages using the n-gram method.

Although it may seem distinct in its treatment, Chapter 6 builds on a previous discussion in this volume by emphasising the role of corpus linguistics in advancing the scope of phraseology. The authors—Jiajin Xu, Guying Zhou, Xinlu Liu, Yuanyuan Wei, Ruchen Yu, and Suhua Zhang— undertake a large-scale comparison of five typologically distinct languages: Arabic, Chinese, English, Malay, and Swahili. Adopting a bottom-up corpus-driven approach, they identify p-frames (i.e., non-contiguous multi-word sequences with a variable) of the most frequently occurring 3- to 5-word sequences in news texts. These sequences are subjected to a systematic comparison in terms of variability, predictability, and discourse function across the five languages under question. The analysis reveals that Arabic and Swahili exhibit an inverse relationship, with statistically differing levels of variability and predictability in p-frames compared to the other languages. Moreover, the functional distribution of p-frames indicates that referential expressions dominate across all five corpora, followed by stance markers and discourse-structuring expressions. The authors, however, note that the use of stance markers appears to be a central dividing line, with an overwhelming preponderance in English. Overall, this chapter raises an interesting discussion on the application of the p-frame approach to characterise genre-specific phraseology.

The concluding chapter in Part I, co-authored by Camino Gutiérrez-Lanza and Rosa Rabadán, presents a cross-linguistic analysis of dubbing as a relatively occluded genre in contrastive research. The authors underscore the inherent challenges of audio-visual customisation, which go beyond creating discourse to also require linguistically and culturally appropriate adaptations of the target language. Notably, the tension between isochrony, lip-syncing, and prefabricated orality led to the rise of what the authors term “dubbing-lect features’’ in the English-Spanish audiovisual industry. Using data from a novel type of parallel corpus, the study investigates key features of English modals can/could and subject pronoun rendering in Spanish dubbing. The authors extract “can/could’’ translated as “poder’’ and aggregate them in accordance with their respective functions. Cases where “can/could’’ modals are rendered using alternatives other than the redundant “poder’’ elucidate the range of semiotic resources available in non-translated Spanish (e.g., “saber’’). On the other hand, the discussion addresses some problematic transfer of certain dubbing features, which may inadvertently create meanings and patterns different from those in non-translated Spanish. Towards the end, the authors endorse the use of some subject pronouns for their adjusting role—even though at times unwarranted—the result of which is evident in the distinct functions they serve.

Part II. Discourse in Contrast

Following the exploration of lexicogrammar, Part II turns to Discourse in Contrast. It covers a wide range of topics, from expressing politeness (English and Norwegian) and coherence relations (English and French) to speech reporting (English, Czech, and Finnish) and punctuation stylistics (English, Swedish, and German).

In Chapter 8, readers revisit the comparison between English and Norwegian, albeit this time from a social-functional perspective. Stine Hulleberg Johansen and Kristin Rygg analyse the English request marker “please” in comparison with its Norwegian counterparts in the ENPC, identifying three primary functions: as a ritual frame indicating expression, a politeness marker softening requests, or a request marker strengthening the directive force. The frequency analysis shows that their distribution varies depending on the interaction types (i.e., interpersonal versus communal) and the situation types (i.e., standard versus non-standard). The authors go further to reveal that Norwegian, indeed, possesses a rich repository of about twelve request markers (e.g., “er du/de snill’’, “vennligst’’, and “vær så god’’) that show different patterns of frequency across situation types. Notably, “please’’ can appear in various positions within a sentence and often corresponds to “vær så snill’’ or is simply omitted in Norwegian translations. Furthermore, specific to Norwegian, the translation equivalent “er du snil’’ is unique, typically appearing in a unit-final position but equally importantly carrying a stronger illocutionary force than “please” in the same position. This finding illustrates that even a single lexical item can take on multiple sociopragmatic functions that may not be directly transferable across linguistically and culturally distinct systems. Overall, the chapter provides a nuanced account of how “please” functions within English and how its Norwegian equivalents are not entirely isomorphic in their functions, enriching our current understanding of cross-linguistic politeness.

Chapter 9 takes a step further into the study of coherence marking, focusing on the analysis of a spoken genre across several languages. This contribution can be said to complement earlier analyses from Chapters 2 and 4 in that formal similarity does not necessarily equate to pragmatic or stylistic equivalence (see also Chapter 10). In extending the focus from core lexical and syntactic resemblances to the discourse-pragmatic level, this shift underscores that even cognates or functionally similar forms can diverge significantly across languages based on their distribution and rhetorical roles. With this in mind, Diana Lewis scrutinises the use of connectives in a comparable corpus of French and English journalistic interviews. Connectives, as useful anchoring tools for maintaining coherence in text, are examined based on whether certain relation types are marked more often than others. Central to Lewis’s hypothesis is that the perceived compatibility between ideas lies on a cline, which influences the frequency and distribution of coherence marking. The classification procedure shows three major functional types: causative, contrastive, and additive. Initial findings on the type-token distribution indicate that French is slightly more connective-heavy than English. Importantly, while the continuous relations category is generally assumed to require less explicit marking, it occurred significantly more frequently in the French dataset—indicating a more ‘aesthetic preference for formal variation’ in political interviews. The comparison of the English connective “then” and its French counterparts “alors” and “puis” elucidates a peculiar semantic shift: connectives expressing temporal meanings may undergo a differential process of grammaticalization that renders them resultative- and additive-like. The remainder of the chapter sheds light on their temporal- and resultative-shared meanings and dissects their divergent ‘weaker’ senses. The author concludes with an analysis of the functional distribution to accentuate how their discourse-organisational uses are shaped according to language-specific preferences and genre conventions. Prospective readers are encouraged to read the full text for an in-depth exploration of these finer points.

Chapter 10 follows suit with Ebeling’s analysis in this volume, yet also offers a unique perspective through its dual focus on characterising a subset of reporting verbs in prose and exploring the effects of translation when rendered into different languages. The authors of Chapter 10, Anna Cermakova and Lenka Fárová, begin by examining the lexicogrammatical patterning of the English reporting verb “said’’ in its past tense form, followed by an analysis of its translated equivalents in Czech and Finnish. The study distinguishes between two types of occurrences in which said appears either modified by a specific class of non-finite clauses or remains unmodified. In English, the sheer frequency of “said” assumes a reporting function and exhibits notable idiosyncrasies in patterning. Meanwhile, the results purport to demonstrate that, irrespective of the language, this reporting device often occurs in modification patterns encoding meaning beyond its neutral semantic sense. The second part of their discussion provides a detailed account of the patterns “with/without-PP’’ in Czech and Finnish to underline key contrasts in the translation options available in the target language. While Czech translators prefer to avoid near-synonyms (“řekl/řekla’’) for the English verb “said” and opt for lexical blends to foreground the meaning of the PP, Finnish translators have a greater inclination towards the pragmatically neutral verb “sanoi”. Furthermore, the authors investigate the ways in which the “with/without-PP” patterns are mapped in the target texts. Overall, the chapter offers a nuanced understanding of how language and authors’ stylistic preferences are likely to influence the selection and interpretation of reporting verbs.

The concluding chapter is co-authored by Jenny Ström Herold and Magnus Levin, who initiate an interesting discussion on the use of the dash as a meaning-bearing device in nonfiction across English, German, and Swedish. Aligning with the volume’s overall pursuit in mapping cross-linguistic variation, Ström Herold and Levin seek to determine the function, form and positioning of dash-introduced segments in both original and translated texts. The other aim is to verify whether translators deploy language-specific strategy conventions for rendering dashes into the target language, or whether the influence of the source language persists after translation. The preliminary analysis reveals that the original texts differ in their level of dash use, with German exhibiting the highest frequency of dashes, followed by Swedish and English. They attribute Germans’ predilection for dashes to the grammatical role they serve in marking subordinate clauses. Importantly, the classification of the dash-introduced segments underscores their multifunctionality and highlights some general trends. Regarding their forms and positioning, a more complex picture emerges with English favouring sentence-medial and sentence-final positions vis-à-vis German and Swedish tendencies for sentence-final positions. The authors isolate three types of strategies in dash-translated texts (retention, omission, and insertion) and expound on the general principle of balancing the preservation of the source text’s typical style with adaptation to target-language punctuation norms. Evidence from this work underlines the ongoing exploration into the ways punctuation markers and certain stylistic features vary across languages.

EVALUATION

In its entirety, the topics, languages, and approaches covered in this volume run the gamut. The individual contributors have utilised state-of-the-art corpus techniques to empirically substantiate their analyses. One of the most commendable strengths of Contrastive Corpus Linguistics: Patterns in Lexicogrammar and Discourse is the due emphasis on establishing a tertium comparationis— that is, a common basis for aligning observations across languages. Taking this into account, this approach enables researchers to systematically compare linguistic patterns or phenomena across languages, without which it would be impossible to ensure that the items or structures under question are truly comparable.

Another strength lies in the book’s comprehensive scope. As shown in the previous sections, the juxtaposition of micro-level grammatical analysis with macro-level analysis of meaning and use adds to establishing the intricate connection between structure and function. This dual focus is particularly important in a field increasingly characterised by embracing an integrated approach. The treatment of several typologically distinct systems deserves close attention. By including a wide array of languages from closely related Indo-European pairs (English and German) to more distant systems (Arabic and Swahili), this collection offers an insightful description of the complexity of cross-linguistic patterns and structural variation. The breadth of sampling not only expands the research foci of current contrastive studies but also prompts researchers to examine how language-specific factors interact and shape our methodological practices.

It cannot be emphasised enough, however, that current work in contrastive corpus linguistics should take a moment to reflect on the field’s rapid growth and the diversity of its approaches. In keeping with this perspective, the volume opens up a space to pay homage to the historical foundations of the field while acknowledging the contributions of key figures, like Karin Aijmer and Bengt Altenberg. These important episodes help steer readers towards the points where the editors deftly navigate emerging trends, including multimodal corpora and genre-based analyses. The picture this volume paints is unique, blending tradition and innovation as well as ensuring that it is both a retrospective account and a blueprint for future research.

Despite its focus and rigour, there are a number of limitations that need to be addressed. One potential weakness lies in the uneven depth of analysis across chapters. While some studies provide a highly detailed account of their subject matter, others—particularly those spanning multiple languages or large corpora—may at times sound more descriptive than analytical. This variability warrants attention as it may pose a challenge for readers seeking a uniform treatment of all the topics covered.
Another limitation relates to the accessibility of some of the methodological discussions. Given the advanced statistical techniques and specialised corpus tools employed in several chapters, readers who are not well-versed in corpus linguistics may find certain sections difficult to follow. At times, numerical data embedded without explicit tabulation were nearly impossible to verify, thereby placing a greater cognitive load on the part of the reader. Some figures were not reader-friendly due to their congested layout and tight line spacing, and because they were reproduced in grayscale, even line-type variations (dashed, dotted, solid) fell short in providing sufficient contrast. This presents an additional hurdle since visual presentations are expected to steer readers towards key trends. There are also several typographical errors and inconsistencies. It would be impractical to enumerate all of them, but a few notable examples are worth highlighting: in the opening chapter, the editors inadvertently mistake the chronology of the contrastive workshops, citing ICAME 43 instead of ICAME 33; in Chapter 8, a slight typographical error appears where “face-treat” is used instead of the intended term “face-threat”.
Perhaps the most significant weakness is that the volume does not sufficiently chart profitable directions for future research. Although the volume marks thirty years of sustained research, it does not convincingly articulate what the upcoming steps might be. Nor does it seem to engage with recent advances in computational techniques, such as machine learning or large-scale automatic annotation. This shortcoming could have been mitigated by including a final note offering a forward-looking perspective and calling for more concerted efforts to address current challenges in the field. On another note, while cross-linguistic comparisons appear to have useful applications in contrastive corpus research, they are not without limitations. The issue of translation bias is mentioned in several chapters, but could have benefited from a more in-depth discussion. A more critical examination of how translation practices might influence the empirical findings would have added an extra layer of nuance to the analyses presented in this volume.

In sum, Contrastive Corpus Linguistics: Patterns in Lexicogrammar and Discourse stands as a tribute to the power of corpus-based methods in unravelling the complexities of language use. The hybrid nature of scholarly discussions across the chapters makes the volume an essential resource for would-be scholars interested in the intersection of linguistics, translation studies, and discourse analysis. By addressing the minutiae of lexicogrammatical constructions and the broader organisation of discourse, the volume offers a balanced and multifaceted perspective, both empirically rigorous and theoretically insightful.

ABOUT THE REVIEWER

Almontassar Bellah Taieb is a PhD student at the Doctoral School of Linguistics, Pázmány Péter Catholic University. He is particularly interested in L2 vocabulary studies, language learning strategies, academic discourse and phraseology. In addition to his research focus, Almontassar is a university lecturer where he teaches courses in English language and academic skills.

Page Updated: 09-Jun-2025

LINGUIST List is supported by the following publishers: