Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!


Revitalizing Endangered Languages

Edited by Justyna Olko & Julia Sallabank

Revitalizing Endangered Languages "This guidebook provides ideas and strategies, as well as some background, to help with the effective revitalization of endangered languages. It covers a broad scope of themes including effective planning, benefits, wellbeing, economic aspects, attitudes and ideologies."

New from Wiley!


We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at***

Review of  Language, Corpora and Cognition

Reviewer: Franka Kermer
Book Title: Language, Corpora and Cognition
Book Author: Piotr Pęzik Jacek Tadeusz Waliński
Publisher: Peter Lang AG
Linguistic Field(s): Discourse Analysis
Text/Corpus Linguistics
Cognitive Science
Issue Number: 28.5267

Discuss this Review
Help on Posting
REVIEWS EDITOR: Helen Aristar-Dry


“Language, Corpora and Cognition”, edited by Piotr Pęzik and Jacek Tadeusz Waliński, appears in Peter Lang’s prolific series “Łódź Studies in Language”, edited by Barbara Lewandowska-Tomaszczyk and Łukasz Bogucki. “Language, Corpora and Cognition” is the 51st publication in the series and contains a collection of fifteen papers written by different authors. Most of the contributions were presented during the 9th international conference on “Practical Applications of Language Corpora” (PALC 2014), held at the University of Łódź, Łódź, Poland, 20–22 November 2014, and three papers from doctoral students who were invited at a later stage (Chapters 13–15). The edited volume’s main focus is to explore how theoretical predications about the relationship between language structure and cognition correspond to findings from empirical linguistic data. The contributions in this book attempt to evaluate various aspects of linguistic structure ranging from syntax, semantics, and morphology to phraseology with the tools and methodologies related to corpus linguistics.

Chapter 1: Gradience in cognitive scanning: participle modifiers in Polish and English (Barbara Lewandowska-Tomaszczyk)

In the opening article, Lewandowska-Tomaszczyk attempts to explain the regular patterns of pre- and post-nominal modifying participial constructions in Polish and English on the basis of Langacker’s account of cognitive scanning processes. She proposes that the occurrences of these regular patterns are linked to the aspectual system of both languages, particularly to the differences in the nature of the cognitive scanning process, which in Polish is more of partial gradient nature, as compared to English where a more rigid distinction between sequential and summary scanning is maintained. Taking Langacker’s theoretical account on dynamicity and construal as the point of departure of analysis and interpretation, Lewandowska-Tomaszczyk discusses Polish and English speakers’ conceptualization of events described in terms of atemporal relationships. The linguistic data extracted from two reference corpora, the British National Corpus and the National Corpus of Polish, reveal that postnominal present participle modifiers involve saliently marked sequential scanning, whereas in the case of prenominal past participle modifiers the scanning process possesses a lower gradient character.

Chapter 2: Experimental applications of dependency-based phraseology extraction (Piotr Pęzik)

This chapter reports on a study that tested the benefit of using a dependency-based method of extracting phraseological units from large corpus data. It is well-attested that prefabricated, conventionalised language chunks play a central role in language reception and production. To explore the nature and types of linguistic prefabrication, new techniques are needed to extract phraseological units in large naturally-occurring data sets. In his article, Pęzik aims to ascertain the benefit of the “dependency-based phraseology extraction”; the method’s usefulness is approached by extracting and aggregating phraseological units, analysing data from large reference corpora as well as building Automatic Combinational Dictionaries from large corpus data. The main goal of his study is to show that in order to detect lexica-grammatical variability in prefabricated linguistic data, one first needs to detect recurrent phraseological units in large corpora. The novelty in this approach lies in the extraction of recurrent subtrees consisting of more than two lexical items of a sentence dependency tree.

Chapter 3: Computational distributional semantics and free associations: a comparison of two word-similarity models in a study of synonyms and lexical variants (Marcin Tatjewski, Mirosław Bańko, Adrianna Kucińska and Joanna Rączaszek-Leonardi)

Tatjewski, Bańko, Kucińska and Rączaszek-Leonardi’s research focuses on the comparison of two methods for measuring and evaluating word meaning similarity. One method, which is at the heart of distributional semantics, is known as Correlated Occurrence Analogue to Lexical Semantics, the other, commonly used in the field of psychology, free association data provided by informants. The evaluation of semantic proximity of word pairs, specifically of lexical loans and native synonyms, in two languages, Polish and Czech, was the main goal of this research. The results confirmed the authors’ hypothesis: both methods yielded the same results and were correlated on a statistically significant level. This outcome implies that both computational semantic analyses performed on large corpus data and experimental techniques are equally suited for exploring the organisation of lexical semantic representations at the cognitive level.

Chapter 4: Grammars or corpora? Who should we trust? Empirical analysis of morphological doubletism in Croatian (Dario Lečić)

In this chapter, Lečić builds on previous research that investigates the status quo and present-day usage of morphological doublets in Croatian. Slavonic languages, such as Croatian, abound in examples of morphological doubletism. Using data from three different sources, Lečić explores whether two morphological variants in word stems and word endings exhibit the same degree of conventionality, i.e. whether two competing forms have the same status in the speaker’s mental grammar or not. Doubletism of stems in Croatian appears in possessive pronouns and verbs, while doubletism of endings encompasses singular form of masculine nouns, genitive plural of feminine nouns and adjectives. Results of the comparison between corpus data and native speakers’ questionnaire material showed a positive correlation between the form’s frequency in the corpus and the acceptability rating by the native speakers. The results rendered by the analysis of grammar reference works show that their explanations cannot fully account for the richness of competing variants in word stems and endings.

Chapter 5: Figurative dimensions of health: a corpus-illustrated study (Adamina Korwin-Szymanowska and Jacek Tadeusz Waliński)

Korwin-Szymanowska and Waliński report on a study which aims at mapping conceptual metaphors of health taking into account that our conceptions of health tend to be discussed figuratively in terms of our embodied physical experience. Based on works on metaphorical representation in thought and language, this study explores the figurative dimensions of health through the lens of cognitive linguistics and corpus linguistics. Their research employs two different reference corpora for English, the British National Corpus and the Corpus of Contemporary American English, which were searched for all lexical items that would specify the state of health. They found that the dimension along the UP-DOWN and STRONG-WEAK scales appear to be prevailing conceptual domains in the conceptualisation of health, which suggests that peoples’ conceptual representations of health arise from embodied experience.

Chapter 6: “Justice with an attitude?” – towards a corpus-based description of evaluative phraseology in judicial discourse (Stanisław Goźdź-Roszkowski)

Goźdź-Roszkowski’s paper investigates the applicability of a corpus-based phraseology perspective to identify and examine evaluative meanings in judicial discourse. Specifically, this study brings together the descriptive framework of local grammar with the methodological workbench of corpus linguistics in order to explore the role of grammatical patterns in expressions of attitudinal meanings. According to Goźdź-Roszkowski, studying opinions and attitudes in expressions from the perspective of local grammar is particularly fruitful for patterning and identifying words which share evaluative meanings. The material employed for this study is drawn from the highly domain-specific genre of the United States Supreme Court opinions. The analysis revealed that judges have the tendency to employ certain linguistic cues to signal their evaluation of arguments put forward by other legal interactants. Furthermore, two grammatical patterns, v-link + ADJ + that pattern (example The court is correct that many mental diseases…)and v-link + ADJ + t-infinitive pattern (example It is quite wrong to invite state court judges to discount…), were found to be a useful diagnostic to identify their prototypical evaluative function.

Chapter 7: Using time to express remoteness in space: A corpus-based study of distance representations for motion medium in the National Corpus of Polish (Jacek Tadeusz Waliński)

In Chapter Seven of this volume, Waliński examines the conceptions of space-time relations in the semantic context of motion events from the perspective of data obtained from the National Corpus of Polish. The author’s assumption that the perception of space is inextricably connected to the perception of time is tested by verifying how frequently spatial distance is expressed in temporal terms. The domain of motion events, particularly the semantic attribute of motion, the motion medium, are well suited for exploring the interplay between temporal and spatial representations. Results indicate that motion-framed distance is expressed both in spatial terms and temporal terms by Polish speakers, with spatial representations being used more frequently. This outcome is yet another testimony to previous work on the mutual relationship between mental conceptions of space and time.

Chapter 8: Avenues for Research on Informal Spoken Czech Based on Available Corpora (Petra Klimešová, Zuzana Komrsková, Marie Kopřivová and David Lukeš)

In their study, Klimešová, Komrsková, Kopřivová and Lukeš attempt to show how spoken corpora can be utilised for addressing a broad range of research topics and revising prior findings based primarily on written discourse. To this end, Klimešová et al. explore linguistic cues typical of spontaneous spoken language on the one hand, and compare these distinct features with features used in other discourse types, namely formal spoken and written discourse, on the other. The data used to demonstrate these features are from corpora on casual spoken communication in Czech. The results confirm the authors’ hypothesis in that certain lexical fillers, phonetic variants and grammatical phenomena are inherent to casual spoken language. Furthermore, their results confirm the relevance of employing corpora of informal spoken language as a source of data as they facilitate the systematic study of a wide range of discursive, sociolinguistic and linguistic features.

Chapter 9: Introducing a corpus of non-native Czech with automatic annotation (Alexandr Rosen)

Rosen discusses the need for, and use of, automated annotation tools originally developed for native Czech for annotating a corpus of texts written by non-native learners of Czech. The growing number of learner corpora has led to a shift from annotating corpora manually to developing automated annotation methods and tools targeting non-native language. Common annotation tools for native language include taggers, lemmatizers, and spelling and grammar checkers. The author introduces a corpus consisting of a collection of transcribed essays written by students of Czech between 2009–2011. The analysis shows, for example, that the results computed by the spelling and grammar checker, Korektor, were sufficiently high to justify the use of this tool in the annotation of non-native corpus data. The author concluded that the use of automated annotation and tools, along with manual annotation of non-standard language, could be complementary in achieving the best results.

Chapter 10: Corpus-based Analysis of Czech Units Expressing Mental States and Their Polish Equivalents
Identification of Meaning and Establishing Polish Equivalents Referring to Different Theories (Elżbieta Kaczmarska)

Kaczmarska’s study focuses on polysemous mental state verbs in Czech and the extent to which different linguistic theories can predict equivalents of these verbs in Polish. The main objective of this study is to build an effective algorithm for the selection of equivalents by applying methods of various linguistic approaches. The pairs of equivalents are drawn from the parallel corpus InterCorp. The research showed that case grammar and cognitive grammar do not offer effective tools to predict pairs of equivalents in the proposed algorithm. However, the frameworks of pattern grammar and valence analysis provided powerful and promising methods for analysing word combinations and equivalents. The author’s overarching objective is to show that the proposed algorithm can be utilised in machine translation tools in the future.

Chapter 11: Problem solving in English and Polish: A cognitive corpus-based study of selected metaphorical conceptualizations (Marcin Trojszczak)

In Chapter 11 of this edited volume, Trojszczak examines selected aspects of metaphorical conceptualisations of problem solving shared between English and Polish speakers. Specifically, this study sets out to address how aspects of speakers’ linguistic expressions of problem solving give insight into the ways in which problem solving as a mental activity is metaphorically conceptualised in different languages. The material employed for this study is obtained from the British National Corpus and the National Corpus of Polish. The author approaches expressions related to the activity of problem solving from the perspective of cognitive corpus-based semantics, which combines the theoretical perspective of conceptual metaphor theory and the methodological tools related to corpus linguistics. The analysis suggests that speakers of English and Polish employ common underlying metaphorical expressions when describing the activity of problem solving. Among the shared conceptual metaphors of problem solving are ABSTRACT OBJECTS ARE PHYSICAL OBJECTS and MENTAL ACTIVITY IS A PHYSICAL ACTIVITY. As the author rightly observes, the results obtained in this study pave the way for researching parallelism in metaphorical representations across other languages.

Chapter 12: Corpus Linguistics for Critical Discourse Analysis. What can we do better? (Victoria Kamasa)

In her paper, Kamasa critically reviews thirty research papers published between 2002 and 2013 in which techniques related to corpus linguistics were used for some form of critical discourse analysis. The main goal of this review was to analyse the methods employed in the studies as well as to pinpoint some of the shortcomings associated with corpus-assisted critical discourse analysis which can help researchers to avoid methodological pitfalls. Corpus linguistics was expected to address vital key points of criticism of critical discourse analysis, such as the decontextualisation of analysed texts or the pivotal role of the researcher’s intuition. The review shows that some of these problems can be tackled by paying more attention to the research design and statistical analysis. For example, larger sets of texts and rigorous statistical analytical tools may contribute to addressing the decontextualization problem, while the usage of frequency and statistical scores can prevent the bias in corpus-supported critical discourse analysis.

Chapter 13: Towards quantitative and qualitative characterisation of various types of dialogue: interviews vs. Panel Discussions (Dorota Pierścińska)

Pierścińska’s doctoral research aims at exploring and characterising quantitative parameters of two types of dialogue, interviews and panel discussions, to specify the underlying perception of these two genres. Pierścińska’s research employs two different reference corpora, which were examined for frequent lexemes, keywords and 4-word clusters. The patterns selected for the analysis were expected to serve particular functions in the interviews and panel discussions, which in turn would serve as the basis of a more general characterisation of the two genres. The results are in line with the prediction put forward by the author demonstrating that interviews and panel discussions are unlike each other in that interviews are more verbal and spontaneous, while panel discussions are more grammaticalised, structured and well-organised.

Chapter 14: Standardisation in safety data sheets? A corpus-assisted study into the problems of translating safety documents (Aleksandra Beata Makowska)

In Makowska’s paper, the objective of the study is to analyse material safety data sheets to pinpoint possible shortcomings and challenges related to the translation process and terminology used in these materials. The author collected ninety-three safety sheets containing 720,000 words published in three languages, English, Polish and German. The overall purpose of this doctoral research is to provide a basis for a higher level of standardisation in the process of translating safety data instructions, as well as to put forward general descriptions related to the actual translation process that would aid the translator. The author interprets the results of the study as suggesting that experienced and qualified translators should be involved in the translation process to avoid translation problems with terminology, general language and meaning of the communicated message.

Chapter 15: Lexical bundles in English medical texts (Monika Betyna)

In the last chapter, Betyna’s doctoral research attempts to describe the discourse function and use of frequently-used word combinations, i.e., lexical bundles, in a topic-oriented corpus of medical texts. The main objective of the investigation is to create an inventory of the most frequent lexical bundles in medical texts and uncover their structural and functional properties. The research material embraces a corpus of one hundred online articles concerning a highly specific topic in medicine. The analysis revealed that words such as ‘ulcers’ and ‘diabetic’ occurred frequently in the texts. Furthermore, the lexical bundle ‘oxygen therapy’ made up 18% of the most frequent lexical bundles and phraseological units in the corpus. The author rightly states that the usage of such vocabulary indicates that these texts are written for a very specific group of experts who are familiar with the use of a highly specific register.


“Language, Corpora and Cognition” is a straightforward account of current practices and approaches to studying the link between linguistic phenomena and conceptual representation with the help of corpus linguistics methodology. The edited volume brings together a variety of papers from diverse corpus linguistic methodologies on various aspects of cognitive science and cognitive linguistics. As the introduction states, it is a valuable contribution and step forward in the understanding of how the use of empirical data can inform theoretical predictions about the relationship between language and cognition. The original studies and varied topics discussed rich quantitative data will capture the interest of university professionals and students alike.

The volume is well-organised and well-edited, and almost every chapter provides goals, theoretical context, methods, and results in a straightforward manner. The articles deal with data from corpora on spoken and written English, Polish, Czech, Croatian and German. The articles are well-written and concise, with the focus on the most central aspects of the respective topic.

The book has, however, some weaknesses. The first minor criticism concerns the introductory chapter written by the editors, which could have outlined and justified the limits of the books, referred to other contributions to the field and provided more context. Another minor drawback concerns some articles’ background, which presents effective if rather densely written overviews of the respective topic and theoretical frameworks (e.g., Chapter One). While these accounts show the expertise of the authors, it seems that for some readers, especially novice researchers, these passages may be slightly heavy. Lastly, it should be noted that nine out of fifteen contributions deal with data from corpora on Slavic languages. As it is the broad spectrum of linguistic phenomena discussed in this volume which makes this book a valuable contribution, contributions discussing corpus data from other languages would have been appreciated. These limitations, however, do not diminish the relevance, validity and value of the book.
Franka Kermer received her Ph.D. in English Language and Culture from the University of Eastern Finland in 2015 with a thesis entitled A Cognitive Grammar Approach to Tense and Aspect Teaching in the L2 Context. Her research interests are primarily concerned with cognitive linguistics, particularly cognitive grammar, and second language acquisition. Her current post-doctoral research focuses on cross-linguistic differences and influence from the perspective of cognitive grammar and cognitive sociolinguistics.

Format: Hardback
ISBN-13: 9783631663363
Pages: 296
Prices: U.S. $ 67.95
U.K. £ 46.00