LINGUIST List 28.3517

Thu Aug 24 2017

Review: Computational Linguistics; General Linguistics; Text/Corpus Linguistics: Prieto (2017)

Editor for this issue: Clare Harshey <>

Date: 26-May-2017
From: Adriana Picoral <>
Subject: Text linguistics for the contrastive study of online customer comments
E-mail this message to a friend

Discuss this message

Book announced at

AUTHOR: Raul Sanchez Prieto
TITLE: Text linguistics for the contrastive study of online customer comments
SUBTITLE: Text-linguistic patterns in German, Dutch, Spanish and French hotel comments and reviews
SERIES TITLE: Studien zur kontrastiven deutsch-iberoromanischen Sprachwissenschaft
PUBLISHER: Narr Francke Attempto Verlag GmbH + Co. KG
YEAR: 2017

REVIEWER: Adriana Picoral, University of Arizona

REVIEWS EDITOR: Helen Aristar-Dry


The explicitly stated target audience of this volume includes both linguists and hotel businesses. The first group is to be catered by a practical demonstration of text linguistics tools through a contrastive analysis of the linguistic patterns of hotel customer comments and reviews found online. The second group is to take advantage of the analysis findings.

The book is comprised of 4 chapters. The first chapter is very brief, with its main purpose being a quick introduction to the field of text-linguistics and a description of what the study at hand encompasses. The second chapter defines the textual genre being analyzed, i.e., online comments written by hotel customers divided into three subcategories according to where they were retrieved from: 1) online travel and hotel booking websites (i.e.,, Expedia and TripAdvisor); 2) a social networking website (i.e., Facebook hotel pages); and 3) video-sharing (i.e., YouTube) and Wiki discussion pages. For each one of these subcategories, the author describes the text actions, situationality, external structure, and wording patterns.

Text actions are actions realized through the text, “according to or related to the conventions of a speech community's members” (Sandig, 1990, p. 91). Examples of text actions the author gleaned from the corpus are: describing the room and the hotel premises; assessing the cleanliness; and commenting on the performance of hotel staff. These actions can assume two types of text functions: informative (i.e., the main purpose is to describe something or provide information) and appellative (i.e., the main purpose is to influence the reader in a given way through the expression of personal preferences).

Somewhat similar to Corpus Linguistics’ situational characteristics, where situation of use is described (Biber & Conrad, 2009), Prieto details the situationality of each text type by specifying three aspects of the situation: 1) the channel and the communicative form (i.e., whether these are private, half-private or universally accessible texts); 2) the superficial text structure (e.g., font design, size and color; background color.); and 3) the visual text structure (e.g., presence of an avatar; use of country flags; visual representation of the score awarded to the hotel by the reviewer).

External structure, which according to Prieto differs from text structure, is also addressed for each of the three text types. Titles, information about the reviewer, information about the hotel stay, and score are examples of features that define the external structure of the text.

Finally, wording patterns are grammar, syntactic, morphological and lexical features. Here, the author offers a broad overview of structures present in each text type, such as pronominal forms used (e.g., the first person and anaphoric references are more common in booking websites), lexical items (e.g., the words “hotel” and “room” are most often repeated), deixis (e.g., absolute spatial deictic structures are common), and morphological structures, among others. For each text type, a general wording pattern comparison among six languages (i.e., Dutch, French, German, Italian, Portuguese and Spanish) is also provided.

Chapter 2 closes with a detailed description of the two corpora described and analyzed in this volume. A smaller corpus of the six languages already mentioned, containing 1,800 comments total, is used in a more qualitative approach, to illustrate examples and concepts throughout the book. A second larger corpus in 4 of the 6 languages (i.e., Dutch, French, German and Spanish), containing a total of 2,000 online comments, is used for the qualitative analysis presented in the third chapter.

By far the longest chapter in the volume, Chapter 3 presents the mainly quantitative analysis of the online comments from a contrastive point of view. The analysis is divided into two main parts: 1) analysis of the communicative macrostructure, focusing on the text actions and text functions discussed in chapter two, offering some expansion on text functions; and 2) analysis of the text-grammatical structures, which were also introduced in Chapter 2, but are expanded and discussed in detail in Chapter 3. As was the case in the previous chapter, each text action, text function, and text-grammatical structure is appropriately illustrated with a comment retrieved from the corpus. Counts and percentages of each feature are presented in summary tables, and comparisons are drawn in prose based on similarities and differences in percentages among the 4 languages in the corpus.

The last chapter in the book, Chapter 4, presents a short conclusion that summarizes the findings presented and discussed in Chapter 3. Although Pietro reiterates that relevant differences among the four languages could not be found for text actions and functions, the author claims the German and Dutch corpora contain more instances of “describing breakfast choices” and “commenting of quietness and privacy”, while the French and Dutch corpora present more cases of “recommending or discouraging a stay at a given hotel”, and the Spanish corpus consists of a larger number of occurrences of “indicating parking availability or commenting on parking-related problems.” Regarding text-grammatical structures, the conclusion is that the Spanish corpus presents higher occurrence of referential impersonal pronouns and lower incidence of first person pronouns. The author also claims that, when writing comments on Facebook, French users opt to still use the formal manner of address, while users in the other three languages choose other deictic devices for person. Finally, conjunctions are said to be more common as a connective element in the Spanish corpus, and subordinating conjunctions are more frequently used by both German and Spanish customers. Limitations are not discussed.


This manuscript is, in general, well organized and clearly written. Regarding target audience, linguists familiar with the field would not have problems following the examples and the analysis, if they also possess some reading skills in all 6 languages that comprise the corpora analyzed. It would be a more difficult read for hotel owners and managers. Many linguistic terms such as anaphoric and cataphoric are not immediately defined. Working definitions for some less well-established terms such as verbal style are also not provided. In addition, the author uses terms that are not frequently deployed in the field. For example, when describing the situationality of the text types, the term perigraphemic is used with no definition provided. A citation is given (i.e., Schütte, 2004, p. 94), but upon a quick search, this is the only other reference (besides the present volume) where this term is used. Providing explicit clear definitions for terms used would not only help readers, but would encourage future researchers to maintain consistency in the field by making use of the same term-definition dyads.

Comments retrieved from the corpora to illustrate concepts and terms are always shown in the original language with no translation to English. While the body of the text is in English, citations from sources in German pepper the manuscript, with no translation or rephrasing provided. While this type of multilingual approach does not affect global understanding, more nuanced perspectives are lost for those who do not read German (or any the other 5 languages used).

Regarding the quantitative analysis itself, which takes most of the book’s content, only descriptive statistics is used to summarize count data into percentages. Pietro justifies this approach by stating this is an “efficient statistical method that can be applied to assess the differences in the multilingual use of text actions” (p. 50). This claim is not completely accurate, since without inferential statistical methods no assertions can be made about differences.

Some questions also remain regarding the methods used. It is not clear whether the data were coded only by the author, or if there were other coders, with measurements of agreement rates and discussion of any disagreements among coders. Pietro also fails to address the representativeness of the corpus compiled, or whether the sample collected is an adequate representation of the text types and the language varieties being analyzed (McEnery, Xiao & Tono, 2006). In fact, the author justifies the suitability of the booking website corpus by explaining that “the first ten positive comments from the five highest-rated hotels […] and the first ten negative comments from the five lowest-rated hotels” (p. 37) were collected. There is no discussion on whether this type of sample would also hold for all online reviews in this type of website, or whether simple or stratified random sampling (McEnery, Xiao & Tono, 2006) would be more appropriate.

All in all, I would agree that this volume is a good reference on text linguistic research and practice. The text linguistic tools are clearly explained and the analysis is well grounded on theory, but the description of the methods, with respect to both corpus compilation and statistical analysis, is lacking. Nevertheless, some of the findings are indeed compelling and extremely relevant to hotel owners and managers.


Biber, D., & Conrad, S. (2009). Register, genre, and style. Cambridge University Press.

McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. Taylor & Francis.

Sandig, B. (1990). Holistic linguistics as a perspective for the nineties. Text-Interdisciplinary Journal for the Study of Discourse, 10(1-2), 91-96.


Adriana Picoral is a PhD student in the Second Language Acquisition and Teaching program at the University of Arizona. Her research interests include corpus linguistics, computational linguistics, and technology-enhanced language teaching. Her work includes research on pedagogical practices built based on corpus analysis and learner analytics, such as open-learner models.

Page Updated: 24-Aug-2017