LINGUIST List 26.3129

Thu Jul 02 2015

Review: Socioling; Text/Corpus Ling: Cougnon, Fairon (2014)

Editor for this issue: Sara Couture <>

Date: 04-Feb-2015
From: Michelle Johnson <>
Subject: SMS Communication
E-mail this message to a friend

Discuss this message

Book announced at

EDITOR: Louise-Amélie Cougnon
EDITOR: Cédrick Fairon
TITLE: SMS Communication
SUBTITLE: A linguistic approach
SERIES TITLE: Benjamins Current Topics 61
PUBLISHER: John Benjamins
YEAR: 2014

REVIEWER: Michelle Anne Johnson, CUNY Graduate Center

Review's Editor: Helen Aristar-Dry


SMS Communication, edited by Louise-Amélie Cougnon and Cédrick Fairon, presents a collection of scientifically grounded linguistic research on short message service (SMS) communication, highlighting the multiple approaches to understanding this emergent language form, as well as corpora and processing tools for working with it. This edited volume brings together a range of voices and perspectives through its twelve chapters to create a comprehensive picture of the linguistic, social, and technological evolution of SMS communication. As Crystal points out in the introduction, this volume is timely and necessary, as SMS communication is changing before our eyes, and too much research to this point has focused on whether it hurts or helps students’ educational development, and too little has focused on analyzing and documenting this language form. This volume covers these areas, inviting the reader to explore more dimensions of linguistic research on this emergent language form and while also providing theoretical background and support for future corpus development.


Through the introduction, Cougnon and Fairon contextualize this collection of works with their insights into the global shift in communication patterns occurring as a response to digital convergence. They then introduce the volume’s three primary objectives: to present SMS research, introduce the reader to available corpora and processing tools, and to showcase the variety of approaches to SMS study. Overall, this volume does an outstanding job of achieving all of these goals. They only way that this work could be improved is by grouping the papers by their subdomain, further illustrating how each paper is related to and builds on the others.

In “Seek and Hide: Anonymizing a French SMS corpus using natural language processing techniques”, Pierre Accorsi, Namrata Patel, Cédric Lopez, Rachel Panckhurst, and Mathieu Roche describe Seek&Hide, a much needed and efficient tool for anonymizing a large corpus. Seek&Hide was developed for the sud4science project ( Aimed at a corpus researcher, the chapter goes into detail about the intricacies of anonymization and the types of personal information that appear in such data sets. This discussion itself is invaluable, as anonymization tends to be a complicated, highly technical, yet crucial aspect of corpus work. It is impossible and unethical to do this work without the anonymization process; yet it can tend to be extremely time consuming to do by hand and complicated to do computationally without losing large amounts of data. This chapter would provide an excellent introduction to the process for a novice researcher or corpus research course.

Josie Bernicot, Olga Volckaert-Legrier, Antonine Goumi, and Alain Bert-Erboul in their chapter, “SMS experience and textisms in young adolescents: Presentation of a longitudinally collected corpus”, present an ingeniously gathered corpus of SMS messages. Starting with a group of novice texters, they provided them with phones in order to watch the evolution of SMS in one group. Their messages were then sent to the researchers over the course of one year. This created a corpus illustrating how a texting community evolves its conventions and norms and thereby essentially creates a dialect by interacting with an open conversational community. Not only does this corpus invite questions of language evolution, but it also invites research on how textisms appear and get adopted within a speech community. Overall, this corpus presents a very exciting prospect for understanding SMS communication and language evolution on a micro scale.

In “Automatic or controlled writing? The effect of a dual task on SMS writing in novice and expert adolescents”, Céline Combes, Olga Volckaert-Legrier, and Pierre Largy use a dual task design to determine if there is a significant cognitive cost to writing and reading SMS’s, and if that cost is lessened with more practice. Not unexpectedly, they found that the cognitive costs lessened with increased expertise. Parallel to research on the cognitive cost of learning to write, they hypothesize that the lower performance by novice texters is a result of the working memory associated with communicating in an unfamiliar medium. They further analyzed the results by looking at the experts’ texting features (i.e., substitutions, reductions, lexical and grammatical spelling, etc.) across the sections of a message (opening, body, and closing), and in the single task and dual task cases. They found that participants made more modifications in the single task – indicating that standard writing may require less working memory than SMS writing for some participants. They further found that closings had the most modifications, more than the body of the message, and more than the opening. This is not an unexpected result as since closings (and openings) are fairly standard and routinized as compared to the body of a message, they lend themselves to modification and play as the actual content is understood. This is not addressed in this paper, though I hope suspect it could be investigated through use of tools such as VARD2 presented later in this volume (available from Lancaster Universtiy at .

Úrsula Kirsten Torrado, in her chapter, “Development of SMS language from 2000 to 2010: A comparison of two corpora” analyzes the evolution of this medium, emphasizing just how much this medium has changed as it evolved from a relatively quick and efficient way to communicate into a complex way to perform identity. Her analysis of the changes is insightful and very detailed, drawing on the different types of spelling reformations seen in text messages from 2010 versus those from 2000. Here and there she reminds the reader that the spelling is not bad, but only different. This is exactly the type of article I would assign to an undergraduate class to introduce research on SMS communication.

In their chapter, “Texto4Science”, Philippe Lenglais and Patrick Drouin describe the corpus they developed Texto4science ( in conjunction with the sms4science ( project . Not only do they detail exactly how they collected the data and their annotation system, but they also provide a general analysis of what is present in the corpus. The details of how they organized, annotated, and analyzed the data will be invaluable to other researchers considering embarking on similar projects. Similarly, they describe the questions in their survey and their relevance to the project, creating a very useful guide for the technical aspects of corpus development.

In “SMS communication as plurilingual communication: Hybrid language use as a challenge for classical code-switching categories”, Etienne Morel, Claudine Bucher, Simona Pekarek Doehler, and Beat Siebenhaar identify why traditional notions of code switching do not always fit the language used by bilinguals/plurilinguals in SMS’s. By focusing on a corpus derived from a plurilingual community in Switzerland, they show how complicated the idea of code switching becomes when it is both in a written form and performed between multilingual interlocutors. Code switching is notoriously difficult to quantify, but the SMS modality complicates it even further by introducing homographs and other semiotic systems. This chapter provides a critically needed framework to categorize and classify code switching on digital platforms.

Rachel Panckhurst and Claudine Moise describe the data collection process and initial analysis of a large French language SMS corpus in their chapter, “French text messages: From SMS data collection to preliminary analysis”. One of the most useful aspects of this chapter is the careful discussion of how each choice they made affected the resulting data and therefore the corpus they created. They give a candid account of the effect of choices such as the listing of languages and the annotation decisions. Most importantly they use this corpus to begin doing conversational analysis, which has been notoriously difficult to study as message donations generally focus on one user. The schema they develop for analyzing these messages is sure to become foundation for future research.

In “A sociolinguistic analysis of transnational SMS practices: Non-elite multilingualism, grassroots literacy and social agency among migrant populations in Barcelona”, Maria Sabaté I Dalmau takes a linguistically positive approach, viewing the language practices of trans-national migrant workers and their communities. She deftly shows how one group of multilingual migrant workers are able to claim their autonomy from an oppressive social environment, and maintain their culture and identity through language choice on SMS and ICT.

Elisabeth Stark takes the most formal linguistic approach in her chapter, “Negation marking in French text messages”. By looking at a corpus of over 4,000 messages, she discovers the cause of clitic ‘ne’ negation in French text messages. Surprisingly, it is not sociolinguistically driven, but rather motivated by a completely language-internal trigger: subject type. While this discovery itself is very interesting, what it means for the study of SMS communication is huge. Conventional wisdom in CMC research says that much of the way text messages are written is motivated by identity performance. Stark shows that this is not the entire story, and linguists should start looking at the larger patterns if we wish to understand what motivates SMS as its own language form.

In “‘i didn’t spel that wrong did i. Oops’: Analysis and normalisation of SMS spelling variation”, Caroline Tagg, Alistair Baron, and Paul Rayson describe the use of an automated spelling normalization, VARD2 as applied to the CorTxt corpus of over 11,000 text messages. This tool is designed to take a wide variety of spelling variants and translate them into normalized forms. It is well accepted in CMC research that respellings are a significant component of what makes SMS a unique language form. However, the variations themselves are often ascribed to personal choice. This tool allows researchers to identify the patterns in respellings and looks for language-internal rules or tendencies. In line with Stark’s research in the previous chapter, this allows researchers to better understand this language form as a rule-governed transformation of standard English. Just as English departments are evolving into a study of distant reading, through tools such as this, linguists are able to do distant analysis and find the patterns that are fueling the evolution of SMS communication.

Deniz Uygur-Distexha takes an innovative approach to studying initialisms in “Lol, mdr and ptdr: An inclusive and gradual approach to discourse markers”. These three initialisms (lol, mdr, and ptdr) are by far the most common in French text messages, they all technically translate to laughter or humor, though they are not well understood throughout the literature. By analyzing each of them through a syntactic, semantic lens with a focus on frequency and collocations, Uygur-Distexha is able to conclude that they function as discourse markers, and while they are not meant to be interpreted literally, they are not tied to specific speech acts since they occur in a range of contexts.

Overall, this is a much needed and timely volume exploring and documenting the language found on digital communication technologies. Taking a scientific approach to document and analyze this emerging language form, the pieces in this collection come together to create a sense of urgency for studying SMS Communication as it is evolving. In a veritable call to action, David Crystal’s introduction sums up the essence of this volume perfectly: that witnessing SMS Communication emerge is a time-sensitive opportunity that researchers must take advantage of. This collection is a first step towards documenting and analyzing this language form as it is changing. The only improvement I can suggest is to order the chapters into research versus corpus development and maintenance. This would allow researchers focused on one or the other to easily identify them and give the volume a structure to help relate the pieces to each other more concretely.


Michelle Johnson is a PhD candidate in Linguistics at the CUNY Graduate Center and her research focuses on bilingual texting and literacy practices. She is interested in digital communication as a newly emerging language form and how the advent of digital communication both expands traditional notions of discourse as well as provides researchers with the opportunity to witness language evolve.

Page Updated: 02-Jul-2015