Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!


Revitalizing Endangered Languages

Edited by Justyna Olko & Julia Sallabank

Revitalizing Endangered Languages "This guidebook provides ideas and strategies, as well as some background, to help with the effective revitalization of endangered languages. It covers a broad scope of themes including effective planning, benefits, wellbeing, economic aspects, attitudes and ideologies."

New from Wiley!


We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at***

Review of  Automated Grammatical Error Detection for Language Learners

Reviewer: Cornelia I. Tschichold
Book Title: Automated Grammatical Error Detection for Language Learners
Book Author: Claudia Leacock Martin Chodorow Michael Gamon‌ Joel Tetreault
Publisher: Morgan & Claypool Publishers
Linguistic Field(s): Computational Linguistics
Subject Language(s): English
Issue Number: 26.913

Discuss this Review
Help on Posting
Review's Editor: Helen Aristar-Dry


This slim book is an updated version of the 2010 edition of the same title. The authors justify this update with the substantial expansion of the field and the fact that error detection technology has become mainstream. Language learners, especially learners of English, constitute a huge potential market for tools that promise them to improve the quality of their writing by detecting grammatical errors. Such tools are now becoming more widespread and are increasingly being used by educational institutions.

In the introduction, the authors define ‘grammatical errors’ as including not only grammar errors, but also usage and punctuation errors. These are the kinds of errors that tools such as the grammar checker in MS Word can deal with. The second chapter gives a short overview of the field of grammar checking tools starting with the early commercial tools (‘CorrecText’, ‘Grammatik’, ‘Epistle’, ‘Critique’) in the 1980s with their varying amounts of string-matching and parsing. The 1990s then gave way to statistical methods that were combined with the earlier error-tolerant grammars or error rules. Today most systems use a hybrid approach combining a statistical element that relies on a large training corpus and some rules written to detect specific error types, e.g. errors that are known to be typical for certain learner groups.

Chapter 3 gives a short overview of the types of errors that grammar checkers for language learners can be expected to deal with today. While learners will have problems with different areas of the English language, partly depending on their first language, a number of error types are quite frequent for almost all learners of English, and much work has concentrated on these areas. The article system of the English language is one such area; the use of prepositions and collocations constitute two further sources of errors commonly found in texts written by learners of English.

With Chapter 4, the book becomes slightly more technical. For the evaluation and comparison of different error detection system, researchers use the terms of ‘precision’ and ‘recall’ to calculate the results that grammar checkers achieve for a given text. Both of these figures are calculated on the basis of the numbers for the so-called true positives (errors correctly detected), the false positives (errors detected where there are none), and the false negatives (errors not detected). Precision and recall can also be combined into an overall F-score. The measures are not perfect as they tend to show an improvement of the system simply when there are more errors in the text. A further issue is the precise definition of “error”. What counts as an error, and how the error is best corrected, is not always easy to determine. In order to address these issues, so-called Shared Tasks have been developed, mainly with a view to making a better comparison between systems possible. In these tasks, true and false positives and false negatives are defined, and every occurrence in the corpus identified before the evaluation starts. In contrast to these Shared Tasks, learner corpora contain naturally occurring data, without ready error annotation. Often, more than one correction is possible, and sometimes it is not possible to say with certainty what the learner was trying to express.

Chapters 5 and 6 focus on data-driven approaches to article and preposition errors, and on errors relating to collocations, respectively. Data-driven systems look at the context around each token to determine the typical context for that particular token, but the number of words inspected on each side can vary. If the corpus is tagged or even parsed, the syntactic context can further enrich the information gained from the surrounding words and tags. If deemed necessary, semantic information can be gained from electronic dictionaries or sources such as WordNet. This contextual information is then used to train the system. The training data can either be derived from a corpus of texts that show correct usage only (typical for the earlier systems), from a corpus of correct usage with artificially introduced errors (e.g. random substitutions, deletions, etc.), or from a corpus containing both correct usage and real errors. Once the training is finished, the system can check any new text against its model. Features that show an unusual pattern, i.e. one that deviates too much from the model of correct usage, will be flagged up as potential errors. It is then up to the user to decide whether the feature really does constitute an error or not. To help the user make this decision, a number of typical usage examples can be displayed.

Like article and preposition errors, some collocation errors can be found by comparing the learner’s text with native-speaker texts. Among the best systems for the detection of collocation errors is one for Chinese learners that will find typical mis-collocations such as “eat medicine” and suggest more appropriate word combinations. Transfer errors caused by the learner’s first language are very common in this area, so a good system needs to take this into consideration.

Chapter 7 considers the final group of errors, those concerning spelling, punctuation and verb forms. Statistical approaches may be less well suited to errors where the local context is very relevant. A verb form such as *writed can be corrected without recourse to heuristics; the word form can just be looked up in a list of over-regularized verb forms. After a list of potential errors has been created (manually), a set of rules can be written for their detection, and finally some filters added to prevent over-flagging. To illustrate this approach, the grammar checker ‘Criterion’ has a (bigram-derived) rule that the sequence of the article a followed by a plural noun is usually wrong, but a filter then applies when this occurs in the sequence “a systems analyst”. The ‘ESL Assistant’ has rules for the use of modals and similar verb-related errors, prepositions, and other word combinations that often lead to errors. Other systems for various languages also use such error rules to identify potential problems.

One of the most problematic aspects in the detection of verb form errors is the fact that a wrong word class (part-of-speech) is often attached to a verb during the tagging process precisely because there is an error in the text, i.e. one resulting in another possible word. Some of the Shared Tasks now include this type of error, so more attention may be paid to these problems in the future.

Other punctuation and spelling errors are typically treated using the same methods as in spell checkers aimed at native speakers, partly because we do not know whether the errors non-native writers make are significantly different in quality from those that native speaker writers make.

In Chapter 8, the authors take a closer look at the issues surrounding the annotation of learner corpora for errors. To create a gold standard, a large manually annotated corpus is needed in order to make objective evaluation across different systems a realistic possibility. The ideal annotation would be multi-layered, allowing for more than one correction, and arrived at by agreement among several annotators. Annotator agreement can be low, however, especially if a correction needs to be supplied as well. In the context of automatic error detection systems, relatively simple annotation systems are typically used, e.g. annotation schemes that are limited to the types of errors the system can actually correct. Crowdsourcing is now opening up the possibility of making more comprehensive error annotation more affordable both in terms of time and money. The exploitation of online revision logs of wiki texts or on language learning websites is another possibility mentioned that is worth exploring in this context.

In the last chapter the authors take a look at some interesting developments in the field since the first edition of this book came out. The first of these topics is the Shared Tasks that have become available in the field of automatic grammatical error correction. The first Shared Task used in the 2011 competition was restricted to a set of 13 errors concerning articles and preposition use in a relatively small corpus. Since then the scope of errors has widened, and more and more teams are taking part in the competitions. A major issue in these Shared Task competitions are wrong annotations, and errors that are detected, but were not originally flagged, making the whole procedure quite labour-intensive. How to treat multiple errors is another unresolved issue for Shared Tasks. Progress in machine translation systems is the authors’ second reason for devoting a chapter to new developments in the field. Some language pairs are problematic for article generation in the target language text, and much work has gone into improving the post-editing process for such pairs, i.e. determining definiteness from the source language, so that the appropriate article can be generated in the target language. Two methods developed for machine translation systems promise some potential for re-use in error correction systems. The noisy channel model treats error correction as a translation task from English with errors into English without errors. This may be particularly useful if there are multiple errors in a single sentence. As with any data-driven method, a large corpus is required to train the system. The round trip model involves translating the text into the writer’s native language first, then translating this back into the target language. One experiment used this to correct preposition errors by French speakers in English. Using multiple pivot languages can further improve the result. The third potentially interesting development is the use of crowdsourcing for error correction, divided up into identification, correction and verification. The response time and the quality can be very good, so this is an area with much promise for automatic error detection.

The chapter ends on the assumption that feedback is beneficial for writing quality, but this is not universally accepted. The majority of linguists probably agree that good feedback is beneficial. With automated systems, the feedback quality is lower than it is for humans, but there is still evidence of improved writing quality. The important point seems to be that writers are happy to accept feedback from automated systems as long as it is largely reliable. Systems should therefore be optimized for precision and avoid false positives, even if this means some errors remain undetected.

The conclusion the authors draw is that automatic error correction has been most successful with data-driven approaches. Many error types have not received much attention yet, and as the systems become more widely used, the complexity of the problem becomes more and more obvious. A number of avenues remain to be fully explored, including taking into consideration the writer’s first language, techniques used in machine translation, and findings from the field of second language acquisition research.


The intended readership of this book includes researchers in the field and more generally people with an interest in the area of computer-assisted language learning and NLP (Natural Language Processing). Apart from this group of people, the book could also be recommended for anyone with an interest in how these automated error detection tools work. Given the fact that more and more exam text written in English, whether by first or second language writers, will be evaluated by tools such as those described in this book, it would be highly advisable for many educators to familiarize themselves with the basics of how these tools work. Despite the occasional slightly more technical passage, this book is very readable for readers even without any background in computational linguistics.

In hindsight, the order of topics as they appear in the chapters seems almost in reverse and probably makes more sense from the point of view of the authors than for people outside the field. The chapter on the history of the field is logically positioned early on, but issues such as the definition of what constitutes an error, given in the introduction, become much easier to understand once the reader has learnt something about the difficulties the tools described in this book actually face and the techniques used by them. From this definition, the authors work their way ‘downwards’ all the way to the list of learner corpora given in the appendix, thus dissecting the problem layer by layer.

Four years ago, when the first edition of this book appeared, the lack of Shared Tasks made a comparison between different (commercial) systems practically impossible. In this edition, the authors clearly explain the need for and the value of these Shared Tasks. To the reader, the advances described here also reveal that we are still at the beginning of the development of grammar checkers for language learners and non-native writers. The brief look into the three areas that may provide some key impetus to the field is one of the most interesting aspects of the book and can offer a glimpse of what the future may hold for automatic essay evaluation and error detection.

To conclude, I can highly recommend this book to any reader who is interested in a short, readable overview of the state-of-the art in automated grammatical error detection, written by some of the most influential researchers in this field.
Cornelia Tschichold wrote her MA thesis on grammar checking for non-native speakers, before going on to work on English computational phraseology for her PhD. She now works at swansea University, where she teaches courses on English linguistics. Her research interests include the acquisition of English vocabulary and phraseology, and computer-assisted language learning.

Format: Paperback
ISBN-13: 9781627050135
Prices: U.S. $ 45
Format: Electronic
ISBN-13: 9781627050142
Prices: U.S. $ 45