Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!


Revitalizing Endangered Languages

Edited by Justyna Olko & Julia Sallabank

Revitalizing Endangered Languages "This guidebook provides ideas and strategies, as well as some background, to help with the effective revitalization of endangered languages. It covers a broad scope of themes including effective planning, benefits, wellbeing, economic aspects, attitudes and ideologies."

E-mail this page 1

We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at***

Dissertation Information

Title: Automatic Detection of Grammar Errors in Primary School Children's Texts. A Finite State Approach. Add Dissertation
Author: Sylvana Sofkova Hashemi Update Dissertation
Email: click here to access email
Institution: Göteborg University, Department of Computer Science
Completed in: 2003
Linguistic Subfield(s): Computational Linguistics; Syntax; Language Acquisition;
Subject Language(s): Swedish
Director(s): Robin Cooper

Abstract: This thesis concerns the analysis of grammar errors in Swedish texts written by primary school children and the development of a finite state system for finding such errors. Grammar errors are more frequent for this group of writers than for adults and the distribution of the error types is different in children's texts. In addition, other writing errors above word-level are discussed here, including punctuation and spelling errors resulting in existing words.

The method used in the implemented tool FiniteCheck involves subtraction of finite state automata that represent grammars with varying degrees of detail, creating a machine that classifies phrases in a text containing certain kinds of errors. The current version of the system handles errors concerning agreement in noun phrases, and verb selection of finite and non-finite forms. At the lexical level, we attach all lexical tags to words and do not use a tagger which could eliminate information in incorrect text that might be needed later to find the error. At higher levels, structural ambiguity is treated by parsing order, grammar extension and some other heuristics.

The simple finite state technique of subtraction has the advantage that the grammars one needs to write to find errors are always positive, describing the valid rules of Swedish rather than grammars describing the structure of errors. The rule sets remain quite small and practically no prediction of errors is necessary.

The linguistic performance of the system is promising and shows comparable results for the error types implemented to other Swedish grammar checking tools, when tested on a small adult text not previously analyzed by the system. The performance of the other Swedish tools was also tested on the children's data collected for this study, revealing quite low recall rates. This fact motivates the need for adaptation of grammar checking techniques to children, whose errors are different from those found in adult writers and pose more challenge to current grammar checkers, that are oriented towards texts written by adult writers.

The robustness and modularity of FiniteCheck makes it possible to perform both error detection and diagnostics. Moreover, the grammars can in principle be reused for other applications that do not necessarily have anything to do with error detection, such as extracting information in a given text or even parsing.

Key Words: grammar errors, spelling errors, punctuation, children's writing, Swedish, language checking, light parsing, finite state technology