LINGUIST List 13.928

Thu Apr 4 2002

Review: Computational Ling, Morphology: Kiraz (2001)

Editor for this issue: Terence Langendoen <terrylinguistlist.org>

What follows is another discussion note contributed to our Book Discussion Forum. We expect these discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in.

If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for discussion." (This means that the publisher has sent us a review copy.) Then contact Simin Karimi at siminlinguistlist.org or Terry Langendoen at terrylinguistlist.org.

Directory

Pius ten Hacken, Kiraz (2001) Computational Nonlinear Morphology

Message 1: Kiraz (2001) Computational Nonlinear Morphology

Date: Fri, 29 Mar 2002 14:13:55 +0100
From: Pius ten Hacken <pius.tenhackenunibas.ch>
Subject: Kiraz (2001) Computational Nonlinear Morphology

Kiraz, George Anton (2001) Computational Nonlinear Morphology. Cambridge University Press, xxi+171pp, hardback ISBN 0-521-63196-3, US$59.95 Announced at http://linguistlist.org/issues/12/12-1986.html#2

Pius ten Hacken, Universit�t Basel

The morphology of Semitic languages is marked by an unusual process whereby a sequence of consonants which represents the root is distributed over a template. In Arabic, for example, there is a root "ktb" ('write') which appears as "katab" in the perfective active, "kutib" in the perfective passive, "aktub" in the imperfective active, and "uktab" in the imperfective passive. It is not surprising that morphologists have been fascinated and sometimes uneasy with this phenomenon, which does not fit in with more common categories such as suffixation and prefixation. The problem is all the more pertinent in computational morphology, where finite-state approaches have been predominant since the 1980s. In this book, based on the author's Ph.D. dissertation, Kiraz describes how finite-state mechanisms for computational morphology can be extended so that they can also cover templatic morphology in Semitic languages.

SYNOPSIS The book consists of eight chapters, which can be roughly divided into a description of the background and earlier work on the topic in Chapters 1-4 and a description of the author's own approach in Chapters 5-8.

Chapter 1 introduces basic notions from morphology, formal languages, finite-state automata, and Semitic morphology, with the aim of making the book accessible to scholars from different backgrounds.

Chapter 2 provides a survey of the most prominent approaches to the morphology of Semitic languages in linguistic theory. Autosegmental morphology, introduced by McCarthy (1981), proposes that a form such as "katab" is the result of linearization of information represented at three different tiers, one for the root "ktb", one for the vocalism "a", and one for the CV pattern CVCVC. Other approaches discussed are based on this one. One particularly challenging process found in Semitic languages is the formation of the so-called broken plural, e.g. "salaatiin" from "sultaan". Apart from templatic morphology, the morphology of Semitic languages also involves prefixation, suffixation, and circumfixation.

Chapter 3 gives an historical overview of finite-state morphology. In its most popular form it is based on Koskenniemi's (1983) two-level formalism, which consists of a lexicon system for the concatenation of formatives and a set of two-level rules mediating between the lexical form resulting from the concatenation and the surface form found in actual text. A number of developments are briefly summarized, with the emphasis on those ones relevant to the morphology of Semitic languages.

Chapter 4 is a survey of the discussion of the morphology of Semitic languages in the context of computational linguistics. The proposals range from the extension of finite-state morphology with additional tapes to the abandonment of finite-state constraints.

In Chapter 5 the author describes his own approach, based on proposals by Kay (1987) and Pulman & Hepple (1993). Instead of the lexical form as used by Koskenniemi (1983), there are three tapes corresponding to McCarthy's (1981) tiers. Rewrite rules relate a triple from the tapes of the lexical form to a symbol of the surface form. Affixes are represented on the same tape as the CV pattern and put in the correct position by the morphotactic component, which consists of regular or context-free rules.

Chapter 6 shows how each of the approaches discussed in Chapter 2 can be modelled in terms of the system described in Chapter 5. In the treatment of the broken plural, an extension of the model is proposed. Rather than relating a plural form such as "salaatiin" directly to the root "sltn", a vocalism, and a CV pattern, the formation of this form is based on the singular "sultaan". More generally, experience of writing actual mapping rules has shown that it is not practical to require that they always refer to the tiers. Therefore an intermediate level between lexical tiers and surface form is introduced, the linearized lexical form.

Chapter 7 describes the compilation process. It starts with a number of formal definitions and then describes separately how the lexicon component, the rewrite rules component, and the morphotactic component are compiled into finite-state transducers.

Chapter 8 is a brief conclusion with some thoughts on the use of the mechanisms developed here for other phenomena, the remaining problems and the general outlook.

DISCUSSION The start of Chapter 1 immediately raises the question of the audience for this book: "This book might have a wide audience: computational linguists, theoretical and applied linguists, Semitists, and - who knows - maybe Biblical scholars with interest in Semitic." The claim that the intended audience includes all these categories is hedged by "might", but the stated purpose of Chapter 1 is to make the book accessible to all of them. This would have been a remarkable achievement for a Ph.D. dissertation in such a technical field. In my opinion, however, this goal is not achieved.

Dissertations are of course a difficult genre as far as the target audience is concerned. In writing a dissertation, the primary audience is the supervisor. This means that in order not to look naive, authors are often inclined to reduce the level of explicitness in explanation and argumentation to what the supervisor still understands. Compared to the intended audience for a book, however, Ph.D. supervisors have a rather atypical level of background knowledge, so that the arguments in a Ph.D. dissertation are often quite difficult for outsiders to understand.

As I have not read the original Ph.D. dissertation I can only guess which parts were reworked for publication as a book, but the results are very unequal in terms of clarity of presentation. Some parts, in particular most of Chapter 7, are a model of clear, user-friendly presentation. Each definition is immediately illustrated by an example in such a way that they can be understood with only basic knowledge of formal language theory. Other sections, for instance 5.5, remain somewhat opaque because formal concepts are introduced without explaining or showing how they are used.

In the survey chapters (2-4), an excellent opportunity to make the book useful to a wider audience is missed. The descriptions of different linguistic treatments of Semitic morphology in Chapter 2 are juxtaposed without showing how they are related. The approaches are in a partially temporal order and there is an overlap in the people involved, so that it may be assumed that there are rational considerations as to why a particular new approach emerged. The reader has to guess them, however. In Chapter 3 and to an even greater extent in Chapter 4, one gets the impression that the reader is supposed to know all the articles referred to. The descriptions are just elaborate enough to recall the main features to someone who knows the underlying system, but to others they are a real challenge.

Another property of dissertations which turns into a disadvantage, at least when they are published as a book, is the tendency to simplify the presentation of areas not central to the topic at hand. A rather innocent example is the statement in the preface that there are only two (named) earlier monographs on computational morphology preceding the book under review. Such a claim is of course quite hazardous and in fact unlikely to be true.

The casual treatment of some central morphological questions gives more cause for concern. Ignoring the debate on the status of morphemes, section 1.1.1.1 introduces them as if there was a general consensus about them. The author assumes that morphemes are the basic entities of the lexicon and morphological rules combine them, selecting allomorphs and adapting the form where appropriate. Although Matthews (1974) is referred to in this section, neither here nor elsewhere in the book did I come across any mention of the distinction between inflection and word formation. Even a brief footnote explaining the choice of the set of assumptions and mentioning its consequences would have been much better than completely ignoring these central issues in morphological theory. The fact that Aronoff's (1994) influential analysis of the template system of Semitic languages is entirely ignored must be due to this set of assumptions. Aronoff's book is quoted for a general historical remark, which means that the author must be aware of it. It looks as if the incompatibility of theoretical frameworks resulted in the author finding it too difficult to relate his own analysis to Aronoff's.

Another point which may be taken as a matter of simplification concerns the use of "phonological" notation. Throughout the book, strings appear which are enclosed in slashes and are called phonological, but which do not represent any common phonological notation. Some examples are /move#ing/ (where # stands for beta, used as a boundary symbol) on p. 22, /unsuccessful/ on p. 80, and /can/ on p. 151. In practicing computational morphology one can choose to take a phonological perspective, as in Cahill & Gazdar (1999), or an orthographic one, as in ten Hacken & Domenig (1996), but a mixed perspective is confusing. Of course this confusion has a long tradition, as illustrated by the term "weak letters" (p. 46) with a reference to the Arabic grammatical tradition.

I will not mention the minor errors I found in some of the figures, because the author announces a periodically updated errata sheet at http://www.bethmardutho.org/gkiraz. A password is required for access to this site.

The problems noted here do not strike at the heart of the research presented in the book under review. They mainly concern its presentation beyond a small circle of specialists. The second half of the book is definitely the better part and the system described is of great interest. For those sufficiently knowledgeable about the computational treatment of Semitic morphology this is definitely a valuable book. For others the value depends on their eagerness to collect and read the original articles referred to or their tolerance to taking in a great deal of information without seeing the relationships or understanding the details.

REFERENCES Aronoff, Mark H. (1994), Morphology by Itself: Stems and Inflectional Classes, Cambridge (Mass.): MIT Press.

Cahill, Lynne & Gazdar, Gerald (1999), 'German noun inflection', Journal of Linguistics 35:1-42.

ten Hacken, Pius & Domenig, Marc (1996), 'Reusable Dictionaries for NLP: The Word Manager Approach', Lexicology 2:232-255.

Kay, Martin (1987), 'Nonconcatenative Finite-State Morphology', in Proceedings of the Third Conference of the European Chapter of the Association of Computational Linguistics, 1-3 April 1987, University of Copenhagen, p. 2-10.

Koskenniemi, Kimmo (1983), Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production, University of Helsinki, Department of General Linguistics Publications No. 11.

McCarthy, John J. (1981), 'A Prosodic Theory of Nonconcatenative Morphology', Linguistic Inquiry 12:373-418.

Pulman, Stephen G. & Hepple, M. (1993), 'A Feature-Based Formalism for Two-Level Phonology: A Description and Implementation', Computer Speech and Language 7:333-358.

ABOUT THE REVIEWER Pius ten Hacken is Privatdozent for General Linguistics at the Universit�t Basel. His research specializations include theoretical and computational morphology. He has worked in the Word Manager project for reusable morphological dictionaries since 1991 and coauthored a monograph on this system.