LINGUIST List 13.928

Thu Apr 4 2002

Review: Computational Ling, Morphology: Kiraz (2001)

Editor for this issue: Terence Langendoen <terrylinguistlist.org>


What follows is another discussion note contributed to our Book Discussion Forum. We expect these discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in. If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for discussion." (This means that the publisher has sent us a review copy.) Then contact Simin Karimi at siminlinguistlist.org or Terry Langendoen at terrylinguistlist.org.

Directory

  1. Pius ten Hacken, Kiraz (2001) Computational Nonlinear Morphology

Message 1: Kiraz (2001) Computational Nonlinear Morphology

Date: Fri, 29 Mar 2002 14:13:55 +0100
From: Pius ten Hacken <pius.tenhackenunibas.ch>
Subject: Kiraz (2001) Computational Nonlinear Morphology

Kiraz, George Anton (2001) Computational Nonlinear Morphology.
Cambridge University Press, xxi+171pp, hardback ISBN 0-521-63196-3,
US$59.95
 Announced at http://linguistlist.org/issues/12/12-1986.html#2

Pius ten Hacken, Universit�t Basel

The morphology of Semitic languages is marked by an unusual process
whereby a sequence of consonants which represents the root is
distributed over a template. In Arabic, for example, there is a root
"ktb" ('write') which appears as "katab" in the perfective active,
"kutib" in the perfective passive, "aktub" in the imperfective
active, and "uktab" in the imperfective passive. It is not surprising
that morphologists have been fascinated and sometimes uneasy with
this phenomenon, which does not fit in with more common categories
such as suffixation and prefixation. The problem is all the more
pertinent in computational morphology, where finite-state approaches
have been predominant since the 1980s. In this book, based on the
author's Ph.D. dissertation, Kiraz describes how finite-state
mechanisms for computational morphology can be extended so that they
can also cover templatic morphology in Semitic languages.

SYNOPSIS
The book consists of eight chapters, which can be roughly divided
into a description of the background and earlier work on the topic in
Chapters 1-4 and a description of the author's own approach in
Chapters 5-8.

Chapter 1 introduces basic notions from morphology, formal languages,
finite-state automata, and Semitic morphology, with the aim of making
the book accessible to scholars from different backgrounds.

Chapter 2 provides a survey of the most prominent approaches to the
morphology of Semitic languages in linguistic theory. Autosegmental
morphology, introduced by McCarthy (1981), proposes that a form such
as "katab" is the result of linearization of information represented
at three different tiers, one for the root "ktb", one for the
vocalism "a", and one for the CV pattern CVCVC. Other approaches
discussed are based on this one. One particularly challenging process
found in Semitic languages is the formation of the so-called broken
plural, e.g. "salaatiin" from "sultaan". Apart from templatic
morphology, the morphology of Semitic languages also involves
prefixation, suffixation, and circumfixation.

Chapter 3 gives an historical overview of finite-state morphology. In
its most popular form it is based on Koskenniemi's (1983) two-level
formalism, which consists of a lexicon system for the concatenation
of formatives and a set of two-level rules mediating between the
lexical form resulting from the concatenation and the surface form
found in actual text. A number of developments are briefly
summarized, with the emphasis on those ones relevant to the
morphology of Semitic languages.

Chapter 4 is a survey of the discussion of the morphology of Semitic
languages in the context of computational linguistics. The proposals
range from the extension of finite-state morphology with additional
tapes to the abandonment of finite-state constraints.

In Chapter 5 the author describes his own approach, based on
proposals by Kay (1987) and Pulman & Hepple (1993). Instead of the
lexical form as used by Koskenniemi (1983), there are three tapes
corresponding to McCarthy's (1981) tiers. Rewrite rules relate a
triple from the tapes of the lexical form to a symbol of the surface
form. Affixes are represented on the same tape as the CV pattern and
put in the correct position by the morphotactic component, which
consists of regular or context-free rules.

Chapter 6 shows how each of the approaches discussed in Chapter 2 can
be modelled in terms of the system described in Chapter 5. In the
treatment of the broken plural, an extension of the model is
proposed. Rather than relating a plural form such as "salaatiin"
directly to the root "sltn", a vocalism, and a CV pattern, the
formation of this form is based on the singular "sultaan". More
generally, experience of writing actual mapping rules has shown that
it is not practical to require that they always refer to the tiers.
Therefore an intermediate level between lexical tiers and surface
form is introduced, the linearized lexical form.

Chapter 7 describes the compilation process. It starts with a number
of formal definitions and then describes separately how the lexicon
component, the rewrite rules component, and the morphotactic
component are compiled into finite-state transducers.

Chapter 8 is a brief conclusion with some thoughts on the use of the
mechanisms developed here for other phenomena, the remaining problems
and the general outlook.

DISCUSSION
The start of Chapter 1 immediately raises the question of the
audience for this book: "This book might have a wide audience:
computational linguists, theoretical and applied linguists,
Semitists, and - who knows - maybe Biblical scholars with interest in
Semitic." The claim that the intended audience includes all these
categories is hedged by "might", but the stated purpose of Chapter 1
is to make the book accessible to all of them. This would have been a
remarkable achievement for a Ph.D. dissertation in such a technical
field. In my opinion, however, this goal is not achieved.

Dissertations are of course a difficult genre as far as the target
audience is concerned. In writing a dissertation, the primary
audience is the supervisor. This means that in order not to look
naive, authors are often inclined to reduce the level of explicitness
in explanation and argumentation to what the supervisor still
understands. Compared to the intended audience for a book, however,
Ph.D. supervisors have a rather atypical level of background
knowledge, so that the arguments in a Ph.D. dissertation are often
quite difficult for outsiders to understand.

As I have not read the original Ph.D. dissertation I can only guess
which parts were reworked for publication as a book, but the results
are very unequal in terms of clarity of presentation. Some parts, in
particular most of Chapter 7, are a model of clear, user-friendly
presentation. Each definition is immediately illustrated by an
example in such a way that they can be understood with only basic
knowledge of formal language theory. Other sections, for instance
5.5, remain somewhat opaque because formal concepts are introduced
without explaining or showing how they are used.

In the survey chapters (2-4), an excellent opportunity to make the
book useful to a wider audience is missed. The descriptions of
different linguistic treatments of Semitic morphology in Chapter 2
are juxtaposed without showing how they are related. The approaches
are in a partially temporal order and there is an overlap in the
people involved, so that it may be assumed that there are rational
considerations as to why a particular new approach emerged. The
reader has to guess them, however. In Chapter 3 and to an even
greater extent in Chapter 4, one gets the impression that the reader
is supposed to know all the articles referred to. The descriptions
are just elaborate enough to recall the main features to someone who
knows the underlying system, but to others they are a real challenge.

Another property of dissertations which turns into a disadvantage, at
least when they are published as a book, is the tendency to simplify
the presentation of areas not central to the topic at hand. A rather
innocent example is the statement in the preface that there are only
two (named) earlier monographs on computational morphology preceding
the book under review. Such a claim is of course quite hazardous and
in fact unlikely to be true.

The casual treatment of some central morphological questions gives
more cause for concern. Ignoring the debate on the status of
morphemes, section 1.1.1.1 introduces them as if there was a general
consensus about them. The author assumes that morphemes are the basic
entities of the lexicon and morphological rules combine them,
selecting allomorphs and adapting the form where appropriate.
Although Matthews (1974) is referred to in this section, neither here
nor elsewhere in the book did I come across any mention of the
distinction between inflection and word formation. Even a brief
footnote explaining the choice of the set of assumptions and
mentioning its consequences would have been much better than
completely ignoring these central issues in morphological theory. The
fact that Aronoff's (1994) influential analysis of the template
system of Semitic languages is entirely ignored must be due to this
set of assumptions. Aronoff's book is quoted for a general historical
remark, which means that the author must be aware of it. It looks as
if the incompatibility of theoretical frameworks resulted in the
author finding it too difficult to relate his own analysis to
Aronoff's.

Another point which may be taken as a matter of simplification
concerns the use of "phonological" notation. Throughout the book,
strings appear which are enclosed in slashes and are called
phonological, but which do not represent any common phonological
notation. Some examples are /move#ing/ (where # stands for beta, used
as a boundary symbol) on p. 22, /unsuccessful/ on p. 80, and /can/ on
p. 151. In practicing computational morphology one can choose to take
a phonological perspective, as in Cahill & Gazdar (1999), or an
orthographic one, as in ten Hacken & Domenig (1996), but a mixed
perspective is confusing. Of course this confusion has a long
tradition, as illustrated by the term "weak letters" (p. 46) with a
reference to the Arabic grammatical tradition.

I will not mention the minor errors I found in some of the figures,
because the author announces a periodically updated errata sheet at
http://www.bethmardutho.org/gkiraz. A password is required for access
to this site.

The problems noted here do not strike at the heart of the research
presented in the book under review. They mainly concern its
presentation beyond a small circle of specialists. The second half of
the book is definitely the better part and the system described is of
great interest. For those sufficiently knowledgeable about the
computational treatment of Semitic morphology this is definitely a
valuable book. For others the value depends on their eagerness to
collect and read the original articles referred to or their tolerance
to taking in a great deal of information without seeing the
relationships or understanding the details.


REFERENCES
Aronoff, Mark H. (1994), Morphology by Itself: Stems and Inflectional
Classes, Cambridge (Mass.): MIT Press.

Cahill, Lynne & Gazdar, Gerald (1999), 'German noun inflection',
Journal of Linguistics 35:1-42.

ten Hacken, Pius & Domenig, Marc (1996), 'Reusable Dictionaries for
NLP: The Word Manager Approach', Lexicology 2:232-255.

Kay, Martin (1987), 'Nonconcatenative Finite-State Morphology', in
Proceedings of the Third Conference of the European Chapter of the
Association of Computational Linguistics, 1-3 April 1987, University
of Copenhagen, p. 2-10.

Koskenniemi, Kimmo (1983), Two-Level Morphology: A General
Computational Model for Word-Form Recognition and Production,
University of Helsinki, Department of General Linguistics
Publications No. 11.

McCarthy, John J. (1981), 'A Prosodic Theory of Nonconcatenative
Morphology', Linguistic Inquiry 12:373-418.

Pulman, Stephen G. & Hepple, M. (1993), 'A Feature-Based Formalism
for Two-Level Phonology: A Description and Implementation', Computer
Speech and Language 7:333-358.

ABOUT THE REVIEWER
Pius ten Hacken is Privatdozent for General Linguistics at the
Universit�t Basel. His research specializations include theoretical
and computational morphology. He has worked in the Word Manager
project for reusable morphological dictionaries since 1991 and
coauthored a monograph on this system.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue