Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more



Donate Now | Visit the Fund Drive Homepage

Amount Raised:

$34674

Still Needed:

$40326

Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington


Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

What is English? And Why Should We Care?

By: Tim William Machan

To find some answers Tim Machan explores the language's present and past, and looks ahead to its futures among the one and a half billion people who speak it. His search is fascinating and important, for definitions of English have influenced education and law in many countries and helped shape the identities of those who live in them.


New from Cambridge University Press!

ad

Medical Writing in Early Modern English

Edited by Irma Taavitsainen and Paivi Pahta

This volume provides a new perspective on the evolution of the special language of medicine, based on the electronic corpus of Early Modern English Medical Texts, containing over two million words of medical writing from 1500 to 1700.


Email this page
E-mail this page

Review of  Implementing Typed Feature Structure Grammars


Reviewer: Michael B. Maxwell
Book Title: Implementing Typed Feature Structure Grammars
Book Author: Ann Copestake
Publisher: CSLI Publications
Linguistic Field(s): Computational Linguistics
Book Announcement: 14.2409

Discuss this Review
Help on Posting
Review:


Date: Thu, 4 Sep 2003 16:22:16 -0400
From: Mike Maxwell <maxwell@ldc.upenn.edu>
Subject: Implementing Typed Feature Structure Grammars

Copestake, Ann. 2002. Implementing Typed Feature Structure Grammars.
CSLI Publications, xi+233 pp., hardback ISBN 1575862611, USD 62.00;
paperback ISBN 1575862603, USD 22.00.

Announced in Linguist List 13.2529, modified announcement 13.2589.

Reviewed by Mike Maxwell, Linguistic Data Consortium at the University of
Pennsylvania.

(Reviewer's note: While announced as a book, this is more properly
described as software, the book being the tutorial and user manual.
The software can be freely downloaded or obtained as a CD.)

Generative grammar was originally a theory of context-free phrase
structure rules, augmented with a notion of transformations. But
embedded in 'Aspects of the Theory of Syntax' was a nascent theory
of syntactic features. Originally viewed as a way to augment a
context-free grammar of atomic categories (noun, verb, NP, VP...),
these features were in fact capable of completely replacing the atomic
categories.

It took many years for syntactic theories to catch up to the
possibilities opened up by the introduction of syntactic features.
Generalized Phrase Structure Grammar, for example, is a half-breed: an
over-generating system of rules based on atomic categories is filtered
by principles governing the percolation of those features. But in
Head-Driven Phrase Structure Grammar (HPSG), and related theories, the
grammar uses only features. While such a grammar is still context-free,
it can be considerably more complex to parse. One way of looking at
this is that there are orders of magnitude more categories: (potentially)
as many as there are combinations of feature values.

Thus, while there have been many more or less efficient parsers for
context-free phrase structure grammars using atomic categories
(sometimes enhanced with features), there have been very few parsers
suitable for theories like HPSG.

The tool described in this book is such a parser: the Linguistic
Knowledge Building system (LKB). (The LKB also includes a generator,
although it is not as well developed as the parser.) The book is a
tutorial (and, to some extent, a reference) for LKB, as well as
describing (briefly) the linguistic theory which the software models.
It does presuppose some knowledge of syntax, and readers who come
from a different theoretical background should probably begin by reading
an introductory text on HPSG. Prospective users of this program should
also be comfortable with a programmer's editor, such as emacs.

Ann Copestake is the book's author, but she credits many others with
helping design and build LKB. The LKB can be downloaded in executable
form from the book's website. (Actually, as I write this, it is
downloadable from http://www-csli.stanford.edu/~aac/lkb.html; the
download link at the URL given in the book is broken.) It runs under
Windows (95 or later), RedHat Linux, and Solaris operating systems. The
source code is written in Common Lisp (with the gui running under a
particular variety of Common Lisp), and has been open-sourced.
Reportedly, this allows it to run on an Apple Macintosh, provided you
purchase a Lisp interpreter.

I successfully downloaded and ran the Windows version (which I ran under
MS-Windows 2000). Users of this version should however be warned that
some Windows conventions are not followed (probably because they are not
standards under the other operating systems). For instance,
installation does not automatically create an icon on your desktop, nor an
entry in your 'Start' menu. If you create your own shortcut for starting LKB,
and you prefer to keep your grammar files in some location other than a
subdirectory of the LKB directory (perhaps under your "My Documents"
directory), you'll want to edit the shortcut to start LKB in that
directory. Also, clicking on the 'x' in the upper right-hand corner of
the running LKB "Top" window shuts down LKB, but leaves a Lisp window
running. (Using the 'Quit' menu item from LKB does shut down the Lisp
process as well.) Another oddity: while I was able to change the font
sizes for most aspects of the display, I was unable to change the
miniscule font size used for displaying parses. However, clicking on
the parse tree brings up a menu, and one of the menu choices then displays
a larger version of the parse tree, in which one can browse individual
nodes to view the feature structures which those nodes abbreviate.

I also successfully downloaded and ran the Solaris version (which uses a
similarly miniscule and apparently unchangeable font for displaying
parses).

Finally, I downloaded several of the Linux versions, but I was unable to
make lkb run under my RedHat v.9 Linux installation. It was unclear
which version of lkb I was supposed to use for this latest version of Linux,
but the 'unstable' version dated 2003-09-03 seemed to come the closest
to running. (For those interested in the gory details, lkb complained
about a library error. The LKB website FAQ suggested that this was a
Motif problem, and that it would only run under an earlier version (2.1)
of Motif than what I had on my Linux machine (2.2--the same version
downloadable by default from the Motif website). I suspect that I could
have obtained this older version from the Motif web site, and replaced
the installed version of Motif--but this was getting beyond my meager
Linux skills, and I gave up.)

Once running, LKB includes a windowing interface to the grammar loader,
parser and generator, various display tools including a type hierarchy
browser, and some debugging tools. All these tools are basically
read-only, that is, browsers rather than editors; editing is done with
your favorite programming editor, which you may think of as a blessing
(if you are a veteran emacs user, since emacs has special hooks into the
LKB) or a curse (otherwise).

The book consists of two parts, a tutorial and a user manual. The
tutorial section begins with a very simple grammar (using atomic
categories, and no features), and progressively adds refinements,
working up to a grammar that handles long-distance dependencies.
Interspersed with the instructions on how to write and debug grammars
are discussions of the theory which the grammars presuppose. Exercises,
both thought problems and hands-on (programming) exercises help ensure
the reader's understanding. (Simple answers to the exercises are supplied
in the book, while longer answers are contained in the data files downloadable
from the LKB website.)

The user manual section of the book is essentially a reference manual,
although there are some useful explanations as well. It is the nearest
thing to a 'help' file for the various commands and displays (and parts
of it could usefully be turned into an on-line help file).

While generally well-written, I found the book to be hard going at
times. I also developed an increased appreciation for the joy of hyperlinked
text--since the text was merely paper, I found myself penciling in next
to in-text references to figures, the page numbers on which those figures
appeared. (I was not helped by the figure numbering systems, of which
there are two: one numbered on a per-chapter basis, and one running
sequentially through the book.)

There are few typos, and most do not interfere with the reading. A
couple of the more important ones are the substitution of "latter" for "former"
(pg. 76, just above figure 3.56); and two distinct versions of the same
rule (first rule on pg. 128 and rule at the bottom of pg. 124; the
latter is correct).

In addition to the text and the website, there is an LKB mailing list
(archived since June at Linguist List). It does not appear to have
received much traffic, but would be a useful resource particularly to
readers working through the LKB system on their own.

The LKB program has (of course) certain limitations, perhaps the most
important of which is that the order of argument phrases in sentences
must match their order in the argument lists of lexical items (apart
from
wh-movement; dative movement and other such "transformations" are of
course treated lexically). The implication is that it will be difficult
to treat 'free word order' languages. (I hasten to add that this is a
fundamental problem which virtually all parsing programs face.)

Another limitation lies in the simplistic implementation of morphology
in LKB. Actually, although some might see this as a limitation, I look at
this as the right way to build software. For whatever you may think of
the theoretical relationship between morphology and syntax, when it
comes to computational implementations, they are quite different.
Therefore computational implementations should treat syntax and
morphology as separate modules. And indeed, it should not be difficult
to attach to LKB a more sophisticated morphological parser/ generator
(such as the Xerox finite state morphological transducer, recently published
through CSLI/ University of Chicago Press; see Linguist List 14.2028).
The same might hold for efficient lexical access; while lexical access is
implemented internally to LKB, it may be more efficient with large lexicons
to hook up a specialized lexical lookup engine. (The LKB also supplies 'hooks'
to allow for an alternative semantic "back end".)

The LKB is also somewhat limited in its treatment of multi-word lexical
items, which can consist of a list of words in fixed order, with one of
those words being specified as a possible (inflectional) affixation
site.
As readers of the Linguist List may well appreciate, there is an entire
spectrum of constructions which lie somewhere near the border between
syntax and morphology, including compounds (which may or may not be
written solid, as in English 'doghouse' and 'dog house'), proper names
and place names, and ranging through idioms, fixed expressions, and
perhaps even verb-particle constructions; not all of these fit such a
limited approach. Neither linguistic theory nor computational
linguistics has not caught up with the variety of such constructions (see e.g.
http://www.cl.cam.ac.uk/users/alk23/mwe/mwe.html for a recent conference
on this topic), so it is understandable that the LKB would not handle
them all well.

I did not try to use LKB with anything except vanilla ASCII characters.
The LKB web site has a place-holder for this issue (as well as for
on-going work on multi-word constructions). Since there does not appear
to be any obvious way to change the actual font used to display parses
(short of editing the Lisp code), it is presumably not possible at
present to use alternative fonts inside LKB itself. However, there is a
downloadable Japanese grammar (reachable from the LKB website) that is
said to run under the LKB; the web documentation recommends running it
from inside emacs for purposes of Japanese text input. Since I am
proficient neither in emacs nor in Japanese, I did not try this.

Finally, LKB is not intended to serve as a production environment.
Rather, it is intended as a tool for learning about parsing and
generating with HPSG-like grammars.

My principle criticism of the LKB is not so much a criticism as a
concern with what the intended audience is. While my sympathies are
entirely with hand-crafted grammars, this is not a prominent methodology
in computational linguistics in recent years, where the emphasis has
instead been on statistical approaches. Nor, outside of HPSG, will many
linguists find the twin barriers of theory and computational tools easy to cross.
So the main audience seems to be those linguists who already know HPSG
(or a related theory), and who are already comfortable enough with
computers and programming to build and debug grammars. And this
seems--unfortunately--a rather small audience.

There are several things that could make this grammar development system
more accessible to linguists (or to linguistics students). One such aid
would be to hide more of the computational details, such as the
distinction (discussed in section 4.3 and elsewhere) between (ordinary)
lists and 'difference lists'. This distinction is purely computational,
not linguistic, and should be invisible to users.

It would also be helpful to provide as part of the development
environment structured editors, which would make typographic errors
(spelling, unbalanced brackets) less likely. In my experience, tracking
down such errors (not to mention learning the special notation in the
first place) can chew up a lot of time.

Another change which would make the system much easier to use concerns
the scripting commands. As it stands, in addition to learning the
grammar notation, the advanced user must learn a Lisp-like notation in order
to tell the system which files to load, and for defining certain system
parameters. While the learning curve for this portion of the application
is not long (scripting commands tend not to be complex, and often what
one needs for a new grammar can simply be adapted from an existing
grammar), the learning curve could have been eliminated entirely by
providing access to the necessary commands and variable settings through
the gui (and then attaching to these gui widgets, pointers to the
appropriate sections of the user manual section of the book, converted
into on-line help documents).

But perhaps the most useful thing to make LKB more useful and interesting
to linguists, is something its users could best do: build up a set of
sample grammars. As a start towards this, the LKB website makes available
for download two good-sized grammars, one of English and one of Japanese.
I googled several more LKB grammars (including a categorial grammar
implementation), but it would be a service to the NLP community and
particularly to users of LKB to gather the URLs for such systems in one
location (and make them accessible from OLAC as well).

In summary, I can recommend this book and its software for those who
wish to try their hand at parsing using an HPSG-like approach. I suspect it
would work well in a classroom setting, although I think students would
need some prior experience in program editing and debugging. (As I
mentioned, a structured editor would do much to ease the frustration for
naïve users that inevitably comes with using a 'dumb' editor for
programming.) There are of course other parsing systems (Copestake
provides a long list of freely available systems at the end of chapter
five), but it is beyond the scope of this review to compare them with
LKB. Suffice to say that if you want to experiment with parsing using
HPSG or similar formalisms, then LKB is probably the system to use.





 
ABOUT THE REVIEWER:
About the reviewer Mike Maxwell works for the Linguistic Data Consortium at the University of Pennsylvania. He holds a Ph.D. in linguistics from the University of Washington. In between these two places, he developed a broad coverage syntactic grammar of English, consulted on minority languages in Colombia, and built a morphological parser/ generator.

Amazon Store: