Copestake, Ann. 2002. Implementing Typed Feature Structure Grammars. CSLI Publications, xi+233 pp., hardback ISBN 1575862611, USD 62.00; paperback ISBN 1575862603, USD 22.00.
Announced in Linguist List 13.2529, modified announcement 13.2589.
Reviewed by Mike Maxwell, Linguistic Data Consortium at the University of Pennsylvania.
(Reviewer's note: While announced as a book, this is more properly described as software, the book being the tutorial and user manual. The software can be freely downloaded or obtained as a CD.)
Generative grammar was originally a theory of context-free phrase structure rules, augmented with a notion of transformations. But embedded in 'Aspects of the Theory of Syntax' was a nascent theory of syntactic features. Originally viewed as a way to augment a context-free grammar of atomic categories (noun, verb, NP, VP...), these features were in fact capable of completely replacing the atomic categories.
It took many years for syntactic theories to catch up to the possibilities opened up by the introduction of syntactic features. Generalized Phrase Structure Grammar, for example, is a half-breed: an over-generating system of rules based on atomic categories is filtered by principles governing the percolation of those features. But in Head-Driven Phrase Structure Grammar (HPSG), and related theories, the grammar uses only features. While such a grammar is still context-free, it can be considerably more complex to parse. One way of looking at this is that there are orders of magnitude more categories: (potentially) as many as there are combinations of feature values.
Thus, while there have been many more or less efficient parsers for context-free phrase structure grammars using atomic categories (sometimes enhanced with features), there have been very few parsers suitable for theories like HPSG.
The tool described in this book is such a parser: the Linguistic Knowledge Building system (LKB). (The LKB also includes a generator, although it is not as well developed as the parser.) The book is a tutorial (and, to some extent, a reference) for LKB, as well as describing (briefly) the linguistic theory which the software models. It does presuppose some knowledge of syntax, and readers who come from a different theoretical background should probably begin by reading an introductory text on HPSG. Prospective users of this program should also be comfortable with a programmer's editor, such as emacs.
Ann Copestake is the book's author, but she credits many others with helping design and build LKB. The LKB can be downloaded in executable form from the book's website. (Actually, as I write this, it is downloadable from http://www-csli.stanford.edu/~aac/lkb.html; the download link at the URL given in the book is broken.) It runs under Windows (95 or later), RedHat Linux, and Solaris operating systems. The source code is written in Common Lisp (with the gui running under a particular variety of Common Lisp), and has been open-sourced. Reportedly, this allows it to run on an Apple Macintosh, provided you purchase a Lisp interpreter.
I successfully downloaded and ran the Windows version (which I ran under MS-Windows 2000). Users of this version should however be warned that some Windows conventions are not followed (probably because they are not standards under the other operating systems). For instance, installation does not automatically create an icon on your desktop, nor an entry in your 'Start' menu. If you create your own shortcut for starting LKB, and you prefer to keep your grammar files in some location other than a subdirectory of the LKB directory (perhaps under your "My Documents" directory), you'll want to edit the shortcut to start LKB in that directory. Also, clicking on the 'x' in the upper right-hand corner of the running LKB "Top" window shuts down LKB, but leaves a Lisp window running. (Using the 'Quit' menu item from LKB does shut down the Lisp process as well.) Another oddity: while I was able to change the font sizes for most aspects of the display, I was unable to change the miniscule font size used for displaying parses. However, clicking on the parse tree brings up a menu, and one of the menu choices then displays a larger version of the parse tree, in which one can browse individual nodes to view the feature structures which those nodes abbreviate.
I also successfully downloaded and ran the Solaris version (which uses a similarly miniscule and apparently unchangeable font for displaying parses).
Finally, I downloaded several of the Linux versions, but I was unable to make lkb run under my RedHat v.9 Linux installation. It was unclear which version of lkb I was supposed to use for this latest version of Linux, but the 'unstable' version dated 2003-09-03 seemed to come the closest to running. (For those interested in the gory details, lkb complained about a library error. The LKB website FAQ suggested that this was a Motif problem, and that it would only run under an earlier version (2.1) of Motif than what I had on my Linux machine (2.2--the same version downloadable by default from the Motif website). I suspect that I could have obtained this older version from the Motif web site, and replaced the installed version of Motif--but this was getting beyond my meager Linux skills, and I gave up.)
Once running, LKB includes a windowing interface to the grammar loader, parser and generator, various display tools including a type hierarchy browser, and some debugging tools. All these tools are basically read-only, that is, browsers rather than editors; editing is done with your favorite programming editor, which you may think of as a blessing (if you are a veteran emacs user, since emacs has special hooks into the LKB) or a curse (otherwise).
The book consists of two parts, a tutorial and a user manual. The tutorial section begins with a very simple grammar (using atomic categories, and no features), and progressively adds refinements, working up to a grammar that handles long-distance dependencies. Interspersed with the instructions on how to write and debug grammars are discussions of the theory which the grammars presuppose. Exercises, both thought problems and hands-on (programming) exercises help ensure the reader's understanding. (Simple answers to the exercises are supplied in the book, while longer answers are contained in the data files downloadable from the LKB website.)
The user manual section of the book is essentially a reference manual, although there are some useful explanations as well. It is the nearest thing to a 'help' file for the various commands and displays (and parts of it could usefully be turned into an on-line help file).
While generally well-written, I found the book to be hard going at times. I also developed an increased appreciation for the joy of hyperlinked text--since the text was merely paper, I found myself penciling in next to in-text references to figures, the page numbers on which those figures appeared. (I was not helped by the figure numbering systems, of which there are two: one numbered on a per-chapter basis, and one running sequentially through the book.)
There are few typos, and most do not interfere with the reading. A couple of the more important ones are the substitution of "latter" for "former" (pg. 76, just above figure 3.56); and two distinct versions of the same rule (first rule on pg. 128 and rule at the bottom of pg. 124; the latter is correct).
In addition to the text and the website, there is an LKB mailing list (archived since June at Linguist List). It does not appear to have received much traffic, but would be a useful resource particularly to readers working through the LKB system on their own.
The LKB program has (of course) certain limitations, perhaps the most important of which is that the order of argument phrases in sentences must match their order in the argument lists of lexical items (apart from wh-movement; dative movement and other such "transformations" are of course treated lexically). The implication is that it will be difficult to treat 'free word order' languages. (I hasten to add that this is a fundamental problem which virtually all parsing programs face.)
Another limitation lies in the simplistic implementation of morphology in LKB. Actually, although some might see this as a limitation, I look at this as the right way to build software. For whatever you may think of the theoretical relationship between morphology and syntax, when it comes to computational implementations, they are quite different. Therefore computational implementations should treat syntax and morphology as separate modules. And indeed, it should not be difficult to attach to LKB a more sophisticated morphological parser/ generator (such as the Xerox finite state morphological transducer, recently published through CSLI/ University of Chicago Press; see Linguist List 14.2028). The same might hold for efficient lexical access; while lexical access is implemented internally to LKB, it may be more efficient with large lexicons to hook up a specialized lexical lookup engine. (The LKB also supplies 'hooks' to allow for an alternative semantic "back end".)
The LKB is also somewhat limited in its treatment of multi-word lexical items, which can consist of a list of words in fixed order, with one of those words being specified as a possible (inflectional) affixation site. As readers of the Linguist List may well appreciate, there is an entire spectrum of constructions which lie somewhere near the border between syntax and morphology, including compounds (which may or may not be written solid, as in English 'doghouse' and 'dog house'), proper names and place names, and ranging through idioms, fixed expressions, and perhaps even verb-particle constructions; not all of these fit such a limited approach. Neither linguistic theory nor computational linguistics has not caught up with the variety of such constructions (see e.g. http://www.cl.cam.ac.uk/users/alk23/mwe/mwe.html for a recent conference on this topic), so it is understandable that the LKB would not handle them all well.
I did not try to use LKB with anything except vanilla ASCII characters. The LKB web site has a place-holder for this issue (as well as for on-going work on multi-word constructions). Since there does not appear to be any obvious way to change the actual font used to display parses (short of editing the Lisp code), it is presumably not possible at present to use alternative fonts inside LKB itself. However, there is a downloadable Japanese grammar (reachable from the LKB website) that is said to run under the LKB; the web documentation recommends running it from inside emacs for purposes of Japanese text input. Since I am proficient neither in emacs nor in Japanese, I did not try this.
Finally, LKB is not intended to serve as a production environment. Rather, it is intended as a tool for learning about parsing and generating with HPSG-like grammars.
My principle criticism of the LKB is not so much a criticism as a concern with what the intended audience is. While my sympathies are entirely with hand-crafted grammars, this is not a prominent methodology in computational linguistics in recent years, where the emphasis has instead been on statistical approaches. Nor, outside of HPSG, will many linguists find the twin barriers of theory and computational tools easy to cross. So the main audience seems to be those linguists who already know HPSG (or a related theory), and who are already comfortable enough with computers and programming to build and debug grammars. And this seems--unfortunately--a rather small audience.
There are several things that could make this grammar development system more accessible to linguists (or to linguistics students). One such aid would be to hide more of the computational details, such as the distinction (discussed in section 4.3 and elsewhere) between (ordinary) lists and 'difference lists'. This distinction is purely computational, not linguistic, and should be invisible to users.
It would also be helpful to provide as part of the development environment structured editors, which would make typographic errors (spelling, unbalanced brackets) less likely. In my experience, tracking down such errors (not to mention learning the special notation in the first place) can chew up a lot of time.
Another change which would make the system much easier to use concerns the scripting commands. As it stands, in addition to learning the grammar notation, the advanced user must learn a Lisp-like notation in order to tell the system which files to load, and for defining certain system parameters. While the learning curve for this portion of the application is not long (scripting commands tend not to be complex, and often what one needs for a new grammar can simply be adapted from an existing grammar), the learning curve could have been eliminated entirely by providing access to the necessary commands and variable settings through the gui (and then attaching to these gui widgets, pointers to the appropriate sections of the user manual section of the book, converted into on-line help documents).
But perhaps the most useful thing to make LKB more useful and interesting to linguists, is something its users could best do: build up a set of sample grammars. As a start towards this, the LKB website makes available for download two good-sized grammars, one of English and one of Japanese. I googled several more LKB grammars (including a categorial grammar implementation), but it would be a service to the NLP community and particularly to users of LKB to gather the URLs for such systems in one location (and make them accessible from OLAC as well).
In summary, I can recommend this book and its software for those who wish to try their hand at parsing using an HPSG-like approach. I suspect it would work well in a classroom setting, although I think students would need some prior experience in program editing and debugging. (As I mentioned, a structured editor would do much to ease the frustration for naïve users that inevitably comes with using a 'dumb' editor for programming.) There are of course other parsing systems (Copestake provides a long list of freely available systems at the end of chapter five), but it is beyond the scope of this review to compare them with LKB. Suffice to say that if you want to experiment with parsing using HPSG or similar formalisms, then LKB is probably the system to use.
ABOUT THE REVIEWER:
About the reviewer Mike Maxwell works for the Linguistic Data Consortium at the University of Pennsylvania. He holds a Ph.D. in linguistics from the University of Washington. In between these two places, he developed a broad coverage syntactic grammar of English, consulted on minority languages in Colombia, and built a morphological parser/ generator.