"Kissine offers a new theory of speech acts which is philosophically sophisticated and builds on work in cognitive science, formal semantics, and linguistic typology. This highly readable, brilliant essay is a major contribution to the field."
EDITORS: Srinivas Bangalore and Aravind K. Joshi TITLE: Supertagging SUBTITLE: Using Complex Lexical Descriptions in Natural Language Processing PUBLISHER: MIT Press YEAR: 2010
William Corvey, Department of Linguistics, University of Colorado at Boulder
INTRODUCTION As the editors note in the Introduction, an encoding of linguistic information can be accomplished using either simple or complex primitives. Simple primitives have the advantage of being straightforward to annotate, but require more complex operations to compose larger structures; more complicated primitives require more advanced local annotation of linguistic features but require only general operations for composition. The approach taken in this text is the latter, to “complicate locally, simplify globally (CLSG)” (2).
Supertagging is an approach of “almost parsing” (Joshi and Srinivas, 1994; Srinivas, 1996; Srinivas and Joshi, 1998), whereby statistical techniques from part of speech disambiguation may be applied to the elementary trees associated with lexical items in Lexicalized Tree Adjoining Grammars (LTAGs) (e.g. Joshi 1985), and this processing allows for more efficient complete parsing. LTAGs differ from context-free grammars (CFGs) in several important ways. First, LTAGs and CFGs contain distinct primitive elements. Whereas the domain of locality of a CFG is expressed as a rule, an LTAG instead encodes elementary trees, which associate a verb with its arguments. This is done in order to localize all dependencies within a single domain; the editors show that constraint specification is often spread over several domains of locality in CFGs, which is counter to the CLSG approach. Second, the LTAGs and CFGs accomplish phrase composition differently. CFGs derive phrases via the application of rules; to form larger LTAG trees, elementary trees may be composed by operations, particularly adjoining (9). In LTAG, these parse trees are called “derived trees,” as they are the result of attaching several elementary trees together. LTAG “derivation trees” provide a record of the operations combinations performed to produce a derived tree (35). Therefore, while CFGs are defined by a set of primitives and context-free rules, an LTAG specification instead contains elementary trees and derivation trees.
The book contains nineteen chapters, including the Introduction, divided into five sections. I will first summarize the five parts of the book and then turn to a discussion of the text.
SUMMARY Part One describes supertag creation and the arrangement of supertags in an inventory, focusing on the development of tree adjoining grammars (TAGs). Because TAGs are difficult to construct manually, automatic extraction of grammars from large-scale lexical resources is an appealing alternative where feasible. Part One describes two systems for the extraction of TAGs from pre-existing resources.
Chapter 2 (“From Treebanks to Tree-Adjoining Grammars,” Fei Xia and Martha Palmer) describes LexTract, a system that produces both CFGs and LTAGs from treebanks. Xia and Palmer first describe the three user-supplied input tables required by the system; these tables provide the information to construct elementary trees, which mark heads and distinguish between arguments and adjuncts. The chapter then proceeds with a description of a three-step extraction algorithm, which (1) converts a Treebank tree into an LTAG derived tree; (2) decomposes this derived tree into elementary trees; and (3) creates derivation trees showing how elementary trees are combined to produce the derived tree. The resulting set of derivation trees is then used to train statistical LTAG parsers. The authors then compare the system to other extraction algorithms for CFG and LTAG and describe a variety of applications using output from LexTract.
In Chapter 3 (“Developing Tree-Adjoining Grammars with Lexical Descriptions,” Fei Xia, Martha Palmer, and K. Vijay-Shanker), the authors present LexOrg, a system that produced LTAG grammars from abstract specifications (73). This system is similar to LexTract in that it employs user-supplied information in order to implement parsing. However, in the case of LexOrg, this information comes in the form of abstract specifications encoding specific linguistic information. The chapter describes a method of eliciting this linguistic information from the user whereby users are required to enter feature equations, rather than tree templates. This process ensures consistency across all templates in the XTAG grammar (e.g. XTAG-Group, 1998) while requiring less effort on the part of the user. These abstract specifications inform key modules within LexOrg. The chapter also includes experimental results and comparison to related systems.
Part Two describes implementations of supertag parsers and the use of supertagging in other parsing applications.
Chapter 4 (“Complexity of Parsing for Some Lexicalized Formalisms,” Giorgio Satta) describes an innovative parsing algorithm for processing LTAGs more efficiently. Satta presents LTAG parsing in the broader context of parsing for lexicalized formalisms. The work extends an algorithm developed for lexicalized context-free grammars to LTAGs, resulting in a significant increase in the efficiency of an LTAG parsing algorithm.
In Chapter 5 (“Combining Supertagging and Lexicalized Tree-Adjoining Grammar Parsing,” Anoop Sarkar), the author explores two factors that impact TAG parser efficiency: syntactic lexical ambiguity and sentence complexity. Supertagging is shown to reduce lexical ambiguity and thus improve parser efficiency. The chapter also details a co-training design using a supertagging parser and a statistically-trained LTAG parser.
Chapter 6 (“Discriminative Learning of Supertagging,” Libin Shen) gives an overview of a system for supertagging based on discriminative learning, which overcomes the problem of noise generated by a trigram supertagger. Using NP chunking as an evaluation task, Shen shows that supertagging, correctly implemented, can yield an improvement in NP chunking accuracy (where experiments using a trigram supertagger showed lower performance). Shen demonstrates techniques for overcoming problems of sparse data, taking advantage of the rich feature sets provided by supertags, and forcing the learning algorithm to zero in on the most difficult classification cases.
Chapter 7 (“A Nonstatistical Parsing-Based Approach to Supertagging,” Pierre Boullier) proposes a disambiguation model using structural constraints as opposed to statistical modeling. The type of model Boullier outlines is automatically deduced from an LTAG and does not require additional training data. In contrast to statistical systems choosing one or n-best tags during parsing, the system outlined here ensures that all supertags needed to parse a sentence will be available for processing; this approach therefore gives 100% recall. The author provides comparison among several supertaggers constructed using this paradigm and provides evaluation.
Chapter 8 (“Nonlexical Chart Parsing for TAG,” Alexis Nasr and Owen Rambow) describes a Generative Dependency Grammar (GDG) parser for supertags. GDG is a type of nonlexicalized chart parser that produces dependency parse output from supertagged input. Nasr and Rambow provide evaluation on the Penn Treebank data and describe parser efficiency. The chapter also details distinctions between this approach, previous work in the area of dependency parsing, and the Lightweight Dependency Analyzer (LDA; Bangalore, 2000).
Part Three gives an overview of supertags and supertag utilization in alternative formalisms.
Chapter 9 (“Supertagging for Efficient Wide-Coverage CCG Parsing,” Stephen Clark and James R. Curran) describes a supertagging implementation in Combinatorial Categorical Grammar (CCG; Steedman, 2000). The authors present a system incorporating supertagging into a CCG parser. The chapter also illustrates a model for CCG supertag disambiguation that yields multiple supertags per word; this allows the system to find more supertags for a given span if the parser fails to cover the entire span with the supertags currently under consideration. The authors conclude by presenting parser evaluation and discussing the role of supertagging in CCG parsing.
Chapter 10 (“Constraint Dependency Grammars: SuperARVs, Language Modeling, and Parsing,” Mary P. Harper and Wen Wang) describes supertagging with constraint dependency grammars (Mayurama, 1990). Parsing is viewed as a constraint satisfaction problem. Harper and Wang introduce super abstract role values (SuperARVs) to encode morphosyntactic information (which is analagous to the supertag encoding of syntactic information). SuperARVs are gleaned from the Penn Treebank, and the authors test two parsing methods. The first method performs SuperARV disambiguation prior to dependency parsing and the second method performs disambiguation and linking in conjunction with one another. The parsers are evaluated using the Wall Street Journal and a speech recognition task.
In Chapter 11 (“Guiding a Constraint Dependency Parser with Supertags,” Kilian Foth, Tomas By, and Wolfgang Menzel), the authors describe a constraint dependency parser incorporating supertags. Supertags are formed from dependency trees taken from the German NEGRA and TIGER corpora. These supertags are designed to be especially information-rich, and thus increase the size of the supertag vocabulary. The authors note that this increase in vocabulary size does not cause a proportional increase in supertagger error, and this motivates an exploration of methods of feature encoding to maximize the performance of a rule-based weighted constrained dependency parser using rich supertags.
Chapter 12 (“Extraction of Type-Logical Supertags from the Spoken Dutch Corpus,” Richard Moot) describes supertags in type-logical grammars (e.g. Lambek, 1958). Moot first provides an introduction to type-logical grammars, which parse via theorem proving. The chapter then proceeds by detailing extraction of a type-logical treebank from the Spoken Dutch Corpus; the resulting lexicon forms the basis of a supertagging vocabulary. The system is trained on a filtered version of this dataset and evaluation of a supertagging system is presented.
Chapter 13 (“Extracting Supertags from HPSG-Based Treebanks,” Günter Neumann and Berthold Crysmann) describes supertagging in the context of Head-Driven Phrase Structure Grammar (HPSG; Pollard and Sag, 1994). The authors extract a Lexicalized Tree Insertion Grammar (LTIG) from a German Treebank included in Verbmobil. Trees from this grammar form supertags in an LTIG parser. The authors present parser evaluation on both the Verbmobil and NEGRA corpora.
Chapter 14 (“Probabilistic Context-Free Grammars with Latent Annotations,” Takuya Matsuzaki, Yusuke Miyao, and Jun’ichi Tsujii) gives a method for extending localization in context-free grammars. The authors embellish nonterminal s in a CFG with latent annotations taking a value from a fixed set. These latent variables allow some dependence in the CFG. The chapter presents evaluation results and a discussion of variations in the values associated with each latent variable.
Chapter 15 (“Computational Paninian Grammar Framework,” Akshar Bharati and Rajeev Sangal) describes a supertag implementation in Paninian Grammar. First developed for Sanskrit, computational implementations of Paninian Grammar have found utility for describing a variety of Indian languages (e.g. Bharati et. al., 1995; Narayana, 1994), as well as other languages (Bharati et. al. 1997, for English; Pedersen et. al. 2004, for Arabic). The authors discuss a parser implementation using Computational Paninian Grammar and compare this implementation and its performance with those of LTAG.
Part Four is dedicated to exploring linguistic and psycholinguistic issues related to supertagging.
Chapter 16 (“Lexicalized Syntax and Phonological Merge,” Robert Frank) argues that elementary trees are sufficient for encoding all the syntactic features of a lexical item. The chapter tests this Syntactic Lexicalization Hypothesis (373) in the phonological domain. The author illustrates the effectiveness of elementary trees in describing the total syntactic representations for a variety of linguistic phenomena.
Chapter 17 (“Constraining the Form of Supertags with the Strong Connectivity Hypothesis,” Alessandro Mazzei, Vincenzo Lombardo, and Partick Sturt) provides a model of incremental sentence processing using supertags. The authors introduce a Dynamic Version of TAG (DVTAG) equipped to handle predicted heads. The chapter illustrates an extraction of a DVTAG from the Penn Treebank and the authors discuss the resultant number of predicted tags and the plausibility of an extracted model as a proxy for human language processing.
Part Five describes several applications of supertagging.
Chapter 18 (“Semantic Labeling and Parsing via Tree-Adjoining Grammars,” John Chen) details semantic role labeling (SRL) systems utilizing deep linguistic features and compares performance to a model using surface features only. The target labels of the SRL system are PropBank roles, annotated over the Penn Treebank. Chen extracts LTAGs from the treebank to build supertaggers and an LDA, which uses PropBank information expressed as part of the syntactic constituent label. The chapter presents evaluation indicating that a supertagging approach can improve SRL performance.
Chapter 19 (“Applications of HMM-Based Supertagging,” Karin Harbusch, Jens Bäcker, and Saša Hasan) describes two applications of Hidden Markov Model (HMM)-based supertagging: a dialog system using supertags and a system for disambiguation on small device keyboards. Both applications use a combination of a supertagger with an LDA. The HMM approach to supertagging is motivated by the efficiency of decoding algorithms and by an improvement in performance for both German and English data. The authors present implementation details of the system components and use the applications for evaluation.
EVALUATION The text reviewed here provides a detailed overview of the theory, implementation, and applications of supertagging in a variety of domains within computational linguistics and related disciplines. Each of the five sections provides knowledge critical to the reader’s ability to understand and use supertags. Chapters in Part One describe methods for building LTAG grammars, which are precursors to many supertagging systems, from treebanked data or user-entered specifications. Papers in Part Two provide an outline of how to implement supertag parsers in a variety of formats. Part Three illustrates supertag implementations in a variety of linguistic formalisms to suit the needs of many systems. Finally, Parts Four and Five provide empirical and application-based justification for the supertagging approach. The text remains coherent, despite covering a wide range of topics.
While the text provides a detailed introduction to supertagging and Tree Adjoining Grammars, the editors presuppose some reader knowledge of many linguistic subfields and formalisms. While each paper includes an introduction to the formalism or methods included, the book might not be readily accessible to an audience outside of the computational linguistics community or to those unfamiliar with the intricacies of parsing tasks. However, most chapters provide references to introductory materials for the motivated reader.
In general, the text provides an empirical justification of the efficacy of supertags for a variety of tasks and provides implementation examples that inspire future work in this area. The book would be a valuable resource for linguists interested in computational grammars and parsing, and to machine learning researchers interested in linguistic formalisms and the design of complex syntactic features.
REFERENCES Bangalore, S. (2000). A lightweight dependency analyzer for partial parsing. Journal of Natural Language Engineering: 6(2):113-138.
Bharati, A., Bhatia, M., Chaitanya, V., and Sangal, R. (1996). Paninian Grammar Framework Applied to English. Technical Report TRCS-96-238, CSE, IIT Kanpur.
Bharati, A., Chaitanya, V., and Sangal, R. (1995). Natural Language Processing: A Paninian Perspective. New Delhi: Prentice Hall of India.
Joshi, A. K. (1985). Tree Adjoining Grammars: How Much Context-Sensitivity Is Required to Provide Reasonable Structural Descriptions? Natural Language Parsing: Psychological, Computational and Theoretical Perspectives, pp. 206-250.
Joshi, A.K. and Srinivas, B. (1994). Disambiguation of super parts of speech (supertags): Almost parsing. In Proceedings of the 1994 International Conference on Computational Linguistics (COLING), Kyoto, Japan. pp. 154-160.
Lambek, J. (1958). The mathematics of sentence structure. American Mathematical Monthly, 65:154-170.
Mayurama, H. (1990). Constraint Dependency Grammar. Technical Report #RT0044, IBM, Tokyo, Japan.
Narayana, V.N. (1994). Anusarak: A Device to Overcome the Language Barrier. PhD thesis, Department of CSE, IIT Kanpur, January, 1994.
Pedersen, M. J., Eades, D., Amin, S.K. and Prakash, L. (2004). Parsing Arabic relative clauses: A paninian dependency grammar approach. In: S. Shah and S. Hussain, Proceedings of the Eighth International Multitopic Conference. The Eighth International Multitopic Conference (INMIC 2004), Lahore, Pakistan, (573-578). 24-26 December, 2004. Pollard, C. and Sag, I. (1994). Head-Driven Phrase Structure Grammar. Chicago: University of Chicago Press.
Srinivas, B. (1996). “Almost Parsing” Technique for Language Modeling. In Proceedings of ICSLP96 Conference, Philadelphia, PA, pp. 1173-1176.
Srinivas, B. and Joshi, A.K. (1998). Supertagging: An approach to almost parsing. Computational Linguistics: 22:1-29.
Steedman, M. J. (2000). The Syntactic Process. The MIT Press, Cambridge, M.A.
XTAG-Group. (1998). A Lexicalized Tree Adjoining Grammar for English. Technical Report IRCS 98-18, University of Pennsylvania.
ABOUT THE REVIEWER
ABOUT THE REVIEWER:
William Corvey is a PhD student in the Department of Linguistics and the
Institute of Cognitive Science at the University of Colorado at Boulder.
His main research interests are in discourse processing, computational
applications of Conversation Analysis, and the construction and use of
large-scale lexical resources, particularly VerbNet.