Marcu, Daniel (2000). The Theory and Practice of Discourse Parsing and Summarization. Cambridge, MA: MIT Press. xix, 248 pp.
David Parkinson, Natural Language Group, Microsoft Corporation.
Marcu's book (henceforth TPDPS) presents the theoretical background to and practical results of his investigation of the automatic derivation of discourse structure in natural language texts. Working primarily within the theoretical framework provided by Rhetorical Structure Theory (RST; Mann & Thompson 1988), and using as empirical test-bed primarily the task of text summarization, M develops a formal theory of text structure and proposes two distinct implementations. This work will be of interest and use both to linguists concerned with text structure and discourse organization, as well as to computational linguists or computer scientists familiar with other problem domains in natural language processing.
The structure of TPDPS is as follows:
Section I concerns the theoretical background necessary for the presentation of M's experiments in parsing and summarization.
Chapter 1 introduces the goals and outline of the book; M provides an RST-style roadmap of the rhetorical structure of the book.
Chapter 2 presents some general background common to many theories of discourse structure, and gives a very brief overview of RST as M's preferred theory of discourse structure. M argues that RST is insufficient as a computational theory, since it lacks strict well-formedness criteria for its data structures and because it lacks a provably complete algorithm for deriving all possible rhetorical structure trees for a given discourse (or discourse fragment). To begin to alleviate these shortcomings, M proposes two flavors of well-formedness in the form of constraints on the relation between spans and their sub-spans. Once the data structures (discourse representations in the form of RST trees) are formally specified, the algorithms follow, as is standard practice.
Chapter 3 develops the formal and combinatorial properties of the data structures that M will implement, proceeding from the axiomatization of valid text structures (a mathematical description of well-formedness), and on to the proof-theoretic account of the derivation of valid text structures. This material is potentially slow going for the less computationally inclined reader, but M does a very good job of making it clear and relevant to the problems discussed previously.
Chapter 4 takes the results of the previous chapter and briefly discusses two approaches to implementing these results in a computational system aimed at producing complete sets of well-formed RST data structures for a given input.
Chapter 5 summarizes the results of the section, and raises some very interesting questions about the sets of assumptions underlying the material presented in Chapters 4 and 5. With respect to the general issues surrounding the definition of well-formed rhetorical structures, M signals open issues about other types of information that might well be taken into account in competing implementations. And with respect to the problem of implementing an efficient algorithm for constructing the complete set (or some optimally useful subset) of rhetorical structures, M discusses other search methods that might be employed while parsing discourse structure, in order to constrain the vast numbers of structures that might otherwise be produced in an unconstrained search of the parse space. Although a deep investigation of the computationally optimal and psycholinguistically most plausible algorithms for discourse parsing lie outside the scope of M's stated intentions, this is an interesting and provocative chapter, in spite of its brevity.
Section II presents the two approaches taken by M to parsing discourse-structure, with discussion devoted to contrasting these two approaches.
Chapter 6 presents a cue-phrase-based rhetorical parser, which uses the appearance of relevant discourse-functional markers in text to indicate both the boundaries of discourse units. Because this technique depends on the manual extraction of relevant markers and the way that they are used to delimit spans of text related by signaled RST relations, M first presents the results of a corpus evaluation he performed to determine the characteristics of more than 450 discourse markers. These were analyzed in 2100 text fragments from the Brown corpus, and collapsed into 54 RST or RST-like relations. M first presents the precision and recall results of the algorithms for identifying discourse markers and clauselike units, and then moves on to the main results concerning the overall accuracy of this method of segmenting discourse and hypothesizing the discourse relations holding between its component parts.
Chapter 7 presents a contrasting approach to parsing discourse structure, using machine-learning approaches to deduce rhetorical relations from a training corpus of hand- tagged data. Again. The problem of hypothesizing complete parses is broken into a segmentation problem and a labeling problem. In the segmentation phase, the learning algorithm is sensitized to features such as POS tags within a window 5 tokens wide and punctuation marks. In the labeling phase, where the aim is to produce well-formed RST trees whose nodes correctly represent the span, hierarchy, and discourse relation of each subtree, the learning algorithm is sensitized to a variety of features, including some that are more semantically sophisticated (e.g., Wordnet-based similarity of hypothesized spans).
Chapter 8 provides an overview of previous empirical research on discourse parsing, and again concludes with a brief but useful discussion of some open issues, especially additional information that could be used to inform and improve discourse parsing in future.
Section III is dedicated to the application of the computational approaches developed in Section II to a real- world problem: the summarization of text by extraction of the most relevant units.
Chapter 9 presents the results of an experiment in which M contrasts the results obtained for extract-directed summarization by (i) human judges asked to assign importance scores to text units of varying degrees of importance; (ii) human analysts who hand-parsed the texts according to RST rhetorical relations; (iii) the cue-phrase-based rhetorical parser discussed in Chapter 6. These results are contrasted against three baselines: (iv) the Microsoft Office 97 summarizer; (v) selection of the first N important units in the text; and (vi) random selection of N units in the text (where N equals the number of units that human judges chose as important). Overall, the results indicate that the cue- phrase-based rhetorical parser, despite its weaknesses in recognition of the full set of rhetorical relations in a given text, comes close to human performance when compared against the results obtained by the human analysts.
In Chapter 10, M takes the results obtained in the summarization task and shows how performance can be boosted by taking into account a variety of heuristic measures aimed at driving down ambiguity of parses produced. The approach is a sensible one: when more than one well-formed rhetorical structure is produced for some text, an efficient system in one which is able to successfully choose between competing parses and assign a higher likelihood to some subset. Among the metrics that M employs are: the presence of explicit discourse markers, rightward skew to trees produced, and lexical similarity to the title. Again, some of these metrics may be particularly relevant to the summarization task; still others might be found to be useful. But M is more concerned with laying out the general approach than conclusively determining the "correct" set of heuristics, which seems like the right approach to take in a book such as this. The chapter concludes with an algorithm designed to find the optimal weighting of the seven heuristics used by M, and discussion of the improvements obtained over the untuned rhetorical parser.
Chapter 11 summarizes the results obtained in Chapters 9 and 10, and concludes with some future directions, as well as issuing some promissory notes for the usefulness of rhetorical parsing in other problem domains, such as natural language generation, machine translation, and information retrieval.
The brevity and general succinctness of TPDPS is a bit of a two-edged sword: M does an admirable job of presenting the linguistic background and theoretical assumptions of RST from a very high level perspective, and of developing the formal properties of the data structures and algorithms he uses. Still, the reviewer feels that the former may not supply quite enough information to convince the more computationally-oriented reader that RST is the best theoretical foundation; and the interest of the more linguistically-oriented reader may flag somewhat during the chapters devoted to formal proof of the soundness of the computational mechanisms employed. Still, these are minor quibbles, and Marcu is a kind enough author to understand and accommodate the varying needs of the audience he hopes to attract. Above all, he provides pointers throughout the book to open issues, alternatives, possibilities left uninvestigated, and future directions -- which is only fair in a field (rhetorical parsing) in its infancy. This is overall a very useful and highly readable introduction to a synthesis of theoretical and computational approaches to discourse structure, suitable for use from anyone from undergraduate through researcher.
References:
Mann, William C. & Sandra A. Thompson. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3): 243-281.
David Parkinson is a computational syntactician in the Natural Language Group at the Microsoft Corporation.
|