LINGUIST List 27.2005

Mon May 02 2016

Review: Computational Ling; Phonology; Syntax; Text/Corpus Ling: van Zaanen, Heinz, de la Higuera (2015)

Editor for this issue: Sara Couture <saralinguistlist.org>


Date: 20-Feb-2016
From: Andrew Lamont <alamontindiana.edu>
Subject: Grammatical Inference for Computational Linguistics
E-mail this message to a friend

Discuss this message


Book announced at http://linguistlist.org/issues/26/26-4897.html

AUTHOR: Jeffrey N. Heinz
AUTHOR: Colin de la Higuera
AUTHOR: Menno M. van Zaanen
TITLE: Grammatical Inference for Computational Linguistics
SERIES TITLE: Synthesis Lectures on Human Language Technologies
PUBLISHER: Morgan & Claypool Publishers
YEAR: 2015

REVIEWER: Andrew Lamont, Indiana University

Co-reviewed by Andrew Lamont and Lwin Moe

Reviews Editor: Helen Aristar-Dry

SUMMARY

Grammatical inference is the task of learning the structure of a formally defined system. Early on, the authors state that the “purpose of this book is to introduce computational linguists to the major results of this field and to its way of thinking” (2). That is, the book overviews what grammatical inference is and how it is done for an audience without assuming a background in formal language theory. Chapter 1 presents overview of grammar and inference, Chapter 2 describes formal learning, Chapters 3 and 4 discuss empirical learning of regular and non-regular languages respectively, and Chapter 5 summarizes and enumerates open questions.

Chapter 1 presents a definition of grammar as it is used throughout the book and outlines the task of inference. The structure of the chapter parallels and lays out the rest of the book. Inference is broken down into formal and empirical inference. With the discussion of formal inference, formal language theory as it relates to natural language is overviewed. The evaluation of grammatical inference systems is summarized with empirical inference. The chapter ends with a summary and a section that formally defines important concepts ranging from alphabets and strings to the Chomsky Hierarchy.

Having defined grammar in Chapter 1, Chapter 2 addresses what inference looks like. The chapter begins by describing the relationship between the learner, i.e. the system tasked with inferring the grammar, and the oracle, i.e. the source of the grammar, and how much information the learner has access to at any given time. Then the chapter defines fundamental concepts of learning and major results relating to them from the field. The bulk of the chapter describes formal grammar formalisms relevant to natural language - finite-state automata, Hidden Markov models, context-free grammars, and others. The chapter ends by questioning the relation of grammatical inference to machine learning in general before a brief summary.

Chapter 3 covers the empirical learning of regular languages. Regular languages are separated from non-regular languages in the text paralleling the split between phonology and syntax for linguists. The chapter justifies this by characterizing the formal split in terms of the learning bias for the class of regular languages. The chapter defines regular grammar and state-merging, the main algorithm discussed, and grounds these abstract concepts with examples drawing from Pintupi and Kwakwala stress patterns. The application of state-merging for the inference of regular grammars and their relation to linguistic typology is justified as a learning bias. The chapter also covers a learning scenario in which negative evidence is available and learning transducers. The last section before the summary covers learning stochastic languages, a task familiar to most computational linguists as n-gram modeling. However, this section highlights regular patterns such as sibilant harmony in Samala which are not string-local and which therefore require an alternate inference strategy.

Chapter 4 examines the learning of non-regular languages, specifically languages that are modeled by context-free grammars such as much of natural language syntax. The chapter begins by discussing substitutability, the principle underlying the learning algorithms given in the chapter, as exemplified by constituent structure in natural language syntax. Substitutability is broken into expanding and reducing the hypothesis grammar which relates to state-merging and state-splitting from Chapter 3. The chapter then overviews several inference systems concisely, detailing their learning process grouped by whether the systems are expanding or reducing approaches and their assumed input. Then, several approaches to evaluating the output of inference are presented with discussion of the advantages and disadvantages of each approach. Finally before summarizing, there is a discussion of whether context-free grammars are powerful enough to capture natural language syntax and if not the difficulty of learning context-sensitive grammars.

Chapter 5 provides a concise summary of the book, discusses the various ways to approach and understand the problem of learning, and then overviews current open areas of research along with resources for further study.

EVALUATION

Overall, we feel this book provides a strong introduction to grammatical inference and is written well for computational linguists and as such accomplishes exactly what it sets out to do. The book is self-contained, introducing and defining terms and concepts that readers need to be familiar with in order to understand the larger points. The book is approachably short, just over 120 dense pages with references; a determined reader should find it takes at most a weekend to read. As such, it makes a good read cover-to-cover. The book is also organized very clearly with respect to laying out visually various subsections and formalisms. For example, the grammar formalisms in Chapter 2 are all accompanied by example figures and tables giving information on the time complexity of parsing, how powerful they are, and their use and implementation in learning. This organization makes the book a tremendously useful reference tool both for people who have read the text and want to quickly return to major points and for browsers. Further, the text makes extensive references both to source materials and other summaries of the field. We both come from a computational linguistics background with only minimal experience in grammatical inference (n-gram modeling primarily). The book gave a thorough and understandable overview of the field and is a very useful resource to return to in the future.

The presentation of theoretical concepts at times is fast and may be overwhelming to readers who do not have a strong background in formal language theory or its applications, e.g. finite-state automata. However, a slower pace may recast the book as a dry, several hundred page textbook. The presentation of theoretical concepts is always accompanied by a task-based justification in the text so the reader never feels that they are working through formalisms for no reason. For example, in the first chapter before the initial presentation of formal language theory, the authors devote half a page motivating the utility of such a formalism in discussing grammatical inference and justifying to the reader the importance of formally defining the task and its aspects. Another strength in the presentation of abstract material is that it is consistently tied to the task at hand. For example, the discussion of grammar formalisms in Chapter 2 is accompanied by a summary of each formalism’s ability to parse, model, and learn within the text. This keeps the abstract material grounded in the task of learning. While the theoretical material may at times be cumbersome, its presentation is also well motivated and anchored in the concrete task at hand.

Another strength of the book is that it keeps its audience in mind. Just as the concrete task is embedded within the more abstract discussion, applications in linguistics are brought up throughout the book. The presentation of learning regular grammars in Chapter 3 is given with the example of stress patterns in Pintupi and Kwakwala. This grounds the formal material very nicely for computational linguists who are more comfortable with data. The dissimilar stress patterns provide solid, concrete examples for finite-state automata and prefix/suffix trees. The sibilant harmony pattern in Samala likewise provides a concrete application of stochastic strictly k-local language which may otherwise be opaque. Chapter 4 similarly delves into syntactic topics ranging from constituency tests to treebanks and dependency parsing. Throughout the text, references to phonology, machine learning, machine translation, cognitive science, and other potential applications of grammatical inference are made, constantly reasserting the relation to computational linguistics. While regular and non-regular languages are split in Chapters 3 and 4, the link between the approaches given therein is expressed clearly at the outset of Chapter 3. This establishes that researchers primarily interested in syntax (a non-regular system) should nonetheless engage the material on inferring regular systems. All in all, the book reasserts its relation to specific areas of linguistics as well as linguistics as a whole.


ABOUT THE REVIEWER

Andrew Lamont has a masters degree in Linguistics from Eastern Michigan University and is currently finishing a masters degree in Computational Linguistics from Indiana University. His research primarily concerns phonological typology and computational aspects of Optimality Theory.

Lwin Moe has a masters degree in Computer Science from the Asian Institute of Technology in Bangkok and a masters in Computational Linguistics from Indiana University. His research interests include computational issues in Southeast Asian languages.

Page Updated: 02-May-2016