AUTHOR: Kornai, Andras TITLE: Mathematical Linguistics SERIES TITLE: Information and Knowledge Processing PUBLISHER: Springer YEAR: 2007
Alexandre Allauzen, Department of Informatic, University Paris 11 (Orsay), and LIMSICNRS.
SUMMARY As mentioned in the forewords, this book requires a sufficient general mathematical maturity, but no prior knowledge of linguistic or languages. And it's true that the book is rather an introduction to linguistics through a mathematical point of view, than an attempt of mathematical formalization for linguists.
The structure of this book follows the standard introductory courses to linguistics, but the emphasis will sometimes be on some advanced notions. Even for ground knowledge, it might be useful to add some readings to fully understand the subject. The author warns you in the introduction: this book is intended to complement Manning and Schütze (1999) and Jelinek (1997). Thus, to first understand the main concepts in this book, the reader should be familiar with the content of Manning and Schütze (1999).
Chapter 2: ''Elements'' A primary concern of mathematical linguistics is to define useful and relevant elements and their associated sets. Two subtasks can be then addressed: generation and evaluation. The generation approach is a definitional method which aims to enumerate all the elements of a given set. Because techniques used in generation are not always useful, an equally important concern is to solve the membership problem. Therefore a variety of grammars are introduced with emphasis on generating grammar, stringrewriting rules and context sensitivegrammars. This chapter sets up many notions and notations without introducing their interest and their scope. A good strategy for readers would be to manually create their own glossary of notations when reading this chapter if they do not want to return to it periodically to find the meaning of notation found in the further chapters.
Chapter 3: ''Phonology'' First this chapter formalizes the wellknown definition of phonemes as minimum concatenative units of sound. The mathematical tool of choice for dealing with most of segmental phonology is stringrewriting by means of contextsensitive rules. To go beyond segments, the example of final devoicing in Russian leads to the definition of the natural classes: sets of segments that frequently appear together in rules. As the phonemic alphabet is the result of an algorithm, the set of natural classes is also forced by the phonological patterning of languages. By the means of postulate, a set of distinctive features can be defined (like voiced/unvoiced for example). The author can therefore express another definition of natural classes as sets of segments that can be decomposed by fewer features than their individual members. Then suprasegmental theory and autosegments are described using a multitiered representation. A tier is mathematically defined with the corresponding association rules. Then a large part of the chapter is dedicated to computational phonology with Finite State Transducers and Biautomatons.
Chapter 4: ''Morphology'' Morphology is maybe the most vexing part of linguistics from a mathematical point of view because of radical typological differences, flexible boundaries and near truth. These three aspects are developed in the first sections of the chapter: the prosodic hierarchy, word formation, optimality. The radical typological differences can be simply illustrated: no amount of expert knowledge of Modern English is sufficient to describe the case system of Modern Russian. In this part, the author presents a prosodic hierarchy describing syllables, moras, feet, cola and a typology for words and stress.
To describe the word formation process, the author starts with the definition of morpheme, its categorization and the possible ways of their association. Once the clitics is defined and can be stripped away, six criteria are used to define the ''wordhood''. Given a set of morphemes, these criteria delineate a formal language of words. The categorization of morphemes gives rise to a similar categorization of the processes that operate on them. And this yields to a typology of languages (agglutinating, inflecting, ...). Then, the lexeme is defined as a maximal set of paradigmatically related word. The paradigmatic forms can be generated by inflection or derivation. The language of words can be infinite, some variants of the generative apparatus described in the chapter 2 are needed to bear the task of enumerating all words given a finite set of morphemes.
Finally an aspect of characterizing a set of words is to describe their frequency in corpora, or more interestingly in the populations of which the corpora are samples. In this chapter, the basic Zipf's law is introduced.
Chapter 5: ''Syntax'' The theory of syntax addresses three strongly interconnected ranges of facts: the combinatorical possibilities of words, the internal structure of sentences, the fit between what is being said and what is seen in the world. Much of modern syntax is an attempt to do away with separate mechanisms in accounting for these three ranges of facts.
The first part is about combinatorical approaches that put the emphasis on the way words combine. Reducing the vocabulary complexity can be carried out by defining classes of words by distributional equivalence. And given a reduced set of word classes, categorial grammars are introduced as a mean of characterizing the set of wellformed expressions. Categorial grammar offers a flexible and intuitive method for sorting out complex syntactic situations. Furthermore, the syntax is tied to semantics in a desired manner. Then the phrase structure and the Context Free Grammar are introduced and discussed. The second part is about the grammatical approaches which emphasizes the notion of grammatical primitives: dependency, linking, and valency and in the third part the semanticsdriven theories of syntax are discussed, in particular the issues of frame semantics and knowledge representation.
Then a large part of the chapter is dedicated to weighted models of syntax, which extend the reach of the theory to another range of facts: the weight a given string of words has. A weighted language is defined as a mapping from strings to real values. The overall question is how a weighted language successfully approximates another, and this introduces the mathematical theory of density. A special case of interest is when the density is zero. Furthermore the authors describes the general notion of weighted rules and their interpretation in sociolinguistics. Afterward, weighted regular languages are discussed with the main devices to generate them, weighted finite state automaton/transducers and hidden markov models. The chapter ends with a discussion of external evidence.
Chapter 6: ''Semantics'' The basic issue is how a mathematical logic can help to characterize the relationship between grammatical expressions and their meaning. At least, an appropriate theory must characterize the set of expressions and the set of meanings. For the first goal, an answer can be found in the previous chapters with a generation approach based on phonology, morphology and syntax, taken together. To characterize the set of meanings, specific requirements and criteria can be derived by inspecting the Liar, opacity and the Berry paradox. In modeltheoretic semantics, the meaning representations are formulas, and they can be seen as just technical devices used for disambiguating expressions with multiple meanings.
The standard formal theory meets the essential criteria. This theory starts with the description of Montague grammar. Here the author provides only a survey of the key techniques and ideas, and highlights some of the inadequacies of Montague grammar such as the use of an English like sublanguage. Then the author tackles the development of formal theory based on Montague grammar. The main concern is to replace Tarskistyle induction on subformulas by a strict lefttoright method of building the semantic analysis.
Chapter 7: ''Complexity'' The evaluation of grammars is not obvious, typically we can assess if the discrepancies between the observables and the prediction of a model consists in under or overgeneration. Therefore, it could be necessary to provide a measure of simplicity to select the most ''simple'' hypothesis given some facts. After a brief reminder on information theory, the author introduces the theory of simplicity leaning on the basic idea of the Kolmogoroff complexity. The chapter ends with the more general problem of inductive learning from the perspective of complexity, introducing notions such as the minimum description length.
Chapter 8 and 9: ''Linguistic pattern recognition'' and ''Speech and handwriting'' Chapter 8 summarizes the area of linguistic pattern recognition, and two wellknown applications are described in chapter 9: speech and handwriting. In general, pattern recognition can be defined as a task where an infinite and continuous set of inputs must be associated with a finite set of outputs. When this set of outputs is linguistically structured, we talk about linguistic pattern recognition. Despite of this distinction, chapter 8 addresses the main issues of pattern recognition such as quantization, document classification and Markov processes.
Chapter 9 is dedicated to automatic speech recognition (ASR) and handwriting recognition (OCR). To start with ASR, the author gathers the main mathematical tools of lowlevel signal processing: from the Ztransform to the Discrete Fourier Transform. This aims to describe how to encode a speech signal for a relevant representation of information. Then, the use of phonemes as hidden units is described and discussed with a central question: how the discrete and ''meaningless'' units used in phonology can be recognized from continuous acoustic signals and what are the related issues ?
In the second part dedicated to handwriting recognition, the input of the problem turns from audio signals to discrete image. Thus, other mathematical tools have to be proposed to deal with for example size normalization (rescaling), stroke width or font weights. These aspects are addressed by mathematical morphology and more precisely by the operations of reflection, translation, dilatation, erosion, opening, closing and boundary. These operations aim to obtain the skeleton of an image and the vertex chain coding.
EVALUATION In my opinion, this is a very interesting book which provides an introduction to linguistics through a mathematical point of view. This book is maybe more intended for mathematicians than linguists. One of its goals is to bridge the gap between a mathematical formalization and the philosophical or pragmatic expectations of linguistic theories.
Chapters 4 to 6 are for me of a great interest. The proposed descriptions of morphology, syntax and semantics are really enriching. Moreover, the concise work of bibliography completes the book and gives to the reader a broad overview of these domains. The large part dedicated to the weighted model of syntax illustrates the purpose of this book. I only encourage the reader to first learn the basic notions of the formal language theory. As for all the subjects addressed by the author, this is not a cookbook. More basic and pedagogical readings are necessary if the reader wants to understand or to build some NLP systems. Here, the purpose is more about the mathematical foundation of such systems and tries to describe their limits. In these three chapters, the author really complements the wellknown books of computational linguistic, such as Manning and Schütze (1999) and Jelinek (1997).
As previously mentioned, this book requires a sufficient general mathematical maturity. However, I doubt reading this book needs no prior knowledge of linguistics or languages is needed. This book is not a ''standalone'' one. For example, let's consider the chapter 3 on phonology. While the generation process of the set of phonemes is clearly described and highlights the necessity of a mathematical formalization, the rest of the chapter is more obscure to me. A previous reading of the work of Goldsmith on the autosegmental theory is really helpful to understand the motivations and goals of such theory. Moreover, the rest of the chapter consists in a long mathematical development of computational morphology. Whereas this development is quite understandable, motivations and goals are not obvious. At this point, some readers may wonder: what is the contribution in terms of knowledge or in terms of a deeper understanding of phonology? Thus, this chapter is not easy to read, but what is more important, is a bit disappointing.
This can be partially explained by what is for me a lack of this book: a lot of mathematical notations are required and there is no index or glossary associated. The reader is therefore quickly lost among the several and often briefly introduced mathematical notations. A lot of time could be spent to understand some important distinction between notations among chapters. This lack does not help the rich purpose of this book.
REFERENCES Manning, Chris and Hinrich Schütze. 1999. _Foundations of Statistical Natural Language Processing_. Cambridge, MA: MIT Press.
Jelinek, F. 1997. _Statistical Methods for Speech Recognition_. Cambridge, MA: MIT Press.
ABOUT THE REVIEWER A. Allauzen is associate professor in the informatic department of the University Paris 11 (Orsay), and at the LIMSICNRS in the Spoken Language Processing group. Current research activities concern automatic speech recognition and statistical machine translation.
