AUTHOR: Kornai, Andras
TITLE: Mathematical Linguistics
SERIES TITLE: Information and Knowledge Processing
Alexandre Allauzen, Department of Informatic, University Paris 11 (Orsay), and
As mentioned in the forewords, this book requires a sufficient general
mathematical maturity, but no prior knowledge of linguistic or languages. And
it's true that the book is rather an introduction to linguistics through a
mathematical point of view, than an attempt of mathematical formalization for
The structure of this book follows the standard introductory courses to
linguistics, but the emphasis will sometimes be on some advanced notions. Even
for ground knowledge, it might be useful to add some readings to fully
understand the subject. The author warns you in the introduction: this book is
intended to complement Manning and Schütze (1999) and Jelinek (1997). Thus, to
first understand the main concepts in this book, the reader should be familiar
with the content of Manning and Schütze (1999).
Chapter 2: ''Elements''
A primary concern of mathematical linguistics is to define useful and relevant
elements and their associated sets. Two sub-tasks can be then addressed:
generation and evaluation. The generation approach is a definitional method
which aims to enumerate all the elements of a given set. Because techniques used
in generation are not always useful, an equally important concern is to solve
the membership problem. Therefore a variety of grammars are introduced with
emphasis on generating grammar, string-rewriting rules and context
sensitive-grammars. This chapter sets up many notions and notations without
introducing their interest and their scope. A good strategy for readers would be
to manually create their own glossary of notations when reading this chapter if
they do not want to return to it periodically to find the meaning of notation
found in the further chapters.
Chapter 3: ''Phonology''
First this chapter formalizes the well-known definition of phonemes as minimum
concatenative units of sound. The mathematical tool of choice for dealing with
most of segmental phonology is string-rewriting by means of context-sensitive
rules. To go beyond segments, the example of final devoicing in Russian leads to
the definition of the natural classes: sets of segments that frequently appear
together in rules. As the phonemic alphabet is the result of an algorithm, the
set of natural classes is also forced by the phonological patterning of
languages. By the means of postulate, a set of distinctive features can be
defined (like voiced/unvoiced for example). The author can therefore express
another definition of natural classes as sets of segments that can be decomposed
by fewer features than their individual members. Then supra-segmental theory and
auto-segments are described using a multitiered representation. A tier is
mathematically defined with the corresponding association rules. Then a large
part of the chapter is dedicated to computational phonology with Finite State
Transducers and Biautomatons.
Chapter 4: ''Morphology''
Morphology is maybe the most vexing part of linguistics from a mathematical
point of view because of radical typological differences, flexible boundaries
and near truth. These three aspects are developed in the first sections of the
chapter: the prosodic hierarchy, word formation, optimality. The radical
typological differences can be simply illustrated: no amount of expert knowledge
of Modern English is sufficient to describe the case system of Modern Russian.
In this part, the author presents a prosodic hierarchy describing syllables,
moras, feet, cola and a typology for words and stress.
To describe the word formation process, the author starts with the definition of
morpheme, its categorization and the possible ways of their association. Once
the clitics is defined and can be stripped away, six criteria are used to define
the ''wordhood''. Given a set of morphemes, these criteria delineate a formal
language of words. The categorization of morphemes gives rise to a similar
categorization of the processes that operate on them. And this yields to a
typology of languages (agglutinating, inflecting, ...). Then, the lexeme is
defined as a maximal set of paradigmatically related word. The paradigmatic
forms can be generated by inflection or derivation. The language of words can be
infinite, some variants of the generative apparatus described in the chapter 2
are needed to bear the task of enumerating all words given a finite set of
Finally an aspect of characterizing a set of words is to describe their
frequency in corpora, or more interestingly in the populations of which the
corpora are samples. In this chapter, the basic Zipf's law is introduced.
Chapter 5: ''Syntax''
The theory of syntax addresses three strongly interconnected ranges of facts:
the combinatorical possibilities of words, the internal structure of sentences,
the fit between what is being said and what is seen in the world. Much of modern
syntax is an attempt to do away with separate mechanisms in accounting for these
three ranges of facts.
The first part is about combinatorical approaches that put the emphasis on the
way words combine. Reducing the vocabulary complexity can be carried out by
defining classes of words by distributional equivalence. And given a reduced set
of word classes, categorial grammars are introduced as a mean of characterizing
the set of well-formed expressions. Categorial grammar offers a flexible and
intuitive method for sorting out complex syntactic situations. Furthermore, the
syntax is tied to semantics in a desired manner. Then the phrase structure and
the Context Free Grammar are introduced and discussed. The second part is about
the grammatical approaches which emphasizes the notion of grammatical
primitives: dependency, linking, and valency and in the third part the
semantics-driven theories of syntax are discussed, in particular the issues of
frame semantics and knowledge representation.
Then a large part of the chapter is dedicated to weighted models of syntax,
which extend the reach of the theory to another range of facts: the weight a
given string of words has. A weighted language is defined as a mapping from
strings to real values. The overall question is how a weighted language
successfully approximates another, and this introduces the mathematical theory
of density. A special case of interest is when the density is zero. Furthermore
the authors describes the general notion of weighted rules and their
interpretation in sociolinguistics. Afterward, weighted regular languages are
discussed with the main devices to generate them, weighted finite state
automaton/transducers and hidden markov models. The chapter ends with a
discussion of external evidence.
Chapter 6: ''Semantics''
The basic issue is how a mathematical logic can help to characterize the
relationship between grammatical expressions and their meaning. At least, an
appropriate theory must characterize the set of expressions and the set of
meanings. For the first goal, an answer can be found in the previous chapters
with a generation approach based on phonology, morphology and syntax, taken
together. To characterize the set of meanings, specific requirements and
criteria can be derived by inspecting the Liar, opacity and the Berry paradox.
In model-theoretic semantics, the meaning representations are formulas, and they
can be seen as just technical devices used for disambiguating expressions with
The standard formal theory meets the essential criteria. This theory starts with
the description of Montague grammar. Here the author provides only a survey of
the key techniques and ideas, and highlights some of the inadequacies of
Montague grammar such as the use of an English like sub-language. Then the
author tackles the development of formal theory based on Montague grammar. The
main concern is to replace Tarski-style induction on sub-formulas by a strict
left-to-right method of building the semantic analysis.
Chapter 7: ''Complexity''
The evaluation of grammars is not obvious, typically we can assess if the
discrepancies between the observables and the prediction of a model consists in
under or over-generation. Therefore, it could be necessary to provide a measure
of simplicity to select the most ''simple'' hypothesis given some facts. After a
brief reminder on information theory, the author introduces the theory of
simplicity leaning on the basic idea of the Kolmogoroff complexity. The chapter
ends with the more general problem of inductive learning from the perspective of
complexity, introducing notions such as the minimum description length.
Chapter 8 and 9: ''Linguistic pattern recognition'' and ''Speech and handwriting''
Chapter 8 summarizes the area of linguistic pattern recognition, and two
well-known applications are described in chapter 9: speech and handwriting. In
general, pattern recognition can be defined as a task where an infinite and
continuous set of inputs must be associated with a finite set of outputs. When
this set of outputs is linguistically structured, we talk about linguistic
pattern recognition. Despite of this distinction, chapter 8 addresses the main
issues of pattern recognition such as quantization, document classification and
Chapter 9 is dedicated to automatic speech recognition (ASR) and handwriting
recognition (OCR). To start with ASR, the author gathers the main mathematical
tools of low-level signal processing: from the Z-transform to the Discrete
Fourier Transform. This aims to describe how to encode a speech signal for a
relevant representation of information. Then, the use of phonemes as hidden
units is described and discussed with a central question: how the discrete and
''meaningless'' units used in phonology can be recognized from continuous acoustic
signals and what are the related issues ?
In the second part dedicated to handwriting recognition, the input of the
problem turns from audio signals to discrete image. Thus, other mathematical
tools have to be proposed to deal with for example size normalization
(rescaling), stroke width or font weights. These aspects are addressed by
mathematical morphology and more precisely by the operations of reflection,
translation, dilatation, erosion, opening, closing and boundary. These
operations aim to obtain the skeleton of an image and the vertex chain coding.
In my opinion, this is a very interesting book which provides an introduction to
linguistics through a mathematical point of view. This book is maybe more
intended for mathematicians than linguists. One of its goals is to bridge the
gap between a mathematical formalization and the philosophical or pragmatic
expectations of linguistic theories.
Chapters 4 to 6 are for me of a great interest. The proposed descriptions of
morphology, syntax and semantics are really enriching. Moreover, the concise
work of bibliography completes the book and gives to the reader a broad overview
of these domains. The large part dedicated to the weighted model of syntax
illustrates the purpose of this book. I only encourage the reader to first learn
the basic notions of the formal language theory. As for all the subjects
addressed by the author, this is not a cookbook. More basic and pedagogical
readings are necessary if the reader wants to understand or to build some NLP
systems. Here, the purpose is more about the mathematical foundation of such
systems and tries to describe their limits. In these three chapters, the author
really complements the well-known books of computational linguistic, such as
Manning and Schütze (1999) and Jelinek (1997).
As previously mentioned, this book requires a sufficient general mathematical
maturity. However, I doubt reading this book needs no prior knowledge of
linguistics or languages is needed. This book is not a ''stand-alone'' one. For
example, let's consider the chapter 3 on phonology. While the generation process
of the set of phonemes is clearly described and highlights the necessity of a
mathematical formalization, the rest of the chapter is more obscure to me. A
previous reading of the work of Goldsmith on the autosegmental theory is really
helpful to understand the motivations and goals of such theory. Moreover, the
rest of the chapter consists in a long mathematical development of computational
morphology. Whereas this development is quite understandable, motivations and
goals are not obvious. At this point, some readers may wonder: what is the
contribution in terms of knowledge or in terms of a deeper understanding of
phonology? Thus, this chapter is not easy to read, but what is more important,
is a bit disappointing.
This can be partially explained by what is for me a lack of this book: a lot of
mathematical notations are required and there is no index or glossary
associated. The reader is therefore quickly lost among the several and often
briefly introduced mathematical notations. A lot of time could be spent to
understand some important distinction between notations among chapters. This
lack does not help the rich purpose of this book.
Manning, Chris and Hinrich Schütze. 1999. _Foundations of Statistical Natural
Language Processing_. Cambridge, MA: MIT Press.
Jelinek, F. 1997. _Statistical Methods for Speech Recognition_. Cambridge, MA:
ABOUT THE REVIEWER
A. Allauzen is associate professor in the informatic department of the
University Paris 11 (Orsay), and at the LIMSI-CNRS in the Spoken Language
Processing group. Current research activities concern automatic speech
recognition and statistical machine translation.