Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more



Donate Now | Visit the Fund Drive Homepage

Amount Raised:

$34413

Still Needed:

$40587

Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington


Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

What is English? And Why Should We Care?

By: Tim William Machan

To find some answers Tim Machan explores the language's present and past, and looks ahead to its futures among the one and a half billion people who speak it. His search is fascinating and important, for definitions of English have influenced education and law in many countries and helped shape the identities of those who live in them.


New from Cambridge University Press!

ad

Medical Writing in Early Modern English

Edited by Irma Taavitsainen and Paivi Pahta

This volume provides a new perspective on the evolution of the special language of medicine, based on the electronic corpus of Early Modern English Medical Texts, containing over two million words of medical writing from 1500 to 1700.


Email this page
E-mail this page

Review of  Probabilistic Linguistics


Reviewer: Azra N. Ali
Book Title: Probabilistic Linguistics
Book Author: Rens Bod Jennifer B. Hay Stefanie Jannedy
Publisher: MIT Press
Linguistic Field(s): General Linguistics
Book Announcement: 14.2722

Discuss this Review
Help on Posting
Review:
Date: Wed, 8 Oct 2003 15:13:55 +0100
From: Azra Nahid Ali <a.n.ali@hud.ac.uk>
Subject: Probabilistic Linguistics

Bod, Rens, Jennifer Hay and Stefanie Jannedy (2003) Probabilistic
Linguistics, MIT Press, A Bradford book.

Azra N. Ali, School of Computing and Engineering, University of
Huddersfield, England.

OVERVIEW
The study of language has very much been categorical, however we are
now in an era where we cannot ignore that language shows probabilistic
properties and the book clearly deals with this. The book is all about
probabilistic linguistics and each chapter covers probabilistic
modeling from a different theoretical linguistic view; from
sociolinguistics to phonology. The book begins with an introductory
chapter about probabilistic linguistics and probability theory before
delving deep into probabilistic linguistics. Each theoretical chapter
is covered by a specialist in the field. The book also has a glossary
which is well documented and if you didn't know what 'hypothesis'
meant, well you do now.

Chapter 1: Introduction (by Rens Bod, Jennifer Hay, and Stefanie
Jannedy) The chapter provides an overview of probabilistic linguistics
and how probability plays a role in linguistics, showing with examples
that not all linguistic is categorical, in fact it is gradient and
shows probability properties. You can quickly grasp how frequency and
probability fits together in linguistic theory/approaches before even
reading any further.

Chapter 2: Elementary Probability Theory (by Rens Bod) This is an
introductory chapter on probability theory. The chapter starts off
with simple linguistic examples (general probability calculations,
joint and conditional probability) that all linguistic readers should
be able to understand, before moving on to more complex examples -
probabilistic grammars (probabilistic context free grammars and data-
oriented parsing models) but still within ease of understanding.

Chapter 3 - Probabilistic Modeling in Psycholinguistics (by Dan
Jurafsky) While we may fail to see probabilistic properties in
linguistics, Dan clearly highlights where they can be found and at the
same time provides a good literature support.

Jurafsky introduces the chapter by talking about frequency, showing
that the cognitive processing time is considerably short for high
frequency words than for low frequency words. He explains that high
frequency words have a shorter duration time and often the final coda
of a word is unstable, where deletion of /d/ and /t/ are apparent. He
then moves on to neighbouring words in a sentence where the probability
is an important aspect in speech comprehension and production, followed
by a different form of frequencies - 'Syntactic subcategorization
Frequencies' of verbs. In this section, conditional probability is
discussed at some length.

The latter half of the chapter discusses 'Probabilistic Architectures
and Model'. The section details different types of probabilistic
models for sentence processing, for example, constraint-based models,
competition model, Markov models, stochastic context-free grammars, and
Bayesian belief networks. Each model is described in detail with
examples and weaknesses of the models also highlighted.

Chapter 4 - Probabilistic Sociolinguistics (by Norma Mendoza- Denton,
Jennifer Hay, and Stefanie Jannedy) The chapter provides a good
introduction to sociolinguistics variation and points out how existing
statistical techniques are poor and not suitable for analysing
sociolinguistics data. Traditional statistical methods cannot be used
by the sociolinguistics researcher because statistical techniques like
Analysis of Variance (ANOVA) require controlled data for their use.
The chapter discusses the need for more advanced multivariate
probabilistic methods and shows how such techniques can be used to
analyse sociolinguistics variation data.

The probabilistic techniques that are discussed are related to one
particular language variation case - the monophthongization of /ay/
which is apparent in African- American speakers in the southern states
of U.S. Data are analysed, first by using the traditional frequency
approaches then moving on to the VARBRUL program and Classification and
Regression Tress (CART).

VARBRUL program is a form of logistic regression model and the author
details the framework of VARBRUL and discusses how this program
compares with the commercial applications like SPSS and SAS. Latter
half of the chapter illustrates how VARBRUL program can be used to
collect and analyse data - monophthongization of /ay/ in Oprah
Winfrey's speech. Oprah Winfrey is an African-American talk-show host
and the program is used to analyse the considerable style shifting that
is apparent in her speech. In the final section, CART approach is used
to investigate patterns in the data.

Chapter 5 - Probability in Language Change (by Kie Zuraw) The chapter
looks at the role that probability plays to address the issue of
language change. Language changes over time and this is apparent in
the changes of observed probabilities over time. Zuraw shows that by
applying probabilistic approaches to language change, it enables one to
underpin the factors that cause a language to change.

Chapter 6 - Probabilistic Phonology: Discrimination and Robustness (by
Janet B. Pierrehumbert) Pierrehumbert discusses a number of studies and
supports with evidence to show that probability can be found at all
levels of representation, first illustrated through "probability
distribution over the phonetic space" (p.182). What is more important
and is the focus of the chapter is that speech perception, production
and well-formedness is affected by frequency and is both gradient and
predictable. Infants acquire words first before phonemes and phonemes
are gradually built. In adults, well-formedness judgments for novel
words are affected by frequency, lexical neighbours and phonotactics of
existing words. Finally, Pierrehumbert highlights the fact that
phonetic learning requires continuous updating of probability
distribution.

Chapter 7 - Probabilistic Approaches to Morphology (by R. Harald
Baayen) Baayen's opening pages of his chapter should have actually been
at the beginning of the book, as he encapsulates so nicely how
probabilistic linguistics has come about. This has been due to the
development and the ease of availability of statistical software that
can analyse large amounts of data at a fraction of the time compared to
manually processing. At the same time, technology has enabled to
collect and store large amounts of data, for instance British National
Corpus (BNC), a corpus consisting of 100 million words. With these two
technologies at ones disposal, it is not surprising that we can now see
probabilistic properties in linguistics.

The chapter concentrates on morphological productivity, why people use
certain types of affixes in English and Dutch more than others. Baayen
shows that frequency approaches to measure productivity is not an
appropriate method, as it does not tell you the degree to which certain
affixes are productive. This is illustrated by some simple English
morphological examples -th and -ness using subcorpus of the British
National Corpus. Baayen therefore deals with probabilistic approaches
to determine the factors that aids to the degree of productivity. The
final section of the chapter discusses morphological segmentation
problem, illustrated by computational models using Matcheck program.

Chapter 8 - Probabilistic Syntax (by Christopher D. Manning) Manning
highlights that little attention has been devoted to the area of
probabilistic syntax. He therefore examines 'probabilistic models for
explaining language structure' (page 291) because there are a number of
phenomena in syntax where categorical approaches are not adequate for
their explanations. In fact he emphasizes that probabilistic models
should be used in addition to the categorical approaches to obtain a
full understanding of the language structure.

Manning shows throughout his chapter that probabilities can also be
found in syntax, contrary to the statements made by Chomsky and others.
This is demonstrated quickly to the reader, by showing that the
ungrammatical structure 'as least as' (first noted by Manning in
Rosso's book 2001) does not appear to be a typo error or speech error
as first thought. By searching through corpus linguistics, several
instances of these ungrammatical structures were found in the New York
Times newswire and more instances when searched on the web. The
remainder of the first part of chapter looks at verbal clausal
subcategorization frames to which probabilistic syntax models are
applied. In the final section, it gives an overview on Optimality
Theory and Analysis, followed by Stochastic Optimality Theory,
loglinear models and generalized linear models.

CHAPTER 9 - Probabilistic Approaches to Semantics (By Ariel Cohen) The
final chapter discusses probabilistic techniques in semantics. Cohen
opens the chapter by discussing 'probability', what do the figures
actually mean and tells us when it comes to semantics. The chapter
addresses this issues to generic and frequency adverbs using ratio
theories to show their extensibility.

EVALUATION
The uniqueness of this book is that it starts off with an introductory
chapter on probabilistic linguistics and probability theory before
delving deep into probabilistic linguistics. Credit must be given to
the authors for introducing a single book that covers probabilistic
properties that can be found in all areas of linguistics (phonology,
syntax, sociolinguistics, etc.) and showing how traditional statistical
techniques may no longer be appropriate to deal with complex data
analysis work. My only concern is that, although the book is supposed
to be an introductory book on probabilistic linguistic it is far from
that. Some of the chapters in the book contain complex probabilistic
mathematical work which may overwhelm a linguistic student with limited
mathematical experience.

Although I have not been able to provide a detailed account for the
chapters that are not my main focus of research, it has nevertheless
been interesting to read these chapters and to know how probabilistic
techniques can be applied to other fields of linguistic too. I would
therefore advise a linguistic reader to read the book selectively,
start by reading the introduction chapter and probability theory
chapter, which is a must if they have limited background in
probabilistic mathematics, followed by reading the chapters of interest
to their field of research.
 
ABOUT THE REVIEWER:
ABOUT THE REVIEWER Azra Ali is a PhD student in the ARTFORM (Centre for Artificial Intelligence and Formal Methods) research group in the School of Computing and Engineering at the University of Huddersfield, England. Her research area is audiovisual speech errors, phonology, and she is currently expanding her knowledge in the area of probabilistic linguistics.

Amazon Store: