Date: Tue, 21 Jan 2003 10:32:08 -0500
From: ZhongDong Zhang
Subject: Jackson & Moulinier (2002) NLP for Online Applications
Jackson, Peter and Isabelle Moulinier (2002) Natural Language Processing
for Online Applications: Text Retrieval, Extraction and Categorization.
John Benjamins Publishing Company, x+226pp, paperback ISBN 1-58811-250-0,
$29.95, Natural Language Processing series.
Zhongdong Zhang, Novator Systems Ltd., Toronto
The growth of online applications and the World Wide Web has caused
intense interest in Natural Language Processing. More and more Natural
Language Processing techniques have been applied in commercial systems.
The book provides a theoretical and practical introduction to several
Natural Language Processing related technologies: document retrieval,
information extraction, text categorization, named entity extraction,
text summarization, and topic detection. It gives a clear introduction
and explanation to various approaches to the selected techniques.
General principles and best practices as well as in-depth discussions
are given based both on current research results and on the authors' own
experience with these technologies. Every chapter ends with an
evaluation of the techniques discussed. The authors succeed in providing
a good and concise reference book to technology practitioners in the
Internet space. The explanations to most techniques are clear, and based
on these explanations, readers can implement these techniques directly.
Furthermore, the book provides a comprehensive bibliography to the
techniques it covers.
Throughout the book readers will find two things very useful: sidebars
and pointers. Sidebars provide a clear explanation or demonstration of
techniques being discussed and thus allow readers to be able to get an
easy understanding. Pointers provide supplementary bibliographical
Unlike some well known books on Natural Language Processing techniques
(e.g., Allen 1995, Manning & Schuetze 1999, Cole et al 1997) which deal
with core theories, approaches and techniques as well as general
applications of Natural Language Processing, this book focuses on
several selected technologies which are identified by the authors as
main tasks and super-tasks of language processing applications on the
Web (page 8). The book doesn't pay much attention to the relationship of
NLP and these tasks, as discussed in Allan 2000 and Voorhees 1999;
rather, it focuses mainly on the technical aspects of the selected
As the authors emphasize in abstract, the book is neither a vendor guide
nor a recipe for building applications, although it does deal with
general principles and practical issues of building applications with
the selected techniques. Issues like architecture, design and
implementation of Natural Language Processing techniques embedded,
robust, efficient, and scalable systems (e.g., Kowalski 1997, Basili et
al 1999) are not discussed explicitly in this book. Some general
discussions on these issues are available and some software and toolkits
are mentioned throughout the book.
Chapter 1. Natural Language Processing. In this chapter the authors give
an overview of Natural Language Processing. Key theories and techniques
of Natural Language Processing, such as tokenization, tagging, grammars,
parsing, and named entity recognition are introduced. Discussions on
advantages, drawbacks and pitfalls of various approaches or options are
given. These discussions can help readers to easily make decisions when
choosing appropriate techniques. This chapter also gives a relatively
complete theoretical and practical resource guide. Many useful software
tools are discussed or referred to as well. This introduction and
resource guide serve not only as a foundation for the rest of book, but
allow readers/developers to be able to get started to construct natural
language processing systems quickly.
Chapter 2. Document Retrieval. In chapter 2 the authors focus on
document retrieval. After introducing the indexing technology and
different query processing techniques such as Boolean search, Vector
Space model, probabilistic retrieval and language modeling, an in-depth
discussion on search engines and Web search is given. The application of
natural language processing techniques in document retrieval is
discussed here and some useful thoughts are given.
Chapter 3. Information Extraction. In this chapter the authors present
many results from the Message Understanding Conferences, which have been
dedicated to information extraction. Different approaches, systems and
theories behind them are introduced and reviewed. An evaluation of
current technology of information extraction is given. The authors
conclude in the chapter that information extraction technology has come
Chapter 4. Text Categorization. In this chapter the authors first give a
general analysis of applications and tasks as well as key issues
regarding text categorization technology. Various methods of text
categorization, from handcrafted rule based methods, statistical methods
to combination of multiple classifiers, are then explained and
discussed. The chapter also gives a detailed introduction to the
evaluation of text categorization systems.
Chapter 5. Towards Text Mining. In this chapter the authors describe
several promising applications of Natural Language Processing, namely,
named Entity recognition, reference resolution, automatic text
summarization and topic detection. Various approaches for these tasks
are introduced and evaluated. The authors finish this chapter, as well
as the book, by giving their thoughts on future prospects of Natural
As mentioned above, the book covers document retrieval, information
extraction, text categorization, named entity extraction, text
summarization, and topic detection - these are techniques identified as
main tasks and super tasks of language processing applications on the
Web. Question Answering (QA) and natural language conversation are thus
not discussed. However, as demand for Question Answering and
conversational systems grows very quickly - in fact in different
channels like the Web, Email and phone / voice, a discussion on this
topic would absolutely be very interesting (e.g., Voorhees 2001, Allen
et al 2001).
In addition, the book uses endnotes instead of footnotes and
bibliography. Although plenty of valuable thoughts can be found in the
endnote of each single chapter, it might be more convenient for readers
to have a separate list of bibliographical references.
In general, the book is a very good, concise reference book filled with
many theoretical principles and practical guidelines. I recommend this
book to anyone who wants to build applications related to text
retrieval, information extraction and categorization.
J. Allan (2000) Natural Language Processing for Information Retrieval.
Tutorial presented at the NAACL/ANLP Language Technology Joint Conference
in Seattle, Washington, April 29, 2000.
J. Allen (1995) Natural Language Understanding, 2nd ed. Benjamin/Cummings.
J. F. Allen, D. K. Byron, M. Dzikovska, G. Ferguson & L. Galescu (2001)
Toward Conversational Human Computer Interaction. AI Magazine, Winter 2001,
R. Basili, M. Di Nanni & M. T. Pazienza (1999), Engineering if IE
Systems: An Object-Oriented Approach. In M. T. Pazienza, ed.,
Information Extraction: Towards Scalable, Adaptable Systems, pp. 134-164. Springer.
R. Cole, J. Mariani, H. Uszkoreit, A. Zaenen & V. Zue (1997) Survey of
the State of the Art in Human Language Technology. Cambridge University
G. Kowalski (1997), Information Retrieval Systems: Theory and Implementation.
Kluwer Academic Publishers.
C. D. Manning & H. Schuetze (1999) Foundations of Statistical Natural
Language Processing. MIT Press.
E. M. Voorhees (1999) Natural Language Processing and Information Retrieval.
SCIE, pp. 32-48
E. M. Voorhees (2001) Overview of the TREC 2001 Question Answering Track.