LINGUIST List 14.226

Wed Jan 22 2003

Review: Computational Ling: Jackson & Moulinier (2002)

Editor for this issue: Naomi Ogasawara <naomilinguistlist.org>


What follows is a review or discussion note contributed to our Book Discussion Forum. We expect discussions to be informal and interactive; and the author of the book discussed is cordially invited to join in.

If you are interested in leading a book discussion, look for books announced on LINGUIST as "available for review." Then contact Simin Karimi at siminlinguistlist.org.


Directory

  • ZhongDong Zhang, Jackson & Moulinier (2002) NLP for Online Applications

    Message 1: Jackson & Moulinier (2002) NLP for Online Applications

    Date: Tue, 21 Jan 2003 23:43:37 +0000
    From: ZhongDong Zhang <zhangnovator.com>
    Subject: Jackson & Moulinier (2002) NLP for Online Applications


    Jackson, Peter and Isabelle Moulinier (2002) Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization. John Benjamins Publishing Company, x+226pp, paperback ISBN 1-58811-250-0, $29.95, Natural Language Processing series.

    Book Announcement on Linguist: http://linguistlist.org/get-book.html?BookID=4059 http://linguistlist.org/issues/13/13-2579.html

    Zhongdong Zhang, Novator Systems Ltd., Toronto

    SYNOPSIS

    The growth of online applications and the World Wide Web has caused intense interest in Natural Language Processing. More and more Natural Language Processing techniques have been applied in commercial systems. The book provides a theoretical and practical introduction to several Natural Language Processing related technologies: document retrieval, information extraction, text categorization, named entity extraction, text summarization, and topic detection. It gives a clear introduction and explanation to various approaches to the selected techniques. General principles and best practices as well as in-depth discussions are given based both on current research results and on the authors' own experience with these technologies. Every chapter ends with an evaluation of the techniques discussed. The authors succeed in providing a good and concise reference book to technology practitioners in the Internet space. The explanations to most techniques are clear, and based on these explanations, readers can implement these techniques directly. Furthermore, the book provides a comprehensive bibliography to the techniques it covers.

    Throughout the book readers will find two things very useful: sidebars and pointers. Sidebars provide a clear explanation or demonstration of techniques being discussed and thus allow readers to be able to get an easy understanding. Pointers provide supplementary bibliographical resources.

    Unlike some well known books on Natural Language Processing techniques (e.g., Allen 1995, Manning & Schuetze 1999, Cole et al 1997) which deal with core theories, approaches and techniques as well as general applications of Natural Language Processing, this book focuses on several selected technologies which are identified by the authors as main tasks and super-tasks of language processing applications on the Web (page 8). The book doesn't pay much attention to the relationship of NLP and these tasks, as discussed in Allan 2000 and Voorhees 1999; rather, it focuses mainly on the technical aspects of the selected tasks.

    As the authors emphasize in abstract, the book is neither a vendor guide nor a recipe for building applications, although it does deal with general principles and practical issues of building applications with the selected techniques. Issues like architecture, design and implementation of Natural Language Processing techniques embedded, robust, efficient, and scalable systems (e.g., Kowalski 1997, Basili et al 1999) are not discussed explicitly in this book. Some general discussions on these issues are available and some software and toolkits are mentioned throughout the book.

    Chapter 1. Natural Language Processing. In this chapter the authors give an overview of Natural Language Processing. Key theories and techniques of Natural Language Processing, such as tokenization, tagging, grammars, parsing, and named entity recognition are introduced. Discussions on advantages, drawbacks and pitfalls of various approaches or options are given. These discussions can help readers to easily make decisions when choosing appropriate techniques. This chapter also gives a relatively complete theoretical and practical resource guide. Many useful software tools are discussed or referred to as well. This introduction and resource guide serve not only as a foundation for the rest of book, but allow readers/developers to be able to get started to construct natural language processing systems quickly.

    Chapter 2. Document Retrieval. In chapter 2 the authors focus on document retrieval. After introducing the indexing technology and different query processing techniques such as Boolean search, Vector Space model, probabilistic retrieval and language modeling, an in-depth discussion on search engines and Web search is given. The application of natural language processing techniques in document retrieval is discussed here and some useful thoughts are given.

    Chapter 3. Information Extraction. In this chapter the authors present many results from the Message Understanding Conferences, which have been dedicated to information extraction. Different approaches, systems and theories behind them are introduced and reviewed. An evaluation of current technology of information extraction is given. The authors conclude in the chapter that information extraction technology has come of age.

    Chapter 4. Text Categorization. In this chapter the authors first give a general analysis of applications and tasks as well as key issues regarding text categorization technology. Various methods of text categorization, from handcrafted rule based methods, statistical methods to combination of multiple classifiers, are then explained and discussed. The chapter also gives a detailed introduction to the evaluation of text categorization systems.

    Chapter 5. Towards Text Mining. In this chapter the authors describe several promising applications of Natural Language Processing, namely, named Entity recognition, reference resolution, automatic text summarization and topic detection. Various approaches for these tasks are introduced and evaluated. The authors finish this chapter, as well as the book, by giving their thoughts on future prospects of Natural Language Processing.

    DISCUSSION

    As mentioned above, the book covers document retrieval, information extraction, text categorization, named entity extraction, text summarization, and topic detection - these are techniques identified as main tasks and super tasks of language processing applications on the Web. Question Answering (QA) and natural language conversation are thus not discussed. However, as demand for Question Answering and conversational systems grows very quickly - in fact in different channels like the Web, Email and phone / voice, a discussion on this topic would absolutely be very interesting (e.g., Voorhees 2001, Allen et al 2001).

    In addition, the book uses endnotes instead of footnotes and bibliography. Although plenty of valuable thoughts can be found in the endnote of each single chapter, it might be more convenient for readers to have a separate list of bibliographical references.

    In general, the book is a very good, concise reference book filled with many theoretical principles and practical guidelines. I recommend this book to anyone who wants to build applications related to text retrieval, information extraction and categorization.

    REFERENCES

    J. Allan (2000) Natural Language Processing for Information Retrieval. Tutorial presented at the NAACL/ANLP Language Technology Joint Conference in Seattle, Washington, April 29, 2000.

    J. Allen (1995) Natural Language Understanding, 2nd ed. Benjamin/Cummings.

    J. F. Allen, D. K. Byron, M. Dzikovska, G. Ferguson & L. Galescu (2001) Toward Conversational Human Computer Interaction. AI Magazine, Winter 2001, pp. 27-37.

    R. Basili, M. Di Nanni & M. T. Pazienza (1999), Engineering if IE Systems: An Object-Oriented Approach. In M. T. Pazienza, ed., Information Extraction: Towards Scalable, Adaptable Systems, pp. 134-164. Springer.

    R. Cole, J. Mariani, H. Uszkoreit, A. Zaenen & V. Zue (1997) Survey of the State of the Art in Human Language Technology. Cambridge University Press.

    G. Kowalski (1997), Information Retrieval Systems: Theory and Implementation. Kluwer Academic Publishers.

    C. D. Manning & H. Schuetze (1999) Foundations of Statistical Natural Language Processing. MIT Press.

    E. M. Voorhees (1999) Natural Language Processing and Information Retrieval. SCIE, pp. 32-48

    E. M. Voorhees (2001) Overview of the TREC 2001 Question Answering Track.

    ABOUT THE REVIEWER

    Zhongdong Zhang is a Senior Software Developer at Novator Systems Ltd. in Toronto. Currently he is working on automated customer service solutions (Question Answering and Natural Language Conversation) by using various techniques of natural language understanding, information retrieval and text categorization.