|
Description:
|
This text covers the emerging technologies of document retrieval,
information extraction, and text categorization in a way which highlights
commonalities in terms of both general principles and practical issues. It
seeks to satisfy a need on the part of technology practitioners in the
Internet space, faced with having to make difficult decisions as to what
research has been done and what the best practices are. It is not intended
as a vendor guide (such things are quickly out of date), or as a recipe for
building applications (such recipes are very context-dependent). But it
does identify the key technologies, the issues involved, and the strengths
and weaknesses of the various approaches. There is also a strong emphasis
on evaluation in every chapter, both in terms of methodology (how to
evaluate) and what controlled experimentation and industrial experience
have to tell us.
Table of Contents
Preface ix
1. Natural language processing 1
2. Document retrieval 23
3. Information extraction 75
4. Text categorization 119
5. Towards text mining 173
Index 219
|