|
|
E-mail this message to a friend
|
|
Title:
|
Automatic Text Summarization as Applied to Information Retrieval: Using indicative and informative
|
|
Author:
|
Min-Yen Kan
|
|
Email:
|
click here to access email
|
|
Homepage:
|
http://www.comp.nus.edu.sg/~kanmy
|
|
Degree Awarded:
|
Columbia University
, Natural Language Group Department of Computer Science
|
|
Degree Date:
|
2002
|
|
Linguistic Subfield(s):
|
Computational Linguistics
|
|
Director(s):
|
Judith Klavans
Kathleen McKeown
|
|
|
Abstract:
|
|
I identify weaknesses with the standard 'ranked list of documents' information retrieval user interface by examining the search process as performed in the traditional library by professional librarians and catalogers. I distill these processes into a list of core strategies which can be effectively fulfilled by multidocument summaries which assist in both the searching and browsing process. This thesis implements such automatic text summarization components to create an
alternative method of presenting search results coming from IR frameworks.
As a post-processor of results coming from a search framework, Centrifuser implements these principles by producing both informative and indicative summaries that aid the user in information seeking tasks. Centrifuser uses novel techniques in analyzing source articles as a nested tree of topics, which allows the system to compare and contrast discussions of common topics across documents, and to identify rare topics. Documents similar in topic distribution are grouped together to enable faster and more accurate relevance judgment.
A novel contribution in Centrifuser is the focus on generating indicative summaries. I analyze two sources of indicative summaries -- online public access catalog summaries as well as annotated bibliography entries -- by examining guidelines for writing such summaries and by cataloging types of information used in actual summary corpora. The study reveals that metadata, such as the purpose or audience of a resource, are important inclusions in indicative summaries. By using the study's results, I derive an algorithm that
enables Centrifuser to author indicative summaries that both utilize and include metadata, a novel contribution in the summarization field.
To enhance the quality and the variety of summaries that are produced, I have employed novel techniques in natural language generation. The system analyzes documents using a two-part method: high-level content planning deduces what semantic predicates to include and where to place them, and a low-level realization model computes the most appropriate phrasing for each predicate using both local as well as global context.
|
|
|
|
|
Page Updated: 29-Nov-2009

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|