* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
 
E-mail this message to a friend
Title: Automatic Text Summarization as Applied to Information Retrieval: Using indicative and informative
Author: Min-Yen Kan
Email: click here to access email
Homepage: http://www.comp.nus.edu.sg/~kanmy
Degree Awarded: Columbia University , Natural Language Group Department of Computer Science
Degree Date: 2002
Linguistic Subfield(s): Computational Linguistics
Director(s): Judith Klavans
Kathleen McKeown

Abstract:

I identify weaknesses with the standard 'ranked list of documents' information retrieval user interface by examining the search process as performed in the traditional library by professional librarians and catalogers. I distill these processes into a list of core strategies which can be effectively fulfilled by multidocument summaries which assist in both the searching and browsing process. This thesis implements such automatic text summarization components to create an
alternative method of presenting search results coming from IR frameworks.

As a post-processor of results coming from a search framework, Centrifuser implements these principles by producing both informative and indicative summaries that aid the user in information seeking tasks. Centrifuser uses novel techniques in analyzing source articles as a nested tree of topics, which allows the system to compare and contrast discussions of common topics across documents, and to identify rare topics. Documents similar in topic distribution are grouped together to enable faster and more accurate relevance judgment.

A novel contribution in Centrifuser is the focus on generating indicative summaries. I analyze two sources of indicative summaries -- online public access catalog summaries as well as annotated bibliography entries -- by examining guidelines for writing such summaries and by cataloging types of information used in actual summary corpora. The study reveals that metadata, such as the purpose or audience of a resource, are important inclusions in indicative summaries. By using the study's results, I derive an algorithm that
enables Centrifuser to author indicative summaries that both utilize and include metadata, a novel contribution in the summarization field.

To enhance the quality and the variety of summaries that are produced, I have employed novel techniques in natural language generation. The system analyzes documents using a two-part method: high-level content planning deduces what semantic predicates to include and where to place them, and a low-level realization model computes the most appropriate phrasing for each predicate using both local as well as global context.
Add a dissertation
Update dissertation
Page Updated: 29-Nov-2009

Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.