* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
 
E-mail this message to a friend
Title: On the Theory of Language Information Processing and Key Technology in Automatic Abstracting
Author: Guo Xianghao
Email: click here to access email
Degree Awarded: Beijing University of Posts and Telecommunications , Information engineering
Degree Date: 1997
Linguistic Subfield(s): Computational Linguistics
Subject Language(s): Chinese, Mandarin
Director(s): Zhong Yixin

Abstract:

Natural language processing is of great importance in the times of information. This paper focuses on some key problems in language information processing, mainly on the discourse modeling and automatic abstracting. The organization of the thesis is as follows:

Chapter 1: The history and new development of natural language processing research is introduced.

Chapter 2: A new algorithm of Chinese automatic segmenting is put forward.

Chapter 3: A Chinese sentence parser is introduced which is one of the most important components of the Application systems in the paper.

Chapter 4: Since the main problem this paper focuses on is automatic abstracting, this chapter gives a brief review of the automatic abstracting: its history, existing systems and different methods used in this area.

Chapter 5: In order to develop a automatic abstracting system based on natural language understanding, it is critically important to study the discourse structure. This chapter puts forward a discourse structure model based on intention structure, called NIT (Narrate Intention Tree). The nature of NIT and the construction of NIT is studied in details, and a classification of existing automatic abstracting system based on NIT theory is introduced as well.

Chapter 6: In this chapter the author designed a practical automatic abstracting system that handles news of foreign affairs. It can understand the sentences in the news and sum up the main topic mentioned in the news. After this, the system would generate a abstract based on its understanding. The purpose of this is to test and verify the NIT model, and the realization of this system is discussed in details.

Chapter 7: There are two methods in automatic abstracting: the traditional method which will not concern the meaning of the text but uses some statistical and some heuristic measures to get the abstract of text, and the method uses natural language understanding technology. The quality of the former method is not satisfactory, while the latter one is always limited to a narrow area and can't be practical. To solve this problem, this chapter introduced corpus and machine learning method into automatic abstracting. The key idea is that human being first read the text and then selects some sentences, which, according to his understanding, would be the abstract of the text. Computer will learn the selected sentences and discover the rules of such sentences, and uses these rules to extract potential abstract sentence from the text of same fields.This method is especially useful while dealing with the texts that are of great amount and under dynamic changes. Two machine learning methods: ID3 decision tree and Abstract sentence match patterns #(ASMP#)are employed in this chapter. Abstract sentence match patterns is put forward by the author, it has two parts: A trigger and sentence structure pattern. The trigger is always the main verb ofthe sentence, which actives the ASMP and act as the pattern's conceptual anchor point. The sentence structure pattern characterizes the structure of the sentences in abstract.

Chapter 8: Sums up the whole paper.
Add a dissertation
Update dissertation
Page Updated: 26-Nov-2009

Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.