LINGUIST List 20.2692
|
Wed Aug 05 2009
Diss: Comp Ling: Orasan: 'Comparative Evaluation of Modular...'
Editor for this issue: Di Wdzenczny
<di linguistlist.org>
|
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
|
Directory
1. Constantin
Orasan,
Comparative Evaluation of Modular Automatic Summarisation Systems Using CAST
Message 1: Comparative Evaluation of Modular Automatic Summarisation Systems Using CAST
|
Date: 05-Aug-2009
From: Constantin Orasan <C.Orasan wlv.ac.uk>
Subject: Comparative Evaluation of Modular Automatic Summarisation Systems Using CAST
E-mail this message to a friend
Institution: University of Wolverhampton
Program: School of Humanities, Languages and Social Sciences
Dissertation Status: Completed
Degree Date: 2006
Author: Constantin Orasan
Dissertation Title: Comparative Evaluation of Modular Automatic Summarisation Systems Using CAST
Dissertation URL: http://clg.wlv.ac.uk/papers/orasan-thesis.php
Linguistic Field(s):
Computational Linguistics
Dissertation Director:
Chris Paice
Ruslan Mitkov
Dissertation Abstract:
The information overload faced by today's society poses great challenges to researchers who want to find a relevant piece of information. Automatic summarisation is a field of computational linguistics which can help humans to deal with this information overload by automatically extracting the gist of documents. This thesis attempts to gain insights into the automatic summarisation field from several different angles. First, it performs qualitative, quantitative and comparative evaluations of different automatic summarisation methods. These summarisation methods are built around a term-based summariser which is then augmented with additional linguistic information which includes lexical, semantic and discourse information. On the basis of these evaluations, it was noticed that the choice of modules which provide low-level linguistic information (e.g. morphological processors) does not influence the results significantly, but higher level linguistic information, such as anaphora resolution and shallow information about discourse structure, leads to significant improvements of the summaries. In order to have a comprehensive view of how good summaries produced by a given method are, the evaluation performed in this thesis measures both the informativeness of the summaries produced and the quality of their discourse structure. Moreover, a method which determines the upper limit for informativeness is proposed to demonstrate the limits of extraction techniques. Comparison between the informativeness and the quality of discourse reveals no correlation between them. A third direction pursued in this research is to replace conventional iterative extraction methods, which extract one sentence at a time without considering the rest of the sentences in the summary, with more holistic ones, where the decision to extract a sentence is determined not only by the content of a sentence, but also by the rest of the sentences extracted. To this end, a genetic algorithm which encodes the whole summary is implemented and is shown to produce better summaries than its iterative equivalent.
Read more issues|LINGUIST home page|Top of issue
|
|

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|