LINGUIST List 18.2645
|
Tue Sep 11 2007
Diss: Computational Ling/Text & Corpus Ling: Hasler: 'From Extracts...'
Editor for this issue: Hunter Lockwood
<hunter linguistlist.org>
|
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
|
Directory
1. Laura
Hasler,
From Extracts to Abstracts: Human summary production operations for computer-aided summarisation
Message 1: From Extracts to Abstracts: Human summary production operations for computer-aided summarisation
|
Date: 10-Sep-2007
From: Laura Hasler <L.Hasler wlv.ac.uk>
Subject: From Extracts to Abstracts: Human summary production operations for computer-aided summarisation
E-mail this message to a friend
Institution: University of Wolverhampton
Program: School of Humanities, Languages and Social Sciences
Dissertation Status: Completed
Degree Date: 2007
Author: Laura Hasler
Dissertation Title: From Extracts to Abstracts: Human summary production operations for computer-aided summarisation
Dissertation URL: http://clg.wlv.ac.uk/papers/hasler-thesis.pdf
Linguistic Field(s):
Computational Linguistics
Text/Corpus Linguistics
Dissertation Director:
Michael Hoey
Ruslan Mitkov
Constantin Orasan
Dissertation Abstract:
This thesis is concerned with the field of computer-aided summarisation, which has emerged at the confluence of the separate but related fields of human and automatic summarisation. Due to the poor quality of the readability and coherence of automatically produced extracts, computer-aided summarisation (CAS) is a viable working option to fully automatic summarisation. CAS allows a human summariser to post-edit automatically produced extracts to improve their readability and coherence. In order to best utilise the concept of computer-aided summarisation, reliable ways of improving the coherence and readability of extracts when transforming them into abstracts must be established. To achieve this, a corpus-based analysis of the operations a human summariser applies to extracts to transform them into abstracts is presented. The corpus developed here is a corpus of pairs of news texts annotated for important information (i.e., human-produced extracts) and the human-produced abstracts corresponding to these extracts. The creation of this corpus simulates the computer-aided summarisation process to enable a reliable investigation into the operations used. A detailed classification of human summary production operations is proposed, with examples which highlight the common linguistic realisations and functions of the operations identified in the corpus. The classification is then used as a basis for guidelines which can be given to users of computer-aided summarisation systems in order to ensure that the summaries they produce are of a consistently high quality. The human summary production operations are applied to extracts using the guidelines in order to evaluate them. Evaluation is performed using a metric developed for Centering Theory, a discourse theory of local coherence and salience, which constitutes a new evaluation method. This is appropriate because existing methods of evaluating summaries are unsuitable. A set of both automatic and human-produced extracts and their corresponding abstracts are evaluated, and a comparison is made with evaluations given by a human judge. The evaluation shows that when the operations are applied to extracts using the guidelines, there is an improvement in the readability and coherence of the resulting abstracts.
Read more issues|LINGUIST home page|Top of issue
|
|

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|