LINGUIST List 18.2637
|
Mon Sep 10 2007
Diss: Computational Ling: Conway: 'Approaches to Automatic Biograph...'
Editor for this issue: Hunter Lockwood
<hunter linguistlist.org>
|
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
|
Directory
1. Mike
Conway,
Approaches to Automatic Biographical Sentence Classification: An empirical study
Message 1: Approaches to Automatic Biographical Sentence Classification: An empirical study
|
Date: 09-Sep-2007
From: Mike Conway <mike nii.ac.jp>
Subject: Approaches to Automatic Biographical Sentence Classification: An empirical study
E-mail this message to a friend
Institution: University of Sheffield
Program: Department of Computer Science
Dissertation Status: Completed
Degree Date: 2007
Author: Mike Conway
Dissertation Title: Approaches to Automatic Biographical Sentence Classification: An empirical study
Linguistic Field(s):
Computational Linguistics
Dissertation Director:
Robert Gaizauskas
Dissertation Abstract:
This thesis addresses the problem of the reliable identification of biographical sentences, an important subtask in several natural language processing application areas (for example, biographical multiple document summarisation, biographical information extraction, and so on). The biographical sentence classification task is placed within the framework of genre classification, rather than traditional topic based text classification. Before exploring methods for doing this task computationally, we need to establish whether, and with what degree of success, humans can identify biographical sentences without the aid of discourse or document structure. To this end, a biographical annotation scheme and corpus was developed, and assessed using a human study. The human study showed that participants were able to identify biographical sentences with a good level of agreement. The main body of the thesis presents a series of experiments designed to find the best sentence representations for the automatic identification of biographical sentences from a range of alternatives. In contrast to previous work, which has centred on the use of single terms (that is, unigrams) for biographical sentence representations, the current work derives unigram, bigram and trigram features from a large corpus of biographical text (including the British Dictionary of National Biography). In addition to the use of corpus derived n-grams, a novel characteristic of the current approach is the use of biographically relevant syntactic features, identified both intuitively and through empirical methods. The experimental work shows that a combination of n-gram features derived from the Dictionary of National Biography and biographically orientated syntactic features yield a performance that surpasses that gained using n-gram features alone. Additionally, in accordance with the view of biographical sentence classification as a genre classification task, stylistic features (for example, topic neutral function words) are shown to be important for recognising biographical sentences.
Read more issues|LINGUIST home page|Top of issue
|
|

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|