LINGUIST List 23.4752
|
Wed Nov 14 2012
Diss: Comp Ling/ Discourse Analysis/ Pragmatics/ Text/Corpus Ling/ English: Leidner: 'Toponym Resolution in Text...'
Editor for this issue: Lili Xia
<lxia linguistlist.org>
|
Date: 12-Nov-2012
From: Jochen Leidner <leidner acm.org>
Subject: Toponym Resolution in Text: Annotation, evaluation and applications of spatial grounding of place names
E-mail this message to a friend
Institution: University of Edinburgh
Program: School of Informatics
Dissertation Status: Completed
Degree Date: 2007
Author: Jochen L. Leidner
Dissertation Title: Toponym Resolution in Text: Annotation, evaluation and applications of spatial grounding of place names
Dissertation URL: http://www.era.lib.ed.ac.uk/handle/1842/1849
Linguistic Field(s):
Computational Linguistics
Discourse Analysis
Pragmatics
Text/Corpus Linguistics
Subject Language(s): English (eng)
Dissertation Director:
Bonnie Webber
Claire Grover
Dissertation Abstract:
Background: Spatial and temporal expressions refer to events in space-time, and the grounding of events is a precondition for reasoning. Thus, automatic grounding can improve many applications such as automatic map drawing and question answering (e.g., for questions like 'How far is London from Edinburgh?'). Whereas temporal grounding has received considerable attention, robust spatial grounding has long been neglected. I define the task of automatic Toponym Resolution as computing the mapping from instances of names for places as found in a text to a representation of the extensional semantics of the location referred to, such as a geographic latitude/longitude footprint. The mapping between names and locations is referentially ambiguous: London can refer to the capital of the UK or to London, Ontario, Canada, or other Londons on earth). Objective: I investigate how referentially ambiguous spatial named entities can be grounded, or resolved, with respect to an extensional coordinate model robustly on open-domain news text. Method: While a small number of previous attempts have been made to solve the toponym resolution problem, these were either not evaluated, or evaluation was done by manual inspection of system output instead of curating a reusable reference corpus. Since the relevant literature is scattered across several libraries, information retrieval, natural language processing) and descriptions of algorithms are mostly given in informal prose, I attempt to systematically describe them and aim at a reconstruction in a uniform, semi-formal pseudo-code notation for easier re-implementation. A systematic comparison leads to an inventory of heuristics and other sources of evidence. In order to carry out a comparative evaluation procedure, an evaluation resource is required. Unfortunately, to date no gold standard has been curated in the research community. To this end, a reference gazetteer and an associated novel reference corpus with human-labeled referent annotation are created. These are subsequently used to benchmark a selection of the reconstructed algorithms and a novel re-combination of the heuristics cataloged in the inventory. I then compare the performance of the same TR algorithms under three different conditions, namely applying it to the output of human named entity annotation, automatic annotation using an existing Maximum Entropy sequence tagging model, and a naive toponym lookup procedure in a gazetteer. Evaluation: The algorithms implemented in this thesis are evaluated in an intrinsic or component evaluation. To this end, we define a task-specific matching criterion to be used with traditional Precision and Recall evaluation metrics. Main Contributions: The major contributions of this thesis are as follows: - a new reference corpus in which instances of location named entities have been manually annotated with spatial grounding information for populated places. - a new method and implemented system to resolve toponyms that is capable of robustly processing unseen text (open-domain online newswire text) and grounding toponym instances in an extensional model using longitude and latitude coordinates and hierarchical path descriptions, and a comparison between a replicated method as described in the literature, which functions as a baseline, and a novel algorithm based on minimality heuristics; and - an empirical analysis of the relative utility of various heuristic biases and other sources of evidence.
Read more issues|LINGUIST home page|Top of issue
|
|
Page Updated: 14-Nov-2012
|
|
About LINGUIST
|
Contact Us
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.
|
|