LINGUIST List 23.4752

Wed Nov 14 2012

Diss: Comp Ling/ Discourse Analysis/ Pragmatics/ Text/Corpus Ling/ English: Leidner: 'Toponym Resolution in Text...'

Editor for this issue: Lili Xia <lxialinguistlist.org>



Date: 12-Nov-2012
From: Jochen Leidner <leidneracm.org>
Subject: Toponym Resolution in Text: Annotation, evaluation and applications of spatial grounding of place names
E-mail this message to a friend

Institution: University of Edinburgh Program: School of Informatics Dissertation Status: Completed Degree Date: 2007

Author: Jochen L. Leidner

Dissertation Title: Toponym Resolution in Text: Annotation, evaluation and applications of spatial grounding of place names

Dissertation URL: http://www.era.lib.ed.ac.uk/handle/1842/1849

Linguistic Field(s): Computational Linguistics                             Discourse Analysis                             Pragmatics                             Text/Corpus Linguistics
Subject Language(s): English (eng)
Dissertation Director:
Bonnie Webber Claire Grover
Dissertation Abstract:

Background: Spatial and temporal expressions refer to events inspace-time, and the grounding of events is a precondition forreasoning. Thus, automatic grounding can improve many applicationssuch as automatic map drawing and question answering (e.g., forquestions like 'How far is London from Edinburgh?'). Whereas temporalgrounding has received considerable attention, robust spatialgrounding has long been neglected. I define the task of automaticToponym Resolution as computing the mapping from instances of namesfor places as found in a text to a representation of the extensionalsemantics of the location referred to, such as a geographiclatitude/longitude footprint. The mapping between names and locationsis referentially ambiguous: London can refer to the capital of the UKor to London, Ontario, Canada, or other Londons on earth).

Objective: I investigate how referentially ambiguous spatial namedentities can be grounded, or resolved, with respect to an extensionalcoordinate model robustly on open-domain news text.

Method: While a small number of previous attempts have been made tosolve the toponym resolution problem, these were either not evaluated,or evaluation was done by manual inspection of system output insteadof curating a reusable reference corpus. Since the relevantliterature is scattered across several libraries, informationretrieval, natural language processing) and descriptions of algorithmsare mostly given in informal prose, I attempt to systematicallydescribe them and aim at a reconstruction in a uniform, semi-formalpseudo-code notation for easier re-implementation. A systematiccomparison leads to an inventory of heuristics and other sources ofevidence. In order to carry out a comparative evaluation procedure,an evaluation resource is required. Unfortunately, to date no goldstandard has been curated in the research community. To this end, areference gazetteer and an associated novel reference corpus withhuman-labeled referent annotation are created. These are subsequentlyused to benchmark a selection of the reconstructed algorithms and anovel re-combination of the heuristics cataloged in the inventory. Ithen compare the performance of the same TR algorithms under threedifferent conditions, namely applying it to the output of humannamed entity annotation, automatic annotation using an existingMaximum Entropy sequence tagging model, and a naive toponymlookup procedure in a gazetteer.

Evaluation: The algorithms implemented in this thesis are evaluatedin an intrinsic or component evaluation. To this end, we define atask-specific matching criterion to be used with traditional Precisionand Recall evaluation metrics.

Main Contributions: The major contributions of this thesis are as follows:- a new reference corpus in which instances of location named entities have been manually annotated with spatial grounding information for populated places.- a new method and implemented system to resolve toponyms that is capable of robustly processing unseen text (open-domain online newswire text) and grounding toponym instances in an extensional model using longitude and latitude coordinates and hierarchical path descriptions, and a comparison between a replicated method as described in the literature, which functions as a baseline, and a novel algorithm based on minimality heuristics; and- an empirical analysis of the relative utility of various heuristic biases and other sources of evidence.



Page Updated: 14-Nov-2012