LINGUIST List 18.2213
|
Mon Jul 23 2007
Diss: Comp Ling: Schaefer: 'Integrating Deep and Shallow Natural La...'
Editor for this issue: Hunter Lockwood
<hunter linguistlist.org>
|
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
|
Directory
1. Ulrich
Schaefer,
Integrating Deep and Shallow Natural Language Processing Components: Representations and hybrid architectures
Message 1: Integrating Deep and Shallow Natural Language Processing Components: Representations and hybrid architectures
|
Date: 23-Jul-2007
From: Ulrich Schaefer <ulrich.schaefer dfki.de>
Subject: Integrating Deep and Shallow Natural Language Processing Components: Representations and hybrid architectures
E-mail this message to a friend
Institution: Saarland University
Program: Department of Computer Science
Dissertation Status: Completed
Degree Date: 2006
Author: Ulrich Schaefer
Dissertation Title: Integrating Deep and Shallow Natural Language Processing Components: Representations and hybrid architectures
Linguistic Field(s):
Computational Linguistics
Dissertation Director:
Hans Uszkoreit
Wolfgang Wahlster
Dissertation Abstract:
We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processing tasks. We introduce XML standoff markup as an additional abstraction layer that eases integration of NLP components, and propose the use of XSLT as a standardized and efficient transformation language for online NLP integration. In the main part of the thesis, we describe our contributions to three hybrid architecture frameworks that make use of these fundamentals. SProUT is a shallow system that uses elements of deep constraint-based processing, namely type hierarchy and typed feature structures. Whiteboard is the first hybrid architecture to integrate not only part-of-speech tagging, but also named entity recognition and topological parsing, with deep parsing. Finally, we present Heart of Gold, a middleware architecture that generalizes Whiteboard into various dimensions such as configurability, multilinguality and flexible processing strategies. We describe various applications that have been implemented using the hybrid frameworks such as structured named entity recognition, information extraction, creative document authoring support, deep question analysis, as well as evaluations. In Whiteboard, e.g., it could be shown that shallow pre-processing increases both coverage and efficiency of deep parsing by a factor of more than two. Heart of Gold not only forms the basis for applications that utilize semantics-oriented natural language analysis, but also constitutes a complex research instrument for experimenting with novel processing strategies combining deep and shallow methods, and eases replication and comparability of results.
Read more issues|LINGUIST home page|Top of issue
|
|

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|