|
|
E-mail this message to a friend
|
|
Title:
|
Example-Based Parsing for Resource-Deficient Languages
|
|
Author:
|
Shailly Goyal
|
|
Email:
|
click here to access email
|
|
Degree Awarded:
|
Indian Institute of Technology, Delhi
, Ph.D.
|
|
Degree Date:
|
2007
|
|
Linguistic Subfield(s):
|
Computational Linguistics
Morphology
Syntax
Text/Corpus Linguistics
|
|
Subject Language(s):
|
English
Hindi
|
|
Director(s):
|
Niladri Chatterjee
|
|
|
Abstract:
|
|
Aim of the present research is to develop parsing schemes for natural language sentences. Parsed corpus is essential for various natural language processing (NLP) activities, but its availability cannot be guaranteed for most of the languages. Furthermore, development of parsed corpus or parsers is not an easy task using traditional approaches, viz. rule-based and statistical. This is because success of these approaches almost invariably demands a huge amount of computational resources that are not typically available for most of the natural languages. We feel that example-based (EB) approaches can serve as suitable alternatives at this juncture.
The major advantage of these approaches is that their demand on computational resources is much less in comparison with the traditional approaches, yet EB approaches are useful in developing robust techniques as is envisaged in many areas of artificial intelligence, NLP in particular. In this work, we have pursued the following two aspects of example-based parsing:
Bilingual Parsing: In this methodology a sentence is parsed using the parse of its parallel sentence. While projecting the syntactic relations from one language to another, we have considered similarities as well as dissimilarities between the two languages. Hence, we are able to develop generalized schemes that can work on a wide variety of source-target language pairs. We have developed parsing schemes for simple as well as complex sentences.
Monolingual Parsing: In this scheme a sentence is parsed using the parse knowledge of examples of the same language. We have developed schemes for parsing sentences of a language by acquiring appropriate knowledge from a parsed example base of the same language. In this work, we have devised ways to take care of various problems, such as unknown words, free word order property, morphological variations, effectively. We have developed both these schemes in a generalized way with minimal dependence on linguistic knowledge so that the schemes developed can be used across a wide spectrum of languages. In this work we have done a thorough case-study on Hindi. For nitty-gritty details of the parsing schemes, where linguistic details are inevitable, we have considered English and Hindi as the source and target language, respectively. Furthermore, we have chosen link grammar as the underlying grammar for representing the parse of sentences. One fundamental requirement therefore is a link grammar for languages under consideration. Since no such grammar exists for Hindi, we have developed a link grammar for Hindi. For this task also we follow example-based approach.
Development of Hindi Link Grammar: Instead of developing the link grammar for Hindi from scratch, in this work we have made appropriate modifications in the English link grammar to suit the requirements of the Hindi grammar. We have shown how English links can be adapted for Hindi by taking care of its various grammatical nuances (e.g. free word order, noun and verb morphology, influence of subject/object on verb morphology) that make Hindi grammar distinctly different from English. The parsing schemes developed in this work have been implemented, and tested on a reasonably-sized example base. Still we have been able to demonstrate clearly the efficacy of these schemes. We feel that our research will pave the way for quick development of parsers for other languages.
|
|
|
|
|
Page Updated: 26-Nov-2009

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|