Date: 12-Mar-2008
From: Shailly Goyal <shaillygoyal gmail.com>
Subject: Example-Based Parsing for Resource-Deficient Languages
E-mail this message to a friend
Institution: Indian Institute of Technology, Delhi Program: Ph.D. Dissertation Status: Completed Degree Date: 2007 Author: Shailly Goyal Dissertation Title: Example-Based Parsing for Resource-Deficient Languages Linguistic Field(s): Computational Linguistics Morphology Syntax Text/Corpus Linguistics Subject Language(s): English (eng) Hindi (hin) Dissertation Director(s): Niladri Chatterjee Dissertation Abstract: Aim of the present research is to develop parsing schemes for natural language sentences. Parsed corpus is essential for various natural language processing (NLP) activities, but its availability cannot be guaranteed for most of the languages. Furthermore, development of parsed corpus or parsers is not an easy task using traditional approaches, viz. rule-based and statistical. This is because success of these approaches almost invariably demands a huge amount of computational resources that are not typically available for most of the natural languages. We feel that example-based (EB) approaches can serve as suitable alternatives at this juncture. The major advantage of these approaches is that their demand on computational resources is much less in comparison with the traditional approaches, yet EB approaches are useful in developing robust techniques as is envisaged in many areas of artificial intelligence, NLP in particular. In this work, we have pursued the following two aspects of example-based parsing: Bilingual Parsing: In this methodology a sentence is parsed using the parse of its parallel sentence. While projecting the syntactic relations from one language to another, we have considered similarities as well as dissimilarities between the two languages. Hence, we are able to develop generalized schemes that can work on a wide variety of source-target language pairs. We have developed parsing schemes for simple as well as complex sentences. Monolingual Parsing: In this scheme a sentence is parsed using the parse knowledge of examples of the same language. We have developed schemes for parsing sentences of a language by acquiring appropriate knowledge from a parsed example base of the same language. In this work, we have devised ways to take care of various problems, such as unknown words, free word order property, morphological variations, effectively. We have developed both these schemes in a generalized way with minimal dependence on linguistic knowledge so that the schemes developed can be used across a wide spectrum of languages. In this work we have done a thorough case-study on Hindi. For nitty-gritty details of the parsing schemes, where linguistic details are inevitable, we have considered English and Hindi as the source and target language, respectively. Furthermore, we have chosen link grammar as the underlying grammar for representing the parse of sentences. One fundamental requirement therefore is a link grammar for languages under consideration. Since no such grammar exists for Hindi, we have developed a link grammar for Hindi. For this task also we follow example-based approach. Development of Hindi Link Grammar: Instead of developing the link grammar for Hindi from scratch, in this work we have made appropriate modifications in the English link grammar to suit the requirements of the Hindi grammar. We have shown how English links can be adapted for Hindi by taking care of its various grammatical nuances (e.g. free word order, noun and verb morphology, influence of subject/object on verb morphology) that make Hindi grammar distinctly different from English. The parsing schemes developed in this work have been implemented, and tested on a reasonably-sized example base. Still we have been able to demonstrate clearly the efficacy of these schemes. We feel that our research will pave the way for quick development of parsers for other languages.
This Year the LINGUIST List hopes to raise $60,000. This money will go to help keep
the List running by supporting all of our Student Editors for the coming year.
See below for donation instructions, and don't forget to check out our Fund Drive
2008 LINGUIST List Circus and join us on our many shows!
http://linguistlist.org/fund-drive/2008/
There are many ways to donate to LINGUIST!
You can donate right now using our secure credit card form at
https://linguistlist.org/donation/donate/donate1.cfm
Alternatively you can also pledge right now and pay later. To do so, go to:
https://linguistlist.org/donation/pledge/pledge1.cfm
For all information on donating and pledging, including information on how to
donate by check, money order, or wire transfer, please visit:
http://linguistlist.org/donate.html
The LINGUIST List is under the umbrella of Eastern Michigan University and as such
can receive donations through the EMU Foundation, which is a registered 501(c)
Non Profit organization. Our Federal Tax number is 38-6005986. These donations
can be offset against your federal and sometimes your state tax return (U.S. tax
payers only). For more information visit the IRS Web-Site, or contact your
financial advisor.
Many companies also offer a gift matching program, such that they will match any
gift you make to a non-profit organization. Normally this entails your contacting
your human resources department and sending us a form that the EMU Foundation fills
in and returns to your employer. This is generally a simple administrative procedure
that doubles the value of your gift to LINGUIST, without costing you an extra penny.
Please take a moment to check if your company operates such a program.
Thank you very much for your support of LINGUIST!
-----------------------------------------------------------------------------------------
Read more issues|LINGUIST home page|Top of issue
|