LINGUIST List 14.2786

Wed Oct 15 2003

Diss: Computational Ling: Ramaswamy: 'A...'

Editor for this issue: Takako Matsui <takolinguistlist.org>


Directory

  1. myholyword, A Morphological Analyzer for Tamil

Message 1: A Morphological Analyzer for Tamil

Date: Tue, 14 Oct 2003 05:50:38 +0000
From: myholyword <myholywordhotmail.com>
Subject: A Morphological Analyzer for Tamil

Institution: University of Hyderabad
Program: Centre for Applied Linguistics and Translation Studies
Dissertation Status: Completed
Degree Date: 2003

Author: Vaishnavi Ramaswamy 

Dissertation Title: A Morphological Analyzer for Tamil

Linguistic Field: Computational Linguistics

Dissertation Director 1: G. Uma Maheshwara Rao

Dissertation Abstract: 

This thesis deals with the designing and implementation of a
morphological analyzer for the Tamil language. It also involves a
comparative study of certain other models of morphological processing,
in order to analyze the advantages of each, in terms of suitability
for adaptation for a language like Tamil. This is primarily aimed at
constructing a complete morphological module for Tamil that could be
used in any NLP application like a spell checker, POS tagger, or
parser.

Aspects of designing a computational model for morphological analysis
include:

1) Deciding a model based on psycholinguistic factors. 
2) Designing formal methods/techniques that would enable converting 
theoretical descriptions into computational models.

The analyzer under consideration relies on a theoretical blend of the
IA and IP approaches to morphological decomposition. Wherever
automatic phonological rules operate largely, IP is incorporated. In
areas where complex but non-automatic morphophonemics (sandhi) is
involved, IA is the choice.

Qualitative and quantitative methods in corpus linguistics were
employed to extract frequency counts and collocations of words. All
possible contexts of occurrence and usage of a word were studied. For
every grammatical category of the language, an extracted list of the
minimum number of word-forms required for a sufficient coverage had
been prepared. Based on such attributes, and in consideration of the
factors of coverage and efficiency for a morphological analyzer, an
essential set of morphological paradigms for each word class in Tamil
had been established. This served as a database comprising of
different tables of inflectional forms of a word, for all the words in
the language.

An analysis of two other well-established models of morphological
analysis: AMPLE and KIMMO had also been taken up for the purpose of
comparison. They formed good platforms for implementing morphological
analyzers in various languages. Implementation of these have been
compared with the Tamil Morph developed here, taking into
consideration factors such as, the cost of implementation in terms of
effort and time, coverage and efficiency.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue