LINGUIST List 18.1455
|
Mon May 14 2007
Diss: Computational Ling/Text&Corpus Ling/Translation: Chandra: 'Ma...'
Editor for this issue: Hunter Lockwood
<hunter linguistlist.org>
|
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
|
Directory
1. Subhash
Chandra,
Machine Recognition and Morphological Analysis of Subanta-Padas
Message 1: Machine Recognition and Morphological Analysis of Subanta-Padas
|
Date: 11-May-2007
From: Subhash Chandra <subhash.jnu gmail.com>
Subject: Machine Recognition and Morphological Analysis of Subanta-Padas
Institution: Jawaharlal Nehru University, New Delhi Program: Special Centre for Sanskrit Studies (SCSS) Dissertation Status: Completed Degree Date: 2006 Author: Subhash Chandra Dissertation Title: Machine Recognition and Morphological Analysis of Subanta-Padas Linguistic Field(s): Computational Linguistics Text/Corpus Linguistics Translation Subject Language(s): Sanskrit (san) Dissertation Director(s): Girish Nath Jha Dissertation Abstract: The Indian Heritage Group of the Centre for Development of Advanced Computing (CDAC) has developed a system called DESIKA, which claims to process all the words of Sanskrit and includes generation and analysis (parsing). The Rashtriya Sanskrit Vidyapeeth, Tirupathi under the leadership of Prof. K. V. Ramakrishnamacharyulu (currently Vice Chancellor of Rajasthan Sanskrit University) has done commendable work on the Sansk-net project. Prof. Vineet Chaitanya and Amba Kulkarni are visiting the institution and are currently guiding several Sanskrit R&D initiatives with far reaching consequences. The Academy of Sanskrit Research, Melkote, Mysore has been actively involved in bringing scholars doing technology R&D for Sanskrit and shAstras on a single platform. The Special Centre for Sanskrit Studies, Jawaharlal Nehru University, New Delhi is currently engaged in the following R&D - kAraka Analyzer, sandhi splitter and analyzer, verb analyzer, NP gender agreement, POS tagging of Sanskrit, online Multilingual amarakoaha, Panni's AshTadhyAyI search engine, online MahAbhArata indexing and Jha (2006) presented a model of Sanskrit Analysis System (SAS). The RCILTS project under Prof. G.V. Singh at the School of Computer and Systems Sciences has prepared useful linguistic resources for Sanskrit. Morphological analyzers for Sanskrit, Telugu, Hindi, Marathi, Kannada and Punjabi have been developed by Akshara Bharathi Group at Indian Institute of Technology, Kanpur, and University of Hyderabad funded by Ministry of Information Technology the project claims to have 95% coverage for Telugu (arbitrary text in modern standard Telugu), and 88% coverage for Hindi. This system is available on the site for downloading as well as online at: http://www.iiit.net/ltrc/morph/index.htm Anusaaraka (developed by Akshar Bharati group, IIIT, Hyderabad) is a computer software which renders text from one Indian language into another, a sort of machine translation. It produces output which is comprehensible to the reader, although at times it might not be grammatical. The system is available at the IIIT Hyderabad site ) How is this work different? The work is different from existing research in the following ways: 1. No online RDBMS based recognizer-analyzer is available till date, which accepts and displays results in Unicode Devanagari script but this system takes Unicode Devanagri text and displays results in Devanagari, 2. This system takes Devanagari utf-8 text as input and delivers Devanagari utf-8 text output using a Java servlet Apache-Tomcat - JDBC - RDBMS technology, 3. gives a comprehensive computational analysis of subanta-padas in a Sanskrit text, and does basic tagging of verbs and avyayas too, 4. uses a hybrid approach to process input text. It works on the morphological nature of bases and applies the vibhakti information for processing, 5. the system can be used for larger processing of Sanskrit for text simplification and machine translation Summary of chapters Chapter I discusses morphological analyzers, current status of R&D in this field, structure and organization of of AshTAdhyAyI (AD), and subanta of Panini. Chapter II discusses subanta formalism of Panini and mechanisms to recognize verb, avyaya and subanta in Sanskrit text. Chapter III discusses the analysis of subanta-padas. Chapter IV discusses the implementation aspects: the front end, Java objects, databases, linguistic resources (corpus and rule bases and example bases), how they work and what is basic requirement of the system and how to apply sandhi and subanta rule where ever necessary. Conclusion discusses future R&D, limitations of the system and result analysis.
Respond to list|Read more issues|LINGUIST home page|Top of issue
|
|

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|