Academic Paper

Title: Computational Identification and Analysis of Complicated Sanskrit
Author: Subhash Chandra
Email: click here TO access email
Institution: Centre for Development of Advanced Computing
Linguistic Field: Computational Linguistics; Morphology; Syntax; Translation
Subject Language: Sanskrit
Abstract: This paper presents a model for computational identification and analysis of/L/complicated Sanskrit noun phrases [(nominal morphology or Sanskrit subanta-padas)/L/(NPs)] in Sanskrit text. The simple ones or those forms which are strictly rule/L/governed and fall in to patterns are not very difficult to analyze. However, there are/L/several complicated and ambiguous forms which pose a challenge for analyzers. The/L/purpose of this paper is to put forth a strategy and algorithm which can enable any/L/Sanskrit parser to recognize and analyze these complicated NPs. Identification/L/includes separating the NPs from Verb Phrases [(tinanta) (VPs)] by a strategy of/L/isolating verbs and in-declinables. Analysis includes splitting the NPs into its subconstituents - base [{(praatipadik) (any meaningful form of a word, which is neither a/L/root nor a suffix) (PDK)}], case-number markers [(karaka-vacana-vibhakti) (KVV)]./L/Sanskrit is a heavily inflected language and depends on serial inflections on nouns/L/and verbs for communication of meaning. A fully inflected unit is called pada/L/(useable word) which are NPs or VPs. Therefore identifying and analyzing these/L/inflections are critical to any further processing of Sanskrit./L//L/According to Paanini, there are 21 nominal inflectional suffixes (seven/L/vibhaktis and three numbers 7 X 3 = 21) which are attached to the PDK according to/L/the category, gender, number, and end-character of the base. Some forms of Sanskrit/L/NPs can be very complicated for computational identification and analysis for the/L/examples. For examples: ramaah, bhavati, gacCati, etc. can be both a nominal as well/L/as verbal construction. The pronominal forms pose another challenge, as in most of/L/them; the inflected forms can not easily be related to their bases morphologically. We/L/may have to posit ad-hoc rules and processing to handle them. For example - ‘aham’/L/(first per sing), ‘tvam’ (second per sing), ‘sah’ (third per sing pronoun), ‘amu’ etc./L/are NP formed from respectively the base ‘asmad’, ‘yusmad’, ‘tad’, etc. by inflecting/L/for nominative singular and ‘adas’ by inflecting for nominative dual./L//L/The system first does punctuation, avyayas and verbs (non-NPs)/L/identification for NPs identification in Sanskrit text. After identification of these/L/words, system recognizes all remaining words as NPs and sends for analysis process./L/System does identification of Avyaya (AV) and VPs with the help of AV and VP/L/database. We have stored around 524 AV forms, commonly used in modern Sanskrit/L/languages and about 500 commonly used verb roots and their forms for verb/L/recognition. So we have around 90,000 verb forms stored in UTF-8 Unicode/L/devanagari scripts. Thus the NPs in Sanskrit text are identified by a process of/L/exclusion. After the verbs and avyayas are identified by their lexical pattern matching/L/search, the remaining words in the text are labeled NPs./L//L/The system also has some basic requirements for use- 1. JAVA installed to support/L/the Java Web Server. 2. Apache Tomcat 4.0 installed web Server. 3. Baraha software/L/for UTF-8 Unicode Devanagari input or any other. If the user’s machine does not have all of these then they can not use this system./L//L/The present work is an attempt to process Sanskrit NP inflections by way of/L/Paanini’s rule system, appropriate database and example-base. The system developed is an online system run on Apache Tomcat platform using Java servlet, MSSQL server/L/2005 as back end and JBDC for connectivity. The goal is to simplify Sanskrit text for/L/self reading, understanding, and also for any Machine (Aided) Translation (MAT)/L/from Sanskrit to other languages.
Type: Individual Paper
Status: Completed
Venue: Allahabad, India
Publication Info: ICCS, Allahabad, Proceeding
