Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!

ad

Revitalizing Endangered Languages

Edited by Justyna Olko & Julia Sallabank

Revitalizing Endangered Languages "This guidebook provides ideas and strategies, as well as some background, to help with the effective revitalization of endangered languages. It covers a broad scope of themes including effective planning, benefits, wellbeing, economic aspects, attitudes and ideologies."


We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at https://linguistlist.org/!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at webdevlinguistlist.org***

Academic Paper


Title: Enhancing Effectiveness of Sentence Alignment in Parallel Corpora: Using MT & heuristics
Paper URL: http://researchweb.iiit.ac.in/~kirankumar/pub/icon08.pdf
Author: Kiran Pala
Email: click here TO access email
Institution: International Institute of Information Technology, Hyderabad
Linguistic Field: Computational Linguistics
Subject Language: English
Hindi
Abstract: India is a multilingual, linguistically dense and diverse country with rich resources of information. Parallel corpora have major role in multilingual natural language processing, computational linguistics, speech and information retrieval. This paper describes an alignment system for aligning English-Hindi texts in GyanNidhi corpus at the sentence level. The criteria used for alignment is combination of linguistic, statistical information and simple heuristics. We use multi-feature approach with Anusaaraka (Machine Translation System), Hindi shallow-parser, Hindi WordNet lookup as primary technique with resources of target language to increase the level of alignment accuracy. Other features such as Named Entities, linguistic information, notation converters are used to match the words in between one-to-many bilingual sentences. Our experiments are based on the GyanNidhi corpus. We obtained 92.06% accuracy for English-to-Hindi sentence alignment with 95.68% precision and 88.09% recall for one-to-many sentence alignment. The study also suggests procedures for aligning parallel translated corpora by using a machine translation system.
Type: Collection
Status: Completed
Venue: Pune, India
Publication Info: In Proceedings of the Sixth International Conference on Natural Language Processing (ICON).
URL: http://researchweb.iiit.ac.in/~kirankumar/pub/icon08.pdf
Add a new paper
Return to Academic Papers main page
Return to Directory of Linguists main page