Publishing Partner: Cambridge University Press CUP Extra Publisher Login

New from Cambridge University Press!

ad

Voice Quality

By John H. Esling, Scott R. Moisik, Allison Benner, Lise Crevier-Buchman

Voice Quality "The first description of voice quality production in forty years, this book provides a new framework for its study: The Laryngeal Articulator Model. Informed by instrumental examinations of the laryngeal articulatory mechanism, it revises our understanding of articulatory postures to explain the actions, vibrations and resonances generated in the epilarynx and pharynx."


New from Oxford University Press!

ad

Let's Talk

By David Crystal

Let's Talk "Explores the factors that motivate so many different kinds of talk and reveals the rules we use unconsciously, even in the most routine exchanges of everyday conversation."



E-mail this page 1

We Have a New Site!

With the help of your donations we have been making good progress on designing and launching our new website! Check it out at https://linguistlist.org/!
***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at webdevlinguistlist.org***

Dissertation Information


Title: Towards the Development of an Automatic Diacritizer for the Persian Orthography based on the Xerox Finite State Transducer Add Dissertation
Author: Peyman Nojoumian Update Dissertation
Email: click here to access email
Homepage: http://dornsife.usc.edu/cf/faculty-and-staff/faculty.cfm?pid=1038534
Institution: University of Ottawa, Department of Linguistics
Completed in: 2011
Linguistic Subfield(s): Computational Linguistics;
Director(s): Paul Hirschbühler
Diana Inkpen

Abstract: Due to the lack of short vowels or diacritics in Persian orthography, many
Natural Language Processing applications for this language, including
information retrieval, machine translation, text-to-speech, and automatic
speech recognition systems need to disambiguate the input first, in order
to be able to do further processing. In machine translation, for example,
the whole text should be correctly diacritized first so that the correct
words, parts of speech and meanings are matched and retrieved from the
lexicon. This is primarily because of Persian’s ambiguous orthography. In
fact, the core engine of any Persian language processor should utilize a
diacritizer and a lexical disambiguator. This dissertation describes the
design and implementation of an automatic diacritizer for Persian based on
the state-of-the-art Finite State Transducer technology developed at Xerox
by Beesley & Karttunen (2003). The result of morphological analysis and
generation on a test corpus is shown, including the insertion of
diacritics. This study will also look at issues that are raised by
phonological and semantic ambiguities as a result of short vowels in
Persian being absent in the writing system. It suggests a hybrid model
(rule-based & inductive) that is inspired by psycholinguistic experiments
on the human mental lexicon for the disambiguation of heterophonic
homographs in Persian using frequency and collocation information. A
syntactic parser can be developed based on the proposed model to discover
Ezafe (the linking short vowel /e/ within a noun phrase) or disambiguate
homographs, but its implementation is left for future work.