|Title:||Computational Morphosyntactic Analysis of Modern Greek||Add Dissertation|
|Author:||Giorgos Orphanos||Update Dissertation|
|Email:||click here to access email|
|Institution:||University of Patras, Department of Computer Engineering and Informatics|
|Linguistic Subfield(s):||Computational Linguistics;|
|Abstract:||This dissertation addresses the computational morphosyntactic analysis of Modern Greek — an inflected natural language. Morphosyntactic analysis is a cognitive process that constitutes an intermediate layer between morphological and syntactic analysis and aims to assign unambiguous morphosyntactic information to words of texts. With the term morphosyntactic information we mean the morphological origin and the morphosyntactic properties of a word (e.g. the word ανθρώπου is the genitive singular form of the masculine noun [άνθρωπος]). The primary concern of morphosyntactic analysis is to resolve the morphosyntactic ambiguity introduced by morphological analysis (e.g. the word απαντήσεις is either a form of the verb [απαντώ] or a form of the noun [απάντηση]), so as to alleviate the already difficult task of syntactic analysis.
After an overview of the models that have been applied to the morphosyntactic disambiguation of various natural languages, we propose and implement a new model for Modern Greek. Our model comprises two layers. The first layer is constructed according to the machine learning approach. It resolves a significant (the most difficult) part of the ambiguity with the aid of automatically induced decision trees. Decision tree induction is performed with three different algorithms; all three are variants of the standard ID3 algorithm adapted to the linguistic nature of the training datasets. The second layer is constructed according to the linguistic approach. It resolves the remainder of the ambiguity with the aid of handcrafted syntactic rules. The description of the syntactic rules is based on the definite-clause grammar formalism. For the evaluation of our model we used a manually disambiguated corpus of Greek running texts. The evaluation results certify the success of our approach.
The practical outcome of the research presented herein was the development of a morphosyntactic tagger (better known as part-of-speech tagger). The major characteristic of this tagger is its robustness, i.e. the capability to process any text written in Greek. Apart from its utility as a standalone text-analysis tool, the morphosyntactic tagger plays a key role in almost all natural language processing applications: corpus annotation, syntactic analysis, grammar checking, word sense disambiguation, information retrieval, information extraction, summarization, text classification, etc.