LINGUIST List 18.2657
|
Wed Sep 12 2007
Diss: Comp Ling/Morphology: Xanthos: 'Apprentissage automatique de ...'
Editor for this issue: Hunter Lockwood
<hunter linguistlist.org>
|
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
|
Directory
1. Aris
Xanthos,
Apprentissage automatique de la morphologie: le cas des structures racine-schème
Message 1: Apprentissage automatique de la morphologie: le cas des structures racine-schème
|
Date: 12-Sep-2007
From: Aris Xanthos <Aris.Xanthos unil.ch>
Subject: Apprentissage automatique de la morphologie: le cas des structures racine-schème
E-mail this message to a friend
Institution: University of Lausanne
Program: Department of Linguistics
Dissertation Status: Completed
Degree Date: 2007
Author: Aris Xanthos
Dissertation Title: Apprentissage automatique de la morphologie: le cas des structures racine-schème
Linguistic Field(s):
Computational Linguistics
Morphology
Subject Language(s): Arabic, Standard (arb)
Dissertation Director:
François Bavaud
John A. Goldsmith
Remi J. Jolivet
Dissertation Abstract:
This dissertation is concerned with the development of algorithmic methods for the unsupervised learning of natural language morphology, using a symbolically transcribed wordlist. It focuses on the case of languages approaching the introflectional type, such as Arabic or Hebrew. The morphology of such languages is traditionally described in terms of discontinuous units: consonantal roots and vocalic patterns. Inferring this kind of structure is a challenging task for current unsupervised learning systems, which generally operate with continuous units. In this study, the problem of learning root-and-pattern morphology is divided into a phonological and a morphological subproblem. The phonological component of the analysis seeks to partition the symbols of a corpus (phonemes, letters) into two subsets that correspond well with the phonetic definition of consonants and vowels; building around this result, the morphological component attempts to establish the list of roots and patterns in the corpus, and to infer the rules that govern their combinations. We assess the extent to which this can be done on the basis of two hypotheses: (i) the distinction between consonants and vowels can be learned by observing their tendency to alternate in speech; (ii) roots and patterns can be identified as sequences of the previously discovered consonants and vowels respectively. The proposed algorithm uses a purely distributional method for partitioning symbols. Then it applies analogical principles to identify a preliminary set of reliable roots and patterns, and gradually enlarge it. This extension process is guided by an evaluation procedure based on the minimum description length principle, in line with the approach to morphological learning embodied in Linguistica (Goldsmith, 2001). The algorithm is implemented as a computer program named Arabica; it is evaluated with regard to its ability to account for the system of plural formation in a corpus of Arabic nouns. This thesis shows that complex linguistic structures can be discovered without recourse to a rich set of a priori hypotheses about the phenomena under consideration. It illustrates the possible synergy between learning mechanisms operating at distinct levels of linguistic description, and attempts to determine where and why such a cooperation fails. It concludes that the tension between the universality of the consonant-vowel distinction and the specificity of root-and-pattern structure is crucial for understanding the advantages and weaknesses of this approach.
Read more issues|LINGUIST home page|Top of issue
|
|

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|