* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
 
E-mail this message to a friend
Title: Apprentissage automatique de la morphologie: le cas des structures racine-schème
Author: Aris Xanthos
Email: click here to access email
Homepage: http://www.unil.ch/imm/page15560.html
Degree Awarded: University of Lausanne , Department of Linguistics
Degree Date: 2007
Linguistic Subfield(s): Computational Linguistics
Morphology
Subject Language(s): Arabic, Standard
Director(s): Remi Jolivet
John Goldsmith
François Bavaud

Abstract:

This dissertation is concerned with the development of algorithmic methods
for the unsupervised learning of natural language morphology, using a
symbolically transcribed wordlist. It focuses on the case of languages
approaching the introflectional type, such as Arabic or Hebrew. The
morphology of such languages is traditionally described in terms of
discontinuous units: consonantal roots and vocalic patterns. Inferring this
kind of structure is a challenging task for current unsupervised learning
systems, which generally operate with continuous units.

In this study, the problem of learning root-and-pattern morphology is
divided into a phonological and a morphological subproblem. The
phonological component of the analysis seeks to partition the symbols of a
corpus (phonemes, letters) into two subsets that correspond well with the
phonetic definition of consonants and vowels; building around this result,
the morphological component attempts to establish the list of roots and
patterns in the corpus, and to infer the rules that govern their
combinations. We assess the extent to which this can be done on the basis
of two hypotheses: (i) the distinction between consonants and vowels can be
learned by observing their tendency to alternate in speech; (ii) roots and
patterns can be identified as sequences of the previously discovered
consonants and vowels respectively.

The proposed algorithm uses a purely distributional method for partitioning
symbols. Then it applies analogical principles to identify a preliminary
set of reliable roots and patterns, and gradually enlarge it. This
extension process is guided by an evaluation procedure based on the minimum
description length principle, in line with the approach to morphological
learning embodied in Linguistica (Goldsmith, 2001). The algorithm is
implemented as a computer program named Arabica; it is evaluated with
regard to its ability to account for the system of plural formation in a
corpus of Arabic nouns.

This thesis shows that complex linguistic structures can be discovered
without recourse to a rich set of a priori hypotheses about the phenomena
under consideration. It illustrates the possible synergy between learning
mechanisms operating at distinct levels of linguistic description, and
attempts to determine where and why such a cooperation fails. It concludes
that the tension between the universality of the consonant-vowel
distinction and the specificity of root-and-pattern structure is crucial
for understanding the advantages and weaknesses of this approach.
Add a dissertation
Update dissertation
Page Updated: 26-Nov-2009

Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.