Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

Oxford Handbook of Corpus Phonology

Edited by Jacques Durand, Ulrike Gut, and Gjert Kristoffersen

Offers the first detailed examination of corpus phonology and serves as a practical guide for researchers interested in compiling or using phonological corpora


New from Cambridge University Press!

ad

The Languages of the Jews: A Sociolinguistic History

By Bernard Spolsky

A vivid commentary on Jewish survival and Jewish speech communities that will be enjoyed by the general reader, and is essential reading for students and researchers interested in the study of Middle Eastern languages, Jewish studies, and sociolinguistics.


New from Brill!

ad

Indo-European Linguistics

New Open Access journal on Indo-European Linguistics is now available!


Query Details


Query Subject:   Index of synthesis data
Author:   Hugo Cesar de Castro Carneiro
Submitter Email:  click here to access email

Linguistic LingField(s):  Morphology
Syntax

Query:   My M.Sc. thesis is called ''The function of the index of synthesis of the
languages in part-of-speech tagging with weightless artificial neural
networks''.

In this thesis my motivation is based on ''like vs. gostam (Portuguese for
''they like'')'' paradigm. In which ''like'' has an ambiguous part of
speech, as it can be a preposition, a conjunction, a verb or even other
part of speech, needing to have a word like ''they'' adjacent to it in
order to help readers to know that it is a ''verb'' (in this context). On
the other hand, ''gostam'' in Portuguese is always a verb, as the ''-am''
suffix informs the reader that ''gostam'' is really a verb.

So, I am testing a system I've developed in 5 languages: Mandarin Chinese,
English, Portuguese, German and Turkish (from the most isolating language
to the most synthetic). And when I get the information I need from these 5
languages, I will test the system in 4 others: Thai (more synthetic than
Mandarin Chinese and more isolating than English), Japanese (more synthetic
than English and more isolating than Portuguese), Italian (more synthetic
than Portuguese and more isolating than German) and Russian (more synthetic
than German and more isolating than Turkish).

But I have one problem: The indices of synthesis of these languages are
only estimated by me, and maybe even their order is somewhat wrong (is
Portuguese or German the most synthetic?).

I would like to know if someone can help me find an index of synthesis of
these languages? Or where can I get a text in each of these languages with
all words with each of their morphemes separated?

I am concluding my master studies this year, but I need to send a paper to
a journal before I get my M.Sc. in Computer Science degree.
LL Issue: 22.4036
Date posted: 15-Oct-2011



Back

Sums main page