LINGUIST List 22.4036|
Sat Oct 15 2011
Qs: Index of synthesis data
Editor for this issue: Zac Smith
We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate.
In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query.
To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.cfm.
1. Hugo Cesar de Castro Carneiro ,
Index of synthesis data
Message 1: Index of synthesis data
From: Hugo Cesar de Castro Carneiro <hcesarcastrogmail.com>
Subject: Index of synthesis data
E-mail this message to a friend
My M.Sc. thesis is called ''The function of the index of synthesis of the
languages in part-of-speech tagging with weightless artificial neural
In this thesis my motivation is based on ''like vs. gostam (Portuguese for
''they like'')'' paradigm. In which ''like'' has an ambiguous part of
speech, as it can be a preposition, a conjunction, a verb or even other
part of speech, needing to have a word like ''they'' adjacent to it in
order to help readers to know that it is a ''verb'' (in this context). On
the other hand, ''gostam'' in Portuguese is always a verb, as the ''-am''
suffix informs the reader that ''gostam'' is really a verb.
So, I am testing a system I've developed in 5 languages: Mandarin Chinese,
English, Portuguese, German and Turkish (from the most isolating language
to the most synthetic). And when I get the information I need from these 5
languages, I will test the system in 4 others: Thai (more synthetic than
Mandarin Chinese and more isolating than English), Japanese (more synthetic
than English and more isolating than Portuguese), Italian (more synthetic
than Portuguese and more isolating than German) and Russian (more synthetic
than German and more isolating than Turkish).
But I have one problem: The indices of synthesis of these languages are
only estimated by me, and maybe even their order is somewhat wrong (is
Portuguese or German the most synthetic?).
I would like to know if someone can help me find an index of synthesis of
these languages? Or where can I get a text in each of these languages with
all words with each of their morphemes separated?
I am concluding my master studies this year, but I need to send a paper to
a journal before I get my M.Sc. in Computer Science degree.
Read more issues|LINGUIST home page|Top of issue
Page Updated: 15-Oct-2011
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.