* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 21.74

Thu Jan 07 2010

Qs: Japanese and English Corpora Research

Editor for this issue: Elyssa Winzeler <elyssalinguistlist.org>


We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate.

In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query.

To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
Directory
        1.    Barry Kavanagh, Japanese and English Corpora Research

Message 1: Japanese and English Corpora Research
Date: 04-Jan-2010
From: Barry Kavanagh <b_kavanaghauhw.ac.jp>
Subject: Japanese and English Corpora Research
E-mail this message to a friend

I have a question regarding corpora if I may. At the moment I am looking at
non-verbal representations of language such as emoticons in computer
mediated discourse and have compiled a fairly large Japanese and English
corpus. As I am counting these non-verbal or paralinguistic cues within
these corpora the corpora need to be of the same size otherwise my data and
findings may be deemed void. For example, if the Japanese corpus if much
bigger than the English one then the chances are the more likely that these
non-verbal representations will appear. I have tried making the number of
sentences the same within each corpora (very time consuming, also defining
what a sentence is in online communication can be difficult) and I am also
trying to find similar studies that have compared English and Japanese
corpora (no luck yet) and to see if here are any reliable representations
that state for example that 400 kanji is equal to 1000 English words etc.

Any ideas or advice would be fantastic.

Linguistic Field(s): Computational Linguistics
                            Text/Corpus Linguistics

Subject Language(s): English (eng)
                            Japanese (jpn)

Read more issues|LINGUIST home page|Top of issue




Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.