LINGUIST List 7.1767

Fri Dec 13 1996

Sum: Asian Word Frequency

Editor for this issue: T. Daniel Seely <seelylinguistlist.org>


Directory

  1. Peter.Ross, Asian Word Frequency

Message 1: Asian Word Frequency

Date: Fri, 13 Dec 1996 09:12:01 -0700
From: Peter.Ross <Peter.Rossanu.edu.au>
Subject: Asian Word Frequency


Recently I posted a query for information on Word Frequency for
East/Southeast Asian languages. Following is a summary of responses
received (on Chinese) as well as some additional information from enquiries
off the list (on Thai). There is still a lot of work to be done here. Word
segmentation has been mentioned as a key issue for both Chinese and Thai.

MANDARIN CHINESE:

1. Beijing Yuyan Xueyuan Yuyan Jiaoxue Yanjiu Suo. 1986. Xiandai Hanyu
Pinlu Cidian [Modern Chinese Frequency Dictionary]. Beijing:Beijing Yuyan
Xueyuan Chubanshe. [Beijing Institute of Language Press].

2. "Dictionary of Usage Frequency of Modern Chinese Words", 1990 Beijing
University of Aeronauties and Astronautics Press.

3. Linguistic Data Consortium (http://www.ldc.upenn.edu) Corpus of Mandarin
conversational speech. 100 30-minute, telephone conversations, 10 minutes
each conversation, transcribed. Soon be published on CD-ROM. In principle,
this corpus enables estimation of word frequencies for contemporary
conversational Mandarin.

4. There are several automatically- or semi-automatically segmented
Mandarin text corpora around, eg Academia Sinica, Taiwan.

THAI

1. Yuen Poovoravan, Kasetsart University has done some preliminary research
on Thai word frequency.

2. Some basic statistics (grammatical categorie)
<http://www.links.nectec.or.th>;
<http://tanaka-www.cs.titech.ac.jp/~virach/profile.html>;

3. Virach Sornlertlamvanich <virachcs.titech.ac.jp> is involved in NLP
research for Thai. Results are not yet available.

Respnses provided by:
Xiaolin Zhou <zhoupsychology.bbk.ac.uk>
Phillip Elliot <FSKD94Aprodigy.com>
Mark Liberman <mylunagi.cis.upenn.edu>
Hugh Thaweesak Koanantakool <htknectec.or.th>
Thatsanee Charoenporn <thatscnwg.nectec.or.th>
Virach Sornlertlamvanich <virachcs.titech.ac.jp>

Peter Ross
Thai/Linguistics
Australian National University
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue