Editor for this issue: T. Daniel Seely <seely
linguistlist.org>
Recently I posted a query for information on Word Frequency for East/Southeast Asian languages. Following is a summary of responses received (on Chinese) as well as some additional information from enquiries off the list (on Thai). There is still a lot of work to be done here. Word segmentation has been mentioned as a key issue for both Chinese and Thai. MANDARIN CHINESE: 1. Beijing Yuyan Xueyuan Yuyan Jiaoxue Yanjiu Suo. 1986. Xiandai Hanyu Pinlu Cidian [Modern Chinese Frequency Dictionary]. Beijing:Beijing Yuyan Xueyuan Chubanshe. [Beijing Institute of Language Press]. 2. "Dictionary of Usage Frequency of Modern Chinese Words", 1990 Beijing University of Aeronauties and Astronautics Press. 3. Linguistic Data Consortium (http://www.ldc.upenn.edu) Corpus of Mandarin conversational speech. 100 30-minute, telephone conversations, 10 minutes each conversation, transcribed. Soon be published on CD-ROM. In principle, this corpus enables estimation of word frequencies for contemporary conversational Mandarin. 4. There are several automatically- or semi-automatically segmented Mandarin text corpora around, eg Academia Sinica, Taiwan. THAI 1. Yuen Poovoravan, Kasetsart University has done some preliminary research on Thai word frequency. 2. Some basic statistics (grammatical categorie) <http://www.links.nectec.or.th> <http://tanaka-www.cs.titech.ac.jp/~virach/profile.html> 3. Virach Sornlertlamvanich <virachMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecs.titech.ac.jp> is involved in NLP research for Thai. Results are not yet available. Respnses provided by: Xiaolin Zhou <zhou
psychology.bbk.ac.uk> Phillip Elliot <FSKD94A
prodigy.com> Mark Liberman <myl
unagi.cis.upenn.edu> Hugh Thaweesak Koanantakool <htk
nectec.or.th> Thatsanee Charoenporn <thatsc
nwg.nectec.or.th> Virach Sornlertlamvanich <virach
cs.titech.ac.jp> Peter Ross Thai/Linguistics Australian National University