LINGUIST List 25.2588
Tue
Jun 17 2014
FYI: Chinese Spoken
Wordlist Database
Editor for this issue:
Uliana Kazagasheva <ulianalinguistlist.org>
Date: 15-Jun-2014
From: Shu-Chuan Tseng
<tsengsc
gate.sinica.edu.tw>
Subject: Chinese Spoken
Wordlist Database
E-mail this message to a
friend
The ''Chinese Spoken Wordlist'' was derived
from the transcripts of 85 Taiwan Mandarin
conversations collected and processed at
Academia Sinica, with a total of 42 hours of
speech recording. The recording took place from
2001 to 2003 and the speakers' age ranged from
14 to 63. The transcripts were automatically
processed by the CKIP word segmentation and POS
tagging system. The results of word
segmentation, POS tagging, and character-Pinyin
conversion as well as homographs were then
manually corrected and edited. As a result, the
wordlist consists of 16,683 word types and
405,435 word tokens, equivalent to 607,016
syllables.
To access the ''Chinese Spoken Wordlist''
please see:
http://mmc.sinica.edu.tw/resources_e_02.html
Linguistic Field(s): Computational Linguistics;
Language Acquisition
Subject Language(s):
Chinese, Mandarin (cmn)
Page Updated: 17-Jun-2014