LINGUIST List 23.2763
Tue Jun 19 2012
FYI: Taiwan Mandarin Spoken Wordlist
Editor for this issue: Brent Miller
<brentlinguistlist.org>
Date: 19-Jun-2012
From: Shu-Chuan Tseng <tsengsc
gate.sinica.edu.tw>
Subject: Taiwan Mandarin Spoken Wordlist
E-mail this message to a friend
The ''Taiwan Mandarin Spoken Wordlist'' was derived from thetranscripts of 85 Taiwan Mandarin conversations collected andprocessed at Academia Sinica, with a total of 42 hours of speechrecording. The recording took place from 2001 to 2003 and thespeakers' age ranged from 14 to 63. The transcripts were automaticallyprocessed by the CKIP word segmentation and POS tagging system.The results of word segmentation, POS tagging, and character-Pinyinconversion as well as homographs were then manually corrected andedited. As a result, the wordlist consists of 16,683 word types and405,435 word tokens, equivalent to 607,016 syllables.
The Wordlist can be downloaded at
http://mmc.sinica.edu.tw/resources_e_01.htm
Linguistic Field(s): Text/Corpus Linguistics
Page Updated: 19-Jun-2012