LINGUIST List 23.2763
|
Tue Jun 19 2012
FYI: Taiwan Mandarin Spoken Wordlist
Editor for this issue: Brent Miller
<brent linguistlist.org>
|
Date: 19-Jun-2012
From: Shu-Chuan Tseng <tsengsc gate.sinica.edu.tw>
Subject: Taiwan Mandarin Spoken Wordlist
E-mail this message to a friend
The ''Taiwan Mandarin Spoken Wordlist'' was derived from the transcripts of 85 Taiwan Mandarin conversations collected and processed at Academia Sinica, with a total of 42 hours of speech recording. The recording took place from 2001 to 2003 and the speakers' age ranged from 14 to 63. The transcripts were automatically processed by the CKIP word segmentation and POS tagging system. The results of word segmentation, POS tagging, and character-Pinyin conversion as well as homographs were then manually corrected and edited. As a result, the wordlist consists of 16,683 word types and 405,435 word tokens, equivalent to 607,016 syllables. The Wordlist can be downloaded at http://mmc.sinica.edu.tw/resources_e_01.htm
Linguistic Field(s): Text/Corpus Linguistics
Read more issues|LINGUIST home page|Top of issue
|
|
Page Updated: 19-Jun-2012
|
|
About LINGUIST
|
Contact Us
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.
|
|