* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *


LINGUIST List 23.2763

Tue Jun 19 2012

FYI: Taiwan Mandarin Spoken Wordlist

Editor for this issue: Brent Miller <brentlinguistlist.org>

Date: 19-Jun-2012
From: Shu-Chuan Tseng <tsengscgate.sinica.edu.tw>
Subject: Taiwan Mandarin Spoken Wordlist
E-mail this message to a friend

The ''Taiwan Mandarin Spoken Wordlist'' was derived from the
transcripts of 85 Taiwan Mandarin conversations collected and
processed at Academia Sinica, with a total of 42 hours of speech
recording. The recording took place from 2001 to 2003 and the
speakers' age ranged from 14 to 63. The transcripts were automatically
processed by the CKIP word segmentation and POS tagging system.
The results of word segmentation, POS tagging, and character-Pinyin
conversion as well as homographs were then manually corrected and
edited. As a result, the wordlist consists of 16,683 word types and
405,435 word tokens, equivalent to 607,016 syllables.

The Wordlist can be downloaded at

http://mmc.sinica.edu.tw/resources_e_01.htm

Linguistic Field(s): Text/Corpus Linguistics

Read more issues|LINGUIST home page|Top of issue



Page Updated: 19-Jun-2012

Supported in part by the National Science Foundation       About LINGUIST    |   Contact Us       ILIT Logo
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.