The LINGUIST List is dedicated to providing information on language and language analysis, and to providing the discipline of linguistics with the infrastructure necessary to function in the digital world. LINGUIST is a free resource, run by linguistics students and faculty, and supported primarily by your donations. Please support LINGUIST List during the 2016 Fund Drive.
FYI: Taiwan Mandarin Spoken Wordlist
THE ''TAIWAN MANDARIN SPOKEN WORDLIST'' WAS DERIVED FROM THE
TRANSCRIPTS OF 85 TAIWAN MANDARIN CONVERSATIONS COLLECTED AND
PROCESSED AT ACADEMIA SINICA, WITH A TOTAL OF 42 HOURS OF SPEECH
RECORDING. THE RECORDING TOOK PLACE FROM 2001 TO 2003 AND THE
SPEAKERS' AGE RANGED FROM 14 TO 63. THE TRANSCRIPTS WERE AUTOMATICALLY
PROCESSED BY THE CKIP WORD SEGMENTATION AND POS TAGGING SYSTEM.
THE RESULTS OF WORD SEGMENTATION, POS TAGGING, AND CHARACTER-PINYIN
CONVERSION AS WELL AS HOMOGRAPHS WERE THEN MANUALLY CORRECTED AND
EDITED. AS A RESULT, THE WORDLIST CONSISTS OF 16,683 WORD TYPES AND
405,435 WORD TOKENS, EQUIVALENT TO 607,016 SYLLABLES.
THE WORDLIST CAN BE DOWNLOADED AT