Software Details
| Title: | New Resources from the LDC |
|---|---|
| Submitter: | Linguistic Data Consortium |
| Description: | New Publications from the LDC The CSLU: Spelled and Spoken Words corpus consists of spelled and spoken words. 3647 callers were prompted to to say and spell their first and last names, to say what city they grew up in and what city they were calling from, and to answer two yes/no questions. In order to collect sufficient instances of each letter, 1371 callers also recited the English alphabet with pauses between the letters. Each call was transcribed by two people, and all differences were resolved. In addition, a subset of 2648 calls has been phonetically labeled. Korean Propbank is a semantic annotation of the Korean English Treebank Annotations and Korean Treebank version 2.0. Each verb and adjective occurring in the Treebank has been treated as a semantic predicate and the surrounding text has been annotated for arguments and adjuncts of the predicate. The verbs and adjectives have also been tagged with coarse grained senses. There are two basic components to Korean Propbank: 1. The Verb Lexicon. A frames file, consisting of one or more frame sets, has been created for each predicate occurring in the Treebank. These files serve as a reference for the annotators and for users of the data. 2,749 such files have been created. 2. The Annotation. There are two annotation files. The virginia-verbs.pb file has 9,588 annotated predicate tokens. These predicate tokens include all those occurring in over 54 thousand words of the Korean English Treebank Annotations, totaling ~791 KB of uncompressed data. The newswire-verbs.pb file has 23,707 annotated predicate tokens. These predicate tokens include all those occurring in over 131 thousand words of the Korean Treebank version 2.0. The Speech Controlled Computing corpus was designed to support the development of small footprint, embedded ASR applications in the domain of voice control for the home. It consists of the recordings of 125 speakers of American English from four regions, three age groups and two gender groups, pronouncing isolated words. The recordings were conducted in a sound-attenuated room, and a high-quality microphone was used. Each speaker read a randomized word list consisting of 2100 words (100 distinct words appearing 21 times each). NOTE: Nonmembers may obtain a commercial rights license to Speech Controlled Computing for US$7000 by signing the LDC User License Agreement for Speech Controlled Computing. For-Profit Membership to the LDC is not required. If you need further information, or would like to inquire about membership to the LDC, please email ldc@ldc.upenn.edu or call +1 215 573 1275. |
| Linguistic Field(s): |
Computational Linguistics Text/Corpus Linguistics |
| LL Issue: | 17.1053 |
| Date Posted: | 07-Apr-2006 |


