Editor for this issue: Martin Jacobsen <marty
linguistlist.org>
Announcing a NEW RELEASE from the LINGUISTIC DATA CONSORTIUM Boston University Radio Speech Corpus The Boston University Radio Speech Corpus was collected by Mari Ostendorf of Boston University, primarily to support research in text-to-speech synthesis, particularly generation of prosodic patterns. The corpus consists of professionally read radio news data, including speech and accompanying annotations, suitable for speech and language research. The corpus includes speech from seven (4 male, 3 female) FM radio news announcers associated with WBUR, a public radio station. The main radio news portion of the corpus consists of over seven hours of news stories recorded in the WBUR radio studio during broadcasts over a two year period. In addition, the announcers were also recorded in a laboratory at Boston University. In this, the lab news portion, the announcers read a total of 24 stories from the radio news portion. The announcers were first asked to read the stories in their non-radio style and then, 30 minutes later. to read the same stories in their radio style. Each story read by an announcer was digitized in paragraph size units, which typically include several sentences. The files were digitized at a 16k Hz sample rate using a 16 bit A/D. The paragraphs were annotated with the orthographic transcription, phonetic alignments, part-of-speech tags and prosodic markers. The orthographic transcripts were generated by hand and include indication of where the speaker took a breath. The phonetic alignments and part-of-speech tags were generated automatically and hand corrected. The prosodic labels were marked by hand and are available only for a subset of the corpus. Institutions that have membership in the LDC for either the 1996 or 1997 Membership Year will be able to receive the BU Radio Corpus at no additional charge, in the same manner as all other speech corpora published by the LDC. Nonmembers can receive a copy of this corpus for research purposes only for a fee of US$400. If you would like to order a copy of this corpus, please email your request to ldcMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueunagi.cis.upenn.edu. If you need additional information before placing your order, or would like to inquire about membership in the LDC, please send email or call (215) 898-0464. Further information about the LDC and its available corpora can be accessed on the Linguistic Data Consortium WWW Home Page at URL http://www.ldc.upenn.edu/. Information is also available via ftp at ftp.cis.upenn.edu under pub/ldc; for ftp access, please use "anonymous" as your login name, and give your email address when asked for password.
Nick Caffrey asked >Does anyone have details of online Spanish corpora? There is a corpus of written Argentine and Chilean Spanish, and transcribed spoken Peninsular Spanish online at http://lola.lllf.uam.es Because of recent technical difficulties, it may be temporarily inaccessible, but keep trying. - ------------------------------------------------------------------ Lee Hartman Dept. of Foreign Languages Southern Illinois University Carbondale, IL 62901-4521 U.S.A.Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue