LINGUIST List 32.3625

Tue Nov 16 2021

FYI: November 2021 Newsletter - LDC

Editor for this issue: Everett Green <everettlinguistlist.org>



Date: 15-Nov-2021
From: Membership Coordinator <ldcldc.upenn.edu>
Subject: November 2021 Newsletter - LDC
E-mail this message to a friend

In this newsletter:
Join LDC for Membership Year 2022
Spring 2022 Data Scholarship Application Deadline

New Publications:
BOLT Egyptian Arabic PropBank and Sense – Discussion Forum, SMS/Chat, and Conversational Telephone Speech
Second DIHARD Challenge Development – Eleven Sources
Second DIHARD Challenge Development - SEEDLingS

--
Join LDC for Membership Year 2022
Membership Year 2022 (MY2022) is open and discounts are available for those who keep their membership current and join early. Current MY2021 members who renew their LDC membership before March 1, 2022 will receive a 10% discount off the membership fee. New or returning organizations will receive a 5% discount when joining by March 1.

Visit Join LDC for details on membership, user accounts and payment.
Spring 2022 Data Scholarship Application Deadline
Applications are now being accepted through January 15, 2022 for the Spring 2022 LDC Data Scholarship program which provides university students with no-cost access to LDC data. Consult the LDC Data Scholarship page for more information about program rules and submission requirements.
--

New publications:
(1) BOLT Egyptian Arabic PropBank and Sense – Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) for the DARPA BOLT program and consists of propbank annotation on Egyptian Arabic informal text and telephone speech.

BOLT Egyptian Arabic PropBank and Sense – Discussion Forum, SMS/Chat, and Conversational Telephone Speech is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus. 2021 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.

*

(2) Second DIHARD Challenge Development - Eleven Sources was developed by LDC and contains approximately 22 hours of English and Chinese speech data along with corresponding annotations used in support of the Second DIHARD Challenge.

Second DIHARD Challenge Development – Eleven Sources is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus. 2021 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.

*

(3) Second DIHARD Challenge Development - SEEDLingS was developed by Duke University and LDC and contains approximately two hours of English child language recordings along with corresponding annotations used in support of the Second DIHARD Challenge. The DIHARD Challenges are a set of shared tasks on diarization focusing on "hard" diarization; that is, speech diarization for challenging corpora where there was an expectation that existing state-of-the-art systems would fare poorly.

Source data is from the SEEDLingS (The Study of Environmental Effects on Developing Linguistic Skills) corpus, designed to investigate how infants' early linguistic and environmental input plays a role in their learning. Recordings were generated in the home environment of infants in the Rochester, New York area. A subset of that data was annotated by LDC for use in the first and second DIHARD Challenges.

The data in this release consists of files provided in the Second DIHARD Challenge as well as subsequently updated annotated files not provided to second challenge participants.

Second DIHARD Challenge Development – SEEDLingS is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus provided they have submitted a completed copy of the special license agreement. 2021 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.


Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldcldc.upenn.ed


Linguistic Field(s): Computational Linguistics


Page Updated: 16-Nov-2021