LINGUIST List 32.2408

Fri Jul 16 2021

FYI: July 2021 Newsletter - LDC

Editor for this issue: Everett Green <everettlinguistlist.org>



Date: 15-Jul-2021
From: Membership Coordinator <ldcldc.upenn.edu>
Subject: July 2021 Newsletter - LDC
E-mail this message to a friend

In this newsletter:
LDC Submissions: a new platform for sharing data through LDC
Fall 2021 LDC Data Scholarship Program

New Publications:
Ethnobotanical Research and Language Documentation of Nahuatl
Chinese Abstract Meaning Representation 2.0
BOLT Egyptian Arabic Co-reference – Discussion Forum, SMS/Chat, and Conversational Telephone Speech

__
LDC Submissions: a new platform for sharing data through LDC
LDC is pleased to announce the launch of LDC Submissions, a platform that provides infrastructure and resources for sharing data through the Catalog. After registering for a user account, corpus submitters can create a submission, upload files, and communicate with LDC’s publications team during the review process. After all reviews are complete, the final, release-ready version of your data set is uploaded to the platform and enters the publications queue.

Sharing your corpus through LDC ensures access to the global research community and the permanent preservation of your data according to best practices for archiving digital language resources. Get started and register for an LDC Submissions user account today.

Fall 2021 LDC Data Scholarship Program
Student applications for the Fall 2021 LDC Data Scholarship program are being accepted now through September 15, 2021. This program provides eligible students with no-cost access to LDC data. Students must complete an application consisting of a data use proposal and letter of support from their advisor.
__

New publications:
(1) Ethnobotanical Research and Language Documentation of Nahuatl consists of approximately 190 hours of field recordings collected in the Sierra Nororiental and Sierra Norte regions of Puebla, Mexico. The corpus contains audio and video recordings of native Nahuatl speakers during the collection of particular plants; partial transcripts (Nahuatl and Spanish); a Highland Puebla Nahuat dictionary; botanical and ethnobotanical data; and speaker metadata.

Ethnobotanical Research and Language Documentation of Nahuatl is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus provided they have submitted a completed copy of the special license agreement. 2021 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.

*

(2) Chinese Abstract Meaning Representation 2.0 was developed by Brandeis University and Nanjing Normal University and is comprised of semantic representations of a set of approximately 20,000 Chinese sentences from Chinese Treebank (CTB) 8.0 (LDC2013T21). CAMR 2.0 includes the content of Chinese Abstract Meaning Representation 1.0 (LDC2019T07) (CTB 8.0 weblog and discussion forum sentences), plus an additional 9,933 sentences from the newswire portion of CTB 8.0.

Chinese Abstract Meaning Representation 2.0 is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus. 2021 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.

*

(3) BOLT Egyptian Arabic Co-reference – Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by Raytheon BBN Technologies. Co-reference annotation aims to fill in the connections between specific mentions in the text that refer to the same entities and events in the discourse context. BOLT co-reference annotation was performed on BOLT treebank annotation. It covers noun phrases (including proper nouns, nominals, pronouns and null arguments), possessives, proper noun pre-modifiers, and verbs.

BOLT Egyptian Arabic Co-reference is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus. 2021 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.

Linguistic Field(s): Computational Linguistics


Page Updated: 16-Jul-2021