LINGUIST List 31.2598

Tue Aug 18 2020

FYI: August 2020 Newsletter - LDC

Editor for this issue: Everett Green <everettlinguistlist.org>



Date: 17-Aug-2020
From: Membership Coordinator <ldcldc.upenn.edu>
Subject: August 2020 Newsletter - LDC
E-mail this message to a friend

In this newsletter:
LDC adds DOI Identifier to its Language Resources
Fall 2020 LDC Data Scholarship Program

New Publications:
LORELEI Vietnamese Representative Language Pack
DEFT Chinese Light and Rich ERE Annotation
CALLFRIEND American English – Southern Dialect Second Edition

__

LDC adds DOI Identifier to its Language Resources
As of July 2020, LDC’s language resources include a Digital Object Identifier (DOI), an internationally recognized identification standard for online digital material. DOIs are alpha numeric strings that correspond to URLs and metadata for specified resources. They are expressed as links that resolve to the object’s online location. For example, the DOI for Penn Parsed Corpora of Historical English LDC2020T16 is https://doi.org/10.35111/4hzx-5483, which leads users to the LDC catalog entry for this data set. To facilitate its assignment and administration of DOIs, LDC has joined DataCite, a global DOI provider for research data. (DOIs for resources released before July 2020 will be assigned through a process expected to be completed shortly.) LDC data sets now have four persistent identifiers: a unique LDC number, ISBN, ISLRN, and DOI. Adding DOIs is consistent with our aim to follow best practices for archiving and curating digital resources, evidenced by the CoreTrustSeal certification which recognizes the LDC Catalog as a trustworthy data repository.

Fall 2020 LDC Data Scholarship Program
Student applications for the Fall 2020 LDC Data Scholarship program are being accepted now through September 15, 2020. This scholarship program provides eligible students with no-cost access to LDC data. Students must complete an application consisting of a data use proposal and letter of support from their advisor.

For application requirements and program rules, visit the LDC Data Scholarship page.

__

New publications:
(1) LORELEI Vietnamese Representative Language Pack consists of Vietnamese monolingual text, Vietnamese-English parallel text, annotations, supplemental resources, and related software tools developed by LDC for the DARPA LORELEI program.

Data was collected in the following genres: discussion forum, news, reference, social network, and weblogs. Approximately 75,000 words were annotated for named entities and up to 25,000 words contain additional annotation, including situation frames (identifying entities, needs, and issues) and entity linking and detection.

This corpus is distributed via web download. Non-members may license this data for a fee.

*

(2) DEFT Chinese Light and Rich ERE Annotation contains Chinese discussion forum web text annotated for entities, relations, and events (ERE) using the ERE Light and ERE Rich annotations schemas developed by LDC. Light ERE annotation labels entity mentions for the target set of ERE types between and among those entities, including coreference. Rich ERE annotation expands types and tagging for ERE annotation tasks and replaces event coreference with event hopper annotation. All files in this release (157) were annotated following Light ERE guidelines; a subset (149) were also labeled with Rich ERE annotation.

This corpus is distributed via web download. Non-members may license this data for a fee.

*

(3) CALLFRIEND American English – Southern Dialect Second Edition was developed by LDC and consists of approximately 26 hours of unscripted telephone conversations between native speakers of Southern dialects of American English. This second edition updates the audio files to wav format, simplifies the directory structure, and adds documentation and metadata.

This corpus is distributed via web download. Non-members may license this data for a fee.

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldcldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104


Linguistic Field(s): Computational Linguistics


Page Updated: 18-Aug-2020