LINGUIST List 32.1361

Mon Apr 19 2021

FYI: March 2021 Newsletter - LDC

Editor for this issue: Everett Green <everettlinguistlist.org>



Date: 15-Apr-2021
From: Membership Coordinator <ldcldc.upenn.edu>
Subject: March 2021 Newsletter - LDC
E-mail this message to a friend

In this newsletter:
New Publications:
X-SRL: Parallel Cross-lingual Semantic Role Labeling
TAC KBP English Sentiment Slot Filling – Comprehensive Training and Evaluation Data 2013-2014
________________________________________

New publications:
(1) X-SRL: Parallel Cross-lingual Semantic Role Labeling was developed by Heidelberg University, Department of Computational Linguistics and the Leibniz Institute for the German Language (IDS). It consists of approximately three million words of German, French, and Spanish annotated for semantic role labeling. The texts are translations of the English portion of 2009 CoNLL Shared Task Part 2 (LDC2012T04). All sentences have annotations for verbal predicates and share the original English Propbank label set across the four languages.

The 2009 CoNLL Shared Task developed syntactic dependency annotations, including the semantic dependency model roles of both verbal and nominal predicates. The following English data was used in the shared task:

- Treebank-2 (LDC95T7): over one million words of annotated English newswire and other text developed by the University of Pennsylvania
- Proposition Bank I (LDC2004T14): semantic annotation of newswire text from Treebank-2 developed by the University of Pennsylvania
- NomBank v 1.0 (LDC2008T23): argument structure for instances of common nouns in Treebank-2 and Treebank-3 (LDC99T42), developed by New York University

For X-SRL, the English source data was automatically translated using DeepL. Automatic tokenization, lemmatization, part-of-speech tagging, and syntactic parsing were then applied to the text. The data was divided into train, development, and test partitions. Semantic labels were transferred for the train and development sections, and the test sentences were validated for translation quality, alignment, label transfer, and filtering.

X-SRL: Parallel Cross-lingual Semantic Role Labeling is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus. 2021 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.

*

(2) TAC KBP English Sentiment Slot Filling – Comprehensive Training and Evaluation Data 2013-2014 was developed by LDC and contains training and evaluation data produced in support of the 2013 and 2014 TAC KBP Sentiment Slot Filling tracks. The data in this release includes queries, manual runs (human-produced query responses), and assessment results for human- and system-produced query responses. Source data was English news and web text.

The regular English Slot Filling track involved mining information about entities from text using a specified set of "slots", or attributes. The goal of the Sentiment Slot Filling task was to evaluate the quality of detectors for positive and negative sentiment.

TAC KBP English Sentiment Slot filling – Comprehensive Training and Evaluation Data 2013-2014 is distributed via web download.

2021 Subscription Members will automatically receive copies of this corpus. 2021 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.

Membership Coordinator
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldcldc.upenn.edu
M: 3600 Market St. Suite 810
Philadelphia, PA 19104

Linguistic Field(s): Computational Linguistics


Page Updated: 19-Apr-2021