LINGUIST List 29.913

Mon Feb 26 2018

FYI: Announcing the SFU Opinion and Comments Corpus

Editor for this issue: Kenneth Steimel <>

Date: 26-Feb-2018
From: Maite Taboada <>
Subject: Announcing the SFU Opinion and Comments Corpus
E-mail this message to a friend

The Discourse Processing Lab at Simon Fraser University ( is pleased to announce the release of the SFU Opinion and Comments Corpus.

The SFU Opinion and Comments Corpus (SOCC) is a corpus for the analysis of online news comments. Our corpus contains comments and the articles from which the comments originated. The articles are all opinion articles, not hard news articles. The corpus is larger than any other currently available comments corpora, and has been collected with attention to preserving reply structures and other metadata. In addition to the raw corpus, we also present annotations for four different phenomena: constructiveness, toxicity, negation and its scope, and appraisal.

Full details, and download link, are available from our GitHub project page:

For more information about this work, please see our papers.

Kolhatkar, V., H. Wu, L. Cavasso, E. Francis, K. Shukla and M. Taboada (2018) The SFU Opinion and Comments Corpus: A corpus for the analysis of online news comments. Journal paper under review.

Kolhatkar. V. and M. Taboada (2017) Using New York Times Picks to identify constructive comments. Proceedings of the Workshop Natural Language Processing Meets Journalism, Conference on Empirical Methods in Natural Language Processing. Copenhagen. September 2017.

Kolhatkar, V. and M. Taboada (2017) Constructive language in news comments. Proceedings of the 1st Abusive Language Online Workshop, 55th Annual Meeting of the Association for Computational Linguistics. Vancouver. August 2017, pp. 11-17.


Varada Kolhatkar (
Maite Taboada (

Linguistic Field(s): Computational Linguistics; Discourse Analysis; Text/Corpus Linguistics

Subject Language(s): English (eng)

Page Updated: 26-Feb-2018