LINGUIST List 28.825

Mon Feb 13 2017

FYI: Over 95 Million Wikipedia Discussion Comments

Editor for this issue: Yue Chen <yuelinguistlist.org>


Date: 08-Feb-2017
From: Melody Kramer <mkramerwikimedia.org>
Subject: Over 95 Million Wikipedia Discussion Comments
E-mail this message to a friend

Yesterday, Wikipedia released a corpus of all 95 million user and article talk
comments made on Wikipedia between 2001-2015 - It is the largest annotated
dataset of online personal attacks and a corpus of over 95 million Wikipedia
discussion comments.

More information at:

https://blog.wikimedia.org/2017/02/07/scaling-understanding-of-harassment/

Both data sets are available on FigShare, a research repository where users
can share data, to support further research:

https://figshare.com/projects/Wikipedia_Talk/16731

If you’re interested in collaborating with the Wikimedia Foundation on
research in this area, you can find documentation on formal collaborations
here:

https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations

You can also follow WikiResearch on Twitter for news and updates on research,
datasets and APIs from Wikimedia projects or contact the Wikimedia Research
team at research-wmfwikimedia.org

Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics

Page Updated: 13-Feb-2017