LINGUIST List 28.825
Mon Feb 13 2017
FYI: Over 95 Million Wikipedia Discussion Comments
Editor for this issue: Yue Chen <yuelinguistlist.org>
Date: 08-Feb-2017
From: Melody Kramer <mkramer
wikimedia.org>
Subject: Over 95 Million Wikipedia Discussion Comments
E-mail this message to a friend Yesterday, Wikipedia released a corpus of all 95 million user and article talk
comments made on Wikipedia between 2001-2015 - It is the largest annotated
dataset of online personal attacks and a corpus of over 95 million Wikipedia
discussion comments.
More information at:
https://blog.wikimedia.org/2017/02/07/scaling-understanding-of-harassment/ Both data sets are available on FigShare, a research repository where users
can share data, to support further research:
https://figshare.com/projects/Wikipedia_Talk/16731 If you’re interested in collaborating with the Wikimedia Foundation on
research in this area, you can find documentation on formal collaborations
here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations You can also follow
WikiResearch on Twitter for news and updates on research,
datasets and APIs from Wikimedia projects or contact the Wikimedia Research
team at research-wmf
wikimedia.org
Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics
Page Updated: 13-Feb-2017