LINGUIST List 31.112

Wed Jan 08 2020

Confs: Clinical Ling, Comp Ling, Text/Corpus Ling / Spain

Editor for this issue: Sarah Robinson <>

Date: 08-Jan-2020
From: Ari Klein <>
Subject: The 5th Social Media Mining for Health Applications Shared Task
E-mail this message to a friend

The 5th Social Media Mining for Health Applications Shared Task
Short Title: #SMM4H 2020

Date: 13-Sep-2020 - 13-Sep-2020
Location: Barcelona, Spain
Contact: Ari Klein
Contact Email: < click here to access email >

Linguistic Field(s): Clinical Linguistics; Computational Linguistics; Text/Corpus Linguistics

Meeting Description:

The Social Media Mining for Health Applications (#SMM4H) Shared Task involves natural language processing (NLP) challenges of using social media data for health research, including informal, colloquial expressions and misspellings of clinical concepts, noise, data sparsity, ambiguity, and multilingual posts. For each of the five tasks below, participating teams will be provided with a set of annotated tweets for developing systems, followed by a three-day window during which they will run their systems on unlabeled test data. For additional details about the tasks and information about registration, data access, paper submissions, and presentations, go to

Task 1: Automatic classification of tweets that mention medications
This binary classification task involves distinguishing tweets that mention a medication or dietary supplement from those that do not.

Task 2: Automatic classification of multilingual tweets that report adverse effects
This binary classification task involves distinguishing tweets that report an adverse effect (AE) of a medication from those that do not, taking into account subtle linguistic variations between AEs and indications (i.e., the reason for using the medication). This task includes distinct sets of tweets posted in English, Spanish, French, and Russian.

Task 3: Automatic extraction and normalization of adverse effects in English tweets
This task is an end-to-end task that involves extracting the span of text containing an adverse effect (AE) of a medication from tweets that report an AE, and then mapping the extracted AE to a standard concept ID in the MedDRA vocabulary (preferred terms).

Task 4: Automatic characterization of chatter related to prescription medication abuse in tweets
This multi-class classification task involves distinguishing, among tweets that mention at least one prescription opioid, benzodiazepine, atypical anti-psychotic, central nervous system stimulant or GABA analogue, tweets that report potential abuse/misuse from those that report non-abuse/-misuse consumption, merely mention the medication, or are unrelated.

Task 5: Automatic classification of tweets reporting a birth defect pregnancy outcome
This multi-class classification task involves distinguishing three classes of tweets that mention birth defects: “defect” tweets refer to the user’s child and indicate that he/she has the birth defect mentioned in the tweet; “possible defect” tweets are ambiguous about whether someone is the user’s child and/or has the birth defect mentioned in the tweet; “non-defect” tweets merely mention birth defects.

Call for Papers:

Paper Submission and Presentation Information:

Participating teams are required to submit a paper describing the system(s) they ran on the test data. The system description may consist of up to two pages, plus unlimited references. Sample description systems can be found in pages 89-136 of the #SMM4H 2019 proceedings. Accepted system descriptions will be included in the #SMM4H 2020 proceedings. We encourage, but do not require, at least one author of each accepted system description to register for the #SMM4H 2020 Workshop, co-located at COLING 2020, and present their system as a poster. Select participants, as determined by the program committee, will be invited to extend their system description to up to four pages, plus unlimited references, and present their system orally. All paper submissions must follow the COLING guidelines ( and be submitted as a PDF using the Softconf START Conference Manager.

Important Dates

Training data available: January 15, 2020 (may be sooner for some tasks)
Test data available: April 2, 2020
System predictions for test data due: April 5, 2020
System description paper submission deadline: May 5, 2020
Notification of acceptance of system description papers: June 10, 2020
Camera-ready papers due: June 30, 2020
Workshop: September 13, 2020

Page Updated: 08-Jan-2020