LINGUIST List 30.1580

Wed Apr 10 2019

Calls: Computational Linguistics, Typology/Bulgaria

Editor for this issue: Everett Green <everettlinguistlist.org>


***************** LINGUIST List Support *****************

Fund Drive 2019
29 years of LINGUIST List!
Please support the LL editors and operation with a donation at:
https://funddrive.linguistlist.org/donate/




Date: 04-Apr-2019
From: Harald Hammarström <harald.hammarstromlingfil.uu.se>
Subject: Grammar Data Mining: Extracting Linguistic Features From Grammatical Descriptions
E-mail this message to a friend

Full Title: Grammar Data Mining: Extracting Linguistic Features From Grammatical Descriptions
Short Title: GDM

Date: 05-Sep-2019 - 06-Sep-2019
Location: Varna, Bulgaria
Contact Person: Harald Hammarström
Meeting Email: < click here to access email >
Web Site: https://spraakbanken.gu.se/lsi/sharedtask/

Linguistic Field(s): Computational Linguistics; Typology

Call Deadline: 30-Jun-2019

Meeting Description:

The present Workshop/Shared Task seeks to transform a large set of digitized publications describing the grammars of the languages of the world into structured databases that will enable comparison of different languages at an unprecedented breadth and depth.

There are some 6 500 languages in the world and information about their grammatical characteristics is available in book-form for over 4000 of them. Until recently, extraction of information from grammars has been done exclusively through manual collection. This procedure is naturally bounded by the limits of human capacities, and as such can only target a relatively small amount of languages/characteristics at a substantial time investment in a given time.

We are now entering a phase where it is practical to use NLP tools for a number of similar tasks. A computer may minimally infer some characteristics of the language described simply by counting words used in a grammatical description, e.g., a high-frequency of the term ’suffix’ likely indicates that the language being described uses a lot of suffixes. Further, there are less straightforward or more detailed characteristics traditionally of interest to linguists, such as where the verb is placed in then sentence (beginning, middle, end), the existence and use of participles, possessive constructions, evidentiality and so on. Any techniques from the NLP toolbox such as td-idf-weighting, tagging, parsing and vector spaces may be used in combination and as input in more sophisticated Machine Learning approaches.

In this shared task we provide a subset of the World Atlas of Language Structures (WALS, http://wals.info) along with the digitized sources from which the features were drawn. Sources are provided in raw text form. The task is to infer WALS datapoints from the raw text data of the digitized grammatical descriptions.

Authors should submit a paper of up to 8 pages conforming to the RANLP style guidelines (see http://lml.bas.bg/ranlp2019/submissions.php) describing their technical solution to the specific task.

Workshop paper submission deadline: 30 June 2019
Workshop paper acceptance notification: 28 July 2019
Workshop paper camera-ready version: 20 August 2019
Workshop: 5-6 September 2019

Each submission will be evaluated against a test set of 1000 random datapoints drawn from the same origin as the training data set.

The workshop will be co-located with RANLP http://lml.bas.bg/ranlp2019 in Bulgaria and take place in Hotel ''Cherno More'', Varna, the main RANLP-2019 conference venue.


In this shared task we provide a subset of the World Atlas of Language Structures (WALS, http://wals.info) along with the digitized sources from which the features were drawn. Sources are provided in raw text form. The task is to infer WALS datapoints from the raw text data of the digitized grammatical descriptions.

Call for Papers:

For training data, task, submission instructions, important dates, evaluation and venue, see:

https://spraakbanken.gu.se/lsi/sharedtask/

Programme Committee:

Guillaume Segerer (CNRS, LLACAN, France)
Harald Hammarström (Department of Linguistics and Philology, Uppsala University, Sweden)
Markus Forsberg (Språkbanken, University of Gothenburg, Sweden)
Søren Wichmann (Leiden University Centre for Linguistics, Netherlands)
Shafqat Mumtaz Virk (Språkbanken, University of Gothenburg, Sweden)
Zeljko Agic (IT University of Copenhagen, Denmark)
Erich Round (University of Queensland, Australia)
Sebastian Nordhoff (LangSci Press, Germany)




Page Updated: 10-Apr-2019