LINGUIST List 29.1295

Fri Mar 23 2018

Disc: Request for Comment: Cross-Linguistic Data Formats

Editor for this issue: Kenneth Steimel <kenlinguistlist.org>


Date: 23-Mar-2018
From: Harald Hammarström <haraldbombo.se>
Subject: Request for Comment: Cross-Linguistic Data Formats
E-mail this message to a friend

RFC: Cross-Linguistic Data Formats (CLDF), version 1.0

Resulting from discussions over several years, and triggered in
particular by work presented in the two workshops of the ''Language
Comparison with Linguistic Databases'' series [1,2], we'd like to
request your comments on version 1.0 of CLDF - a specification for
Cross-Linguistic Data Formats (see http://cldf.clld.org).

The specification proposes a standard format for
- wordlists, including cognate judgments and phonetic alignents,
- grammatical structure datasets like WALS features and other typological surveys.

CLDF is built upon W3C's ''Tabular Data and Metadata on the Web''
recommendation [3] and can be thought of as a domain specific adaption
of this in linguistics.

Extensibility is built into CLDF, to allow support of evolving
standards for more complex types of linguistic data. As of version
1.0, modules for simple dictionary data and parallel-text corpora are
included for further experimentation.

CLDF datasets can be read and written using the Python programming
library pycldf (https://pypi.python.org/pypi/pycldf), but also using
off the shelf tools like spreadsheet software or programming
environments like R, because the data file format in CLDF is based on
comma-separated values (CSV).

The CLDF specification is available at
https://github.com/cldf/cldf/blob/master/README.md

Examples of CLDF datasets and how to access CLDF data are provided at
- https://github.com/cldf/cldf/tree/master/examples and
- https://github.com/cldf/cookbook

We welcome all comments, either posted as reply to this announcement or as
issues at https://github.com/cldf/cldf/issues


[1] http://www.mpi.nl/events/language-comparison-with-linguistic-databases-reflex-and-typological-databases
[2] http://www.eva.mpg.de/linguistics/conferences/2014-ws-lanclid2/index.html
[3] https://www.w3.org/TR/tabular-data-model/



Linguistic Field(s): Computational Linguistics
                            Genetic Classification
                            Historical Linguistics
                            Typology



Page Updated: 23-Mar-2018