LINGUIST List 29.1295
Fri Mar 23 2018
Disc: Request for Comment: Cross-Linguistic Data Formats
Editor for this issue: Kenneth Steimel <kenlinguistlist.org>
Date: 23-Mar-2018
From: Harald Hammarström <harald
bombo.se>
Subject: Request for Comment: Cross-Linguistic Data Formats
E-mail this
message to a friend RFC: Cross-Linguistic Data Formats (CLDF),
version 1.0
Resulting from discussions over several years, and triggered in
particular by work presented in the two workshops of the ''Language
Comparison
with Linguistic Databases'' series [1,2], we'd like to
request your comments on
version 1.0 of CLDF - a specification for
Cross-Linguistic Data Formats (see
http://cldf.clld.org).
The
specification proposes a standard format for
- wordlists, including cognate
judgments and phonetic alignents,
- grammatical structure datasets like WALS
features and other typological surveys.
CLDF is built upon W3C's ''Tabular
Data and Metadata on the Web''
recommendation [3] and can be thought of as a
domain specific adaption
of this in linguistics.
Extensibility is built
into CLDF, to allow support of evolving
standards for more complex types of
linguistic data. As of version
1.0, modules for simple dictionary data and
parallel-text corpora are
included for further experimentation.
CLDF
datasets can be read and written using the Python programming
library pycldf (
https://pypi.python.org/pypi/pycldf),
but also using
off the shelf tools like spreadsheet software or programming
environments
like R, because the data file format in CLDF is based on
comma-separated values
(CSV).
The CLDF specification is available at
https://github.com/cldf/cldf/blob/master/README.md
Examples of CLDF datasets and how to access CLDF data are provided at
-
https://github.com/cldf/cldf/tree/master/examples
and
-
https://github.com/cldf/cookbook
We welcome all comments, either posted as reply to this announcement or as
issues at
https://github.com/cldf/cldf/issues
[1]
http://www.mpi.nl/events/language-comparison-with-linguistic-databases-reflex-and-typological-databases
[2]
http://www.eva.mpg.de/linguistics/conferences/2014-ws-lanclid2/index.html
[3]
https://www.w3.org/TR/tabular-data-model/
Linguistic Field(s): Computational Linguistics
Genetic
Classification
Historical
Linguistics
Typology
Page Updated: 23-Mar-2018