LINGUIST List 28.2007

Mon May 01 2017

FYI: Languages of Indonesia - Data Deposit

Editor for this issue: Yue Chen <>

Date: 28-Apr-2017
From: Bradley Taylor <>
Subject: Languages of Indonesia - Data Deposit
E-mail this message to a friend

Dear Colleagues,

We are pleased to announce that the Jakarta Field Station of the Max Planck
Institute for Evolutionary Anthropology (MPI-EVA), along with its
collaborating projects, has just finalised the deposit of its corpora in The
Language Archive (TLA) at the Max Planck Institute for Psycholinguistics:

The Jakarta Field Station was a major field project of the former Department
of Linguistics at MPI-EVA. Based in Jakarta, with field assistants working in
various locations across Indonesia, it operated between 1999 and 2015 with the
primary purpose of recording and documenting languages of the region. Together
with collaborating projects and scientists, it gathered over 2.3 million
transcribed utterances from primarily naturalistic language recordings. An
archive of the Field Station's website can be found here:

Most utterances are fully glossed into English and translated into either
English or Indonesian or both. All have session and speaker metadata and, in
the TLA, are in Toolbox format, with many in ELAN format as well. All data are
open-access, can be downloaded, and are free to use, with appropriate

Some rough tallies:

Transcribed sessions: 2,800
Text records (~utterances): 2.3 million
Words (tokens): 8.7 million
Recorded audio (WAV): 2,000 files, 1,100 hours
Recorded video (MPEG): 1,600 files, 1,150 hours

In addition to the above, csv text files - one per entity type (texts,
sessions, speakers, etc) - can be downloaded here:

Bradley Taylor

David Gil

Linguistic Field(s): Language Documentation
Text/Corpus Linguistics

Language Family(ies): Austronesian

Page Updated: 01-May-2017