LINGUIST List 12.3069

Mon Dec 10 2001

Jobs: Text/Database Curator: The Rosetta Proj, CA USA

Editor for this issue: Heather Taylor-Loring <>


  1. Jim Mason, Text/Database Curator: The Rosetta Project, San Francisco CA USA

Message 1: Text/Database Curator: The Rosetta Project, San Francisco CA USA

Date: Fri, 7 Dec 2001 18:45:43 -0800
From: Jim Mason <>
Subject: Text/Database Curator: The Rosetta Project, San Francisco CA USA

Text/Database Curator: The Rosetta Project (
The Long Now Foundation
Contact: Director, Jim Mason at

The Rosetta Project is looking for an individual to work processing
scans of linguistic texts and help with general database management.

Tasks include selecting linguistic texts for our database; scanning,
formatting and image processing; language name reconciliation; minor
perl scripting and other database/network support work. These tasks
can range from the rather boring (i.e. hours of scanning) to the very
interesting and specialized work of curating materials for our global
language online archive (see You have
to be willing to do both the interesting and boring parts to be
successful in this job.

The proper person for this job is a generalist with a wide range of
skills and in possession of a flexible character. The job requires
both computational skills as well as familiarity with and interest in
descriptive linguistics. You will be required to process a very large
amount of material on a daily basis, covering the range of world
languages and digital tools. You will have to work quickly and with
accuracy. It is critical that you be able to solve problems on your
own and not require hand-holding through technical challenges.
Familiarity with Photoshop, Debablizer, Adobe Capture, Linux, Win NT,
OCR systems, HTML, and general comfort in digital environments is a

This work is paid on a contract basis at $15 an hour. The job is in
the offices of The Long Now Foundation, located in the Presidio in San
Francisco. From your desk, you will look out over the Bay and watch
the ships sail under the Golden Gate . . . ;-)

More information on the project is included below. Letters of
interest with C.V.s should be sent to

Project Description:

The Rosetta Project is creating a broad corpus of language
descriptions, vernacular texts, analytic materials and audio files for
1,000+ languages in a publicly accessible, online archive. Our
intention is to create a meaningful survey and near permanent archive
of 1,000 languages as well as a unique platform for contemporary
comparative linguistic research and education. The text types we are
collecting for each language are explained in detail on the
site. (
We are creating this broad language archive through an open
contribution, open review process, similar to the strategy that
created the Oxford English Dictionary- though in this case, we hope
the Internet speeds the process a little bit. . . ;-) And to help the
process along, we are running collection efforts at Stanford,
Berkeley, Yale, SIL, and various linguistic organizations.

Most of the material in our database is excerpted from already
published materials, but we are also bringing some new material to
publication for the first time. In general, our interest is in
collecting, preserving, and making available the many riches of
descriptive linguistic work- work that is often difficult to access,
unorganized, or rotting away in file cabinets without a proper home.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue