LINGUIST List 12.1369

Fri May 18 2001

Support: Rosetta Project 1,000 Lang Archive: Researcher

Editor for this issue: Jody Huellmantel <>


  1. Jim Mason, Rosetta Project 1,000 Language Archive: Researcher/Intern, San Francisco, CA

Message 1: Rosetta Project 1,000 Language Archive: Researcher/Intern, San Francisco, CA

Date: Thu, 17 May 2001 17:22:28 -0700
From: Jim Mason <>
Subject: Rosetta Project 1,000 Language Archive: Researcher/Intern, San Francisco, CA

Researcher/Intern opportunity: The Rosetta Project 1,000 Language Archive

We are looking for linguistics students/professionals interested in
helping with archive research for The Rosetta Project 1,000 Language
Archive. The Rosetta Project is an attempt to create a broad corpus
of language descriptions, vernacular texts, analytic materials and
audio files for 1,000+ languages in a publicly accessible, online
archive ( Our goal is to create a
meaningful survey and near permanent archive of 1,000 languages as
well as a unique platform for contemporary comparative linguistic
research and education.

We are assembling a group of 5 researchers for the summer of 2001 to
help collect and assess a variety of materials to build the archive.
Most of this research will take place in the stacks at Stanford and
Berkeley as well as scanning and image processing in the offices of
the Long Now Foundation in San Francisco. Payment will be on a
"pay-per-text" basis. We pay $10 per text collected, which
should work out to a minimum of $15 an hour. If you get efficient at
the process, you can make significantly more.

Most of the materials in the Rosetta archive are excerpts of already
published texts so the collection effort focuses on locating,
excepting and formatting published materials in various archives and
personal libraries. We are excerpting and disseminating these
materials under Fair Use provisions where appropriate or with specific
permission when we are reproducing entire publications.

The texts we are collecting for each language are as follows:

- Genesis translations: We have collected Genesis Ch 1-3 translations
in 1,000 languages, most of which can be seen at We invite more, but this component is
mostly completed.

- Glossed vernacular texts: A cultural specific counterpoint to the Genesis
text with an interlinear morphemic analysis. We will substitute other
vernacular texts if a glossed origin story is unavailable or culturally

- Orthographies: The writing system(s) of the language with pronunciation
guide ideally in IPA. Multiple or competing or historic orthographies are
especially encouraged.

- Swadesh word lists: The Swadesh 100 word list.

- Inventories of phonemes.

- Morphology and Syntax: Short sketches of 7 pages or under. We do not
want full descriptive grammars.

- Audio files: Sample of spoken language with transcription and ideally a

- Detailed descriptions: Origin and current distribution of language, number
of speakers, family, typology, history, etc. Descriptions that extend past
the current Ethnologue description for a language.

Though we are primarily looking for people to work in and around our office
in San Francisco, proposals to work in other archives in the US or around
the world will also be considered. Off-site collection efforts will
likewise be paid via a "pay-per-text" basis and collections must focus on
materials needed for the Rosetta Archive.

For more information on the project, please see
and/or email

Thank you,

Jim Mason
Director, The Rosetta Project
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue