LINGUIST List 2.367

Saturday, 27 July 1991

FYI: Establishment of Linguistic Data Consortium

Editor for this issue: <>


  1. Beth Preston, Establishment of Linguistic Data Consortium

Message 1: Establishment of Linguistic Data Consortium

Date: Fri, 26 Jul 91 10:12:50 EDT
From: Beth Preston <>
Subject: Establishment of Linguistic Data Consortium
The following is an announcement from the U.S. Government, published
in the Commerce Business Daily:
Commerce Business Daily
DUE 081991 POC Mr. Charles L. Wayne, DARPA/SISTO, (703)696-2259. The
Government recognizes that large amounts of linguistic data (e.g., speech,
text, lexicons, and grammars) are needed to produce effective speech and text
processing systems. DARPA intends to support and stimulate the establishment
of a consortium that will develop and distribute such data. Five million
dollars has been earmarked for this purpose from funds recently provided by
Congress to DARPA for Pre-Competitive Technology Development. MOTIVATION -
Many groups are developing advanced technology for speech recognition,
understanding, and synthesis. Others are developing advanced technology for
text retrieval, understanding, generation, and translation. All are impeded by
the daunting costs of acquiring the data needed to produce truly robust,
powerful, scalable systems. The Linguistic Data Consortium will remedy this
situation by providing much larger amounts of data than any one group can
afford and by sharing those costs widely. The resulting data will be true
national assets, enabling a great deal of valuable research and product
development while simultaneously serving important Government needs. DATA -
The data envisioned include large quantities of raw and annotated text and
speech (billions of words of text and thousands of hours of speech), a large
lexicon, and a broad coverage grammar of English. The data will also include
whatever additional materials (including foreign language materials) the
Consortium can obtain by exchange or on other reasonable terms. PROCEDURES -
Where feasible, the Consortium will acquire existing data (such as naturally
occurring text) and put it in a standard format. The Consortium will produce
other data from scratch. The Consortium will also negotiate with foreign
entities to make even larger and more varied amounts of data available to
members. Although the Consortium does not need exclusive rights to donated
data, DARPA does intend to make its growing holdings available exclusively
through the Consortium. PARTICIPANTS - Broad participation is desired.
Potential members include many companies and universities plus several
government agencies. General membership fees will be set at affordable levels,
and foreign members will be considered if access to foreign data can be
assured. Senior Members (i.e., organizations willing to contribute significant
sums of money) will have votes on the Consortium's governing board. The actual
work will be done by various organizations (companies and universities) under
contract to the Consortium. FORMATION - The Consortium may be established as a
separate legal entity, such as a non-profit corporation, or other form of
association. Government funds will be released upon execution of a suitable
agreement. RESPONSES SOUGHT - Organizations interested in becoming Senior
Members of the Consortium are urged to write to Mr. Charles L. Wayne,
DARPA/SISTO, Virginia 3701 N. Fairfax Drive, Arlington, VA 22203-1714.
Organizations (especially academic and non-profit institutions) interested in
hosting and aiding in the formation of the Consortium are asked to submit a
written description of their capabilities for doing so by 4:00 PM, August 19,
1991. Future announcements will deal with general membership issues and with
contracts for data production. (0200)
Defense Advanced Research Projects Agency (DARPA), Contracts Management (CMO),
3701 North Fairfax Drive, Arlington, VA 22203-1714
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue