I am writing to seek input on a proposal that we
tentatively plan to submit to the NEH at the end of the summer. The
idea is fairly simple: we want to use the morphological
parser that we have been developing for the past eight years
or so to generate morphological analyses of every unique string
in the Thesaurus Linguae Graecae (TLG), the database of Greek texts
available on CD ROM from UC Irvine. The TLG is large -- 42 million
words at present and a new version is due out later this year with 57
million words.

Greek is a highly inflected language -- not as bad as Georgian and some
others, but a verb can, with prefixes, have millions of different forms.
The TLG corpus extends over a thousand years and includes virtually all
literary Greek, and thus would support diachronic as well as synchronic
linguistic analysis.

I would like to know if there is anything we could do that would make
this work on Greek useful for the linguistics community in general?
Classicists need this database, but it would be very exciting if it could
stimulate additional work.

The working summary of the project follows. The proposal outline is
fairly succinct (c. 7 pages) but it is full of Greek and does not lend
itself readily to transliteration. If you would like to see a copy, please
send me your US Mail address and we will send one to you. Casual reactions
to just this summary are, however, more than welcome. NOTE: REACTIONS NEED
NOT BE POSITIVE. If this does not seem a worthwhile thing to pursue,
I would love to know why.


Gregory Crane
Department of Classics
Boylston 319
Harvard University
Cambridge MA 02138

A Linguistic Database of Classical Greek

This project will extend an existing parser for classical
Greek, expanding its database of stems to cover the majority
of all words attested in the literary record, and will use
this database to create a morphologically parsed database of
more than 1,000,000 unique strings available in the TLG: in
the end, we will publish the database of analyzed strings,
the databases of stems and endings which drive the parser and
the parser itself.

The resulting databases are an essential
piece of scholarly infrastructure that will (1) revolutionize
current searching techniques for the TLG and other Greek
databases, (2) make it possible to apply more sophisticated
retrieval/text analysis to Greek texts, and (3) provide a
basic but crucial lookup tool that will aid non-specialists
in other fields (e.g., philosophy, political science,
religion) who seek to work directly with the Greek database.

Note: This document is a sketch for a possible proposal to
be submitted to the NEH at the end of August 1992. It is, in
effect, a proposal for a proposal and is thus open to
revision on any and all points.
