LINGUIST List 4.588

Thu 29 Jul 1993

FYI: Oxford Text Archive Update

Editor for this issue: <>


  1. Lou Burnard, Oxford Text Archive update

Message 1: Oxford Text Archive update

Date: Thu, 29 Jul 1993 16:39:24 Oxford Text Archive update
From: Lou Burnard <>
Subject: Oxford Text Archive update


 * a new Short List of titles held at Oxford
 * 40 titles now available in TEI format for anonymous FTP
 * a new FTP service for licensed access via the Internet

It's been a long time since we posted any news of our activities to
this or other lists. It's not that we've been inactive -- quite the
opposite in fact.

* We have been converting texts to a standard TEI-compatible mark up
 (with much appreciated help from Jeffrey Triggs at Bellcore, and
 John Price-Wilkin at Virginia).

* We have been experimenting with ways of saving time and money by
 using FTP, Gopher, WWW etc to deliver material rather than tapes and

* We have been scouring the networks for new material of all kinds

* We have been trying to find some additional and reliable sources of
 funding, but cannot report much progress. Any philanthropists out
 there, please form an orderly queue.

 ***** NEW ACCESSIONS ******

 Our latest catalogue lists 1336 titles, in 28 languages. We have
about 1.2 Gb of textual data, most of it freely available, some of it
restricted in one way or another. We want more. We're particularly
interested in scholarly minority-interest material which is not going
to turn up on CD-anything in the foreseeable future. We don't charge
fees to look after your material, and we keep track of what happens to
it. We do our best to make sure that whatever texts you deposit with us
are rendered as future-proof as we can make them but we don't change
the information you recorded. We're archivists, not evangelists, for
electronic text.

At the same time, now that some kind of standardization is at last
beginning to appear, we're eager to show that old wine can be put into
new bottles. So you'll find that quite a few texts are now available in
more than one form -- both the original, and a "TEI-compatible" form.
(When the original form is easily available elsewhere, and particularly
when the TEI form has more information in it, then we may well drop the
former from the catalogue. But don't worry: it's still in the

 *********** NEW FTP SERVICES *************

Our ftp address is: You can log on as anonymous,
quoting your e-mail address as a password.

If you don't know how to use FTP, ask someone at your local computer
centre. If someone there runs a Gopher, or WWW server, get them to
point the little critter at the following useful files, which you
can also download from the above address:

 ota/textarchive.list our current catalogue
 ota/ information file + order form

There are two classes of texts available from this FTP server

(a) texts which are in TEI format and which we can make freely
 available (these all appear as category P texts in the shortlist)

(b) texts which are available only under our standard conditions of
 use, (these all appear as category U or A in the shortlist)

[Just to confuse the issue, there are also texts which appear as
category P texts in the Shortlist, because they are freely available,
but which we have not yet checked or converted for TEI compatibility,
and which are therefore not available from our FTP server, though you
may well be able to get them from someone else's. We will distribute
them in the same way as (b) class texts if you insist.]

A CLASS TEXTS (Freely Available)

You can just download these without formality using standard FTP
commands. In some cases there are additional usage constraints,
specified in the TEI header. We also hope that you won't redistribute
these texts in a mutilated state or without acknowledgment of where you
got them from. We can't enforce any of these things, obviously. We
think that the Internet is successful because -- and as long as --
people trust each other.

To see what (a) class texts are available now, just take a look in the
directory ota. It's arranged, like the ShortList, by language, and
within that by Author. There are x texts in there today, and there will
be more. Each text has a conformant TEI header, and each text is a
legal TEI compatible document, using a special document type definition
(dtd), which you can also download from the same directory (look in
ota/TEI). Eventually, there'll be some more introductory stuff on what
SGML is, why the TEI is a Good Thing etc etc. Just now, we're working
flat out getting the texts in there.

Here's the list of what was there when I prepared this note:

Anonymous: Gammer Gurtons Needle
Edgar Rice Burroughs: A Princess of Mars
Wilkie Collins: The Woman in White
Joseph Conrad: Lord Jim; Nigger of the Narcissus
Charles Darwin: Origin of Species
Arthur Conan Doyle: Adventures of Sherlock Holmes; Casebook of Sherlock
 Holmes; His last bow; Memoirs of Sherlock Holmes; Sign of Four; Valley
 of Fear; Hound of the Baskervilles; Return of Sherlock Holmes; A study
 in Scarlet
Henry James: The Europeans; Roderick Hudson; The Watch
Jack London: Klondike Tales; The Seawolf; The Call of the Wild; Whitefang
Andrew Marvell: English Poems (1688)
Herman Melville: Moby Dick
John Milton: Paradise Lost
Lucy M. Montgomery: Ann of Avonlea
William Morris: News from Nowhere
Baroness Orczy: The Scarlet Pimpernel
Bram Stoker: Dracula
Antony Trollope: Lady Anna; Ayalas Angel; The Eustace Diamonds; Can you
 forgive her; Phineas Finn; Phineas Redux; Rachel Ray; Dr Wortle's School;
Mark Twain: A Connecticut Yankee at the court of King Arthur
H.G. Wells: The Invisible Man; The War of the Worlds; The Time Machine

(B) CLASS TEXTS : (Restricted access)

The majority of texts in the Archive are and always have been held in
trust for a Depositor. Rather than keep track of a zillion different
contracts with each Depositor, we worked out a single contract which is
the basis of our standard user declaration form. It has served to keep
us out of the law courts for the last twenty five years, so it can't
have been all bad.

Because it's a contract, we have to have a signed paper copy of the
declaration in our hands before we can issue copies of the texts. Once
we have that declaration, we can send you copies of restricted texts, on
diskette, cartridge or magnetic tape, or even over the network.

Up till this week, the only way you could get copies of (b) class texts
over the network was to tell us an account and password on your
machine. We would then bash the files across to you, for free. This was
a rather unsatisfactory procedure in several ways: we think we now have
a better one. It's still free and it works like this:

- you send us a signed order form, as usual
- on the order form you specify the password of your choice
- we place copies of the files you ordered in a special directory under ota,
 access to which requires you to quote both a personal identifier (which we
 will give you) and the password (which you have told us)
- we send you e-mail giving details of how to access the directory
- you download copies of the files you ordered, using conventional ftp
- after a fixed period of time (usually about a week) your personal
 identifier is removed and the file copies deleted

 **********THE DOWN SIDE************

We save until the very end of this note the inevitable piece of bad
news. After 25 years, we've been told very firmly that we have to
increase our prices to something a bit nearer a realistic level. Not
only that, but within the European Community we must charge VAT at 17.5%
on every order. We've taken this opportunity to rethink the way in which
we charge slightly.

We charge only for material costs, postage and packing on orders for
texts sent on magnetic media of various kinds. We have abolished the
"per text" fee, and we are no longer insisting on payment in advance.
We are still charging over the odds for diskettes because they take us a
disproportionate amount of effort to produce.

The cost is worked out as follows:

 Magnetic tape: #50 ($80) each
 DC350 tape cartridge #30 ($50) each
 Diskette #20 ($35) each

 Invoicing charge #10 ($20) payable if order is not prepaid
 Postage surcharge #10 ($20) for orders outside EC
 Add VAT at 17.5% for orders within EC

We will continue to give an estimate for the cost of any order free of
charge. And, of course, if you use our new FTP service, then you don't
need to pay us a penny.

We look forward to hearing from you in the new academic year!

Lou Burnard and Alan Morrison
Oxford Text Archive email:
Oxford University Computing Services tel: +44 865 273238
13 Banbury Road, Oxford OX2 6NN, UK fax: +44 865 273275
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue