LINGUIST List 2.821

Sat 23 Nov 1991

FYI: Brown and LOB Corpora

Editor for this issue: <>


Directory

  1. Henry Kucera, Re: 2.809 Queries: Brown Corpus, Circassian, Croat, Socio
  2. Steve Fligelstone, Re: 2.809 Queries: Brown Corpus, Circassian, Croat, Socio

Message 1: Re: 2.809 Queries: Brown Corpus, Circassian, Croat, Socio

Date: Thu, 21 Nov 91 09:49:23 EST
From: Henry Kucera <HENRYbrownvm.brown.edu>
Subject: Re: 2.809 Queries: Brown Corpus, Circassian, Croat, Socio
This concerns the query re the Brown and LOB corpora:
 The Brown corpus (American English) is available to
non-profit organizations (such as universities), essentially in two formats:
text only (so called "untagged" version) on tape or diskettes from our friends
at the Norwegian Centre for Humanistic Research, P.O. Box 54, University of
Bergen, Bergen, Norway. The cost varies depending on format and the dollar
 exchange rate. It is in the range of $100 -$200. E-mail (for Bitnet) is:
 FAFSRVNOBERGEN. However, you would have to sign a written agreement (no
copying, no commercial use, etc.). The size varies depending on format but
the untagged uncompressed Brown corpus (without grammatical designators)
is about 8mb.
 The "tagged" version of the corpus (which includes an annotation of every word
 by an expanded grammatical class-82 classes in all) is available from Text
 Research, 196 Bowen Street, Providence, RI 02906. Because of its size, it
 comes on mag. tape only (1600 or 6250 bpi, ASCII or EBCDIC) and its cost to
academic institutions is $1,000.- The reason for the difference is that the
tagged corpus provides much more information and carries a separate copyright.
There are also some restrictions: no copying, no commercial use, etc. A written
agreement must be signed by a responsible official of the Department or
University Administration.
 Text Research has no connection with Brown University and has no e-mail
address. However, you can either send e-mail to me for transmission or a fax
to Text Research at 401-751-8958. The size of the tagged database is quite
large--about 53mb. However, it can be fairly easily compressed by a skilled
programmer. A large manual, giving a detailed description of tags, etc. is
included.
 Incidentally, there are no discounts available for either the tagged or the
untagged version. These are fixed prices. Non-academic use is possible only
by obtaining a license from Text Research.
 As for the LOB corpus (British English): Both untagged and tagged versions
are available, but only to non-profit institutions, from the address in Bergen
given above. There are fairly severe restrictions on its use, as far as I
remember (because of British copyright laws). I can't cite the prices right
now but the Bergen people a pretty good in answering e-mail.
Hope this helps. Henry Kucera.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Re: 2.809 Queries: Brown Corpus, Circassian, Croat, Socio

Date: Thu, 21 Nov 91 16:48:41 GMT
From: Steve Fligelstone <eia002cent1.lancs.ac.uk>
Subject: Re: 2.809 Queries: Brown Corpus, Circassian, Croat, Socio
Mark Sanderson asks about availability of tagged versions of the Brown
and LOB (Lancaster/Oslo-Bergen) Corpora. The tagged LOB Corpus, along
with several other widely used corpora can be obtained by writing to
ICAME (International Computer Archive of Modern English) at this address:
 Knut Hofland,
 ICAME
 Norwegian Computing Centre for the Humanities
 Harald Harfagresgt. 31
 Postboks 53
 Universitetet
 N-5027 Bergen
 NORWAY
email (earn/bitnet): fafkhnobergen
The Brown Corpus is also available from this source, but not in tagged
format. However, I understand that the tagged version may be obtained
 TEXT RESEARCH,
 186 Bowen St.,
 Providence RI 02906,
 U.S.A.
There is furthermore a grammatically analysed (parsed as opposed to merely
part-of-speech tagged) version of part of the Brown Corpus. This is
referred to as the Gothenburg Corpus. For details contact:
 Gudrun Magnusdottir
 Sprakdata
 Goteborgs Universitet
 S-412 98 Goteborg
 Sweden
Finally, here at Lancaster work is nearing completion (honestly!) on
a parsed version of part of the LOB Corpus. Write to me if you want
to be kept informed of its progress and availability.
 Steve Fligelstone
 UCREL
 Linguistics Department
 Bowland College
 Lancaster University
 GB-Lancster LA1 4XZ
email: eia002uk.ac.lancaster
Steve Fligelstone
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue