Editor for this issue: <>
Thanks to all that responded to my query on Spanish corpora available online. Below is a summary of the responses I got. Text begins============================================================= ********************* Text Corpora List: Addresses *************************** CORPORAMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueNORA.HD.UIB.NO for messages to the list CORPORA-REQUEST
NORA.HD.UIB.NO for messages to list administrator FILESERV
NORA.HD.UIB.NO for requests to file server (try sending HELP) ****************************************************************************** I'm looking for online Spanish corpora, preferably newspaper or magazine articles. I've heard there is a collection at the University of Miami, but I haven't been able to find it. Can anyone help he out? BTW, I already know what is available in the Oxford Text Archive. ---------------------------------------------------------------- Doug McKee E-mail: mckeed
sra.com SRA Corp. Phone: (703) 558-7820 2000 15th St. N Fax: (703) 558-4723 Arlington, VA 22201 USA ---------------------------------------------------------------- ======================================================================== I would like to mention the Catalogue of Projects in Electronic Text (CPET) at Georgetown University, Washington DC. This catalogue can be accessed via Telnet to: guvax3.georgetown.edu with username: CPET (you will need VT-100 keys). A manual can be fetched from our fileserver (FILESERV
NORA.HD.UIB.NO) by sending send info cpet.manual either as the subject or the only line in the message. A list of roman language projects (of feb. 1991, 64 KB) can be fetched from the file server with the line: send info roman.projects For further information about CPET, contact Margaret Friedman (mfriedman
guvax.georgetown.edu) ================================================================== There is a swedish archive at Gothenburg University containing spanish newspaper and magazine articles. Please contact: David Mighetto <mighetto
rom.gu.se> =================================================================== Concerning English corpora, I'd like to mention that I wrote a survey of electronic corpora and related resources which will be published in the book "Talking Data: Transcription and coding in discourse research", Edwards & Lampert, Erlbaum Publishers, due out April 15. Other surveys are available through: the ICAME archive (anonymous ftp to nora.hd.uib.no), and CPET (cited in the preceding message). There is also the Oxford Text Archive, which specializes, however, in literature and Biblical texts: anonymous ftp to black.ox.ac.uk. Hope that helps. ======================================================================= There are some literary works available electronically from Project Gutenberg. You can get them via anonymous ftp. Just ftp to 128.174.201.12 , after entering then "cd etext/etext92" or "etext91" or "etext93". Among their offerings are works like "Moby Dick" and "Through the Looking Glass". I think they even have Clinton's Inaugural address. I've also been looking for e-texts in Spanish, but with not too much luck. I have some newspaper articles, and some interviews ews that someowas kind enough to send me once. (I posted a query on Linguist about Spanish corpora a while back) ============================================================================ #12755) id <01GVVWTKTIJG8X144C
guvax.acc.georgetown.edu>; Tue, 16 Mar 1993 18:43 EST There are zillions of e-texts! Here are a few sources. 1. The Oxford Text Archives: I can send you their catalogue and order form. They have *lots* of texts in several languages. They will FTP the texts to you free over the internet. 2. Georgetown Catalogue of Projects in Electronic Text (CPET): there was a posting on LINGUIST not too long ago ... if you have access to Gopher, you can find it under 'North America', 'Washington DC'. 3. Commercial: in catalogues such as MacWarehouse, you can find CD-ROMS of text like 'Front Page News'. 4. ACL/DCI: they have a CD-ROM with over a million words of Dow Jones or the Wall Street Journal (or both? I forget) 5. The Linguistic Data Consortium (LDC): lots of non-literary e-corpora, including transcriptions of spoken data 6. ICAME: they have a CD-ROM of famous e-corpora + tools (concordances and stuff) that goes for about $500, and includes the Brown corpus, the LOB corpus, the Lundon-Lund corpus, the Helsinki Diachronic corpus (see 'corpus' and these entries in the Oxford Companion to the English Language) 7. The CHILDES database - caretaker and child language in several diff. languages End of text============================================================ -- --------decio
mace.cc.purdue.edu---------------------------------------- |Gabriel A. Decio | XX XXX XXX XXX XX | |Dept. of English | XX XX XX XX XX XX XX XX | |Purdue University | XX XX XX XX XX XX XX XX | |West Lafayette, IN | XXX XXX XXX XXX |