* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 17.1967

Thu Jul 06 2006

Software: Two New Corpora of Spoken and Written English

Editor for this issue: Svetlana Aksenova <svetlanalinguistlist.org>


To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
Directory
        1.    Christine Bowles, Two New Corpora of Spoken and Written English


Message 1: Two New Corpora of Spoken and Written English
Date: 06-Jul-2006
From: Christine Bowles <c.bowlesucl.ac.uk>
Subject: Two New Corpora of Spoken and Written English


The Survey of English Usage at UCL is pleased to announce the publication of two
exciting new corpora supplied with search software that allows for the retrieval
of grammatical patterns and constructions.

THE DIACHRONIC CORPUS OF PRESENT-DAY SPOKEN ENGLISH (DCPSE)

This corpus contains a total of 800,000 words of grammatically analysed (tagged
and parsed) spontaneous spoken English from comparable categories in the
London-Lund Corpus (1960s/1970s) and the ICE-GB Corpus (1990s): 400,000 words
from each corpus in the form of tree diagrams. The design of DCPSE is such that
it will be possible to study the grammatical features of spontaneous spoken
English over time. DCPSE is the largest single collection of tagged and parsed
orthographically transcribed spoken English in the world. The corpus will
provide linguists interested in recent linguistic change in English with a new,
innovative and searchable database. The corpus is suppplied on CD, together with
the ICECUP 3.1 search software (see below) and a 'Getting Started' manual.

RELEASE 2 OF THE BRITISH COMPONENT OF THE INTERNATIONAL CORPUS OF ENGLISH
(ICE-GB)

ICE-GB contains one million words of grammatically analysed (tagged and parsed)
spoken and written present-day British English in the form of tree diagrams. The
material in Release 2 of the corpus has been synchronised with sound recordings
for the spoken part of the corpus (a total of around 75 hours), which can be
supplied separately. Together with Release 2 of ICE-GB we are pleased to
announce the publication of ICECUP 3.1, the dedicated search software for ICE-GB
and DCPSE (see above). New features in ICECUP 3.1 include a lexicon and a
grammaticon, which can provide an overview of distributions of words, tags, and
grammatical patterns. The Fuzzy Tree Fragment (FTF) facility, which allows
searches for grammatical patterns, has been extended and improved. There are
many other improvements to ICECUP in this release, e.g. a thoroughly revised
on-line help manual covering all the new features. A new ICECUP 'Getting
Started' manual is published with the corpus.

ICE-GB SOUND RECORDINGS

The sound recordings (75 hours) will be available in the form of a set of CDs
containing uncompressed 'wave' files for installation on a hard disk.

For further details, including prices and upgrades, please visit:

http://www.ucl.ac.uk/english-usage/resources/sales.htm

or contact Christine Bowles: c.bowlesucl.ac.uk

We offer very low prices for students. Please allow 4-6 weeks for delivery.


Linguistic Field(s): Historical Linguistics
Syntax
Text/Corpus Linguistics

Subject Language(s): English (eng)
Respond to list|Read more issues|LINGUIST home page|Top of issue




Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.