LINGUIST List 25.12
Tue
Jan 07 2014
FYI: ARCHER Corpus
Available Online
Editor for this issue:
Uliana Kazagasheva <ulianalinguistlist.org>
Date: 04-Jan-2014
From: David Denison
<david.denison
manchester.ac.uk>
Subject: ARCHER Corpus
Available Online
E-mail this message to a
friend
We are delighted to announce that ARCHER, A
Representative Corpus of Historical English
Registers, can for the first time be searched
by registered users via the internet. The new
version 3.2 also incorporates many
improvements, including extensive
non-linguistic mark-up to modern standards
(TEI, XML), expansion of word-count by 84% to
3.3m words, and correction of existing texts
and bibliographic information.
The corpus runs from 1600 to 1999, allows
comparison of British and American English over
a 250-year span, and its multiple genres permit
subtle sociohistorical discrimination. The
CQPweb search engine is fast and easy to use
for simple searches, and it also offers more
complex searches and statistical
information.
A search engine for ARCHER 3.2 is hosted by
Lancaster University on its CQPweb server. The
version now made available for searches
comprises untagged, original-spelling files.
The planned VARDed and CLAWS-tagged version
will follow as soon as possible and will be
made available to registered users, as will an
additional online version hosted at the
University of Zurich, tagged with the Treebank
tagset and also chunked and parsed with a
dependency grammar. Further details (including
local access arrangements) are given on the
ARCHER project website (
www.manchester.ac.uk/archer).
For copyright reasons, download context is
limited, though adequate for most purposes.
Users at one of the 14 consortium universities
have local access without limits on context and
can consult plain text and XML versions. All
versions have identical text and non-linguistic
mark-up.
The project is currently coordinated at the
University of Manchester. You are invited to
visit
www.manchester.ac.uk/archer
for further details of the corpus and the
consortium. On the Documentation page, the
website has a User Agreement form for you to
download. This must be completed and submitted
online.
David Denison and Nuria Yáñez-Bouza
On behalf of the ARCHER consortium
Linguistic Field(s): Historical Linguistics;
Text/Corpus Linguistics
Subject Language(s):
English (eng)
Page Updated: 07-Jan-2014