LINGUIST List 14.3465

Fri Dec 12 2003

FYI: New Website: Phrases in English

Editor for this issue: Anne Clarke <annelinguistlist.org>


Directory

  1. William Fletcher, New Website: Phrases in English

Message 1: New Website: Phrases in English

Date: Thu, 11 Dec 2003 11:45:10 -0500 (EST)
From: William Fletcher <fletcherusna.edu>
Subject: New Website: Phrases in English


A new website, ''Phrases in English'' (PIE), has been launched:
http://pie.usna.edu
While still under development, PIE already offers much to both
linguists and students, and additional features will increase its
scope in the future.

PIE incorporates a database of all 1-6-grams (phrases 1-6 ''words''
long) with part-of-speech (POS) codes occurring three or more times
in the 100-million-word British National Corpus (BNC). One can
explore English phraseology either through lists of forms and their
frequencies or by searching for specific forms or collocations,
e.g. 2-grams of the pattern ''ADJ work'', to find the most frequent
adjectives describing work.

PIE also offers a phrase pattern discovery tool, ''phrase-frames'':
sets of variants of an n-gram identical except for one word (wildcard
symbol *). The most frequent and productive 4-frame is ''the * of
the'', with variants such ''as the end of the'', ''the rest of the'',
''the top of the'', ''the nature of the''

Over the next year PIE will add:

-- Click on an n-gram in the query results to see concordances from
the BNC

-- POS-grams and POS-frames for studying the relative productivity of
phrase structures

-- Filtering by text type (domain, genre, target audience) for
contrastive studies

-- Query by regular expression (currently only wildcards are
supported)

In addition, when POS-tagging of the Michigan Corpus of Academic
Spoken English (MICASE) http://www.hti.umich.edu/micase/ is complete,
a similar database will be created with those data. Finally, when a
substantial portion of the American National Corpus (ANC)
http://americannationalcorpus.org has been released, a third parallel
database will be built. Together these databases will permit
comparative studies of phraseology in the principal variants of
English.

Please note:

- ''Unfiltered'' queries which match very large datasets can take
several minutes to complete. Please be patient; read the tutorials
and FAQ to focus your queries.

- Users who cannot access the above site may use
http://kwicfinder.com/BNC/ (please let me know so we can investigate)


Acknowledgements

Above all I am grateful to Michael Stubbs of the University of Trier
for detailed suggestions and ongoing discussions that led to the
creation and refinement of this site; even the ''easy as pie'' to
remember acronym goes back to him. His research assistants contributed
as well: Isabel Barth implemented the original phrase-frame generator
and Katrin Ungeheuer offered valuable comments on organization and
user-interface for query by text-type. Finally Lou Burnard of the BNC
Consortium and David Lee of MICASE granted essential permissions and
provided useful feedback on the site.

All user feedback will be received enthusiastically!

Bill Fletcher

fletcher AT usna.edu
fletcher AT kwicfinder.com

http://pie.usna.edu
http://kwicfinder.com

Subject-Language: English; Code: ENG 
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue