* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 16.1366

Fri Apr 29 2005

Sum: WebCorpus Counts

Editor for this issue: Jessica Boynton <jessicalinguistlist.org>

To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
        1.    Jerry Kurjian, WebCorpus Counts

Message 1: WebCorpus Counts
Date: 28-Apr-2005
From: Jerry Kurjian <jkurjianmail.sdsu.edu>
Subject: WebCorpus Counts

Regarding query: http://www.linguistlist.org/issues/16/16-1291.html#1

Below I summarize the comments of Andrew Kehoe and Antoinette Renouf
(5/27/2005), two of the creators of WebCorp, who kindly replied to my query
concerning WebCorp in thread 16.1291 and on Corpora list (corpora AT uib.no):

Within a webpage, WebCorp will gather as many kwics per page as there
exist, if the ''one hit per page'' option is not checked. Across webpages,
WebCorp only gathers hits from up to 200 webpages. Getting fewer than 200
hits might mean that you have chosen to filter some out features out, that
some of the 200 webpages were not accessible to WebCorp or had change, or
that there are fewer than 200 pages that have the search term.

Finally, the authors say they are continuing to upgrade WebCorp, and in an
upcoming version plan to add frequency counts, type/token ratios,
collocation profiles, and ''other statistics.''

Linguistic Field(s): Text/Corpus Linguistics

Respond to list|Read more issues|LINGUIST home page|Top of issue

Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.