LINGUIST List 23.1026
|
Wed Feb 29 2012
Qs: Looking for a Web Crawler for Corpus Analysis
Editor for this issue: Zac Smith
<zac linguistlist.org>
|
We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate. In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query. To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.cfm.
|
Date: 23-Feb-2012
From: Ana Popescu <discusuri hotmail.com>
Subject: Looking for a Web Crawler for Corpus Analysis
E-mail this message to a friend
Dear All, I would like to know if there is a web crawler that could download websites in text format - I have a list of aprox. 100 links from which I would like to collect the text (not the pdfs) and then take the .txt files and run them in a concording programme (I already have Wordsmith and another one). I want to be able to get word lists, but also to be able to find the file where a particular word originates - this is why I need the text files. I am interested in a crawler suitable for OS/Windows. Also, I want the crawler to be able to download the sites recursively, if asked to do so. I have found different free software of this kind on the Internet but they don't do everything I need. Thanks.
Linguistic Field(s):
Computational Linguistics
Read more issues|LINGUIST home page|Top of issue
|
|
Page Updated: 29-Feb-2012
|
|
About LINGUIST
|
Contact Us
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.
|
|