Looking for a Web Crawler for Corpus Analysis
|Submitter Email:||click here to access email|
I would like to know if there is a web crawler that could download websites
in text format - I have a list of aprox. 100 links from which I would like
to collect the text (not the pdfs) and then take the .txt files and run
them in a concording programme (I already have Wordsmith and another one).
I want to be able to get word lists, but also to be able to find the file
where a particular word originates - this is why I need the text files.
I am interested in a crawler suitable for OS/Windows. Also, I want the
crawler to be able to download the sites recursively, if asked to do so.
I have found different free software of this kind on the Internet but they
don't do everything I need.
Sums main page