Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Cambridge University Press!

ad

From Utterances to Speech Acts

By Mikhail Kissine

"Kissine offers a new theory of speech acts which is philosophically sophisticated and builds on work in cognitive science, formal semantics, and linguistic typology. This highly readable, brilliant essay is a major contribution to the field."

--François Recanati, Institut Jean-Nicod



Query Details


Query Subject:   Looking for a Web Crawler for Corpus Analysis
Author:   Ana Popescu
Submitter Email:  click here to access email

Linguistic LingField(s):  Computational Linguistics

Query:   Dear All,

I would like to know if there is a web crawler that could download websites
in text format - I have a list of aprox. 100 links from which I would like
to collect the text (not the pdfs) and then take the .txt files and run
them in a concording programme (I already have Wordsmith and another one).
I want to be able to get word lists, but also to be able to find the file
where a particular word originates - this is why I need the text files.

I am interested in a crawler suitable for OS/Windows. Also, I want the
crawler to be able to download the sites recursively, if asked to do so.

I have found different free software of this kind on the Internet but they
don't do everything I need.

Thanks.
LL Issue: 23.1026
Date posted: 29-Feb-2012



Back

Sums main page