Editor for this issue: <>
Hi there, Firstly , I apologise for not compiling all the responses I got immeidately. Here's are some of the responses I got: Data Mining: from matthewMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecs.williams.edu : You can ftp four papers that I've worked on (only 2 are really data mining - the other two are cooperative database stuff) from cs.williams.edu (anonymous login) The papers are in pub/matthew Another data mining paper could be obtained from ftp.cwi.nl in the directory pub.CWIreports/AA in the file CS-R9406.ps.Z . A part of speech tagger is available by anonymous ftp from: lightning.lcs.mit.edu in Pub/BRILL/programs and its documentation in pub/BRILL/Papers An online lexicon and a semantic concordance that goes along with it could be found in /pub in clarity.princeton.edu from CLAYKE
delphi.com NLP or NLU (using) is a big subject and you may be able to find a special purpose system to help you with your task, but in reality it will be a lexical analyser. Successful general-purpose NLP does not exist - yet. :-) As for the general issue of indexing, I can recommend the book: INDEXERS ON INDEXING by The Royal Society of Indexers (London) Sorry, I don't know the publisher's name or year of publication. A Phd defence in Automatic Terminology Extraction could be obtained from bea
ccv.fr abd it is in French. ------------------------------------- EDITED) To: shuychon
mehta.anu.edu.au Content-Length: 1445 I have received your question through Linguist List. I am doing some research concerning data extraction from text. I have a list of paramaters that must be extracted from texts about car accidents. Those parameters are like : weather, speed, seat belt fastened or not, driver drunk... The aim is to provide a tool capable of doing it automatically or able to help an operator to do it. The first solution would use NLP and the second would use information retrieval technics. Here is MY idea about the subject : - The problem of NLP is that it does not seem to give enough precision in interpretation of long texts. - It seems to be easier to use information retrieval technics, but they cannot extract datas automaticaly since they do not make an interpretation of what is said. Maybe you could be interested in the proceedings of the Message Understanding Conferences (MUC). Regards Thierry. Thierry PERRON