Editor for this issue: <>
The Common Lisp source code for version 1.0 of the Xerox part-of-speech tagger is available for anonymous FTP from parcftp.xerox.com in the file pub/tagger/tagger-1-0.tar.Z. This code has been tested in the following CL implementations: . Franz Allegro Common Lisp version 4.1 on SunOS 4.x; . CMU Common Lisp version 16e on SunOS 4.x; and . Macintosh Common Lisp 2.0p2. Enjoy. Doug Cutting <cuttingMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueparc.xerox.com>, and Jan Pedersen <pedersen
parc.xerox.com>
COMPUTERS - LINGUISTICS - COMMUNICATIONS BIBLIOGRAPHICAL DATABASE For the last 15 years, we have been compiling a bibliographical database on all aspects of computer processing of natural language communications. The bibliography, which now holds more than 67,000 references, is indexed with a thesaurus of over 3,400 keywords. More than 13,000 titles are related to artificial intelligence. The references cover the period beginning with the inception of the computer to the present and include theses, research reports, books, articles from specialized periodicals, papers in conference proceedings, etc. The entries were obtained mostly by systematically scanning more than 400 periodicals and 800 conference proceedings. Some of the thematic sections of the database are near completion and will be published in print in the coming months. Each thematic volume will have a two-level analytical index. Many researchers collaborated by sending us their lists of publications. All others who are interested are invited to do so. In the list that follows, the numbers refer to the approximate number of entries of some of the subsections of the database. ===================================================================== NATURAL LANGUAGE INTERFACES (3000) Conversation, interfaces to database, to expert system, to robot, to operating system, to question answering system, etc. TEXT UNDERSTANDING (3800) PARSING (7000) Syntactic analysis, semantic analysis, semantic interpretation. COMPUTATIONAL MORPHOLOGY (2000) Morphological analysis and generation, lemmatization. TEXT GENERATION (2000) Generation from data or linguistic structure, explanation generation, paraphrasing, etc. SPEECH ANALYSIS, CODING, AND SYNTHESIS (2800) Speech compression, encryption, transmission, speech to tactile display, phoneme identification, speaker identification, tone recognition, etc. SPEECH RECOGNITION AND UNDERSTANDING (3000) Connected, continuous, isolated words, speaker dependent and independent, etc. TEXT INFORMATION EXTRACTION (2000) Indexation (automatic and computer aided), text condensation, content analysis, etc. INFORMATION RETRIEVAL (3000) Full text, conceptual. COMPUTER TRANSLATION (7000) Bilingual, multilingual, aids to translation MATHEMATICAL AND FORMAL LINGUISTICS (3000) COGNITIVE LINGUISTICS AND PSYCHOLINGUISTICS (1600) LITERARY COMPUTING (3000) Concordances, author identification, style analysis, poetry analysis and production, text collation, literary criticism, etc. QUANTITATIVE AND STATISTICAL LINGUISTICS (2400) Frequencies of characters, phonemes, words, grammatical categories, syntactic structures; lexical richness, word collocations, etc. COMPUTER ASSISTED LANGUAGE TEACHING (5500) Teaching foreign languages, composition, writing, grammar, spelling, vocabulary, reading, translation, listening, speaking; text composition aids, etc. ELECTRONIC DOCUMENT PROCESSING (2300) Document editing, formatting, typesetting, coding, storing, interchanging, etc. COMPUTATIONAL LEXICOGRAPHY (3000) Dictionaries, thesauri, terminological databanks; parsing, transfer and generation dictionaries; lexical semantics, etc. OPTICAL CHARACTER RECOGNITION (2900) Character preprocessing, feature extraction, isolation, segmentation, thinning; multi-font recognition, writer identification, etc. CHARACTER PROCESSING (2200) Character coding (external and internal), input, output, synthesis, ordering, conversion, encryption, string matching, font design, etc. COMMUNICATING THROUGH COMPUTERS (2100) E-Mail, computer conferencing, electronic publishing, hypermedia, hypertext, etc. CORPUS LINGUISTICS AND DIALECT STUDY (1000) ===================================================================== Conrad F. Sabourin sabourcoMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueere.umontreal.ca P.O. Box 187, Snowdon Montreal, Qc, H3X 3T4 Canada =====================================================================