LINGUIST List 4.280

Fri 16 Apr 1993

FYI: Tagger code available, bibliographical database

Editor for this issue: <>


Directory

  1. Doug Cutting, Xerox part-of-speech tagger available
  2. Sabourin Conrad, Bibliographical database

Message 1: Xerox part-of-speech tagger available

Date: Wed, 14 Apr 1993 19:59:16 Xerox part-of-speech tagger available
From: Doug Cutting <cuttingparc.xerox.com>
Subject: Xerox part-of-speech tagger available

The Common Lisp source code for version 1.0 of the Xerox
part-of-speech tagger is available for anonymous FTP from
parcftp.xerox.com in the file pub/tagger/tagger-1-0.tar.Z.

This code has been tested in the following CL implementations:
 . Franz Allegro Common Lisp version 4.1 on SunOS 4.x;
 . CMU Common Lisp version 16e on SunOS 4.x; and
 . Macintosh Common Lisp 2.0p2.

Enjoy.

 Doug Cutting <cuttingparc.xerox.com>, and
 Jan Pedersen <pedersenparc.xerox.com>
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Bibliographical database

Date: Thu, 15 Apr 1993 17:05:20 Bibliographical database
From: Sabourin Conrad <sabourcoere.umontreal.ca>
Subject: Bibliographical database


 COMPUTERS - LINGUISTICS - COMMUNICATIONS

 BIBLIOGRAPHICAL DATABASE

 For the last 15 years, we have been compiling a bibliographical
database on all aspects of computer processing of natural language
communications. The bibliography, which now holds more than
67,000 references, is indexed with a thesaurus of over 3,400 keywords.
More than 13,000 titles are related to artificial intelligence.

 The references cover the period beginning with the inception
of the computer to the present and include theses, research reports,
books, articles from specialized periodicals, papers in conference
proceedings, etc. The entries were obtained mostly by systematically
scanning more than 400 periodicals and 800 conference proceedings.

 Some of the thematic sections of the database are near
completion and will be published in print in the coming months.
Each thematic volume will have a two-level analytical index.

Many researchers collaborated by sending us their lists of
publications. All others who are interested are invited to do so.

 In the list that follows, the numbers refer to the approximate
number of entries of some of the subsections of the database.

=====================================================================
NATURAL LANGUAGE INTERFACES (3000)
Conversation, interfaces to database, to expert system, to robot,
to operating system, to question answering system, etc.

TEXT UNDERSTANDING (3800)

PARSING (7000)
Syntactic analysis, semantic analysis, semantic interpretation.

COMPUTATIONAL MORPHOLOGY (2000)
Morphological analysis and generation, lemmatization.

TEXT GENERATION (2000)
Generation from data or linguistic structure, explanation generation,
paraphrasing, etc.

SPEECH ANALYSIS, CODING, AND SYNTHESIS (2800)
Speech compression, encryption, transmission, speech to tactile display,
phoneme identification, speaker identification, tone recognition, etc.

SPEECH RECOGNITION AND UNDERSTANDING (3000)
Connected, continuous, isolated words, speaker dependent and independent, etc.

TEXT INFORMATION EXTRACTION (2000)
Indexation (automatic and computer aided), text condensation, content
analysis, etc.

INFORMATION RETRIEVAL (3000)
Full text, conceptual.

COMPUTER TRANSLATION (7000)
Bilingual, multilingual, aids to translation

MATHEMATICAL AND FORMAL LINGUISTICS (3000)

COGNITIVE LINGUISTICS AND PSYCHOLINGUISTICS (1600)

LITERARY COMPUTING (3000)
Concordances, author identification, style analysis, poetry analysis
and production, text collation, literary criticism, etc.

QUANTITATIVE AND STATISTICAL LINGUISTICS (2400)
Frequencies of characters, phonemes, words, grammatical categories,
syntactic structures; lexical richness, word collocations, etc.

COMPUTER ASSISTED LANGUAGE TEACHING (5500)
Teaching foreign languages, composition, writing, grammar, spelling,
vocabulary, reading, translation, listening, speaking; text composition
aids, etc.

ELECTRONIC DOCUMENT PROCESSING (2300)
Document editing, formatting, typesetting, coding, storing,
interchanging, etc.

COMPUTATIONAL LEXICOGRAPHY (3000)
Dictionaries, thesauri, terminological databanks; parsing, transfer
and generation dictionaries; lexical semantics, etc.

OPTICAL CHARACTER RECOGNITION (2900)
Character preprocessing, feature extraction, isolation, segmentation,
thinning; multi-font recognition, writer identification, etc.

CHARACTER PROCESSING (2200)
Character coding (external and internal), input, output, synthesis,
ordering, conversion, encryption, string matching, font design, etc.

COMMUNICATING THROUGH COMPUTERS (2100)
E-Mail, computer conferencing, electronic publishing, hypermedia,
hypertext, etc.

CORPUS LINGUISTICS AND DIALECT STUDY (1000)

=====================================================================
Conrad F. Sabourin sabourcoere.umontreal.ca
P.O. Box 187, Snowdon
Montreal, Qc, H3X 3T4
Canada
=====================================================================

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue