Editor for this issue: <>
THE ALVEY NATURAL LANGUAGE TOOLS (RELEASE 3) BASIC DESCRIPTION AND DISTRIBUTION ARRANGEMENTS A third release of the Alvey Natural Language Tools (ANLT) is now available. The UK Alvey Programme originally funded three projects at the Universities of Cambridge, Edinburgh and Lancaster to provide tools for use in natural language processing research. The DTI and SERC has funded their continued support and enhancement. The tools, a MORPHOLOGICAL ANALYSER, PARSERS and a GRAMMAR and LEXICON, are usable individually as well as together (integrated by a GRAMMAR DEVELOPMENT ENVIRONMENT) forming a complete system for the morphological, syntactic and semantic analysis of a considerable subset of English. DISTRIBUTION The ANLT system is available by anonymous FTP from Cambridge University, Computer Laboratory. The files containing grammars, lexicons and source code are encrypted, however, reports describing the system, specimen licence agreement and other information is not. If after examining the documentation, you wish to purchase a licence for use of the system for research purposes, you should complete and sign the specimen agreement and return it together with a cheque for the amount specified in the agreement (currently 500 ECU -- 100 ECU upgrade -- or local currency equivalent) to: Lynxvale WCIU Programs 20 Trumpington St. Cambridge, CB2 1QA, UK Fax: +223 332797 On receipt Lynxvale will send you (by letter) the key which can be used in conjunction with the software provided to decrypt the remaining files. DESCRIPTION The MORPHOLOGICAL ANALYSER provides a set of mechanisms for the analysis of complex word forms. The analyser requires data files specifying a lexicon of base morphemes, rules governing spelling changes when concatenating morphemes, and rules describing valid combinations of morphemes in complex words. The tools include a description of English morphology in this form. The analyser should be capable, though, when provided with the necessary linguistic analyses, of being used for most European languages and many others. There are two alternative PARSERS. The main one is an optimized chart parser, incorporating a 'packing' mechanism (making it much more efficient when parsing sentences containing multiple local ambiguities). The other parser is a non-deterministic LALR(1) parser which seems, in most cases, to be even more efficient than the chart parser. The GRAMMAR is a wide-coverage syntactic and semantic grammar of English, written in a metagrammatical formalism derived from Generalized Phrase Structure Grammar. Full coverage is provided of the following constructions and their combinations: - all sentence types: declaratives, imperatives and questions (yes/no, tag and wh questions), - all unbounded dependency types: topicalisation, relativisation, wh questions, - a relatively exhaustive treatment of verb and adjective complement types, - phrasal and prepositional verbs of many complement types, - passivisation, verb phrase extraposition, - sentence and verb phrase modification, - noun phrase complements, - noun phrase pre- and post-modification, - partitives, - coordination of all major category types, - nominal and adjectival comparatives. The LEXICON contains 40,000 homonyms (63,000 entries in total) in the form required by the morphological analyser. The GRAMMAR DEVELOPMENT ENVIRONMENT gives access to all of the other components of the tools, allowing grammars to be input, edited, and browsed; it also compiles them into the base grammatical formalism used by the parsers, and provides extensive grammar debugging facilities. A simple quantifier scoping and post-processing module is supplied as an example of how the result of parsing a sentence can be converted into a representation suitable for further semantic and pragmatic processing. All of the software components are written in Common Lisp and have been tested in several implementations on a wide range of machines. Some published references to these projects can be found in: Briscoe, E., C. Grover, B. Boguraev & J. Carroll, 'A Formalism and Environment for the Development of a Large Grammar of English', Proceedings of 10th International Joint Conference on Artificial Intelligence, Milan, 1987, pp. 703-708. Ritchie, G., G. Russell, A. Black & S. Pulman, 'Computational Morphology: Practical Mechanisms for the English Lexicon', MIT Press, 1991. Technical reports describing the system in detail are available via FTP as detailed in the file `instruct' (and Annex A of the licence agreement). ******************** ANLT distribution arrangements and instructions, and a machine-readable specimen licence agreement are available in files on the FTP server ftp.cl.cam.ac.uk (128.232.0.56). To fetch this information use anonymous FTP (login with user name anonymous, and password your e-mail address), go to the directory `nltools', and fetch the files licence a machine-readable specimen licence agreement instruct instructions on how to FTP technical reports and the ANLT itself The following example shows how to fetch these files: $ ftp ftp.cl.cam.ac.uk Connected to swan.cl.cam.ac.uk. 220- swan.cl.cam.ac.uk FTP server (Version 5.60+UA) ready. ... Name (ftp.cl.cam.ac.uk:jac): anonymous Password (ftp.cl.cam.ac.uk:anonymous): <type your e-mail address here> ... ftp> cd nltools 250 CWD command successful. ftp> get licence ... ftp> get instruct ... ftp> quit 221 Goodbye. (The $ is the Unix shell command prompt). If the FTP command does not know about the address ftp.cl.cam.ac.uk, try giving the command the internet number (128.232.0.56) instead. If you still have problems, or FTP is not available to you, then you can obtain the ANLT on magnetic tape by writing to Lynxvale WCIU Programs at the address given above (specifying the type of tape and format you require).Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
On the 20th and 21st our local mainframe developed a problem, which resulted in 20-30 nameserver requests going unanswered. If the people who sent these requests could submit them again, I hope that they will now receive an answer. The experts claim that the problem is now solved. Norval SmithMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue