Editor for this issue: <>
Lexa, a set of programs for lexical data processing, written by Raymond Hickey, is now available from the Norwegian Computing Centre for the Humanities for about 100 USD. The programs run under MS-DOS and comes on 4 diskettes with a manual of 750 pages in 3 volumes. To get more information and order form, send the following line to FILESERVMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueHD.UIB.NO send icame lexa.info This file can also be fetched with FTP og Gopher from nora.hd.uib.no in the catalogue icame. Knut Hofland Norwegian Computing Centre for the Humanities, Harald Haarfagres gt. 31, N-5007 Bergen, Norway Phone: +47 5 212954/5/6, Fax: +47 5 322656, E-mail: knut
x400.hd.uib.no Here is a short description of the programs written by the author. ********************************************* Raymond Hickey, English Department, University of Munich, Germany Lexical Data Processing The present set of programmes is intended to offer a wide range of software which will carry out (i) the lexical analysis and (ii) information retrieval tasks required by linguists involved in the investigation of text corpora. The suite has been particularly adapted to be used with the corpus of historical English compiled at the University of Helsinki. The general nature of the software, however, permits its application to any set of texts, particularly those which are arranged in the so-called Cocoa format. Lexical analysis. The main programme, Lexa, puts at the disposal of the interested linguist the options he or she would require in order to process lexical data with a high degree of automation on a personal computer. The set is divided into several groups which perform typical functions. Of these the first, lexical analysis, will be of immediate concern. Lexa allows one, via tagging, to lemmatise any text or series of texts with a minimum of effort. All that is required is that the user specify what (possible) words are to be assigned to what lemmas. The rest is taken care of by the programme. In addition, one can create frequency lists of the types and tokens occurring in any loaded text, make lexical density tables, transfer textual data in a user-defined manner to a database environment, to mention just some of the procedures which are built into Lexa. The results of all operations are stored as files and can be examined later, for instance with the text editor shipped with the package. Each item of information used by Lexa when manipulating texts is specifiable by means of a setup file which is loaded after calling Lexa and used to initialise the programme in the manner desired by the user. Information retrieval. The second main goal of the Lexa set is to offer flexible and efficient means of retrieving information from text corpora. The programme Lexa Pat allows one to specify a whole range of parameters for combing through text files. By determining these precisely the user can achieve a high level of correct returns which are of value when evaluating texts quantitatively. A further programme, Lexa DbPat, permits similar retrieval operations to be applied to databases, for instance those generated by Lexa from text files of a corpus. Ascertaining the occurrence of syntactic contexts is catered for by the programme Lexa Context with which users can specify search strings, their position in a sentence, the number of intervening items and then comb through any set of texts in search of them. By means of the utility Cocoa it is possible to group text files of a corpus on the grounds of shared parameters from the Cocoa-format header at the beginning of each file in many text collections, e.g. the Helsinki corpus. All information retrieval operations can then have as their scope those files grouped on the basis of their contents by the Cocoa utility. In the design of the current suite of programmes, flexibility has been given highest priority. This is to be seen in the number of items, in nearly all programmes, which can be determined by the user. Furthermore, techniques have been employed which render the structure of each programme as user-friendly as possible (pull-down menus, window technology, mouse support, similarity of command structure between the 40-odd programmes of the set), permitting the linguist to concentrate on essentially linguistic matters.