Publishing Partner: Cambridge University Press CUP Extra Publisher Login

Software Details

Title: Software for automatic text processing
Submitter: Slava Yatsko
Description: Dear Colleagues,
The Computational Linguistics Laboratory at Katanov State University of
Khakasia (CLL at KSU) is pleased to announce the release of Linguistic
Toolbox – a package of programs for automatic text processing. Linguistic
Toolbox is a concordance that differs from existing analogues in the
following respects.
- It has an integrated part-of-speech tagger thus allowing the user to
create his/her own annotated corpora. Profound linguistic research is often
based on a specific text genre (e.g. fiction, scientific text), linguistic
category (e.g. possession), or works of a particular author (e.g. Maugham).
Publicly available annotated national corpora with evenly distributed
genres often fail to meet the demands of such research and LIT has been
designed to fill this gap. By means of LIT the user can conduct various
searches on his/her own corpora and get statistical information on
distribution of various words, patterns, and phrases.
- Union, subtraction, and intersection operations. These operations are
used in the theory of sets to construct new sets from existing ones. Why
not perform these operations on texts, so that to construct new texts from
existing ones? For example using the subtraction operation the user can
subtract stopwords from a text, and using the intersection operation he/she
can get a list of words that occur in two or more texts with raw counts
assigned to each word. These functions may be of use for computing
distances between texts for the purposes of text classification and
- LIT has an integrated spreadsheet. Having obtained by means of LIT some
statistical information the user can perform computations in LIT itself
without consulting some commercially distributed products such as MS Excel.
- LIT has an integrated WordNet module by means of which the user can
search not only for a given word but also for words semantically related to it.

LIT is distributed as freeware and can be downloaded from the CLL's site at
The current version supports English and works on Windows machines.

V.Yatsko, Head of the CLL at KSU
Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics

LL Issue: 19.3048
Date Posted: 08-Oct-2008

Search Again

Back to Software Index