LINGUIST List 2.175

Sunday, 28 Apr 1991

FYI: Full-Text Retrieval System

Editor for this issue: <>


Directory

  1. Willard McCarty, What is TACT?

Message 1: What is TACT?

Date: Fri, 26 Apr 1991 10:26:18 -0400
From: Willard McCarty <mccartyepas.utoronto.ca>
Subject: What is TACT?
[From Humanist Discussion Group, Vol. 4, No. 1288. Saturday, 27 Apr 1991.]

 TACT:
 an MS-DOS shareware program
 for interactive textual analysis
 (ver. 1.2, released June 1990)

TACT is an interactive full-text retrieval system for MS-DOS with
a number of analytical tools. Like others of its kind, TACT
retrieves segments of text according to specified word forms. In
addition, it can find words or character-strings that match
criteria the user specifies. TACT generates simple graphs to show
the distribution of forms throughout an entire text, or within
various structural divisions determined by the user. TACT also
allows retrieval by metatextual `categories'.

Use of TACT begins with a stable ASCII text. Although this text
need not be marked-up for TACT, in most cases intelligent markup
is crucial to effective analysis.

For markup the researcher uses a wordprocessor to insert simple
codes according to the properties that he or she wishes to query.
In a play, for example, acts, scenes, and speeches are obvious
things to mark; in a novel, chapters; in a narrative poem, books
and stanzas; in a lexicon, subdivisions of the entry; and so
forth. The researcher may also, however, want to mark specific
entities, such as proper names (of people and places), names of
plants and animals, or episodes. In addition, through markup a
number of hypothetical structures can be simultaneously indicated
alongside those denoted by the author or editor, e.g. an
alternative division of a poem into thematic units.

Once the text is marked up, a TACT program known as MAKBAS
converts it into a database for efficient retrieval. MAKBAS
allows the user to define the collation sequence of the alphabet,
special characters, and the characteristics of the tags used for markup.

Working with the database, TACT can present a complete list of
words from which a subset for retrieval may be selected, one word
at a time. Through what is called `regular expression'
capability, the user may also specify a selection rule according
to a pattern of characters, including "wildcards" (for example,
all words beginning with the letter "a" and ending with "ed" or
"ing"). Rules may also contain operators to indicate juxtaposed
words; specific words within a user-definable span, or all words
within such a span of an expression; or all words resembling the
chosen expression to varying degrees. Such rules may be kept in
one or more ASCII files external to the program, from which
specific rules may be selected; thus, for example, the user can
construct a lexicon of words and expressions.

Once a set of words has been selected by whatever means, it can
be saved within TACT as a "category". Categories can in turn be
combined to form other categories. Thus, for example, all words
and expressions the user regards as indicating "love" can be
saved as the category LOVE, and in addition be combined with a
similar category MAD to produce the category MADLOVE. Category
names can be included within rules as easily as words, so that,
for example, a user could ask to see all passages in which
LOVE-words occur within 2 lines of MAD-words. To take a slightly
different example, a user could ask to be shown all paragraphs in
which the category LOVE and the word "death" occur.

In the creation of a category from a rule, the user can examine all
selected locations in the text and choose which to include or exclude.
This ability to choose by context is often essential. The word "heat",
for examle, might be part of what is meant by "love" in some contexts but
not in others, or only the noun might be relevant, not the verb.

Various displays are available. Text can be displayed as KWIC
(keyword-in-context) segments; as simple distribution graphs,
showing how the occurrence of a set of locations is distributed
through the text, or among various structural divisions; or as an
index showing only a list of locations where the event occurred,
with a 1 line context. A new display, added to version 1.2, can
show all collocates to the selected positions in the text -- with
collocates ordered by the Z-score.

Displays in TACT are linked so that, for example, the user can go directly
from a position in a distribution graph to the text it represents.

TACT is multilingual. In order to display foreign languages, it supports
the extended ASCII character set of the IBM PC, and with tools which
extend the character set displays, its capabilities can be extended to
many other languages, such as Greek and Old English. (Hebrew, Arabic, and
languages such as Chinese are beyond its present design, however.) It
supports multilingual analysis as well by allowing for proper
alphabetization, convenient keyboard entry, and printing on devices that
requirespecial "escape codes" to produce non-ASCII characters -- even if
these sequences are different from those that would be used to
enter the character from the keyboard, or display it on screen.

In addition to MAKBAS and TACT, the TACT system includes a program to
construct databases from very large texts (MERGEBAS) and another to
search a database and find all phrases that occur more than a specified
number of times (COLLGEN).

Developed by: John Bradley and Lidio Presutti
University of Toronto Computing Services (UTCS),
Room 201, 4 Bancroft Avenue,
Toronto, Ontario, M5S 1A1
Canada; fax: (416) 978-7159;

John Bradley
voice: (416) 978-3995; e-mail: bradleyvm.utcs.utoronto.ca

Lidio Presutti
voice: (416) 978-5130; e-mail: lidiovm.utcs.utoronto.ca

Distributed by:
 TACT Distribution
 Centre for Computing in the Humanities,
 Robarts Library, Room 14297A,
 University of Toronto,
 Toronto, Ontario
 Canada, M5S 1A5

The developers recognize the generous support of the Centre for
Computing in the Humanities, University of Toronto, and IBM
Canada through its former partnership with the university. The
developers are also indebted to John B. Smith's ARRAS program, by
which TACT has in part been inspired.

Hardware: requires standard MS-DOS platform with 640K RAM; fixed
disk; DOS 2.1 or above.

Cost: The CCH charges a distribution fee of $30 CDN for a copy of
the program and a printed, bound copy of the documentation. (GST
should be added for Canadian sales.) TACT is shareware. You are
welcome to distribute copies of TACT, subject to its license,
which basically permits distribution as long as long as it is not
distributed for profit.

Documentation: online help and a preprinted tutorial; Support:
the developers are glad to answer questions about the usage or design
of TACT and to receive suggestions for its improvement. Queries can be
sent directly to the developers or to TACT-Lvm.utcs.utoronto.ca, the
discussion group for users of TACT.

[End Linguist List, Vol. 2, No. 175]
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue