Editor for this issue: <>
[From Humanist Discussion Group, Vol. 4, No. 1288. Saturday, 27 Apr 1991.] TACT: an MS-DOS shareware program for interactive textual analysis (ver. 1.2, released June 1990) TACT is an interactive full-text retrieval system for MS-DOS with a number of analytical tools. Like others of its kind, TACT retrieves segments of text according to specified word forms. In addition, it can find words or character-strings that match criteria the user specifies. TACT generates simple graphs to show the distribution of forms throughout an entire text, or within various structural divisions determined by the user. TACT also allows retrieval by metatextual `categories'. Use of TACT begins with a stable ASCII text. Although this text need not be marked-up for TACT, in most cases intelligent markup is crucial to effective analysis. For markup the researcher uses a wordprocessor to insert simple codes according to the properties that he or she wishes to query. In a play, for example, acts, scenes, and speeches are obvious things to mark; in a novel, chapters; in a narrative poem, books and stanzas; in a lexicon, subdivisions of the entry; and so forth. The researcher may also, however, want to mark specific entities, such as proper names (of people and places), names of plants and animals, or episodes. In addition, through markup a number of hypothetical structures can be simultaneously indicated alongside those denoted by the author or editor, e.g. an alternative division of a poem into thematic units. Once the text is marked up, a TACT program known as MAKBAS converts it into a database for efficient retrieval. MAKBAS allows the user to define the collation sequence of the alphabet, special characters, and the characteristics of the tags used for markup. Working with the database, TACT can present a complete list of words from which a subset for retrieval may be selected, one word at a time. Through what is called `regular expression' capability, the user may also specify a selection rule according to a pattern of characters, including "wildcards" (for example, all words beginning with the letter "a" and ending with "ed" or "ing"). Rules may also contain operators to indicate juxtaposed words; specific words within a user-definable span, or all words within such a span of an expression; or all words resembling the chosen expression to varying degrees. Such rules may be kept in one or more ASCII files external to the program, from which specific rules may be selected; thus, for example, the user can construct a lexicon of words and expressions. Once a set of words has been selected by whatever means, it can be saved within TACT as a "category". Categories can in turn be combined to form other categories. Thus, for example, all words and expressions the user regards as indicating "love" can be saved as the category LOVE, and in addition be combined with a similar category MAD to produce the category MADLOVE. Category names can be included within rules as easily as words, so that, for example, a user could ask to see all passages in which LOVE-words occur within 2 lines of MAD-words. To take a slightly different example, a user could ask to be shown all paragraphs in which the category LOVE and the word "death" occur. In the creation of a category from a rule, the user can examine all selected locations in the text and choose which to include or exclude. This ability to choose by context is often essential. The word "heat", for examle, might be part of what is meant by "love" in some contexts but not in others, or only the noun might be relevant, not the verb. Various displays are available. Text can be displayed as KWIC (keyword-in-context) segments; as simple distribution graphs, showing how the occurrence of a set of locations is distributed through the text, or among various structural divisions; or as an index showing only a list of locations where the event occurred, with a 1 line context. A new display, added to version 1.2, can show all collocates to the selected positions in the text -- with collocates ordered by the Z-score. Displays in TACT are linked so that, for example, the user can go directly from a position in a distribution graph to the text it represents. TACT is multilingual. In order to display foreign languages, it supports the extended ASCII character set of the IBM PC, and with tools which extend the character set displays, its capabilities can be extended to many other languages, such as Greek and Old English. (Hebrew, Arabic, and languages such as Chinese are beyond its present design, however.) It supports multilingual analysis as well by allowing for proper alphabetization, convenient keyboard entry, and printing on devices that requirespecial "escape codes" to produce non-ASCII characters -- even if these sequences are different from those that would be used to enter the character from the keyboard, or display it on screen. In addition to MAKBAS and TACT, the TACT system includes a program to construct databases from very large texts (MERGEBAS) and another to search a database and find all phrases that occur more than a specified number of times (COLLGEN). Developed by: John Bradley and Lidio Presutti University of Toronto Computing Services (UTCS), Room 201, 4 Bancroft Avenue, Toronto, Ontario, M5S 1A1 Canada; fax: (416) 978-7159; John Bradley voice: (416) 978-3995; e-mail: bradleyMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuevm.utcs.utoronto.ca Lidio Presutti voice: (416) 978-5130; e-mail: lidio
vm.utcs.utoronto.ca Distributed by: TACT Distribution Centre for Computing in the Humanities, Robarts Library, Room 14297A, University of Toronto, Toronto, Ontario Canada, M5S 1A5 The developers recognize the generous support of the Centre for Computing in the Humanities, University of Toronto, and IBM Canada through its former partnership with the university. The developers are also indebted to John B. Smith's ARRAS program, by which TACT has in part been inspired. Hardware: requires standard MS-DOS platform with 640K RAM; fixed disk; DOS 2.1 or above. Cost: The CCH charges a distribution fee of $30 CDN for a copy of the program and a printed, bound copy of the documentation. (GST should be added for Canadian sales.) TACT is shareware. You are welcome to distribute copies of TACT, subject to its license, which basically permits distribution as long as long as it is not distributed for profit. Documentation: online help and a preprinted tutorial; Support: the developers are glad to answer questions about the usage or design of TACT and to receive suggestions for its improvement. Queries can be sent directly to the developers or to TACT-L
vm.utcs.utoronto.ca, the discussion group for users of TACT. [End Linguist List, Vol. 2, No. 175]