LINGUIST List 8.1822

Sun Dec 21 1997

FYI: Software for Corpus Searches

Editor for this issue: Anita Huang <>


  1. Lee Hartman, Software for Corpus Searches

Message 1: Software for Corpus Searches

Date: Wed, 17 Dec 1997 09:30:11 -0600 (CST)
From: Lee Hartman <>
Subject: Software for Corpus Searches

Software for corpus searches

I'm announcing the release of a software program named
"Busca: A Searcher for word patterns in texts" (Version 3 -- December 1997).

Busca is a DOS-based program that searches a set of text
files for a specified pattern of words or for a string of
characters. When searching for a word pattern, Busca uses the
punctuation of the text to search sentence by sentence. The
word pattern is defined in terms of a focus word, with
possibilities for specifying the first, second, and/or third
neighboring word before and/or after it, as well as a "floating"
word located anywhere in the sentence. Words in the search
template can be defined in terms of their beginning (xxx-),
their ending (-xxx), a contained string (-xxx-), or their
entirety (xxx). Each word position in the template may contain
up to ten alternative forms.

Busca can be directed to search a set of texts that are
contained in a large number of files, and these files may reside
in different DOS directories.

Busca was originally designed to be used with a corpus in
Spanish -- the Argentine and Chilean texts of the "Corpus de
Referencia de la Lengua Espan~ola Contemporanea" (CRLEC),
accessible at -- but it can be used with
any set of ASCII text files that use conventional sentence
punctuation ("." and "?" and "!"). The program is available
both in English ( and in Spanish (

Busca is intended for free, non-profit distribution. Users
are requested to acknowledge Busca in publication of any
research that benefits from use of the program.

Here is the address from which to download Busca:

- ------------------------------------------------------------------
Lee Hartman
Dept. of Foreign Languages
Southern Illinois University
Carbondale, IL 62901-4521
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue