Software Details
| Title: | Spanish corpus with integrated search functions |
|---|---|
| Submitter: | Craig Schulenberg |
| Description: | As an outgrowth of our efforts to develop a Parser/Tagger for Spanish we have created a prototype program (Literature Assistant) which integrates a corpus (which has been processed by our Parser) with a 'Reader' interface and some powerful search functions. This program is entirely self-contained and employs an extremely fast database of our own design. We have no intentions of developing this program into a commercial product; rather, it is a research tool which is of great assistance to us in identifying the (many) weaknesses in our Parser, and in our Dictionary. We would appreciate feedback on the design and features of this software approach, and would be interested in collaborative efforts on Parser/Tagger implementations and corpus search algorithms. The Literature Assistant runs in a DOS window on a PC. The corpus includes 700 works (mostly novels), and menu screens allow selecting an author, a work, and (finally) a chapter or bookmark. The user then sees a 'Reader' screen which shows the complete text, and allows rapid page up/down, top-of-text, and end-of-text positioning. When a word or phrase is highlighted (by moving the cursor), the definition is shown (drawn from our 48000 word Dictionary). Conjugated verb forms are referenced back to their infinitives and their definition (based on our 13,066 verb database). If a highlighted word is selected, a second screen immediately appears which shows 'all' sentences in the corpus that use the same word/verb. On this second screen any of the cross-referenced works can then be 'jumped to' by selecting that particular sentence. In this case the user is positioned in the Reader Screen for this newly selected work. In this way all of the texts may be traversed by following these links between the two screens. The second screen (Sentence Screen) permits corpus searches. For example, the query 'gustar(se *)' will find all forms of the reflexive 'se' followed by any conjugated form of 'gustar'. All sentences (and their title and author) are shown that meet the search criteria. A special feature (Jot-a-Note) is provided which makes it easy to generate a textual commentary on any item observed on any screen. This output file can then be processed in any text editor. It is immediately clear that our Parser/Tagger is only 90-95% accurate at this point, and that our Dictionary is too small too do proper justice to these kinds of texts. Nonetheless, we believe that this is an interesting approach not only to corpus linguistics, but also to making Spanish literature more accessible and interactive. |
| Linguistic Field(s): |
Text/Corpus Linguistics |
| LL Issue: | 16.235 |
| Date Posted: | 25-Jan-2005 |


