Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

Holy Sh*t: A Brief History of Swearing

By Melissa Mohr

Holy Sh*t: A Brief History of Swearing "contains original research into the history of swearing, and is scrupulous in analyzing the claims of other scholars."


New from Cambridge University Press!

ad

A New Manual of French Composition

By R. L. Graeme Ritchie

A New Manual of French Composition "provides a guide to French composition aimed at university students and the higher classes in schools. "


Book Information

   
Sun Image

Title: Index Structures for the Exploration of Natural Language Corpora
Written By: Johannes Goller
URL: http://www.lincom-shop.eu/
Series Title: Linguistic Resources for Natural Language Processing 06
Description:

This study describes the development of a large-scale corpus query system – a specialized search engine used to perform advanced types of pattern search, especially for patterns used by linguists interested in discovering syntactic phenomena in large corpora. Beginning with a review of traditional search engine algorithms, the main focus then shifts to suffix arrays, a data structure that has been available since 1987, but is not commonly used in large-scale search engines for various technical reasons. Recently developed algorithms are considered in this study as the starting point for a new attempt to re-introduce the suffix array as a data structure of practical value to corpus-linguistic research. One of the key findings is a technique that combines several suffix arrays using indexed bit vectors and enables the searching of layers of meta information, such as part-of-speech information and semantic labels, in parallel to searching the text. A set of algorithms operating on that data structure is presented, enabling sophisticated pattern matching, such as gap-matching and gap-filling, as well as improved methods of concordance generation. The final chapters present practical examples of how the new system is used to make linguistically relevant discoveries in real corpora.

Publication Year: 2013
Publisher: Lincom GmbH
Review: Not available for review. If you would like to review a book on The LINGUIST List, please login to view the AFR list.
BibTex: View BibTex record
Linguistic Field(s): Computational Linguistics
Text/Corpus Linguistics
Issue: All announcements sent out by The LINGUIST List are emailed to our subscribers and archived with the Library of Congress.
Click here to see the original emailed issue.

Versions:
Format: Paperback
ISBN-13: 9783862884087
Pages: 140
Prices: Europe EURO 64.80