* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *


LINGUIST List 24.492

Mon Jan 28 2013

Books: Index Structures for the Exploration of Natural Language Corpora: Goller

Editor for this issue: Rebekah McClure <rebekahlinguistlist.org>

Date: 24-Jan-2013
From: Ulrich Lueders <lincom.europat-online.de>
Subject: Index Structures for the Exploration of Natural Language Corpora: Goller
E-mail this message to a friend

Title: Index Structures for the Exploration of Natural Language Corpora
Series Title: Linguistic Resources for Natural Language Processing 06
Published: 2013
Publisher: Lincom GmbH
                http://www.lincom-shop.eu

Book URL: http://www.lincom-shop.eu/

Author: Johannes Goller
Paperback: ISBN: 9783862884087 Pages: 140 Price: Europe EURO 64.80
Abstract:

This study describes the development of a large-scale corpus query system – a specialized search engine used to perform advanced types of pattern search, especially for patterns used by linguists interested in discovering syntactic phenomena in large corpora.

Beginning with a review of traditional search engine algorithms, the main focus then shifts to suffix arrays, a data structure that has been available since 1987, but is not commonly used in large-scale search engines for various technical reasons.

Recently developed algorithms are considered in this study as the starting point for a new attempt to re-introduce the suffix array as a data structure of practical value to corpus-linguistic research. One of the key findings is a technique that combines several suffix arrays using indexed bit vectors and enables the searching of layers of meta information, such as part-of-speech information and semantic labels, in parallel to searching the text. A set of algorithms operating on that data structure is presented, enabling sophisticated pattern matching, such as gap-matching and gap-filling, as well as improved methods of concordance generation. The final chapters present practical examples of how the new system is used to make linguistically relevant discoveries in real corpora.

Linguistic Field(s): Computational Linguistics
                            Text/Corpus Linguistics

Written In: English (eng)

See this book announcement on our website:
http://linguistlist.org/pubs/books/get-book.cfm?BookID=64039


Read more issues|LINGUIST home page|Top of issue

-------------------------- Major Supporters --------------------------
Brill http://www.brill.nl
Cambridge Scholars Publishing http://www.c-s-p.org
Cambridge University Press http://us.cambridge.org
Cascadilla Press http://www.cascadilla.com/
Bloomsbury Publishing (formerly The Continuum International Publishing Group) http://www.continuumbooks.com
De Gruyter Mouton http://www.degruyter.com/mouton
Edinburgh University Press http://www.eup.ed.ac.uk/
Elsevier Ltd http://www.elsevier.com/linguistics
Emerald Group Publishing Limited http://www.emeraldinsight.com/
Equinox Publishing Ltd http://www.equinoxpub.com/
European Language Resources Association - ELRA http://www.elra.info.
Georgetown University Press http://www.press.georgetown.edu
Hodder Education http://www.hoddereducation.co.uk
John Benjamins http://www.benjamins.com/
Lincom GmbH http://www.lincom.eu
MIT Press http://mitpress.mit.edu/
Morgan & Claypool Publishers
Multilingual Matters http://www.multilingual-matters.com/
Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/
Oxford University Press http://www.oup.com/us
Palgrave Macmillan http://www.palgrave.com
Peter Lang AG http://www.peterlang.com
Rodopi http://www.rodopi.nl/
Routledge (Taylor and Francis) http://www.routledge.com/
Springer http://www.springer.com
University of Toronto Press http://www.utpjournals.com/
Wiley-Blackwell http://www.wiley.com

---------------------- Other Supporting Publishers ----------------------
Association of Editors of the Journal of Portuguese Linguistics http://www.fl.ul.pt/revistas/JPL/JPLweb.htm
International Pragmatics Assoc. http://www.ipra.be
Netherlands Graduate School of Linguistics / Landelijke - LOT http://www.lotpublications.nl/
SIL International http://www.ethnologue.com/bookstore.asp
University of Nebraska Press http://www.nebraskapress.unl.edu/catalog/CategoryInfo.aspx?cid=152
Utrecht institute of Linguistics http://www-uilots.let.uu.nl/



Page Updated: 28-Jan-2013

Supported in part by the National Science Foundation       About LINGUIST    |   Contact Us       ILIT Logo
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.