LINGUIST List 12.2303

Wed Sep 19 2001

FYI: Gsearch Corpus Query, German Treebank Sampler

Editor for this issue: Karen Milligan <>


  1. Frank Keller, Available for download: Gsearch Corpus Query System
  2. TIGER corpus team, German treebank sampler

Message 1: Available for download: Gsearch Corpus Query System

Date: Fri, 14 Sep 2001 12:15:42 +0200 (MET DST)
From: Frank Keller <kellerCoLi.Uni-SB.DE>
Subject: Available for download: Gsearch Corpus Query System

- -------------------------
- -------------------------

We are pleased to announce the immediate availability of Gsearch 2.06,
free of charge for research purposes.

The Gsearch corpus query system allows the selection of sentences by
syntactic criteria from text corpora, even when these corpora contain
no prior syntactic markup. This is achieved by means of a fast chart
parser, which takes as input a grammar and a search expression
specified by the user.

Among the major features of Gsearch are:

* runs under Solaris, Linux, and MacOS X;

* simple to install, based on GNU automake/autoconf;

* supports standard corpora (including BNC, Brown, Susanne, WSJ,
 Frankfurter Rundschau, Negra);

* can be easily extended to new corpora;

* supports standard taggers (LT POS, TnT);

* interfaces with external linguistic resources such as WordNet;

* outputs syntax trees in SGML, but also interfaces with external
 visualization tools (Viewtree, Thistle);

* comes with a tool for random sampling of Gsearch output.

For more information about Gsearch, and to download the latest
version, please visit:

Bug reports, suggestions for enhancements should be sent to:


Gsearch Deveopment Team
Martin Corley, University of Edinburgh
Frank Keller, Saarland University
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: German treebank sampler

Date: Fri, 14 Sep 2001 08:51:02 +0200
From: TIGER corpus team <>
Subject: German treebank sampler

The TIGER German treebank sampler has been released!

A large syntactically annotated corpus of German newspaper text is
under construction in the TIGER project - with project partners in
Saarbruecken, Potsdam, and Stuttgart In order to get feedback from the
research community, the TIGER project team has relased a sampler of
the TIGER corpus:	
	The TIGER corpus is annotated with 'syntax graps', a generalization of
	syntax trees, in order to be able to account fo phenomena involving
	discontinuous constituents. E.g
	- long distance dependencies are encoded by crossing edges
	- coreference in coordination is represented by 'secondary edges'
	More details of the annotation scheme are available online, where you can
	also explore the TIGER corpus sampler interactively.
	The TIGER project team.
	Department of Computational Linguistics, Saarland University
	Institut fuer Germanistik, University of Potsdam
	Department of Natural Language Processing (IMS), University of Stuttgart
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue