Editor for this issue: Karen Milligan <karen
linguistlist.org>
- ------------------------- GSEARCH CORPUS QUERY SYSTEM - ------------------------- We are pleased to announce the immediate availability of Gsearch 2.06, free of charge for research purposes. The Gsearch corpus query system allows the selection of sentences by syntactic criteria from text corpora, even when these corpora contain no prior syntactic markup. This is achieved by means of a fast chart parser, which takes as input a grammar and a search expression specified by the user. Among the major features of Gsearch are: * runs under Solaris, Linux, and MacOS X; * simple to install, based on GNU automake/autoconf; * supports standard corpora (including BNC, Brown, Susanne, WSJ, Frankfurter Rundschau, Negra); * can be easily extended to new corpora; * supports standard taggers (LT POS, TnT); * interfaces with external linguistic resources such as WordNet; * outputs syntax trees in SGML, but also interfaces with external visualization tools (Viewtree, Thistle); * comes with a tool for random sampling of Gsearch output. For more information about Gsearch, and to download the latest version, please visit: http://www.hcrc.ed.ac.uk/gsearch/ Bug reports, suggestions for enhancements should be sent to: gsearch-devMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecogsci.ed.ac.uk Sincerely, Gsearch Deveopment Team Martin Corley, University of Edinburgh Frank Keller, Saarland University
The TIGER German treebank sampler has been released! A large syntactically annotated corpus of German newspaper text is under construction in the TIGER project - with project partners in Saarbruecken, Potsdam, and Stuttgart In order to get feedback from the research community, the TIGER project team has relased a sampler of the TIGER corpus: http://www.ims.uni-stuttgart.de/projekte/TIGER/ The TIGER corpus is annotated with 'syntax graps', a generalization of syntax trees, in order to be able to account fo phenomena involving discontinuous constituents. E.g - long distance dependencies are encoded by crossing edges - coreference in coordination is represented by 'secondary edges' More details of the annotation scheme are available online, where you can also explore the TIGER corpus sampler interactively. --- The TIGER project team. Department of Computational Linguistics, Saarland University Institut fuer Germanistik, University of Potsdam Department of Natural Language Processing (IMS), University of Stuttgart email: tigercorpusMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueims.uni-stuttgart.de