Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more



Donate Now | Visit the Fund Drive Homepage

Amount Raised:

$34168

Still Needed:

$40832

Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington


Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

Software Details

Title: CLAIR Library version
Submitter: Bryan Gibson
Description: Clairlib, The Clair Library
version 1.03 now available
http://belobog.si.umich.edu/clair/clairlib

INTRODUCTION
The University of Michigan's CLAIR (Computational Linguistics And Information Retrieval) group is happy to present version 1.03 of clairlib, the Clair library. The Clair library is intended to simplify a number of generic tasks in Natural Language Processing (NLP), Information Retrieval (IR), and Network Analysis. Its architecture allows for external software to be plugged in with very little effort. Two distributions of the Clair library are available: Clairlib-core, with essential functionality and minimal dependence on external software, and Clairlib-ext, with extended functionality. Work is underway on Clairlib-bio and Clairlib-polisci, extensions that will be of interest to people working on Bioinformatics and Political Science.

FUNCTIONALITY
Native in Clairlib-core: Tokenization, Summarization, LexRank, Biased LexRank, Document Clustering, Document Indexing, PageRank, Biased PageRank, Web Graph Analysis, Network Generation, Power Law Distribution Analysis, Network Analysis (clustering coefficient, degree distribution plotting, average shortest path, diameter, triangles, shortest path matrices, connected components), Cosine
Similarity, Random Walks on Graphs, Statistics (distributions, tests), Tf, Idf, Community Finding*, Phrase-Based Queries*, Fuzzy OR Queries*
Imported functionality into Clairlib-core: Stemming, Sentence Segmentation, Web Page Download, Web Crawling, XML Parsing, XML Tree Building, XML Writing

Clairlib-ext features: Sentence Segmentation using MxTerminator, Sentence Parsing using the Charniak Parser and Chunklink
* New and expanded functionality available this latest release

CHANGES
1.03 August 2007
* Added functionality to perform community finding within weighted, undirected networks
* Added util/chunk_document.pl - breaks documents into smaller files by word number
* Added option to retain punctuation for idf and tf queries
* Added option to print out full lists of idf and tf values for corpus
* LexRank moved from Clair::Network to Clair::Network::Centrality::LexRank
* LexRank use now follows same use pattern as the other centrality modules
1.02 July 2007
* Distribution reorganized in standard format
* Updated overall documentation
1.01 May 2007
* Added Phrase-based Retrieval and Fuzzy OR Queries
* Extended Clairlib-ext with interfaces for Cluster class and Document class to Weka machine learning toolkit
* Added LSI functionality
* Extended parsing of strings/files into Documents
* Added perceptron learning and classification

DOWNLOAD
Visit http://belobog.si.umich.edu/clair/clairlib/ or write to radev@umich.edu to get a copy. Write to radev@umich.edu to receive beta versions of Clairlib-bio or Clairlib-polisci.

FUNDING
This work has been supported in part by National Institutes of Health grants R01 LM008106 "Representing and Acquiring Knowledge of Genome Regulation" and U54 DA021519 "National center for integrative bioinformatics", as well as by grants IDM 0329043 "Probabilistic and link-based Methods for Exploiting Very Large Textual Repositories," 0534323 "Collaborative Research: BlogoCenter - Infrastructure for Collecting, Mining and Accessing Blogs," and 0527513 "The Dynamics of Political Representation and Political Rhetoric," from the National Science Foundation.

ABOUT
The Clair Library is developed by the Clair group at the University of Michigan.
Project design: Dragomir R. Radev
Main implementers: Jonathan dePeri, Anthony Fader, Joshua Gerrish, Bryan Gibson, Mark Hodges, Mark Joseph, Dragomir Radev, and Mark Schaller
Additional code by: Timothy Allison, Michael Dagitses, Aaron Elkiss, Gunes Erkan, Scott Gifford, Patrick Jordan, Samuela Pollack, and Adam Winkel
Linguistic Field(s): Computational Linguistics

LL Issue: 18.2650
Date Posted: 12-Sep-2007

Search Again

Back to Software Index