Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more

Donate Now | Visit the Fund Drive Homepage

Amount Raised:


Still Needed:


Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington

Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info

Software Details

Title: CLAIRLIB 1.0 Release
Submitter: Mark Joseph
Description: Clairlib, The Clair Library

version 1.0 is now available



The University of Michigan's CLAIR (Computational Linguistics And
Information Retrieval) group is happy to present version 1.0 of
clairlib, the Clair library.

The Clair library is intended to simplify a number of generic tasks in
Natural Language Processing (NLP), Information Retrieval (IR), and
Network Analysis. Its architecture also allows for external software
to be plugged in with very little effort.

Two distributions of the Clair library are available: Clairlib-core,
with essential functionality and minimal dependence on external
software, and Clairlib-ext, with extended functionality that may be
of interest to a smaller audience. Work is underway on Clairlib-bio
and Clairlib-polisci, extensions that will be of interest to people
working on Bioinformatics and Political Science.


Native in Clairlib-core: Tokenization, Summarization, LexRank,
Biased LexRank, Document Clustering, Document Indexing, PageRank,
Biased PageRank, Web Graph Analysis, Network Generation*, Power
Law Distribution Analysis*, Network Analysis* (clustering
coefficient, degree distribution plotting, average shortest path,
diameter, triangles, shortest path matrices, connected components),
Cosine Similarity*, Random Walks on Graphs*, Statistics*
(distributions, tests), Tf, Idf

Imported functionality into Clairlib-core: Stemming, Sentence
Segmentation, Web Page Download, Web Crawling, XML Parsing*,
XML Tree Building*, XML Writing*

Clairlib-ext features: Sentence Segmentation using MxTerminator,
Sentence Parsing using the Charniak Parser and Chunklink

* New and expanded functionality available for the first time in this
latest release.


Visit http://tangra.si.umich.edu/clair/clairlib/ or write to
radev@umich.edu to get a copy. Researchers doing work on
Bioinformatics or Political Science can write to
radev@umich.edu to receive beta versions of Clairlib-bio or


This work has been supported in part by National Institutes of Health
grants R01 LM008106 'Representing and Acquiring Knowledge of Genome
Regulation' and U54 DA021519 'National center for integrative
bioinformatics', as well as by grants IDM 0329043 'Probabilistic and
link-based Methods for Exploiting Very Large Textual Repositories,'
0534323 'Collaborative Research: BlogoCenter - Infrastructure
for Collecting, Mining and Accessing Blogs,' and DHB 0527513 'The
Dynamics of Political Representation and Political Rhetoric,' from
the National Science Foundation.


The Clair Library is developed by the Clair group at the University
of Michigan.

Project design: Dragomir R. Radev

Main implementers: Anthony Fader, Joshua Gerrish, Mark Hodges,
Dragomir Radev, and Mark Schaller

Additional code by: Timothy Allison, Michael Dagitses, Aaron Elkiss,
Gunes Erkan, Scott Gifford, Patrick Jordan, Mark Joseph, Samuela
Pollack, and Adam Winkel
Linguistic Field(s): Computational Linguistics

LL Issue: 18.1167
Date Posted: 17-Apr-2007

Search Again

Back to Software Index