Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more



Donate Now | Visit the Fund Drive Homepage

Amount Raised:

$34674

Still Needed:

$40326

Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington


Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


E-mail this page

Conference Information



Full Title: EACL 2014 Workshop on Web as Corpus

   
Short Title: WAC-9
Location: Gothenburg, Sweden
Start Date: 26-Apr-2014 - 26-Apr-2014
Contact: Felix Bildhauer
Meeting Email: click here to access email
Meeting URL: http://www.sigwac.org.uk/wiki/WAC9
Meeting Description: The 9th Web as Corpus Workshop (WAC-9)
Endorsed by the Special Interest Group of the ACL on Web as Corpus
(http://www.sigwac.org.uk/)

The World Wide Web has become increasingly popular as a source of linguistic data, not only within the NLP communities, but also with theoretical linguists facing problems of data sparseness or data diversity. Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/text types. However, the field is still new, and a number of
issues in web corpus construction still needs much research (fundamental and applied), ranging from questions of corpus design (e.g., corpus composition assessment, sampling strategies and their relation to crawling algorithms, handling of duplicated material) to more technical aspects (e.g., efficient implementation of individual post-processing steps in document cleansing and linguistic annotation, or large-scale parallelization to achieve web-scale corpus construction). Similarly, the systematic evaluation of web corpora, for example in the form of task-based comparisons to traditional corpora, has only lately shifted into focus.

For almost a decade, the ACL SIGWAC, and especially the Web as Corpus (WAC) workshops have served as a platform for researchers interested in building and working with web-derived corpora. Past workshops have been co-located with major conferences on computational linguistics and/ or corpus linguistics (such as EACL, LREC, WWW, Corpus Linguistics). As part of the workshop, we will have a panel discussion dedicated to the planning of a shared task for WAC10 (2015), including the nomination of organizers of the shared task. The tracks of the shared task will focus on the quality of web corpus creation tools, tools for linguistic annotation (at least lemmatization, possibly also POS tagging, etc.), and the quality of web corpora themselves.

Organizing Committee:

Felix Bildhauer, Freie Universität Berlin
Roland Schäfer, Freie Universität Berlin

Program Committee:

Organizing Committee, plus:

Adrien Barbaresi, École Normale Supérieure de Lyon
Silvia Bernardini, Università di Bologna
Chris Biemann, Technische Universität Darmstadt
Jesse Egbert, Northern Arizona University
Stefan Evert, Friedrich-Alexander Universität Erlangen-Nürnberg
Adriano Ferraresi, Università di Bologna
William Fletcher, United States Naval Academy
Dirk Goldhahn, Universität Leipzig
Adam Kilgarriff, Lexical Computing Ltd.
Anke Lüdeling, Humboldt-Universität zu Berlin
Alexander Mehler, Goethe-Universität Frankfurt am Main
Uwe Quasthoff, Universität Leipzig
Paul Rayson, Lancaster University
Serge Sharoff, University of Leeds
Sabine Schulte, im Walde, Universität Stuttgart
Egon Stemle, European Academy of Bolzano
Yannick Versley, Universität Heidelberg
Torsten Zesch, Universität Darmstadt
Stephen Wattam, Lancaster University
Linguistic Subfield: Computational Linguistics; Text/Corpus Linguistics
LL Issue: 25.1203

This is a session of the following meeting:
14th Conference of the European Chapter of the Association for Computational Linguistics

Back
Calls and Conferences main page