Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more



Donate Now | Visit the Fund Drive Homepage

Amount Raised:

$34378

Still Needed:

$40622

Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington


Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


E-mail this page

Conference Information



Full Title: 8th Web as Corpus Workshop

   
Short Title: WAC8
Location: Lancaster, United Kingdom
Start Date: 22-Jul-2013 - 22-Jul-2013
Contact: Stefan Evert
Meeting Email: click here to access email
Meeting URL: http://sigwac.org.uk/wiki/WAC8
Meeting Description: 8th Web as Corpus Workshop (WAC-8)
Endorsed by ACL SIGWAC
Hosted by the Corpus Linguistics 2013 Conference
Monday, 22 July 2013 (Lancaster, UK)

Web corpora and other Web-derived data have become a gold mine for corpus linguistics and natural language processing. The Web is an easy source of unprecedented amounts of linguistic data from a broad range of registers and text types. However, a collection of Web pages is not immediately suitable for exploration in the same way a traditional corpus is.

Since the first Web as Corpus Workshop organised at the Corpus Linguistics 2005 Conference, a highly successful series of yearly Web as Corpus workshops provides a venue for interested researchers to meet, share ideas and discuss the problems and possibilities of compiling and using Web corpora. After a stronger focus on application-oriented natural language processing and Web technology in recent years - with workshops taking place at NAACL-HLT 2010, 2011 and WWW 2012 - the 8th Web as Corpus Workshop returns to its roots in the corpus linguistics community.

Accordingly, the leading theme of this workshop is the application of Web data in language research, including linguistic evaluation of Web-derived corpora as well as strategies and tools for high-quality automatic annotation of Web text. The workshop brings together presentations on all aspects of building, using and evaluating Web corpora, with a particular focus on the following topics:

- Applications of Web corpora and other Web-derived data sets for language research
- Automatic linguistic annotation of Web data such as tokenisation, part-of-speech tagging, lemmatisation and semantic tagging (the accuracy of currently available off-the-shelf tools is still unsatisfactory for many types of Web data)
- Critical exploration of the characteristics of Web data from a linguistic perspective and its applicability to language research
- Presentation of Web corpus collection projects or software tools required for some part of this process (crawling, filtering, de-duplication, language identification, indexing, etc.)
Linguistic Subfield: Computational Linguistics; Text/Corpus Linguistics
LL Issue: 24.2464


Back
Calls and Conferences main page