LINGUIST List 22.720|
Fri Feb 11 2011
Calls: Semantics, Computational Ling/USA
Editor for this issue: Amy Brunett
LINGUIST is pleased to announce the launch of an exciting new feature: Easy Abstracts! Easy Abs is a free abstract submission and review facility designed to help conference organizers and reviewers accept and process abstracts online. Just go to: http://www.linguistlist.org/confcustom, and begin your conference customization process today! With Easy Abstracts, submission and review will be as easy as 1-2-3!
1. Eugenie Giesbrecht ,
Distributional Semantics and Compositionality Workshop
Message 1: Distributional Semantics and Compositionality Workshop
From: Eugenie Giesbrecht <giesbrechtfzi.de>
Subject: Distributional Semantics and Compositionality Workshop
E-mail this message to a friend
Full Title: Distributional Semantics and Compositionality Workshop
Short Title: DiSCo'2011 ACL/HLT
Date: 24-Jun-2011 - 24-Jun-2011
Location: Portland, Oregon, USA
Contact Person: Eugenie Giesbrecht
Meeting Email: < click here to access email >
Web Site: http://disco2011.fzi.de/
Linguistic Field(s): Computational Linguistics; Semantics
Call Deadline: 01-Apr-2011
ACL/HLT Workshop on Distributional Semantics and Compositionality (DiSCo'2011)
June 24, 2011, Portland, Oregon, USA
Any NLP system that does semantic processing relies on the assumption of semantic compositionality: the meaning of a phrase is determined by the meanings of its parts and their combination. However, this assumption does not hold for lexicalized phrases such as idiomatic expressions, which causes pain points not only for semantic, but also for syntactic processing, (see Sag et al. 2001). In particular, while distributional methods in semantics have proved to be very efficient in tackling a wide range of tasks in natural language processing, e.g., document retrieval, clustering and classification, question answering, query expansion, word similarity, synonym extraction, relation extraction, textual advertisement matching in search engines, etc. (see Turney and Pantel 2010 for a detailed overview), they are still strongly limited by being inherently word-based. While dictionaries and other lexical resources contain multiword entries, these are expensive to obtain, not available for all languages to a sufficient extent, the definition of a multiword varies across resources and non-compositional phrases are merely a subclass of multiwords. The workshop brings together researchers that are interested in extracting non-compositional phrases from large corpora by applying distributional models that assign a graded compositionality score to a phrase as well as researchers interested in expressing compositional meaning with such models. This score denotes the extent to which the compositionality assumption holds for a given expression. The latter can be used, for example, to decide whether the phrase should be treated as a single unit in applications. We emphasize that the focus is on automatically acquiring semantic compositionality. Approaches that employ prefabricated lists of non-compositional phrases should consider a different venue.
This event consists of a main session and a shared task.
Ivan A Sag, Timothy Baldwin, Francis Bond, Ann Copestake, Dan Flickinger (2001): Multiword
Expressions: A Pain in the Neck for NLP. In Proc. of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2002), Mexico City, Mexico
Turney, P. and P. Pantel. (2010). From Frequency to Meaning: Vector Space Models of Semantics. Journal of Artificial Intelligence Research, 37, 141-188.
2nd Call for Papers:
Test data release: March 31, 2011
Regular paper submission deadline: April 1, 2011
Test data submission and system description deadline: April 8, 2011
Notification of acceptance: Apr 25, 2011
Camera-ready deadline: May 06, 2011
For the main session, we invite submission of papers on the topic of automatically acquiring a model for semantic compositionality. This includes, but is not limited to:
- Models of Distributional Similarity
- Graph-based models over word spaces
- Vector-space models for distributional semantics
- Applications of semantic compositionality
- Evaluation of semantic compositionality
Authors are invited to submit papers on original, unpublished work in the topic area of this workshop. In addition to long papers presenting completed work, we also invite short papers and demos:
- Long papers should present completed work and should not exceed 8 pages plus 1 page of references
- Short papers/demos can present work in progress or the description of a system, and should not exceed 4 pages plus 1 page of references.
As reviewing will be blind, please ensure that papers are anonymous. The papers should not include the authors' names and affiliations or any references to web sites, project names etc., revealing the authors' identity.
Shared Task: Call for Participation
The organizers extracted candidate phrases from two large-scale freely available web-corpora, UkWaC and DeWaC (cf. http://wacky.sslmit.unibo.it/), containing respectively English and German POS tagged text. These data have been manually evaluated for compositionality with Amazon Turk. Workers were presented a sentence with a bolded target phrase and were asked to score how literal the phrase was between 0 and 10. 4-5 different, randomly sampled sentences from the WaCKy corpora for UK English and German were presented to 4 workers each.
Phrases consist of two lemmas and come in three grammatical relations:
- ADJ_NN: adjective modifying a noun
- V_SUBJ: noun as a subject of a verb
- V_OBJ: noun as an object of a verb
Phrases were extracted semi-automatically. The relations were assigned by patterns and manually checked for validity. Phrases were selected in a way as to balance the data set while controlling for frequency. The complete data was split into 40% training, 10% validation and 50% test.
More details on the data set as well as the download link to the training and validation data are available from the workshop's website (http://disco2011.fzi.de/)
Participants of the task are free to choose whatever method and data resources they will use in their submission. Prefabricated lists of multiwords are not allowed. Since the data set is derived from the WaCkY corpora, participants are strongly encouraged to use these freely available text collections to build their models of compositionality, thus ensuring the highest possible comparability of results. Furthermore, since the WaCkY corpora are provided already POS tagged and lemmatized, the workload on the participants' side is considerably reduced. This information (POS tags and lemmatization) may or may not be used by the participants. If needed, additional linguistic annotations or processing may also be added to the corpora. For obtaining the WaCky corpora, please email us (disco2011workshopgmail.com) for instructions to minimize load on the WaCky organizers. Of course, you can also directly contact the WaCky community at http://wacky.sslmit.unibo.it/doku.php?id=start.
Participants need to further submit a 4 page system description for publication in the workshop volume.
- Enrique Alfonseca, Google Research, Switzerland
- Tim Baldwin, University of Melbourne, Australia
- Marco Baroni, University of Trento, Italy
- Paul Buitelaar, National University of Ireland, Ireland
- Chris Brockett, Microsoft Research, Redmond, US
- Tim van de Cruys, INRIA, France
- Stefan Evert, University of Osnabrück, Germany
- Antske Fokkens, Saarland University, Germany
- Silvana Hartmann, TU Darmstadt, Germany
- Alfio Massimiliano Gliozzo, IBM, Hawthorne, NY, USA
- Mirella Lapata, University of Edinburgh, UK
- Ted Pedersen, University of Minnesota, Duluth, USA
- Yves Peirsman, Stanford University, USA
- Peter D. Turney, National Research Council Canada, Canada
- Magnus Sahlgren, Gavagai, Sweden
- Serge Sharoff, University of Leeds, UK
- Anders Søgaard, University of Copenhagen, Denmark
- Daniel Sonntag, German Research Center for AI, Germany
- Diana McCarthy, Lexical Computing Ltd., UK
- Dominic Widdows, Google, USA
- Chris Biemann, San Francisco, USA
- Eugenie Giesbrecht, FZI Research Center for Information Technology at the University of Karlsruhe, Germany
- Emiliano Guevara, Institute for Linguistics and Scandinavian Studies, University of Oslo, Norway
Contact email: disco2011workshop gmail.com
Read more issues|LINGUIST home page|Top of issue
Page Updated: 11-Feb-2011
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.