Publishing Partner: Cambridge University Press CUP Extra Publisher Login

FYI: Cross-Lingual Textual Entailment for Content Synchronization at SemEval 2013 - Task 8


Author: Danilo Giampiccolo

Linguistic Field(s): Computational Linguistics

Subject Language(s): English
French
German
Italian
Spanish

FYI Body: Apologies for cross-posting.
Please circulate to any potentially interested parties.

Second Call for Participation: Cross-Lingual Textual Entailment for Content Synchronization (CLTE)
(SemEval-2013 Task 8)

CLTE website: http://www.cs.york.ac.uk/semeval-2013/task8/
CLTE discussion group: http://groups.google.com/group/clte-semeval


Following up the successful debut in 2012 [Negri et al., 2012], we are pleased to invite participants to the second round of the Cross-Lingual Textual Entailment task (CLTE) at SemEval 2013, co-located with the *SEM and NAACL-2013 conferences.

CLTE addresses textual entailment (TE) recognition under the dimension of cross-linguality, and within the challenging application scenario of content synchronization. The great potential of integrating monolingual TE recognition components into NLP architectures has been reported in several areas, including question answering, information retrieval, information extraction, and document summarization. However, mainly due to the absence of cross-lingual TE (CLTE) recognition components, similar improvements have not been achieved yet in any cross-lingual application. The CLTE task aims at prompting research to fill this gap.

Content synchronization represents an ideal application scenario to test the capabilities of advanced NLP systems. Given two documents about the same topic written in different languages (e.g. Wikipedia articles), the task consists of automatically detecting and resolving differences in the information they provide, in order to produce aligned, mutually enriched versions of the two documents. Towards this objective, a crucial requirement is to identify the information in one page that is equivalent or novel (more informative) with respect to the content of the other. The task can be naturally cast as an entailment-related problem, where bidirectional and unidirectional entailment judgments for two text fragments are respectively mapped into judgments about semantic equivalence and novelty. Alternatively, the task can be seen as a Machine Translation problem, where judgments about semantic equivalence and novelty depend on the possibility to fully or partially translate a text fragment into the other.

Task Description:

Given a pair of topically related text fragments (T1 and T2) in different languages, the CLTE task consists of automatically annotating it with one of the following entailment judgments:

- Bidirectional (T1 -> T2 & T1 <- T2): the two fragments entail each other (semantic equivalence);
- Forward (T1 -> T2 & T1 !<- T2): unidirectional entailment from T1 to T2;
- Backward (T1 !-> T2 & T1 <- T2): unidirectional entailment from T2 to T1;
- No Entailment (T1 !-> T2 & T1 !<- T2): there is no entailment between T1 and T2;

In this task, both T1 and T2 are assumed to be TRUE statements; hence in the dataset there are no contradictory pairs.

Examples:



Mozart nació en la ciudad de Salzburgo
Mozart was born in Salzburg.


Mozart nació el 27 de enero de 1756 en Salzburgo
Mozart was born in 1756 in the city of Salzburg.


Mozart nació en la ciudad de Salzburgo
Mozart was born on 27th January 1756 in Salzburg.


Mozart nació el 27 de enero de 1756 en Salzburgo
Mozart was born to Leopold and Anna Maria Pertl Mozart.



Dataset:

The dataset consists of about 1,700 cross-lingual entailment pairs (1000 for development -i.e. the CLTE 2012 Development and Test data-, and about 700 for test), balanced with respect to the 4 entailment judgments (bidirectional, forward, backward, and no entailment).

Datasets will be available for the following language combinations:
- English/Spanish
- English/German
- English/French
- English/Italian

Evaluation:

System results will be compared to the human-annotated gold standard and the metric used to evaluate system performances will be accuracy, i.e. the proportion of correct judgments out of the total number of judgments returned by the systems.

Accuracy figures will be provided for both the whole test set and for each of the 4 entailment judgment categories taken separately.

Schedule:

- November 1, 2012: Full Training Data available for participants
- February 15, 2013: Registration Deadline [for Task Participants]
- March 1, 2013: Test data release
- March 8, 2013: Task submissions deadline
- March 15, 2013: Release of individual results
- April 9, 2013: Paper submission deadline [TBC]
- April 23, 2013: Reviews Due [TBC]
- May 4, 2013: Camera ready Due [TBC]
- June, 13-14 2013: SemEval 2013 Workshop (collocated at *SemEval and NAACL, Atlanta, USA)

Task Organizers:

- Matteo Negri, FBK-irst, Trento, Italy, negri [at] fbk.eu (CONTACT)
- Yashar Mehdad, The University of British Columbia, ymahdad [at] gmail.com
- Luisa Bentivogli, FBK-irst, Trento, Italy, bentivo [at] fbk.eu
- Danilo Giampiccolo, CELCT, Italy, giampiccolo [at] celct.it
- Alessandro Marchetti, CELCT, Italy, amarchetti [at] celct.it

References:

M. Negri, A. Marchetti, Y. Mehdad, L. Bentivogli and D. Giampiccolo, 2012. Semeval-2012 Task 8: Cross-lingual Textual Entailment for Content Synchronization. In Proceedings of *SEM 2012 (.pdf: http://ixa2.si.ehu.es/starsem/proc/pdf/STARSEM-SEMEVAL053.pdf)

Links:

- CLTE Mailing list http://groups.google.com/group/clte-semeval
- CLTE task website: http://www.cs.york.ac.uk/semeval-2013/task8/
- SemEval 2013 website: http://www.cs.york.ac.uk/semeval-2013/
- SemEval discussion group: http://groups.google.com/group/semeval3
- NAACL 2013 website: http://naacl2013.naacl.org/

Back   FYI main page