|
Working Group Instructions |
|
Page Index: Note: Each group member should read the position papers of the other members of your working group. The position papers are linked below the names in the participant list. The Working Groups:Group 1: Tools interoperability and input/output formats This group is asked to focus on audio and video annotation, with a focus on the annotation formats of existing multimodal annotation tools like, ELAN, Eclipse (=TASX), Exmaralda, TableTrans, Transcriber, etc. Assess the challenges to making diverse annotation formalisms, like those exemplified by these tools, interoperable. Members: Good (Co-Chair), Cochran (Co-Chair), Jacobson, Williams, Udoh, James, Miller, Avelino, Stephens Advance reading: Bird, Steven and Liberman, Mark. A Formal Framework for Linguistic Annotation. http://xxx.lanl.gov/pdf/cs/9903003 Roelfing, K. et al. 2006. Comparison of multimodal annotation tools. http://www.gespraechsforschung-ozs.de/heft2006/tb-rohlfing.pdf Good, Jeff. 2006. The Ecology of Documentary and Descriptive Work. http://www.emeld.org/workshop/2006/papers/ToolEcology-1.pdf Report of the E-MELD 2003 Working Group on text annotation http://www.emeld.org/workshop/2003/textannotation-summary.pdf Report of the E-MELD 2006 Working Group on annotation tools http://www.emeld.org/workshop/2006/wg/wg2-report.pdf Tasks: Day 1: Horizontal interoperability, e.g., interoperability among the formats of annotation tools: Discuss the question, "What features are require of an annotation format to allow for straightforward interoperability with other formats?" For example, assuming that some tools will allow for types of annotation (e.g., speaker metadata) that other tools will not allow for, what would it take to ensure that such annotations do not get lost if one converts back and forth between the formats used by such tools? Similarly, how can we ensure that annotations made on the same dataset using different tools can be straightforwardly integrated with each other? Can a general set of desiderata for annotation formats which would promote interoperability be devised? Day 2: Vertical interoperability: Often the linguist does not need to migrate data from one multimodal annotation tool to another, but rather from a given annotation tool, e.g., Elan, to other tools, e.g., Microsoft Word, in order to create a presentation form of their data. What steps and tools are required to migrate data annotations from their original formats to presentation formats? Which of these steps are well-supported and which are poorly supported? Can data migration of this type be designed to work in ways that facilitate interoperability? For example, is it possible to ensure that data given in presentation formats can always be trivially associated with the original data on which the presentation format was based? Can general desiderata be given for the features of a best-practice migration path to presentation formats? -- top --Group 2: Lexicon schemas and related data models There are many proposed lexicon schemas and data models but, so far, no standard, de facto or otherwise. Arguably, this is the most salient gap in discipline-specific standards and one that only linguists can fill. Members: Trippel (Co-Chair), Maxwell (Co-Chair), Corbett, Prince, Manning, Grimes, Moran, Mittelbach Advance reading: Lexical Markup Framework of ISO. Draft standard. http://www.tc37sc4.org/new_doc/ISO_TC37_SC4_N330_LMF_rev13_ForCDBallot.pdf Hosken, Martin. 2007. Lexicon Interchange Format (LIFT): A description. http://lift-standard.googlecode.com/files/lift_10.pdf See the examples, beginning on page 24. McCormick, Susan. The Structure and Content of the Body of an OLIF v.2.0/2.1 File http://www.olif.net/documents/NewOLIFstruct&content.pdf Ide, N., Lenci, A., Calzolari, N. 2003. RDF Instantiation of ISLE/MILE Lexical Entries. Proceedings of ACL 03 http://www.cs.vassar.edu/~ide/papers/ACL03-ws-ISLE.pdf Evans, Roger and Gazdar, Gerald. 1996. DATR: A Language for Lexical Knowledge Representation. Computational http://acl.ldc.upenn.edu/J/J96/J96-2002.pdf" Tasks: Day 1: Discuss the primary similarities and differences among the lexicon schemas. A useful exercise might be to attempt to represent the same lexical entry in each of the formats, or to transform one format into another. If they are interoperable, there ought to be a mapping. Day 2: Do any of these schemas look like they could become the de facto standard we are looking for? What would it take to make your answer Yes? -- top --Group 3: Ontologies Attempts to reference terminology to a common set of concepts have been pursued through at least 3 strategies: unifying termsets (DatCats), constructing a top-down ontology (GOLD), and building bottom-up ontologies from data. Compare the approaches on the basis of potential for promoting interoperability of language resources. Members: Brown (Co-Chair), Witt (Co-Chair), Dimitriadis, Appleby, Bernth, Whalen, McCord, Dahl Advance reading: Farrar, S. O. & Lewis, W. D. (to appear), The GOLD Community of Practice: An Infrastructure for Linguistic Data http://faculty.washington.edu/wlewis2/papers/FarLew06.pdf See also: current community-based work on GOLD: http://wiki.linguistlist.org/ontowiki/ Wright, Sue Ellen. 2004. A Global Data Category Registry for Interoperable Language Resources. Dimitriadis, Alexis, Adam Saulwick, and Menzo Windhouwer. 2005. Semantic relations in ontology mediated linguistic data integration. Paper presented at E-MELD 2005 workshop in Cambridge, MA. http://emeld.org/workshop/2005/papers/saulwick-paper.doc Tasks: Day 1: Identify lacunae and potential difficulties with each of the three approaches. Below is an excerpt from a paper about GOLD at E-MELD 2006: Upon reflection, we believe that there are presently three significant barriers to the widespread adoption of GOLD and subsequent realization of the interoperation goals, vis:
Hughes & Simons, 2006. GOLD as a Standard for Linguistic Data Interoperation. http://linguistlist.org/emeld/workshop/2006/papers/SimonsHughes.doc Do these difficulties apply to other approaches as well as GOLD? Day 2: Sketch a proposal to address some or all of the problems identified. -- top --Group 4: Unifying Corpus Annotation (may include use of ontologies) Members: Cavar (Co-Chair), Pustejovsky (Co-Chair), Bird, Hardman, Palmer, Choukri, Shroeter, Loehr, Cieri Advance reading: Pustejovsky, James, Adam Meyers, Martha Palmer, Massimo Poesio. 2005. Merging PropBank, NomBank, http://acl.ldc.upenn.edu/W/W05/W05-0302.pdf Schmidt, Thomas, Christian Chiarcos, Timm Lehmberg, Georg Rehm, Andreas Witt, Erhard Hinrichs. 2006. http://emeld.org/workshop/2006/papers/schmidt.html Ide, N., Romary, L. 2004. International standard for a linguistic annotation framework. Journal of Naturaql Language Engineering, 10:3-4, 211-225. http://www.cs.vassar.edu/~ide/papers/JNLE-rev.pdf Tasks: Day 1: Compare existing attempts to integrate annotations from different corpora. How different are the outputs? Day 2: What would be required to standardize corpus annotation or generalize the merger of annotation schemes? If possible, sketch a proposal. -- top --Group 5: Web Services (and standards needed for services) Members: Simons (chair), Sevigny (Co-Chair), Park, Kibort, Legg, Pyatt, Lowe, Cash Cash, Chang, Kendall Advance reading: OReilly, Tim. 2005. What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html Lossau, Norbert. 2004. Search Engine Technology and Digital Libraries: Libraries Need to Discover the Academic http://www.dlib.org/dlib/june04/lossau/06lossau.html Van de Sompel, H. et al. 2004. Rethinking Scholarly Communication: Building the System that Scholars Deserve. http://www.dlib.org/dlib/september04/vandesompel/09vandesompel.html Report of DTSL Working Group 5 on Web Services. http://linguistlist.org/tilr/2007/formatted/DTSL_WG5-1.pdf Tasks: Day 1: Discuss the deployment of existing services, including services being developed by digital libraries. What services seem the most urgent or important? What are the barriers to development and/or to uptake of existing services? Day 2: Advance a proposal for new web services, including any standards that need to be developed in order to implement them. -- top --Group 6: Standards and Data Models Members: Thieberger (Co-Chair), Hinrichs (Co-Chair), Cysouw, Sloetjes, Yi, Veselinova, Langendoen, Beck, Anderson Advance reading: Rumble, John Jr., Bonnie Carroll, Gail Hodge, and Laura Bartolo. 2005. Developing and Using Standards for Data http://www.infointl.com/pdf/developing_using_standards.pdf Ide, Nancy. 2006. Linguistic Annotation Framework. http://www.tc37sc4.org/new_doc/ISO_TC_37_SC4_N311_Linguistic%20Annotation%20Framework.pdf Also, please - Scan the proceedings of E-MELD 2002 and 2003 looking for examples of linguistic standards: http://linguistlist.org/emeld/workshop/2002/ http://linguistlist.org/emeld/workshop/2003/ For example: Bowe, Cathy, Steven Bird, & Baden Hughes. 2003. Toward a General Model of Interlinear Text. Tasks: Day 1: Discuss implementation of existing standards, including barriers to uptake. Create a table (like Table 2 in the Rumble article) of Data and Information Standards for Linguistics, noting where there is overlap or competition between standards. If possible, also classify them as formal, informal, implicit, etc. Day 2: Suggest what the community needs to do in order to develop the standards it needs. In what areas are we missing standards altogether? In each area where we have competing standards or at least numerous candidates, what looks like the way forward? What do we need to do as a community to reach needed consensus? If possible, come up with specific proposals. -- top --Joint Working Group Session As might be expected in a workshop on interoperability, the activities of one working group have implications for those of the other working groups. For that reason, we have planned one working group session on Day 2, in which two working groups will meet together and discuss the mutual impact of their conclusions. We list below some tentative pairings, but we realize that others are possible. For example, Ontologies (WG 3) seems relevant to Lexicons (WG 2), to Annotation (WG4), and to Standards (WG6). Consequently, these pairings may change, depending on the direction the working groups take and the feedback we receive. Pairings: - Working Group 3 (Ontologies) and 4 (Automatic annotation) - Working Group 2 (Lexicon Schemas) and 6 (Data Standards) - Working Group 1 (Annotation Tools) and 5 (Web Services) -- top -- |