WORKING GROUPS

Endangered Languages Information

and Infrastructure Workshop

Working Group Instructions

Background:   Working groups are being asked to envision an online database and catalogue which is designed to offer the most accurate information available on the world’s endangered languages (ELs).  The facility will provide information on number and location of speakers, genetic affiliation, and any unusual typological traits which have been identified in the language.  It will also list the existing dictionaries, grammars, texts, and other relevant documentation materials on each endangered language, as well as any documentation projects which have been launched or planned.  The goal is to provide a scholarly site which allows a user to assess each language’s degree of endangerment, the potential loss to science should the language remain undocumented, and the amount of information already available on the language. 

As envisioned by the workshop organizers, the site would be integrated with LINGUIST List’s MultiTree (http://linguistlist.org/multitree/) and LL-MAP (http://ll-map.org) projects, so that the EL information will be accessible via a map interface and a language family interface, as well as the project’s own interface.  As planned, the project would also have an API making the information flexibly retrievable by distant servers; and the bibliography would be automatically submitted to OLAC (Open Language Archives Community, http://www.language-archives.org).  However, all these ideas are open for discussion; and we welcome suggestions, not only about the contents of the online catalogue, but also about other ways to integrate the information into the discipline’s digital infrastructure.

Listed below are tentative working group assignments and an expanded explanation of each working group’s tasks.  Although some of the questions listed for each working group have been discussed before, we believe that it will be valuable to have the consensus of the distinguished scholars gathered at this workshop.  For that reason, we have requested that each working group leader submit a short (3-5 page), web-publishable report in text form, in addition to the powerpoint created during the workshop.  These reports will be due 3 weeks after the close of the workshop.  As a way of focusing discussion, we have also suggested a concrete product for each working group to create, e.g., a ranked list of suggestions, and in some cases short advance readings (Working Groups 5 and 6).  Working group chairs and members should feel free to add other readings and/or to alter the suggested outcomes as they see fit.  Send suggestions to eliiplinguistlist.org for inclusion in the final Working Group Instructions.

Jump to a Working Group:

Group 1: Collecting information
Group 2: Maintaining and updating information
Group 3: Database content
Group 4: Web interface and database technology
Group 5: Interoperability
Group 6: Collaboration and outreach

Joint Working Group Session
Working Group Final Reports

Please refer to the Workshop Program for the Working Group room numbers and meeting times.


Group 1:  Collecting information

Members:  Rice (Chair), Dwyer, Goddard, Harris, Campbell, Brenzinger, Palmer

Questions:

a) With regard to collecting and using language information, the goals of communities may differ from those of academics.  How might the differing objectives of communities and academics be met?  Is it possible to take both into account?

b) How should data on number of speakers be gathered and presented? How nuanced should the category be? – that is, should we include information about number of semi-speakers, of speakers who learned the language as adults, of second-language speakers who do not have the particular language as their primary language? More broadly, what qualifies as “a speaker” of the language in question?

c) How can accuracy of the information be assessed?  For example, who is the authority if linguists, elders, or activist groups disagree? 

d) What definition of an “endangered language” should be utilized in order to determine and constrain the number of languages that will be treated? —i.e., is there a threshold number of speakers, and, if not, what factors should be taken into account?

e) How might significant typological features in individual endangered languages be identified and represented?  Should this be a key objective?  Why or why not?

f) An ideal ‘Documentation Index’ for endangered languages would provide information on all EL projects, all individuals working on ELs, and all documentation, published and unpublished.  What are the best strategies and/or sources for gathering such information? Which should be considered more essential, which more optional?

Outcomes: (1) Set of guidelines and metrics for (a) – (e).  List of possible sources and/or procedures for (f).


Group 2:  Maintaining and updating information

Members:  Nash (Chair), Anderson, Welcher, Grenoble, Nathan, Grondona, Milin

Questions:

a) How should Internet information on endangered languages be maintained?  That is, what kind of updating process can be and should be instituted?

b) Is a scholarly community input scheme, e.g., a wiki-like tool or comments facility, a viable method of obtaining new and updated information?   

c) Could a society, or a committee (CELP?) be charged with requesting updates from scholars?  If so, which one(s)?

d) Would an automated reminder sent out by LINGUIST List be of help?  To whom should it be sent?

e) What other ideas might be tried?  (Please brainstorm!)

f) Should endangered language community members play a role in this maintenance?  If so, how can their participation be encouraged?

g) Should all updates be funnelled through an oversight authority, in order to ensure the reliability of the data?  If so, who should this be? 

h) What technical and human resources would be required to support the maintenance process and support the sustainability envisioned?

Outcomes:  (1) Brief description of an ideal updating/maintenance process, given unlimited inter-project cooperation and unlimited technical and human resources. (2) Longer description of the best process deemed feasible now, the resources needed to maintain it, and the barriers to implementation.

Alternatively, since this group is being asked to ‘think outside the box’,  the group might list all maintenance/update procedures discussed and rank them as to desirability and feasibility.  For those deemed feasible, please add information on the resources required and the difficulties foreseen.


Group 3: 

Members:  Bowern (Chair), Adelaar, Solnit, Woodbury, Chumakina, Joseph.

Questions:  

a) What information on ELs and documentation should be contained in the database which will underlie the online catalogue—i.e. what database fields should be established? Possible fields include:
i) Number of speakers ii)Source of that information iii) Number of community members [?] iv) Geographical location(s) where spoken v) Genetic affiliation vi) Typological features [free text field or constrained vocabulary?] vii) Bibliography:  what are the categories of information, e.g., grammar, dictionary, texts, recordings; what information should be retained about each, e.g., hours estimate for recordings.
viii) Linguist/researcher involved in documentation ix) Link to DELAMAN-member archives (?).  Other links?
b) What information has been collected by other projects?  For example, have other projects included other database fields than those listed above?
i) How would you rate the reliability of the information? 
ii) What uses has this information been put to? 
iii) What populations did it serve? 
iv) Was it adequate for these uses?

[Note:  We recognize that these some of these judgments may be anecdotal; the objective is simply to call to mind any lessons that can be learned from other projects.]

Advance ‘homework’:  please send the following to Claire Bowern (clairebowerngmail.com) by Nov 5th if possible:

1) A list of online archives and catalogues you know about and/or use regularly, apart from DELAMAN-member archives.

2) A brief description of any other catalogues that aren't online that you regularly use/know about.

3) Your informal evaluations of those materials using the questions under b).

Outcomes: 

(1) A brief summary of the purposes which this database is designed to serve, noting especially the types of searches the database should support.  

(2) A list of suggested database fields (field name, short description, and possible constraints or restrictions (these might be enforced by the interface, not the database schema). 

Ex:          Name:  LangCode

Description:  3-letter code for a language variety

Constraint:  Users will be asked to choose the nearest ISO 639-3 code, but will also be allowed to enter other codes and their sources. 


Group 4:  Web interface and database technology 

Members:  Whalen (Chair), Simpson, Maxwell, Bibiko, Beesley, Fox, Aristar

Questions: 

a) What are some purposes for which the data will be used, and how should the interface take these into account? 

b) What dissemination interface(s) should the project build or link with? 

c) Would it be useful to integrate the information with the MultiTree interface (http://multitree.linguistlist.org)? With the LL-MAP interface (http://llmap.org)?  With other projects?  Which ones?

d) As originally conceived, the project included an ELIIP API designed to allow automated querying of the database and “widgets” (like the code for the Google search box), which would allow formatted display of the information in other projects’ web pages.  Does this seem a useful functionality?

e) If so,  what uses for the data might there be on other sites?

f) What are the merits/drawbacks of RDF, XML, and relational databases for this application? 

Outcomes:  (1) A list of use cases for the information, both information retrieved on site and information retrieved through an API.  (2) A list of desiderata for the interface which is based on the use cases.  This might include suggestions for integration with other projects.


Group 5:  Interoperability

Members:  Simons (Chair), Hinrichs, Sicard, Wittenburg, Thieberger, Aristar-Dry, Iannucci

Advance Reading: Gary Simons and Steven Bird.  2008. Toward a global infrastructure for the sustainability of language resources. Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation, 20–22 November 2008, Cebu City, Philippines. Pages 87-100.
Preprint:  http://www.sil.org/~simonsg/preprint/PACLIC22.pdf

Questions: 

a) What steps can be taken to ensure the interoperability of the kind of information discussed by Working Groups 1 and 3 above?   [Note that this is information about the language, not language data per se.]

b) What would be required to output it in forms that allow it to be used by existing infrastructure, e.g., the OAI?

c) Should making ELIIP information available to OLAC be a key goal? If so, what types of ELIIP information should be shared with OLAC , and what procedures should be instituted to ensure this? 

d) Should the project take part in the ‘Linked Data’ initiative (http://linkeddata.org/home)? If so, what types of information should be given URIs—bibliographical information only, language sketches, what?  And how should URI’s be structured—i.e., at what granularity and with what initial strings?  What else would be required?

e) Are there other information networks which ELIIP should interoperate with?  For example, should ELIIP be designed in such a way that information on all ELIIP database updates is automatically sent to Ethnologue (see 6 below)? 

f) Should we encourage the use of ontologies and taxonomies such as GOLD (http://linguistics-ontology.org/) and ISOCats (http://www.isocat.org/) in referencing information on typological traits (see 1.a.vi above), or should these be confined to markup of actual language data (e.g., interlinear glossed text)?

Outcomes:  A list of suggestions for information sharing and interoperation, together with the pros and cons of each.


Group 6:  Collaboration and outreach

Members:   Good (Chair), Genetti, Ostler, Obata, Moseley, Davis, Milin, di Paolo,

Advance Reading (Suggested):  Simons, Gary.  2007.  Interoperation and the quest for the global riches of knowledge. http://linguistlist.org/tilr/papers/TILR%20Plenary.pdf

Questions:

a) Generally speaking, what kinds of groups and organizations (existent or non-existent) will have to participate in the creation of the ELIIP database in order for it to gather the data it needs? How would they interact with each other in an ideal world?

b) What specific existing groups match the general group categories required for ELIIP’s success? Should collaboration with any of these be prioritized over the others?

c) What “ideal” collaborators do not exist?

d) What is the minimum amount of collaboration required for ELIIP to be successful?

e) What can we learn from the experiences of comparable initiatives, e.g., Ethnologue?

f) How might prioritization of collaboration options shape the technological choices ELIIP might make?

g) What kinds of standards and communities is ELIIP likely to be dependent upon that it has no control over? What kinds of standards and communities is it likely dependent upon that it may have a voice in shaping?

h) What kinds of user groups might be interested in ELIIP? Should the needs of any of these be prioritized over those of others?

i) Should a key goal of ELIIP be to promote public awareness and research on endangered languages? If so, how will that affect the overall structure of the project?

j) Should ELIIP attempt to collaborate with projects like Wikipedia or other such as a means of disseminating the information it collects?

Outcomes:  (1) an overall community model for ELIIP indicating existing opportunities for collaboration and crucial gaps where collaboration seems needed but the relevant organizations do not exist, (2) a prioritized list of possible collaborations, describing to the extent possible the information to be shared and the procedure for information-sharing, (3) a sketch of possible uses of ELIIP including discussion of how the project can support different kinds of user groups and a possible prioritization of groups to support.


Joint Working Group Session:   

Since some of the working group topics overlap, we intend to hold a final working group session in which two groups meet together to discuss whether their conclusions are compatible.  The tentative pairings are: 

Working Group 1 and Working Group 3

Working Group 2 and Working Group 6

Working Group 4 and Working Group 5

Following are a few questions that might be used to focus discussion in the joint sessions:

1) What are the areas of overlap in the two sets of topics assigned to the groups?

a) Do the conclusions reached about these topics coincide or differ?
b) If they differ, do they conflict? If so, can the conflict be traced to differing approaches or assumptions?

2) Do any of the recommendations of one group have implications for the other group? That is, could all the suggestions of one group be followed without impinging on the recommendations of the other?

3) Should either group's recommendations be modified or expanded in light of the joint discussion?

Working Group Final Reports

We have now collected all of the final reports from the working group chairs, and have posted them on the website. To get .pdf and .ppt copies of these reports, please see the Final Reports page on our website.