Maximizing the (re-)usability of language data

Proposer and chair:

Arvi Hurskainen
Institute for Asian and African Studies
Box 59
FIN-00014 University of Helsinki, Finland

The goals of the workshop are the following:

  1. to explore the possibilities of global cooperation in accumulating and sharing data for language studies and teaching, pertaining particularly to non-European and 'minor' languages.
  2. to discuss the format of such data, taking into account varying research and teaching needs, and different tools for manipulating data.
  3. to discuss the methods of observing copyright restrictions in making such data available.
The workshop is linked to the ACO*HUM working group on Computing for Non-European Languages (NEL) (

The covering accumulation of language data is necessary for developing advanced tools for language processing and teaching. As the development of analysis tools and accumulation of data go hand in hand, both involve a lot of manpower. Although language material in computer form is increasingly available, e.g. through the Internet, it often does not meet the needs, and it is still available only in major languages. Much of linguistic data is still accumulated through time-consuming and expensive field-work, and unfortunately it is usually 'lost' into the private archives of the researcher. It is important to stimulate discussion on how to share such data, while at the same time safeguarding the rights of the individual researchers.

It is also important to discuss the format of such data, so that they are easily adaptable to different environments and to be used by different research tools.

Core participants: Arvi Hurskainen, Jacques Souillot.

Read the paper by Arvi Hurskainen presented at the workshop.

