Joseph N. Bell
Middle Eastern Languages and Cultures
University of Bergen
Hans Tanksgt. 19
N-5020 Bergen, Norway
voice: (47) 55582860 / 55584771
fax: (47) 55589410
At the ACO*HUM Policy Symposium in Granada in 1997, the elaboration of guidelines for a European Masters program in Arabic to parallel the one being prepared in African languages by CAMEEL was discussed as a possible task of the NEL working group. In this context, the building of a small model corpus of texts in Arabic as well as Latin script was proposed. The corpus would serve both as a pedagogical tool and as a laboratory for fine-tuning searching software in accordance with the needs of the humanistic disciplines. It has not yet been possible to set up the model corpus, but as one of those involved in the idea, I would like to suggest that the electronic version of an already existing orientalist journal may be used in much the same way as the proposed model corpus. The technical problems involved are not significantly different, which reflects the fact that the goal is very much the same, namely, the creation of a controlled and searchable body of multilingual, multiscript, and multimedia material with the aim of furthering teaching and research in the humanities.
The idea of creating a model corpus on which to test study and research techniques was a response to the concern of education in the humanities at all levels with the improvement of communication and fundamental research skills. Since the proportion of resources available to students and researchers in electronic form is rapidly increasing, educational programs must take this into account. Alongside the considerable resources in varying formats becoming available online or on CD, the need was felt for a corpus or database that would attempt to confront specifically the problems raised by languages using non-Roman scripts and those with linguistic structures that require adaptation of common stemming and searching techniques.
The orientalist journal has traditionally been a collection of both primary and secondary sources, and of both written and graphic material, gathered with the aim of making possible further study and analysis. In digital form, it must make all this material useable and searchable electronically. The problems are precisely those of a corpus of primary materials, with the addition of the problems of rendering the many different European languages, the layout conventions (of mathematical equations, for example), and the critical apparatus found in secondary sources.
This paper attempts to demonstrate how problems common to the proposed model corpus and the orientalist journal are being dealt with in the publication of the digital files of the Journal of Arabic and Islamic Studies. ( http://www.uib.no/jais/jais.htm ) It is further suggested that this journal with its supplements, or a collection of several such journals, could serve the same purpose as a model corpus.