Joseph Norment Bell
University of Bergen
University of Bologna
Koenraad de Smedt
University of Bergen
University of Graz
University of Bergen
The working group has attempted to be a platform for identifying the theoretical foundation of computer applications in the humanities and its implications for humanities scholarship. The present chapter attempts to clarify the link between humanities disciplines and computing, and will aim at recommendations which are compatible Europe-wide. It addresses general problems which all humanities disciplines face when they use computer technology. In this environment, while some problems are specific, arising from the peculiarities of particular disciplines, others arise directly from the encounter between computational and humanistic approaches and methods.
It should be noted, in this connexion, that computer science is a recognized theoretical discipline, distinct from computer technology. Computer science deals with methodological issues of a formal nature (formal languages and formal modelling of reality). The adoption of a such methodologies in the traditionally informal subjects of humanities disciplines has far wider implications than does the simple adoption of computer technology. A university education which attempts to teach according to scholarly principles centred in philosophy of science cannot avoid addressing issues of scientific methodology, which clearly go beyond a superficial and purely instrumental use of computer technology.
It is recognized that computer applications in the humanities have a theoretical foundation which deals with the interrelations between the methodology of the different branches of humanities and the principles of computer science. The automation of some, or many, of the procedures of humanities research raises problems of formalization of data and their treatment, which are different from the problems so far discussed in the traditional approach to the humanities. Moreover, the treatment of language and speech, the coding of textual material, or the construction of databases of historical or archaeological data, all pose methodological problems of their own, which cut across the boundaries of the particular disciplines in which computerized procedures are applied.
What is needed is the identification of best practice in the approach of humanities scholars to computing, and a clarification of relationship between humanities disciplines and computing, with particular reference to the European dimensions of these issues.
The focus of the overview is in the teaching of humanities computing, rather than research activities in this area. Therefore the reader may not find the mention of some otherwise interesting centres. Whereas research in the field of humanities computing has a long history, beginning with projects in automatic translation as far back as 1947, its inclusion within official courses in humanities curricula is relatively recent.
Of the two levels of studies common in European universities (undergraduate vs. postgraduate; licence vs. maîtrise; etc.) we pay more attention to the former. The presence of humanities computing in the second level is more closely tied to research and raises fewer problems concerning educational methods and organization.
Furthermore, we have chosen in our present overview to concentrate on computing and to avoid the fields of information, communication, media and multimedia since these are generally considered as social sciences rather than as humanities. A few examples of such programmes in fields which are not further dealt with below are:
Finally, in our comments we shall sometimes refer to national peculiarities, which certainly are to be found in the curricula of the different countries. This does not preclude the existence of many exceptions to national trends.
Paris X - Nanterre Université de Liège, Faculté de Philosophie et Lettres, Licence en information et communication. Universität des Saarlandes, Philosophische Fakultät, Fachbereich 5 (Grundlagen und Geschichtswissenschaften), Fachrichtung Informationswissenschaft, where "Journalistische Nachrichtenauswahl" is placed together with "Datenbankanwendungen" (but see also below). Berlin, Technische Universität, where humanities disciplines are included in: "Fachbereich 1, Kommunikations- und Geschichtswissenschaften", whereas Fachgebiet "Medienberater, Medienwissenschaft" has very few computing courses.
Roskilde Universitetcenter, "Datalogi for humanister" is one of the basic courses for humanists. University of Aarhus, Information Science, Information and Media Research are parts of the Faculty of Arts. Ruhr-Universität Bochum, Integrierte Sprach- und Bildverarbeitung, Design und Implementation eines Parsers für Wörterbucheinträge, SGML in der Sprachtechnologie: II - Präsentation, Einführung in objektrelationale Datenbanken in der Computerlinguistik. Universität Hamburg, Tutorien zur Einführung in die Computernutzung: "Benutzung des Rechnernetzes an der Universität, Grundlagen des Betriebssystems (Apple), Wissenschaftliche Textverarbeitung, Datenbanken (Aufbau und Funktion), Recherche im Netz (WWW, Bibliothekskataloge), E-Mail". Gerhard-Mercator-Universität-Gesamthochschule Duisburg, Einführungsveranstaltung Computerlinguistik, Informatik für ComputerlinguistInnen: "Logik und Datenverwaltung, Sprachorientierte KI-Forschung, Einführung in die Programmiersprache LISP, Computerlinguistik und Multimedia". Linguistik der romanischen Sprachen und Computer. Ruprecht-Karls-Universität Heidelberg, Computerlinguistik. Katholieke Universiteit Nijmegen Taal, spraak & informatica (TSI). INALCO, Paris, Traitement automatique des langues. Universiteit Antwerpen, Computerlinguïstiek; Internet voor taal- en literatuurwetenschappers. Berlin, Humboldt. Bamberg has a sector on Linguistische Datenverarbeitung in the Fakultät Sprach- und Literaturwissenschaften, at the same level of Klassische Philologie, while the Fakultät Geschichts- und Geowissenschaften does not seem to offer computing teaching, even under Historische Hilfwissenschaften. Köln, Philosophische Fakultät, Institut für Sprachliche Informationsverarbeitung, with a great number of courses. Mainz, Institut für Allgemaine Sprach- und Kulturwissenschaft: Logikprogrammierung für sprachbezogene Anwendungen; Maschinelle Übersetzung. Universität des Saarlandes, Fachbereich Neuere Sprach- und Literaturwissenschaften: Professur für Computerlinguistik; Professur für Maschinelles Übersetzen. Stuttgart, Institut für Maschinelle Sprachverarbeitung, Diplomstudiengang Computerlinguistik. It is interesting to note: "Am Ende der Ausbildung steht das Linguistik-Diplom eine ausgesprochene Seltenheit innnerhalb der deutschen Universitätslandschaft, in der Diplomabschlüsse normalerweise den Natur- und Sozialwissenschaften vorbehalten sind, während die Geisteswissenschaften sonst nur Magister und Staatsexamen kennen." Université Paris 7 Denis-Diderot, Lettres et Sciences Humaines, Licence et Maîtrise de Linguistique: Linguistique informatique; Traitement automatique du langage naturel. Madrid, Univ. Autonoma, Fac. de Filosofia y Letras, Lingüística computacional. Roma, Univ. La Sapienza, Fac. di Lettere, Informatica applicata alle scienze umane. Roma, Univ. Tor Vergata, Fac. di Lettere: Elementi di informatica. Bologna, Fac. di Beni Culturali, Elementi di informatica, Informatica documentale. Venezia, Fac. di Beni Culturali, Elementi di informatica.
Of the different approaches, we favour those where humanities students are confronted systematically and thoroughly with the methodological implications of computing in the humanities. Although this may most successfully be achieved by clearly defined compulsory courses in humanities computing, such an approach may not be appropriate for all educational institutions in all countries. There must be room for different degree structures at different places, which means that some institutions may provide adequate courses but not necessarily a unified programme or not necessarily provided by one teaching unit. Several institutions which have a long tradition of computer methodology courses not as a separate programme but integrated in different disciplines, may be reluctant to restructure, even if a clearly defined humanies-wide computing programme would benefit a wider range of humanities students.
Apart from the obvious problem of how to provide students in the humanities with some competence in computing, there are several other issues in urgent need of attention. The first is the urgent problem of teacher training. In the absence of many full-scale programmes in humanities computing, it is unclear where the teachers of humanities computing should be trained. A second problem is that in interdisciplinary programmes, it seems to be a common experience that computer science people do not easily understand what the study of humanities is really like. The consequences of depending on such teachers may be grave. Thirdly, university planners responsible for the introduction of computing courses often ignore the fact that an academic course should aim to discuss methods at least as much as to provide purely technical skills.
This dependence on fashionable trends, as reflected by popular media, has not always been healthy. The discussions about artificial intelligence in the eighties are a good example. At one time it was almost impossible to exclude the necessity of an expert system in any project. This was unhealthy, not so much because very little ever came of that fad, but because it fundamentally discredited the notion of expert systems ever being of any relevance to the humanities at all. Today even to consider such a notion seems to be politically almost as unwise as it was unavoidable a decade or two ago.
The following section tries to clarify what we can actually say about the relationship between computer science and the humanities which remains valid while fashions change. We must exercise caution: traditionally, such discussions easily become unfocussed, because three roles of computers in the humanities are frequently intermingled. Computers can be used to gain scientific knowledge, to teach that knowledge and to disseminate it. These three sets of activities are of course related; but the challenges they pose and the problems they have to solve, are fundamentally quite different.
In the following, we intentionally restrict ourselves mainly to the first of the three, although at the end of the chapter we will also address the second. We are dealing with methods, that is, the canon (or set of tools) needed to increase the knowledge agreed to be proper to a particular academic field. And we restrict ourselves to those methods that can profit from the use of computational tools and concepts. Since this approach invariably requires the ability to make enquiries according to formally defined specifications, we speak about formal methods. This is a deliberate restriction on the field of discussion; we are not attempting to discuss how to use computers to teach a traditional subject, nor how to produce books more cheaply. We do however discuss the extent to which new media make information available in such quantities that traditional information-handling methods have to change in order to cope.
Finally, we note that information technology itself changes the world in which we are living in many ways. The arts and social sciences reflect and interpret of the world in which we are living. They must therefore tackle information technology, just as they must tackle other changes of the society in which we live. Since, however,the study of the humanities, in our understanding, is different from the production of art or the interpretation of social change, neither the artistic nor the sociological implications of a new generation of media is discussed here.
While we restrict our topic in this way, in another we would like to see it as broad as possible. While databases are not our topic, their use in history do form part of our agenda. Similarly, while statistics are not our topic, the special techniques needed to make use of them in literature and other disciplines are certainly part of our agenda. Summing up, our subject consists of the specific methods for the application of tools and techniques to the humanities fields, in so far as this application improves our capability of acquiring new knowledge.
Computer science is a very wide ranging field. At one extreme, it is almost indistinguishable from mathematics and logic; at another, it is virtually the same as electrical engineering. This, of course, is a consequence of the genealogy of the field. The Turing machine is an interesting scientific construct, independently of whether a physical machine with these properties has been constructed. Conversely, the electronic components of computers can be studied at a physical level, independently of their computing purposes.
Having widely different ancestors in itself, computer science in turn became parent to a very mixed crowd of offspring. Pseudo-disciplines such as medical computer science and juridical computer science have sprung up in recent years abundantly. Some of them, like forestry research computer science (Forstliche Biometrie und Informatik) for which a German university recently accepted a Habilitation, will probably continue to raise eyebrows for some time to come. Others, notably computational linguistics, have long been established as independent areas of research and self contained academic disciplines quite beyond dispute.
The existence of this wide variety of disciplines, related to or spun off from computer science in general, implies two things. First, there must be a core of computer science methods, which can be applied to a variety of subjects. Second, for the application of this methodological core, a thorough understanding of the knowledge domain to which it is applied is necessary. The variety of area specific computer sciences is understandable from the need for specialized expertise in the knowledge domain of each application.
In some countries there is even a preference to use a different term for computer science when applied to a knowledge domain. In the Netherlands, there is a tendency to speak not of documentaire informatica but of documentaire informatiekunde. Thus, informatica is reserved for the core of computer science which tries to understand abstract principles, while informatiekunde is always concerned with the the application of computational methods in some particular area distinct from computer science proper.
As in many other cases, what does not constitute that self-contained, yet applicable core of computer science is more easily specified than what does. Engineering topics are not part of it, although, of course, the construction of sensors in the bio-sciences may require knowledge which the construction of sensors in thermal physics does not. The self-contained core should also be independent from the disciplines in which it is applied; although, of course, there are some fields to which (say) fuzzy systems and their accompanying theory are intrinsically more relevant than others.
Leaving aside these subtle shades, for the purpose of a short introduction, the core of all applied computer sciences is more than the sum of its intellectual ancestors, which may themselves be inextricably associated with particular knowledge domains. Instead, we will attempt to define the core in terms of the traditional combination of data structures and algorithms, applied to the requirements of a discipline:
If we accept the assumption that the succesful application of computational methods, in the sense above, strongly depends on the domain of knowledge to which it is applied, then we also have to accept that applying computational methods without an understanding of that domain will be disastrous. To give an example, a German university in the early eighties introduced a study programme called Informatik für die Geisteswissenschaften, which required more course credits for numerical analysis than a computer science master at many other universities. The same course did not however require the students to apply their knowledge to a single topic within the humanities. After spectacular student interest in the first year, the course had to be withdrawn in the second, as no students could be found to take it.
We conclude that it is pointless to teach computer science to humanities scholars or students unless it is not directly related to their domain of expertise.
On the other hand, time and again, skills in computing are mistaken by humanities scholars for a qualification in computer science. A case in point is the plethora of word processing courses which arose among American universities during the early days of the personal computer. Few of these survived more than a few years, as students rapidly discovered that it was ultimately more convenient to learn their content at their own pace, from general-purpose manuals and introductions. Such courses, which are still taught at some European universities, rapidly become out of date; as each new generation of students arrives equipped with better practical skills than their teachers.
We conclude that humanities computing courses are likely to remain a transient phenomenon, unless they include an understanding of what computer science is all about.
Unfortunately, this problem is not restricted to individual courses, which might be taken simply as amusing consequences of un-informed enthusiasm. It can have more serious consequences. At another University, a Department for Computing for the Humanities was created in the eighties to provide computer literacy for each student of the arts faculty. Not too far into the nineties, at least one of the departments of that faculty threatened to train its students independently unless the Computing for the Humanities department brought its curriculum up to date, in step with current needs. More recently the department was closed down, since the arts faculty considered it no longer provided anything of value for its students.
One might wonder whether it is the task of a university to teach basic computer literacy at all. Students never used to get academic credit for typewriting skills before the invention of word processing, nor for looking up a book in a catalog before the advent of the Internet. Is it elitist to ask why they should get academic credit for acquiring skills in word processing and Internet information retrieval?
There are perhaps two important differences which suggest that they should. The more visible of the two is precisely to do with the rapid evanescence of information technology. Typewriting is a skill that remains stable between finishing secondary school and gaining a doctorate. By contrast, modern information technologies have a habit of changing so rapidly so that what was almost arcane knowledge for a freshman can easily have become computer literacy by the time that student acquires a PhD. Moreover, the PhD student is expected to use tools which could hardly be imagined when he or she began their course of study. If we take seriously the notion of lifelong learning, we might well claim that computer literacy should concern arts faculties, not for its own sake, but to help students update their own knowledge, and to impress upon them the constant need to do so.
The other, less visible, difference is that one may fully master word processing, spreadsheets, simple data bases and HTML authoring (all of which have recently been transformed from advanced knowledge to basic survival skills) and still be helpless, when trying to apply them to a humanities discipline. Even today, many people who use word processors routinely will find it challenging to include Cyrillic characters into their texts, let alone Arabic or Vietnamese (we refer also to Chapter 5 in this respect). A person can routinely submit his tax returns with the help of a spreadsheet and still despair of being able to do meaningful computations with a medieval tax document. A student can have a brilliant homepage but still be unable to encode a literary text in a way likely to remain useful beyond the lifetime of current full text retrieval packages. We conclude that even computer literacy should be taught in the humanities by concentrating on the specific problems posed by the disciplines. Word processing for literary disciplines should concentrate on peculiarities of the specific languages or editorial styles; quantitative packages should be taught to historians in a way that enables them to handle fuzzy data, Roman numerals, and so on; markup for text-based disciplines (see Chapter 3) should focus on general principles, not the peculiarities of the current generation of browsers.
To fulfil both these requirements, Humanities Computer Literacy should be taught to humanities students only by teachers who are themselves fully trained in Humanities Computing. Furthermore, rather than relying on a fixed canon of skills, courses (particularly those at the most introductory level) must be revised year by year to keep them at the shifting edge between what students can be expected to learn by themselves and what they can not.
In a nutshell, nobody should attempt to teach computing skills to a humanities student without experience in computer supported humanities research, preferably in a subject close to the one from which the student population of the course to be taught is being recruited. Exceptions always exist; but it is the problems of communication between 'pure' technicians and content-interested humanities students which, time and again, tend to dominate any discussion of common problems at each of the (many) conferences on one aspect or another of humanities computing taking place every year.
Humanities Computing, the second of our three levels, constitutes the sum of all available methods which can enhance the scientific validity of research results or facilitate the pursuit of otherwise impossible research strategies. It starts with methods adapted from other fields of study; for example, the canon of analytical statistics, which has been developed for various fields. To apply this canon to authorship studies, traditional sampling techniques have to be augmented in specific ways. It continues with methods which, although originated in other fields, have developed independently within specific humanities disciplines. In art history, for example, thesaurus-based systems were originally adapted from other disciplines, but have taken on a life of their own and started a discussion on the proper way to describe the content of images, which has no clear equivalent in any other field. Finally, there are computational methods which the humanities have not borrowed but which more or less originated within some field of the humanities. As an example, we mention the long and rich tradition of methods for identifying individuals in historical documents, despite variations in orthography, variable subsetting of name sets, property-based name shifts and other causes.
Humanities computing is most clearly in need of institutional stabilization. The tradition of the field is incredibly long. Many of today's perennial questions about the optimal representation of humanities information in a computer can already be found in such conference volumes as the Wartenstein conference of 1962 (Hymes 1965), which seems to have been one of the first attempts to survey the field. Indeed, among the major challenges for humanities computing is that few of its followers are sufficiently aware of its long and rich tradition. Every now and again, a fresh wave of discussion is ignited by authors or theoreticians who simply assume that they can ignore forty years of tradition and start from scratch.
This lack of perception is particularly unfortunate for the individual researcher, as it usually means that newcomers to the field have to painfully rediscover ancient solutions simply because they have not been adequately transmitted through the generations . This is unfortunate for the humanities as a whole, because it means that advances in methodology proceed much more slowly than they might. In most European countries, humanities computing is almost a label for a specific stage in the life of a scholar. The vast majority of practitioners are either at the stage of composing their PhD thesis or just after it. After working actively in the field for say five years, they either become computer specialists, (which means that they leave academia for industry), or they fall back upon more traditional areas of their home disciplines. It is scarcely surprising therefore that few permanent positions for humanities computing specialists exist.
As long as we stay with our original definition, that humanities computing is the application of computational tools for the benefit of the various humanities disciplines, there is nothing wrong with this situation. Still, it means that many researchers all over Europe are constantly rediscovering some of the basics of humanities computing, while few, if any, possibilities exist to hand on their discoveries further. To solve that situation, we propose, that, just as we required Humanities Computer Literacy to be taught by people with a Humanities Computing background, so Humanities Computing should in turn be taught by specialists in Humanities Computing Science.
This field of Humanities Computer Science is populated by persons who make the study and development of the possibilities of computer applications in the humanities their profession. With a solid background in one or more humanities fields, they understand the problems of these disciplines; with a strong background in computer science in general, they are able to contribute to the development of data structures and algorithms as defined initially. This is where the humanities have made some lasting contributions to computer science. For example, parsing algorithms which have been developed by computational linguists now form part of the canon of computer science methodology.
This field of Humanities Computer Science should be supra-national (in our case European) from the very start. The field itself profits from the strongest possible emphasis on internationalization. As with any other new discipline, it would otherwise be in danger of being influenced overly much by the idiosyncrasies and preferences of a few individuals from one national academic system.
Creating a European framework of reference has an added value. Very few institutions exist today offering training on a level which could be clearly identified as Humanities Computer Science in the terms defined above. There are many attempts, however, to offer humanities students introductory computational skills and appropriate background knowledge, bundled in a confusing plethora of degrees, add-on diplomas, sandwich courses, etc. This has two major drawbacks:
While significant variation exists between individual disciplines within the humanities, there is, broadly speaking, one major difference between them as a whole and other fields of study, particularly the hard sciences. That is, the humanities in general have very little influence on the creation of the information they process. The strength of a magnetic field is measured directly in units which can be analysed by computational equipment. The style of a painting is a property, which can be described by a trained observer with some degree of inter-subjective consensus among similarly trained individuals. However, the assumptions that underlie the assignment of that description are infinitely further removed from any meaningful way of processing the resulting keyword than (for example) the concept of continuous field strength is from the way in which floating point numbers are handled.
Systematically, we can speak of three types of information, for which we will use the following terms in this section:
While we described these categories in increasing distance from the original material on which the analysis is based, historically humanities computing has developed in almost exactly the opposite direction. While in earlier years the emphasis was on the usage of computing to analyse the relationships and dependencies between coded properties of objects of the analysis, we are now moving towards more sophisticated attempts to analyse the raw information the humanities have to deal with. Examples of image and speech processing come to mind.
How we evaluate the significance of that development depends very much on the general methodological approach a researcher follows. One position is, that the methodological quality of a scientific argument is centrally influenced by two among other factors: the ability to explain the largest possible amount of evidence and the intersubjectivity of the string of argumentation.
While not always explicit, these assumptions have been with us since the earliest days of computing in the humanities. In the field of history the major argumentation for the introduction of computer usage has been the ability to use 'mass sources', where the information contained in huge numbers of individually meaningless events could be sensibly integrated into statistical arguments. And much of the opposition against it arose from a discussion, whether statistical argumentation actually increased intersubjectivity, as all the assumptions had to be made explicit, or whether on the contrary it damaged it, as statistical training was now needed to understand the argumentation.
These two methodological assumptions are always a useful starting point for a discussion of the significance of the processing of information within the humanities. Even more so, as the trend to move more and more from an analysis of coded towards raw information, has taken major steps forward in recent years. As a case in point, the ability to handle images digitally has many important effects. In line with the arguments given above, leading practitioners of the field in art history are currently moving towards formalisations of concepts like 'style' or 'colour usage' which are based on a direct analysis of the image material.
From a more general perspective, the arrival of image handling capabilities has changed very general assumptions about the usage of computers in the humanities. A few years ago, it was obvious that computing in the humanities meant first and foremost the application of computers within research. The explosion of visually attractive presentation tools has changed this quite fundamentally. In many cases, nowadays, the usage of computers in the humanities seems to be focused more on the didactically well formed presentation of results than on their generation. Although visualization may provide a very effective didactic approach in teaching humanities methodology (see for example the visual representation tools discussed in Chapter 4), multimedia does not necessarily provide additional value in all cases. It is important to distinguish a gratuitous use of nice images from a functional one.
Until very recently, the use of digital resources, raw information in our terminology, in the world of images was centred on art history and art historical objects, while manuscripts were rather a side track. At the moment it looks as if that might be changing. One of the fastest growing sectors in digital resources for the humanities is currently the digitized collections of books and manuscripts created by libraries and archives. It is somewhat alarming that these resources are mainly created outside of humanities research and produced by institutions that have traditionally focussed more on the accessibility of materials than their production. It seems particularly alarming that this should be the background against which one of the more fundamental changes created by information technologies is taking place, and that the humanities seem to be less aware of it than they should.
One constant in all considerations of how to make sources available for all humanities subjects has always been the high cost of visual reproduction, specifically by comparison with the publication of transcriptions or descriptions. All humanities disciplines have therefore focussed on rules on how to select relatively small numbers of sources, which were sufficiently important or canonical to merit their reproduction by transcriptions or descriptions. Although cheaper than photographic reproductions, these were still very expensive: Many humanities disciplines and sub-disciplines are based, therefore, on a very intensive and detailed discussion of quite small numbers of canonical texts or corpora.
The tacit assumption behind that strategy no longer exists. The systematic re-production of huge amounts of source material in digital form is clearly possible today, at vastly reduced cost. In principle, this makes accessible source corpora which are several orders of magnitude larger than hitherto. This improved accessibility must eventually benefit students by providing quantitatively and qualitatively better access to the materials they study.
For these reasons, all entities to be documented (historical sources and objects) are represented in several levels, following an iterative process of research: from the facsimiles to transcripts and revised texts. All transformations from one level to the next should be automated as far as possible; at least guaranteeing that the potential user understands all editorial decisions and interpretations. In addition, external and internal features of the sources may be documented as well as their tradition, together with representations of declarative and procedural knowledge. In this way, advanced data base applications and knowledge representations are combined to forman integrated information system, which can be accessed and migrated in different ways for at least three clearly defined basic user groups. Casual users access a static Web server which provides for a sophisticated interface using Java-based search facilities. An assistant-driven CGI interface will satisfy the requirements of more advanced users in the near future. The expert user can directly access to released parts of the virtual archive individually.
At present, ICE methods have been applied in such projects as the Fontes Civitatis Ratisponensis, a bilateral project, carried out in co-operation between a research institute and a city archive to evaluate its scientific and strategic implications. This name stands for the exemplary edition of the medieval records of Ratispone (Germany) as well as for a concept of synergetic co-operation between archivists, historians, exponents of the historical basic disciplines and information scientists for the preservation and documentation of the written cultural heritage. Local computers and networked systems are used as a carrier for the documentations and facsimile editions which were migrated from an expert system and prepared for the Web. To reach as many interested parties as possible, the system is offered on CD-ROM as well.
Today, courses in ICE methods are almost totally absent from humanities curricula. Many literature departments training students in traditional edition philology are reluctant to adopt the new technologies since they lack competence in this area, which is very unevenly spread across the European academic landscape.
Formal methods involve the definition of data structures and of algorithms which are capable of representing both the materials of textual scholarship and the processes typically carried out on them. Since the publication of a seminal article by Coombs et al. (1987), and more particularly since the publication of the Text Encoding Initiative's influential Guidelines for Electronic Text Encoding and Interchange in 1994, a clear consensus appears to have emerged in favour of the use of descriptive markup languages as a means of representing textual data within the scholarly community, and hence a corresponding focus on the effectiveness of such languages as formal representations for the data structures and algorithms relevant to those disciplines.
The fact that such descriptive languages and their associated theoretical assumptions have also come to dominate the world of commercial data processing, and in particular the Internet, does not of course imply that such languages and such methods are necessarily the best for academic purposes: only that they are those most likely to be encountered, and most likely to be well understood or supported by non-academic or non-humanities oriented data processing professionals.
What grounds are there for assuming that the encoding methods appropriate for electronic commerce and the worldwide distribution of pornography are also methods appropriate to humanities scholarship and textual research? To what extent do the formal methods underlying the Web facilitate the development of a better formal understanding of the business of textual scholarship? This section will argue that there is in fact a surprisingly close overlap, and that, far from being peripheral or in opposition to the humanistic endeavour, text encoding and markup are central to it.
We start with the observation that textual material is prepared and used in digital form for a very wide variety of purposes, and by users from quite different academic disciplines. Over-simplifying, we may identify at least the following broad groups of scholars likely to have an interest in formal methods for textual scholarship:
Text encoding theory, by providing a language for the representation of arbitrarily complex formal systems can also offer something to those in the third group above. The historian interested in the interplay between his or her conception of the social system made concrete by a set of documents has often had to choose between a focus on the primary source itself on the one hand, or on an abstraction (such as a relational database) derived from one reading of it, on the other. Descriptive markup languages offer a bridge between these two perceptually different worlds: the annotation or markup relating to the text can coexist with annotation relating to its referents. As to our final group of potential users, the system independence implicit in the uncoupling of process and data which typifies descriptive markup languages surely is the best hope for longevity in that most evanescent of cultural artefacts: the digital document.
It also seems clear that although originally developed for quite other purposes, declarative markup languages have also made interesting and important contributions to the evolution of textual scholarship itself, not just by providing us with hard cases and useful tools, but also by transforming the way we perceive text and textuality. Declarative markup languages are typically used to assert properties of the parts of a given document. The formal separation of assertions about the ontological status of document components from assertions about how they are to be processed is one of the key differences between descriptive markup languages and their predecessors. The separation has a number of pragmatic benefits (reusability of content, multiplicity of application, simplication of processing etc) but also marks a significant assertion about what text really is: in quite a traditional (non-post-modernist) way, textuality is now grounded in something exterior to the physical text. Just as the librarian distinguishes work and copy, so the textual scholar distinguishes text and reading. With the availability of a markup language, the textual scholar now has a tool to make the reading explicit, i.e. processable, within the text.
Descriptive markup languages make feasible the definition of textual grammars, that is the definition of meta-statements specifying how element types can meaningfully co-occur in documents, and in particular what dependency or other relations exist between them. A DTD (document type definition) thus defines not just that there is such a thing as a title, but also that titles should appear at the start of sections rather than in the middle of them, and that a title contains the same kind of other objects as a paragraph. In addition to hierarchic relationships of this kind, an SGML (standard generalized markup language) application can permit the definition of non-hierarchic relationships such as that between a heading and its entry in a table of contents.
We noted above that declarative markup languages have been very valuable in the computational analysis of linguistic materials. Their versatility enables scholars to represent in a uniform way such very different aspects of a text as its formal organization (as paragraphs, headings etc), the paratextual aspects associated with that or other organizations of it, analytic information concerning its interpretation, its linguistic or rhetorical structure and so forth, a point to which we return below. They also provide a formalism at least as powerful as any other for the representation of the complex abstractions typifying much work in computational linguistics and artifical intelligence.
It seems self evident that a text has at least three major axes along which we may attempt to analyse it, and thus implies the application of at least three interlocking semiotic systems. A text is simultaneously an image (which may be transferred from one physical instance to another, by various imaging techniques), a linguistic construct (which may equally be encoded using different modalities, as when a written text is performed), and an information structure (it has the important quality of "aboutness" and bears semantic content relating to a perception of the world at large). It may be noteworthy that these three dimensions seem also to be reflected in three different kinds of software: word processing software focussing on the appearance of text, language analysis software focusing on its linguistic components, and database systems focusing on its `meaning'.
Texts and their meanings are not however to be constrained by the capabilities of software. They remain defiantly both linguistic and physical objects; their formal organization may seem to be linear but is generally not, being characterized by multiple hierarchic structures and interlinked components. Moreover, as cultural objects, they are at once products of and defined by specific contexts. The scope and variety of the encoding systems we need to envisage in developing a unified account of textual hermeneutics may seem very large. The claim of this paper is however that a unified approach remains feasible.
The term markup covers a range of interpretive acts. We may use it to describe the process by which individual components of a writing or other scheme are represented, and for the simple reduction to linear form which digital recording requires. We can also use it for the more obvious acts of representing structure and appearance, whether original or intended. And markup is also able to represent characterizations such as analysis, interpretation, or the contexts in which a text was, or is to be, articulated.
By making explicit a theory about some aspect of a document, markup maps a (human) interpretation of the text into a set of codes on which computer processing can be performed. It thus enables us to record human interpretations in a mechanically shareable way. The availability of large language corpora enables us to improve on impressionistic intuition about the behaviour of language users with reference to something larger than individual experience. In rather the same way, the availability of encoded textual interpretations can make explicit, and thus shareable, a critical consensus about the status of any of the textual features discussed in the previous section for a given text or set of texts. It provides an interlingua for the sharing of interpretations, an accessible hermetic code.
If we see digitized and encoded texts as nothing less than the vehicle by which the scholarly tradition is to be maintained, questions of digital preservation take on a more than esoteric technical interest. And even here, in the world of archival stores and long term digital archiving, a consideration of hermeneutic theory is necessary. The continuity of comprehension on which scholarship depends implies, necessitates indeed, a continuity in the availability of digitally stored information. Digital media, however, are notoriously short lived, as anyone who has ever tried to rescue last year's floppy disk knows. To ensure that data stored on such media remains usable, it must be periodically `refreshed', that is, transferred from one medium to another. If this copying is done bit for bit, that is, with no intervening interpretation, the new copy will be indistinguishable from the original, and thus as usable as the original.
In that last phrase, however, there lurks a catch. Digital media suffer not only from physical decay, but also from technical obsolescence. The bits on a disk may have been preserved perfectly, but if a computer environment (software and hardware) no longer exists capable of processing them, they are so much noise. Computer environments have changed out of all recognition during the last few years, and show no sign of stabilizing at any point in the future. To ensure that digital data remains comprehensible for future generations, simple refreshment of its media is not enough. Where digital encoding techniques may perhaps have an advantage over other forms of encoding information is in their clear separation of markup and content. The markup of a printed or written text may be expressed using a whole range of conventions and expectations, often not even physically explicit, and therefore not preservable in it. By contrast, the markup of an electronic text may be carried out using a single semiotic system in which any aspect of its interpretation can be made explicit, and therefore preservable. If moreover this markup uses as metalanguage some scheme which is independent of any particular machine environment, for example by adhering to international standards such as SGML, XML, or ASN1, the migration problem is reduced to preservation only of the metalanguage in which the markup iis expressed, rather than of all its possible applications.
We conclude that text encoding provides us with a single semiotic system for expressing the huge variety of scholarly knowledge now at our disposal, through which, by means of which, and in spite of which, our cultural tradition persists. Text markup is currently the best tool at our disposal for ensuring that the hermeneutic circle continues to turn, that our cultural tradition endures. Clearly, this persistence requires that students of textual sciences are exposed not only to current generation tools for text markup, but also, and more importantly, to the fundamental principles behind text encoding and markup methods.
The familiar audio CD format with 16-bit encoding and a sampling rate of 44.1 Khz has become an important standard for digital sound encoding, but it has become widespread only on CD and CD-ROM. Because this format requires a substantial amount of storage space, it has hardly been used for on-line audio. Other formats such as AIFF, WAVE, and Sun Audio are frequently found on the Internet, usually for short sound clips. Newer formats promise to offer better quality by using more bits or better use of bandwidth by means of compression. Fraunhofer MPEG audio layer 3 (MP3) offers a signal compression at around 10:1 and is becoming increasingly popular for transmission of medium-sized sound files on the Internet. It is expected to be followed by MP4 which will offer encryption and digital watermarks.
Besides the digital sampling of physical sound, there are different levels of encoding which are suitable for representing speech or musical notes. For music, MIDI (Musical Instruments Digital Interface) has become a widespread encoding situated at the level of key strokes, allowing the separate manipulation of pitch, rhythm, etc. For speech, a number of suitable encodings can be distinguished at different levels, including Linear Predictive Coding as an example of parameter-based description, and phonetic transcription at the symbolic level.
The digitization of sound signals is in itself not especially interesting to humanities scholars, although practical issues, such as availability, quality and copyright, may be important considerations for the use of digital sounds in the classroom as well as in research. More interesting, from a methodological perspective, are the computational methods to encode sound at different levels and ultimately interpret sound as music or spoken language. Most curricula in music still have some way to go towards the incorporation of computational methods, but the phonetics field is rapidly adopting advanced digital speech processing methods. For more details on these developments, we refer to the volumes by Bloothooft et al. (1997-1999) on the work in the SOCRATES/ERASMUS thematic network project in Speech Communication Sciences.
Clearly, music and phonetics scholars can hardly be satisfied with textbooks. The use of computational techniques allows interactive and multimedia presentations, which are more useful than single-mode presentations. Consider, as a typical phonetics example, the McGurk effect, which refers to the observation that when a person hears "ba", while watching a face that says "ga", the combined signal is interpreted as "da". Obviously, the student can only fully appreciate this surprising effect when it is heard and seen, rather than when it is read from a book. Using an interactive video with a demonstration of the effect, students can experiment at will (including modes in which students either close their eyes or turn off the sound while observing the demo).
Although Java is not currently ideal for real-time sound processing due to shortcomings in its mathematical operations, other new sound handling systems are available which supports powerful signal processing and modern interfaces. As an example we mention the MATLAB package. Coupled with computer sound cards, such systems allow an on-line demonstration of sound analysis and synthesis. In the speech and hearing community, efforts are undertaken to exploit these conditions for developing modern educational tools aimed at students in music, phonetics and related disciplines. Examples of useful interactive demonstrations include the segregation of interleaved melodies, the effects of music quantization, the perception and identification of concurrent vowels, etc.
The following sound demonstrations have been developed as pilots for educational materials in a series of projects sponsored by ELSNET:
However, computational authorship attribution and stylistic studies constitute fields which are so interdisciplinary that they may present difficulties in rigid educational structures. Scholars of literature are often reluctant to take on board statistical and linguistic methods despite their claimed advantages. At higher education institutions across Europe, the use of advanced computational methods in the literature curriculum is nearly absent. This suggests that efforts should be undertaken to break down discipline boundaries and stimulate methodical innovations in the text and literature fields.
In a digital edition, for instance, visual and textual information can easily be combined. This mere possibility changes the essential function of a diplomatic transcription. Diplomatic transcriptions were devised to substitute for the original documents, which can now be shown by reliable images. But the availability of an image does not supersede the usefulness of a transcription. It just changes its purposes and assigns it a different function. A transcript is not a replica for the original, but an means of extracting and processing information from it. Diacritics and markup are nolonger seen as an aid to visualizing an absent document, but as a suitable means of modelling both its physical and textual properties for further processing. A transcription aims at representing a possible data model for graphical and textual information. The image itself is digital data and can be exploited as logical information available for formal processing, in order to test the authenticity of the document, or to improve its readability.
It is also apparent, on the same grounds, that a non-linear form of representation of textual information can be more suitable than the canonical printed one for editorial and exegetical purposes. A non-linear data model can bring together multiple layers of textual information, or different witnesses of a complex textual tradition, in a consistent and unique form of textual representation, whence they can all be individually separated and displayed, as well as mutally collated and compared. Likewise, a non-linear model of textual information can afford a connected and comprehensive description of distinct interpretational reconstructions, by assigning different explicit and processable forms to implicit and possibly conflicting structural properties of the same textual features.
The advantages of a processable form of text representation, such as can be provided by a scholarly digital edition, can hardly be overemphasized for humanities students. Their deep methodological impact on the long established practices of the traditional disciplines in the humanities should form an integral part of relevant humanities courses.
As we have in the past trained university students in the use of traditional paper sources for the writing of reports and articles, it is now necessary to give them experience in the use of electronic sources as well. This is the case not only for students who are preparing for advanced research but also for those who are planning careers that will involve the production of general reports, advertising copy, or the like. Standards for citation and preservation of cited material (given the ephemerality of much electronic material today) must also be developed, and students will have to be exposed to them.
Knowledge is stored electronically primarily in the form of text, images, or sound (the latter two in both static and dynamic forms), or as combinations of these media. Each medium presents particular problems for storage and retrieval which must be solved for the medium itself and for the integration of material in that medium into a useable corpus or database.
Of course a great number of eclectic, specialist, and encyclopedic corpora already exist in electronic format, especially in English, and there is a considerable variety of search engines for accessing desired information in them. The Internet can be seen as a collection of such corpora, and many beginning university students can today be expected to be at least as familiar with its use, particularly its multimedia aspects, as their teachers. The job of instructors at universities or other advanced institutions of learning will therefore for the most part consist of introducing students to the use of specialized, discipline specific corpora and databases (whether text only or multimedia).
In this connection we can distinguish three fundamental levels of corpora, according to the extent of human (or human assisted) encoding or tagging of the material (and thus the extent of risk of error):
For fields for which there does not exist a sufficient number of corpora or databases to cover most of the problems students will need to confront, model corpora or model databases must be created. These do not have to be large or even cover any problem in its entirety, but should exemplify typical problems students need to learn how to solve. Where they must be comprehensive is in the range of discipline specific technical problems they involve, because it is through training with these prototypical collections of material that students will learn to exploit or help to create the electronic sources of the near future in their fields.
One of the most obvious problems in the European context is the need to deal with multilingual, and indeed, multiscript corpora and databases. This presents many as yet unsolved or only partially solved problems in the areas of character encoding and display, conceptual searching, and textual input techniques. We will discuss some problems in these areas in somewhat more detail:
Here, however, it is necessary to distinguish between entering (or encoding) a character or symbol and displaying it on a computer screen or a paper print-out. In the not too distant future, adherence to a 16-bit encoding standard like Unicode (with over 65000 places) will solve both these problems by the use of a single encoding value and a single character that corresponds to it. At the present time we are regularly obliged to switch fonts to produce characters and symbols that our normal font does not have. This has the disadvantage of giving the new character or symbol the same ASCII value, and for most search engines the same searching value, as some other character we are using. Moreover, switching fonts burdens the underlying computer file with a great deal of additional information and increases its size considerably. To limit the number of fonts we have had recourse to multi-value codes and, often but not always, to corresponding composite characters. An a with a grave accent (à) may be a single image in a given font, or it may be a combination of two. In principle there is nothing wrong with a multi-value encoding system using composite characters as long as one is consistent. But consistency is precisely what has been lacking, both from one language tradition to another and from one computer platform to another.
Many mistakes have been made. For HTML files, which are presently the standard on the Internet, the cross-platform special character features of Internet browsers like Netscape and Internet Explorer would seem to be a step in the right direction for languages using the Latin alphabet with diacritics. But in fact they complicate the rendering across platforms of symbols the programs have not included. Adobe Acrobat (PDF) files, also common on the Internet, can reproduce a wide range of non-Latin scripts, Latin with diacritics, and special symbols, but the Find function in the Acrobat Reader can seldom retrieve them. In essence this means that much work done converting scholarly texts and journals into PDF format lacks functionality and has to be done over again.
We do not advocate here, in the few years that remain before almost everyone will be using XML with Unicode or a similar encoding standard based on single, discrete values for many thousands of characters, that any further attempt should be made to standardize present encoding schemes. We do argue, however, that automated conversion to Unicode or a similar standard should be an important consideration in the adoption of encoding systems for the model corpora and databases on which we train students, as should ready conversion to a display font, since this is the easiest form to work with visually. The number of programs for processing Unicode data is increasing, so we are already in fact talking of a three part system, proceeding from 16-bit Unicode to 7-bit or 8-bit character encoding for use on most of today's personal computers and, over the Internet, to a high quality composite printing and display font. Automated conversion should be possible in both directions. An example of such a font is JAIS1, a freely available version of Times used in the online journal JAIS. The single font can represent almost all European languages as well as many specialist diacritics. It functions on both the PC and Macintosh platforms, and also over the Internet.
A successful information retrieval system for a given field will be the result of collaboration between computer specialists, linguists, and domain experts. The domain expert theoretically has little need to be familiar with computer language processing, but at this relatively early stage scholars in the humanities will have to attempt to find out what kinds of software are likely to give the most useful results and they will have to make their needs known to programming specialists. Attempts should be made by scholars training students in the use of real or model corpora to select one or more searching or information retrieval systems and to co-operate with the producers in fine tuning them for use with their material, the aim being to maximize the sophistication of the search engine so as to minimize the need for later indexing or manual tagging.
In this connection it is essential to recall that a great many European text corpora will, almost by definition, be multilingual. This poses special problems in the selection of search engines. Search engines that distinguish between languages, and especially those that in addition perform some kind of linguistic analysis, will presumably produce better results than those that do not. But their development may be considerably less cost-effective in a situation where many different languages are involved. It is important that students gain experience with information retrieval systems in the context of various kinds of investigations in the humanities. It is also important to provide feedback to the producers that will be useful for adapting their products to these specific needs. Example of the two approaches are the EUROSPIDER system, which was developed for use in the multilingual Swiss context, and which is not dependent upon syntactic analysis, and the package of linguistic analysis dependent tools produced by the Multi-Lingual Theory and Technology team at Xerox in Grenoble.
As to textual input mechanisms, we leave aside the rapidly expanding field of speech recognition, which will have enormous consequences in the future, and particularly for the building of corpora representing natural, everyday spoken language. We refer to the volumes by Bloothooft (1997-1999). The discussion here will be limited to optical character recognition (OCR) and the development of mnemonic keyboards to simplify the manual input of hundreds or even thousands of individual characters and symbols which some natural languages need.
There are many OCR programs on the market today. Some of them are meant primarily for light office tasks performed on simple material with few problems or irregularities. Others, which in general are of more use in the humanities, especially in a multilingual context, can recognize a very wide range of characters and variants and can be trained to recognize a great many more. Some can scan right to left as well as from left to right, and some again can recognize joined letters and an array of special ligatures, making them able to read scripts like that of Arabic, and, with development, handwriting. OCR programs can potentially be trained to recognize almost any image, not simply letters, and the image can then be assigned a code which can be used to reproduce it, either by means of including the image in a font or by setting as the code a hypertext link to the image. This fact became apparent when an early version of Sakhr's Automatic Reader was used to scan Egyptian hieroglyphs. The program has an English language interface, and even students not interested in Arabic should be encouraged to experiment with the possibilities it offers.
One problem arising from the use of a great many composite signs in a display font is ensuring uniform encoding of these signs. For aesthetic reasons, to cite one example, the dot placed under certain characters in Arabic does not always look the same and consequently, at least under certain encoding schemes presently in use, it does not have the same code. Carelessness, or disagreement about which of the available dots looks best under a given letter, could lead to a composite sign being encoded in more than one way, which would interfere with searchability and convertibility. To avoid this, a keyboard mapper such as Keys (version 2 and above), created by Peter Szaszvari for the Windows platform, may be used to write keyboard macros which ensure that given composite signs are produced in only one way. In the keyboard macros produced for the JAIS1 font mentioned above, in order to make it easy to remember how to produce a given combination, all diacritics are produced typologically by means of a mnemonic letter. If, when Unicode or a similar standard comes into general use, all the composite characters which can be produced by the font are available as single images, the mnemonic technique will still be of use for accessing these characters, and, indeed, many more. Today's keyboard macro programs are thus useful tools for training students to deal with the character resources that 16-bit encoding will make available.
The production of multimedia on CD-ROM and on the Web has led to the emergence of a new type of industry. New companies specialised in multimedia production have developed their activities in this field at a fast pace. Traditional publishing companies, including educational publishers, have suffered from some inertia when confronted with the problem of integrating this new branch into their structures. Faced with the surge of new competitors they have felt threatened, but having understood that their health, or even their survival, could be at stake, they have reacted promptly and are now ranking among the majors in the multimedia production sector. It is appropriate to reflect on issues regarding quality control and accessibility of educational materials produced commercially with the new technologies.
Generally speaking, the multimedia products with the most clearly defined pedagogical strategy are those aimed at young children. Despite the tsunami effect of multimedia on society at large and up to now, very few multimedia products have been available for the training of students at higher education levels, and particularly in the humanities. This is unfortunate because advanced formal methods can hardly be taught from books alone, since their application requires computer resources such as electronic texts, and advanced processing tools.
The lack of available products is not surprising though, since humanities faculties are often dramatically under-equipped as far as new technologies are concerned (see e.g. chapter 5 for some studies). This is of course a dissuasive element for publishing companies who might otherwise target this market sector. Also, for these companies and their multimedia designers, the subjects taught in the humanities (at university level) may appear too fuzzy or ill defined, involve too many related copyright issues, require too many experts or too much cooperation with university teachers. It follows that the evaluation of the return on humanities projects will generally suffer from a negative prognosis, and that the private sector will not invest much in the humanities under the present conditions.
Furthermore, there have been few teachers with enough motivation to acquire advanced training in new technologies. This seems less due to lack of interest than to the fact that keeping up with educational developments is not given as much academic credit as research. If the take up of multimedia by the humanities at university level is slower than in the rest of society, it is not because of any reluctance on the part of teachers.
Today, the Internet has become a quasi common place tool. University web-sites are part of the Web landscape, academic staff look on the Web for information on a daily basis, and do not hesitate to download data or software. The use of the Web is even so ordinary that it has become a reflex for many. Indeed, among university staff there is at present a markedly higher interest in development of Web courses than in courses for physical distribution via media such as CD-ROM. Apart from the low learning curve involved for teaching staff to start preparing materials on the Web, on-line materials also offer important advantages. Among these is the possibility to make modular, transparently linked courses, on which teaching staff at different places co-operate by developing and maintaining their own modules.
On-line courses also offer advantages for students. In addition to the general advantages of computer assisted teaching, such as adaptability to personal profiles, use of multiple input modes to reach levels of concentration, and self-paced tutoring, the practical limitations of time and space become irrelevant, thereby making on-line courses ideal for learning at a distance. The Internet is not just a medium for transmission of materials, but also for interaction with teachers and fellow students through e-mail, chat rooms, structured discussion forums, MOOs and other communication in the virtual classroom.
Nevertheless, off-line courses on CD-ROM may still be preferable to on-line courses in a number of cases. Presently, bandwidth is still too low while the cost of long connections may be excessive for students when they work at home, often because they have insufficient access to machines at their university. Finally, Web browsers are not as powerful as they might seem; incompatibilities and shortcomings in Java are plaguing applications which rely on mathematical operations and signal processing. This could mean that the use of the Internet, for the time being at least, is not unlimited as an on-line vector for pedagogy. Nevertheless, many experiments in Web teaching are being conducted in the humanities (for example the Oxford-based Virtual Seminars project; for other cases in point, we refer also to the other chapters of this volume).
Moreover, teachers in the humanities are beginning to expect their students to use the Internet as much as libraries. The Internet is in many areas bigger and more searchable than libraries on paper. It is sometimes even the best source for material that is so recent that it has not been published on paper yet. Eventually, the Internet may reduce the need for large numbers of copies of books and documents, thus preserving university libraries from suffocating, while offering a wider variety of publications. University libraries are indeed increasingly obtaining institution-wide subscriptions to web-based reference works like the Encyclopaedia Britannica. Copyrights issues related to university uses of these publications will have to be acknowledged to protect authors' and publishers' rights, taking into account that we are heading towards the creation of virtual libraries. But equally important is quality assurance, which must be considered at various levels.
On the surface, multimedia products must meet high ergonomic standards in such matters as legibility of the screen, clarity and comfort of navigation facilities and rapidity of consultation. These various requirements are met through the use of multimedia methods and representations related to data format and organisation, screen design, scenarios of consultation and interactive processes. However, these are not easy matters and require considerable investment of efforts and skills to be satisfactorily implemented, and will largely depend on the contents; hence the necessity of collaboration between many experts including the domain expert.
The second type of requirements teachers express concerns the quality of the data. The main issues here are availability, validity, updating and persistence. Those aspects of data quality requirement are to be contrasted with what occurs when working in the paper format. Books are not always available, or can be very difficult to get hold of, whereas once a document is on the Web, it can immediately become available to millions on the Internet. With respect to validity, on the other hand, books tend to offer better guarantees through their publishers. Updating is a strong side of the Internet. Finally, persistence has shown to be problematic on the Internet, where links tend to move and become invalid, but books are also problematic since once they are out of print, it is difficult to obtain copies.
Faced with non-existing or immature infrastructures for using the new technologies in the academic landscape, European institutions of higher education policies should promote the technology mediated circulation of knowledge through innovative proposals and measures, stimulated by national and international measures. As regards electronic publishing, at least from an academic point of view, a number of issues need to be addressed urgently. Among these are the following:
Furthermore, traditional academic rhetorical discourse, which is essentially
linear, has been confronted with the multimodal and non-linear rhetoric
of hypermedia, which has encouraged teachers to re-appreciate how knowledge
transmission can work best in various situations, and take into account
a broader range of parameters.
In the past preparing a document to present knowledge consisted in reading, taking notes, thinking of a plan and writing (generally a solitary work), then sending it to the printer. Today, with multimedia, the various steps towards the final document (CD-ROM for example) are more numerous. They also require more structuring, more accuracy, more trans-disciplinary competencies and more collaborative work with people from different fields.
The experience of multimedia and its production is certainly an influential factor in the re-definition or refining of methodologies. One could say that new technologies are giving humanities experts new responsibilities, and even reshaping their identities. They must be discipline specialists, but knowledge engineers as well. In order to fulfil their new responsibilities as knowledge transmitters, humanities teachers will have to master the nuts and bolts of multimedia design, from its technological to its formal methodological aspects, in addition to their own discipline and didactic concerns. In this process, the multimedia aspects should not be overemphasized, but remain subsidiary to the pedagogical goals.
Indeed, an important side effect of the greater accessibility of the machine in teaching and learning situations is a greater awareness of the humanities academia about the progressive renewal of their cognitive tools, in particular domain knowledge representations, formalization and modelling. Thanks to this accessibility, the modelling tools which result from advanced computing research can quickly be redirected to educational situations where they train the next generation of scholars in the new formal methods.
The ease of use of the Internet has been a successful playing ground for today's humanities teachers. True to their universities' mission of producing and transmitting knowledge, they are eager to move from Gütenberg's invention to the new media. However, with the new opportunities we have now, it is time to realize that the choices in the delivery of teaching are not independent of the content to be taught. This holds especially for advanced computing in the humanities, when the methods to be taught are critically dependent on sound computational implementation. It will not be better multimedia, but better integration of computational content that will distinguish quality teaching materials from the rest. Dedication to quality of teaching is the best chance for universities to keep true to their missions and to prevent multimedia producing companies from entirely replacing them in the third millennium.
We recommend that institutions of higher education across Europe explicitly recognize that advanced computing in the humanities is important for the future of humanities education, by making the area a focus in strategy plans, budget allocations and public relations.
We recommend that the further development of this field is implemented by restructuring in educational frameworks, including the provision of departments, chairs and support staff dedicated to the integration of advanced computational methods in humanities domains. Teaching computational methods must be directly linked to the students' domain of study.
We recommend the definition of new interdisciplinary competencies for teaching staff in humanities computing and the recognition of teachers' qualifications according to this definition. The continuous training and retraining of teaching staff must be secured through sponsoring of summer schools and other activities.
We recommend the creation of infrastructures for the creation, certification and Europe-wide free distribution of digital resources which are suited as study materials for humanities students. The institution of European centres of excellence in humanities computing would be one step towards achieving this goal.
We recommend the creation of new degrees explicitly qualifying students in computing applied to the humanities or to specific humanities fields. Further curriculum development must be aimed toward the institution of European masters degrees with a focus on integrating computing in specific humanities fields, in order to achieve a recognizable level of qualification in a globalized world of information technologies.
We recommend that institutions give teaching staff credit for their work in advanced computing in humanities education. Institutions must recognize electronic publications as equally valid and valuable as paper publications and must consider educational achievements equally valuable as research achievements. Institutions must actively stimulate collaborative work not only between university staff (as in European projects), but with the private sector as well.
We recommend that institutions provide adequate equipment to students including advanced tools which provide specific support for the study of humanities subjects both at undergraduate and graduate levels.
BLOOTHOOFT (ed.) The landscape of future education in speech communication sciences (Vol. 1-3). Utrecht, OTS publications, 1997-1999.
BUSA, Roberto, Fondamenti di informatica linguistica (Trattati e Manuali), Milano, Vita e Pensiero, 1987.
Calcolatori e Scienze umane. Archeologia e Arte, Storia e Scienze Giuridiche e Sociali, Linguistica e Letteratura. Scritti del convegno organizzato dall'Accademia Nazionale dei Lincei e dalla Fondazione IBM Italia (Fondazione IBM Italia), Milano, Etas Libri, 1992.
COOMBS, RENEAR, et al. `Markup Systems and the Future of Scholarly Text Processing', Communications of the ACM 30.11: 933-947 (1987).
GALLINO, Luciano (ed.), Informatica e scienze umane. Lo stato dell'arte (Collana 885/76), Milano, Franco Angeli, 1991.
GARDIN, Jean-Claude, Le calcul et la raison. Essais sur la formalisation du discours savant (Recherches d'histoire et de sciences sociales, 46), Paris, Éditions de l' École des Hautes Études en Sciences Sociales, 1991.
GENET, Jean-Philippe, ZAMPOLLI, Antonio (ed.), Computers and the Humanities (European Science Foundation), Hampshire, Darmouth Publishing Company Limited, 1992.
HAMESSE, Jacqueline, Méthodologies informatiques et nouveaux horizons dans les recherches médiévales (Société Internationale pour lÉtude de la Philosophie Médiévale. Rencontres de Philosophie Médiévale), Turnhout, Brepols, 1992.
HOCKEY, Susan, A Guide to Computer Applications in the Humanities, London, The Johns Hopkins University Press, 1980.
HOCKEY, Susan, The Oxford Courses for Literary and Linguistic Computing, in: L. Cignoni - C Peters (eds.), Computers in Literary and Linguistic Research, Pisa, Giardini, 1984 (p. 175-181).
HUGHES, Larry John Jr., Bits, Bytes & Biblical Studies. A Resource Guide for the Use of Computers in Biblical and Classical Studies, Grand Rapids (Michigan), Academie Books-Zondervan Publishing House, 1987.
HYMES, Dell (Ed.). The Use of Computers in Anthropology. London, 1965.
IDE, Nancy and VÉRONIS, Jean (ed.), Text Encoding Initiative. Background and Context, Dordrecht, Kluwer Academic Publishers, 1995.
KATZEN, May, Scholarship and Technology in the Humanities (British Library Research), London, Bowker-Saur, 1991.
KENNY, Anthony, The Computation of Style. An Introduction to Statistics for Students of Literature and Humanities, Oxford, Pergamon Press, 1982.
LANCASHIRE, Ian, The Humanities Computing Yearbook 1988. Oxford, Clarendon Press, 1988.
LANCASHIRE, Ian, The Humanities Computing Yearbook 1989-90. A Comprehensive Guide to Software and other Resources, Oxford, Clarendon Press, 1991.
La Pratique des Ordinateurs dans la Critique des Textes, Paris, Centre National de la Recherche Scientifique, 1979.
MARCOS MARÍN, Francisco A., Informática y Humanidades, Madrid, Editorial Gredos, 1994.
MIALL, David S. (ed.), Humanities and the Computer. New Directions, Oxford, Clarendon Press, 1990.
OAKMAN, Robert Lee, Computer Methods for Literary Research, Athens (Georgia), The University of Georgia Press, 1984.
ORLANDI, Tito (ed.), Discipline umanistiche e informatica. Il problema della formalizzazione (Contributi del centro linceo interdisciplinare "Beniamino Segre" 96), Roma, Accademia Nazionale dei Lincei, 1997.
ORLANDI, Tito, Informatica Umanistica ("Studi Superiori NIS", n.78), Roma, La Nuova Italia Scientifica, 1990.
ORLANDI, Tito, Per l'informatica nella Facoltà di Lettere ("Informatica e discipline umanistiche", n.4), Roma, Bulzoni Editore, 1990.
PERILLI, Lorenzo, Filologia Computazionale, Roma, Accademia Nazionale dei Lincei, 1995.
RUDALL, Brian H. and CORNS, Thomas N., Computers and Literature. A Practical Guide, Cambridge, Abacus Press, 1987.
SOLOMON, Jon (ed.), Accessing Antiquity. The Computerization of Classical Studies, Arizona, The University of Arizona, 1993.
SPERBERG-MCQUEEN, C. Michael & BURNARD, Lou (eds.) (1994): Guidelines for Electronic Text Encoding and Interchange (Electronic Book Library Nr. 2). Providence: Electronic Book Technologies.
TURK, Christopher (ed.), Humanities research using computers, Londra-New York, Chapman & Hall, 1991.
VUILLEMIN, Alain, Informatique et Littérature. 1950-1990 (Travaux de Linguistique Quantitative n. 47. Publiés sous la direction de Charles Muller), Paris-Genève, Champion-Slatkine, 1990.
ICAME Journal. University of Bergen.
International journal of corpus linguistics. John Benjamins Publishing Company.
Literary and Linguistic Computing. Oxford University Press.
Research in Humanities Computing. Oxford University Press.
Revue Informatique et Statistique dans les Sciences Humaines. Université de Liège, Centre informatique de Philosophie et Lettres.
Association for Literary and Linguistic Computing (ALLC): http://www.allc.org/.
Encyclopaedia Britannica: http://www.eb.co.uk/.
Iniziative di informatica umanistica su Internet: http://RmCisadu.let.uniroma1.it/camplani/internet.html.
Interactive demonstrations in speech and hearing by Martin Cooke: http://www.dcs.shef.ac.uk/~martin.
Irish resources in the humanities: http://www.ucd.ie/irh/.
Linear Predictive Vocoder by Klaus Fellbaum: http://www.kt.tu-cottbus.de/speech-analysis/.
Models of Speech Perception by Cecile Fougeron and Francesco Cutugno: http://www.unige.ch/fapse/PSY/persons/frauenfelder/SP/Model_speech.html.
Virtual Seminars project (at Oxford): http://info.ox.ac.uk/jtap.