4.3 Curricula and courses in computational linguistics

4.3.1 Introduction

Computational linguistics is taught at all levels from undergraduate upwards, but the academic context varies widely from university to university.  In some cases, there are three or four year CL major degree programmes, but more commonly, individual units of CL study are offered as part of a programme of study in either computer science or linguistics.  The field has seen rapid development in the past few decades.  This has brought about many different schools of thought, as well as a corresponding amount of research specialisms.  While this variety is useful and should be encouraged, it presents a number of difficulties in ensuring that quality teaching in an up to date spectrum of CL continues to be offered.  Firstly, the number of CL staff at most universities is rather small.  Secondly, the development of good teaching materials is way behind developments in research.  This may affect the quality of teaching in several ways: All of this argues for increased co-operation on an international level, to develop and offer better teaching and training in CL.  However, it is not only CL which is affected by these developments, but also general linguistics.  Mainstream scholars of language are increasingly using tools which are produced by CL research.  One needs to distinguish here between the linguist--a user of linguistic tools, and the computational linguist, who is in addition a developer of such tools.  Consider parsing, for instance: even without understanding how a parser works, it can be a practical tool for a linguist who wants to experiment with different grammars.  This argues for the inclusion of tools courses not only in computational linguistics, but also in general linguistics curricula.  To give but one example, all first year linguistics students at the University of Bergen use computer tools such as the LFG Grammar Writers' Workbench and Tarski's World.  The use of these tools requires no programming, but offers sophisticated support to the learning of grammar writing and of logic, respectively.

Co-operation between linguists and computational linguists is needed in the updating of both CL and Linguistics curricula.  In this respect, ACO*HUM aims towards bringing advanced computing to as large a number of humanities students as possible, rather than merely providing a distinction between those who use advanced computing in the humanities and those who do not.

4.3.2 Curriculum development initiatives

Much of this chapter reports on initiatives begun in the ERASMUS Inter-University Cooperation Programme (ICP) in Natural Language Processing in the area of curriculum development (1993-1996).  This work aimed at developing a modular curriculum in NLP consisting of a basic module of courses in CL to be implemented in each participating university, and several specialization modules, different for each university in the ICP, depending on their expertise, taking the entrance condition as nothing more than the basic module.  Students travelling in the network would then be able to begin their NLP studies locally and, depending on their area of interest, take any specialization module at any site.  Given this, our ultimate desideratum was defining good practice in offering commonly agreed curricula.  This was seen as a four-stage process, namely: In the first phase, the undergraduate curricula of Dublin City University, Universität des Saarlandes at Saarbrücken and Tilburg University were standardized.  Then in 1994-95, the contents of the basic module were defined and agreed upon by a subset of the partners in the ICP.  The proposal was discussed, adapted, and agreed upon later by a larger subset of the ICP, with the following results:
Core Course/Topic Contents
Introduction to & Basics in Linguistics:  Phonetics/Phonology
Introduction to Grammars
Computational Linguistics Parsing
Lexical Knowledge
Formal Semantics
Symbolic Computation Lisp and/or Prolog
a procedural language
Program Design
Data Structures and Algorithms
Mathematics/Logic FOPC
Set Theory
Formal Languages and Automata
Probability Theory 
Grammar Formalisms Constraint Grammars
other state of the art grammars 
AI Search
Reasoning and Theorem Proving 
Knowledge Representation

Table 4.1: Core Courses in NLP

These proposals correspond remarkably well to the results of the survey of March 1999.  Regarding core topics, the responding institutions seemed to agree on Parsing Algorithms and Formal Grammars; Introductory courses on Linguistics; Lexical knowledge, Formal semantics, Mathematics and Logic.

It was envisaged that these courses could be taken by students from any site at any other site.  As these courses could be embedded in different types of curricula (e.g. languages, linguistics, computer science, psychology etc.), the exact number of courses and the place in the curriculum where they are taught would differ from site to site.  It is interesting to note that is this work were to be attempted now, then in all probability (no pun intended!) a separate strand on Statistical NLP would be advocated.  Nevertheless, at the time this core material was proposed, this was still very much an emerging discipline, and it was not clear whether it would stand the test of time, and was therefore excluded.Of course, as well as these core components, there are a number of specialization modules, the whole being summarized as follows (x indicates "is taught"; - indicates "is not taught"; o indicates "is optional"):
Course Sheffield DCU Saarbrücken Tilburg UMIST
Introduction to Linguistics x x x x x
Prolog x x x x x
Math/Logic x x x x x
Procedural Programming  x x x o x
Logic/Semantics x x o x o
Software Engineering  x x - Project -
Algorithms/Data Structures  x x x - -
Grammar  x - x x x
Introduction to CL  x x x x x
LISP x - o x x
Artificial Intelligence x x o x x
Parsing x x o x x
Grammar Formalisms x x x - x
Pragmatics x - x - x
Formal Semantics x x x x -
Vision/Robotics x x - - -
Complexity Theory x x x - -
Philosophy x x - - -
Corpus Analysis/ Empirical Methods x x x o x
Psycholinguistics x x o x x
Sociolinguistics x - - x x
Acoustics/Phonetics  x x x - x
Speech x x x - x
Machine Translation - x x - x
Info. Retrieval - - - x -
Human Computer Interaction x - - x -

Table 4.2: Specializations taught at selected sites teaching NLP

These specialization modules obviously reflect the different expertise of different sites, and under our schema would ultimately make possible a wider choice of specialization to all students in the ICP.  Other modules which could be included are language-dependent NLP (French, Irish, Dutch, German NLP) and other application-specific NLP (Dialogue systems, Software localization etc.).

The properties of the programme can be summarized as follows:

  1. Full ECTS compatibility in the credit system.
  2. European dimension in language-specific Modules.
  3. Introduce long-distance teaching techniques.
  4. Develop guidelines for sites starting with NLP curricula.
In 1995-96, the contents of the basic module was adapted to run on the lines of ECTS.  Most of the additional material has also graduated to this level of detail.  In our ICP, as elsewhere, there is a difference between institutions which offer full degrees in CL and institutions offering specialization studies with a focus on CL.  The compatibility with ECTS for those participants in the network having an undergraduate NLP programme (or if no such programme, then a sizeable number of components thereof) was estimated (under very general headings) as:
Topic Gøteborg DCU Saarbrücken Sheffield UMIST Trondheim

Table 4.3: Proportion of material taught at selected sites in terms of ECTS

The sums add to 120, as this core material is envisaged taking the equivalent of two years' study, and ECTS allocates a maximum of 60 credits per year's study.  Even assuming such a general classification schema as this, one can see the very different emphasis placed on certain topics in different institutions.  There may, of course, be philosophical, geographical or historical issues underpinning such choices, but much more compelling is that it reflects the skills and interests of the staff at the sites concerned.

The final phase of this integration under the aegis of the ICP has already been completed by all partners.  Progress was hampered as before by the many bureaucratic and administrative problems when trying to adapt the different curricula at the different institutions to incorporate the basic module.  While we remain committed in principle to developing a joint curriculum in NLP, getting each university to adopt our model will be a much greater obstacle to progress, particularly when it comes to the establishment of treaties between universities for mutual recognition.  However, such an understanding may be more readily achievable at postgraduate level, hence our involvement in the development of a European Msc in Language and Speech.

As for a European dimension in our language-specific modules, this can be taken almost as a given.  Obviously parsing or speech processing is taught in (say) UMIST or DCU with English in mind, whilst in Saarbrücken the focus is on German.  Likewise, when machine translation is taught, the focus would be on translation into the native tongue.  Our efforts regarding Open and Distance Learning (ODL) techniques are documented elsewhere in this chapter.

Obviously we hoped that our work on curriculum development would be used by other universities setting up proposed NLP programmes, and indeed members of the ICP have been contacted by other parties interested in bringing in new courses in NLP, following publicity given to this via the WWW and papers at conferences.  In addition, with the work under the ICP having come to a close, we have continued this groundwork under the SOCRATES networks ACO*HUM and Speech Communication Sciences, or in further joint efforts with other interested groups.

One thing which we wanted to do was to broaden the user base so as to reflect more exactly the state of teaching in CL throughout Europe.  The ICP in NLP contained just 18 universities.  The number of institutions within ACO*HUM expressing an interest in CL is 52, so even within our network there is much to be learned from these new partners.  Nevertheless, there are a number of excellent CL sites which are not in ACO*HUM, whose views needed to be embraced.  This was one of the reasons the CL group within ACO*HUM conducted a survey of major universities with expertise in CL, some of the results of which are discussed elsewhere in this chapter.

4.3.3 Advantages of agreement on core topics

We see these advantages as manifold.  Firstly, efforts towards common curricula could take advantage of a well-documented state of the art.  There have not been many initiatives on this as far as we are aware.  Obviously, many NLP lecturers make their material available over the web, so individual comparisons between courses can be made.  This is facilitated by documents such as the Survey of Computational Linguistics Courses (Dorr, 1993) and the Directory of Graduate Programs in Computational Linguistics (Evens, 1992) published by the Association for Computational Linguistics.  The Socrates Thematic Network in the Teaching of Computing made some strides towards common curricula, including on-line syllabi for CL and Machine Translation.  In a similar vein, the Speech Communication Sciences network produced a curriculum for NLP.

Nevertheless, our efforts are genuinely oriented at the definition of best practice in defining commonly agreed curricula, at least at the level of common core material.  We feel that this work is innovative and hopefully of use to the wider NLP community and that it will offer advantages for mobility.  Here, for instance, certain institutions could modify their curricula with respect to courses offered by other institutions if they became aware of certain omissions in their curricula, should they feel that this lacking of certain knowledge hinders the progression of their students.  We feel that mobility of students and staff would be greatly facilitated by adopting a common starting point as entry for different specializations at different places.  Thus, a certain harmonization at basic levels will promote diversity at more advanced levels.

In addition, knowing what students can be expected to have learned prior to their coming to one's university from abroad, can help lecturers anticipate any particular problems, and provide invaluable information when it comes to selecting an appropriate course of study abroad.  Also, we envisage that a core curriculum could serve as a baseline, or model of good practice, for comparison by other universities who are hoping to set up a course in CL, or who may want to bring their courses more into line with what is being taught at those universities at the forefront of NLP research (particular in the Eastern European context).  Finally, an international agreement on common core elements in CL curricula could simplify the match between job profiles and competencies; this could be a significant advantage for the language industries when recruiting from many countries to cover a wide group of languages.

Notwithstanding these benefits, we want to head off right now one particular criticism which can be anticipated.  That is, we want to stress that our proposals in favour of a common core curriculum are entirely voluntary, and should not be considered an effort to standardize European education in the least.  The adoption of good practice in commonly agreed curricula should spring from the university's desire to offer advantages to students and should not be imposed from above.  We hope it is clear that we want to maintain the specializations which exist, but rather than homogenize all NLP teaching, we want instead to make such specializations available to more students, so as to widen the choice of material open to them, and in so doing enhance their learning experience.  If successful, this will provide still more diversity, lead to more student and staff mobility than currently exists, and make mobility more fruitful.

4.3.4 Proposals and recommendations for future curriculum development

We have attempted to provide the reasoning behind the work done on Curriculum Development, begun some time ago by the ERASMUS Inter-University Cooperation Programme in Natural Language Processing, and continued by the ACO*HUM working group on Computational Linguistics.  This reasoning seems to be supported by the results of the March 1999 survey.  This work is important for many reasons: NLP is an emerging discipline, and could do with a documented state of the art as it approaches maturity.  Furthermore, our work contains proposals for core curriculum components and integration of computing modules in traditional disciplines.  The work reflects a desire for pan-European courses in CL, hopefully at all levels, but perhaps more achievable at postgraduate level.

However, the field of CL is developing very rapidly, and therefore the definition of best practice in CL curricula is a moving target which will require continuing co-ordinated efforts in the future.  Most of all, the field is becoming even more interdisciplinary than before; increasingly, new programmes are across faculty boundaries.  We refer in this respect to the programme Multimedia for Knowledge Transfer at the University of Leiden, The Netherlands, which offers an interdisciplinary major to students of Psychology, Computer Science, Linguistics, Art History and Education Sciences.  The programme covers multimedia (graphic design, sound, text, video, animation)  in an integrated and useful manner to present and register information and knowledge.  Natural Language Processing makes up 3 of the 14 minimal core credits in this nominally one-year programme including a project.

Similarly, the masters degree in Intelligent MultiMedia at Aalborg University in Denmark focusses not only on text and speech but also on vision and their mutual integration.  The goal is to implement a one and a half year masters where students will be required to spend at least three months at another institution in another European country.  The idea is also that students will be able to avail of expertise at another institution which may not exist at their own.  CL is represented in this Master's through three content descriptions: theoretical linguistics, natural language processing, and language engineering applications.  Group project work is an essential learning mode throughout the programme.

We support the work done under the CDA for developing the curriculum for a pan-European Master's degree course in language and speech, which will be taught from October 1999.  This project, which is detailed further in Bloothooft (1999b) and Bloothooft et al. (1998b), have a somewhat wider scope than CL.  Their aim to integrate speech processing with natural language processing seems entirely appropriate at this point, given developments in society, especially in the language industries.  Such integration is also supported by the results of the survey.

Nevertheless, we reiterate here that any proposals here remain just that: we do not advocate a bland, homogeneous methodology throughout the continent; rather, we hope, provided such common groundwork is in place, that students will be able to avail of the rich number of specialization modules available in a more mobile structure than is currently available.  This will lead to a more diverse workforce, more cross-cultural communication, and greater understanding of our partners throughout Europe than is presently the case.  This can only be encouraged, and sought after ever more vigorously.  Finally, we feel that all courses should conform to the practice of ECTS to enable compatibility and facilitate comparisons to be drawn between courses.

In sum, we recommend the following: