Paul Mc Kevitt
Center for PersonKommunikation (CPK)
Institute of Electronic Systems (IES)
Fredrik Bajers Vej 7-A5, DK- 9220, Aalborg Ø, DENMARK
There is a major motivating force which is driving the Humanities and Sciences/Engineering towards each other in the area of integration of language and vision processing by machines: SuperinformationhighwayS. This force is the ability now to have information in text, voice, sound, graphic and video forms available within minutes at local and 1. remote sites through interfaces like Netscape and search engines like 2. AltaVista. People will be able to pose their queries for retrieving information about say stocks and shares, or good restaurants in a city or their bank account by speaking that query to the machine. In turn, they will be able to direct the machine's graphical display of the information it is presenting in response. Visual information comes in many formats from diagrams to videos as does language information both natural and formal. The Sciences/Engineering are more concerned with methods for transmitting, processing, representing and retrieving information across networks while the Humanities are more concerned with the actual information itself. Slethei (1998) also makes this point on convergence of the gap between the two cultures, especially in respect of spoken dialogue systems (http://www.hd.uib.no/AcoHum/abs/Slethei.htm).
The area of MultiMedia is growing rapidly internationally and it is clear that it has various meanings from various points of view. MultiMedia can be separated into at least two areas: (1) traditional MultiMedia and (2) Intelligent MultiMedia (IntelliMedia). The former area is the one that people traditionally think of as being MultiMedia, encompassing the presentation of text, voice, sound and video/graphics with possibly touch and virtual reality linked in. However, the computer has little or no understanding of the meaning of what it is presenting. IntelliMedia, which involves the computer processing and understanding of perceptual input from speech, text and visual images and reacting to it is much more complex and involves technologies from the Engineering side in terms of spoken language processing, natural language processing, image processing, Computer Science and Artificial Intelligence and from the Humanities side in terms of Linguistics, Cognitive Science, Psychology and studies of the mind. (see Mc Kevitt 1994/95/96/97). This is the newest area of MultiMedia research which has seen an upsurge over the last two years and one where most universities internationally do not have all the necessary expertise locally. Traditional and Intelligent MultiMedia education and research are found in the Science/Engineering and Humanities/Humanistic Computing Departments at Aalborg University, Denmark.
2 IntelliMedia 2000+
The Institute for Electronic Systems at Aalborg University, Denmark has expertise in the area of IntelliMedia and has already established an initiative called IntelliMedia 2000+ funded by the Faculty of Science and Technology (FaST). IntelliMedia 2000+ coordinates research on the production of a number of real-time research demonstrators exhibiting examples of IntelliMedia applications and education in the form of a new Master's degree in IntelliMedia. An important emphasis is the integration of research and education in IntelliMedia. IntelliMedia 2000+ is coordinated from the Center for PersonKommunikation (CPK) which has a wealth of experience and expertise in spoken language processing, one of the central components of IntelliMedia, but also radio communications which would be useful for mobile applications (CPK Annual Report, 1998). More details on IntelliMedia 2000+ can be found on WWW: http://www.kom.auc.dk/CPK/MMUI/. IntelliMedia 2000+ involves four research groups from three Departments within the Institute for Electronic Systems: Computer Science (CS), Medical Informatics (MI), Laboratory of Image Analysis (LIA) and Center for PersonKommunikation (CPK), focusing on platforms for integration and learning, expert systems and decision taking, image/vision processing, and spoken language processing/sound localisation respectively. The first two groups provide a strong basis for methods of integrating semantics and conducting learning and decision taking while the latter groups focus on the two main input/output components of IntelliMedia, vision and speech/sound.
Teaching is a large part of IntelliMedia 2000+ and two new courses have been initiated: (1) MultiModal Human Computer Interaction, and (2) Readings in Advanced Intelligent MultiMedia. MultiModal HCI, including traditional HCI, involves teaching of methods for the development of optimal interfaces through methods for layout of buttons, menus, and form filling methods for screens but also includes advanced interfaces using spoken dialogue and gesture. The course on Readings in Advanced Intelligent MultiMedia is innovative and new and includes active learning where student groups present state of the art research papers and invited guest lecturers present their research from IntelliMedia 2000+. A new Master's Degree (M.Eng./M.Sc.) has been established and incorporates the courses just mentioned as core modules of a 1 and 1/2 year course taught in English on IntelliMedia. Each semester has a theme associated with it and involves both project work and courses. Semester I focusses on Basic methods, Semester II on Advanced methods and III on a Master's Thesis in Intelligent MultiMedia. The latter semester has no courses. The Masters course is open for non-Danish and Danish students. All courses are given in English and the thesis can be written in English or Danish. Each student is graded according to internationally recognised grading schemes. More details can be found on WWW: http://www.kom.auc.dk/ESN/masters.
The emphasis on group organised and project oriented education at Aalborg University (Kjaersdam and Enemark 1994) is an excellent framework in which IntelliMedia, an inherently interdisciplinary subject, can be taught. Most courses involve students working on project work in groups in the unique Aalborg style. Here, each semester the students work together in groups of three to four on self-chosen projects and this has proven to give students better opportunities after their education. Approximately 50% of the courses have individual examinations and all courses can be examined as part of an oral examination based on the prepared project report. Groups can even design and implement a smaller part of a system which has been agreed upon between a number of groups. It is intended that there be a tight link between the education and research aspects of IntelliMedia 2000+ and that students can avail of software demonstrators and platforms developed but can also become involved in developing them. The Master's course is now in its second year with over 20 students, half of whom are from abroad and a number of student projects related to IntelliMedia 2000+ have already been completed (Bakman et al. 1997a, 1997b, Nielsen 1997, Tuns and Nielsen 1997). Currently five student groups are enrolled in the Master's conducting projects on multimodal interfaces, pool-game trainer, virtual steering wheel, audio-visual speech recognition, and face recognition. Occasionally, a Lifelong Learning course is given for returning students of Aalborg University who wish to continue their education. This course is a compression of the core IntelliMedia courses.
The results from the four research groups of IntelliMedia 2000+ have hitherto to a large extent been developed within the groups themselves. However, our goal was to establish collaboration among the groups in order to integrate their results into developing IntelliMedia demonstrator systems and applications. Some of the results would be integrated within a short term perspective as some of the technologically based modules are already available, others on the longer term as new results become available. The demonstrator would be a single platform called CHAMELEON with a general architecture of communicating agent modules processing inputs and outputs from different modalities and each of which could be tailored to a number of application domains. CHAMELEON would demonstrate that existing platforms for distributed processing, decision taking, image processing, and spoken dialogue processing could be interfaced to the single platform and act as communicating agent modules within it. CHAMELEON would be independent of any particular application domain. The first prototype of a CHAMELEON software and hardware platform has been developed. CHAMELEON demonstrates that existing software modules for (1) distributed processing and learning, (2) decision taking, (3) image processing, and (4) spoken dialogue processing can be interfaced to a single platform and act as communicating agent modules within it.
CHAMELEON is independent of any particular application domain and the various modules can be distributed over different machines. Most of the modules are programmed in C++ and C. CHAMELEON demonstrates that (1) it is possible for agent modules to receive inputs particularly in the form of images and spoken dialogue and respond with required outputs, (2) individual agent modules can produce output in the form of semantic representations, (3) the semantic representations can be used for effective communication of information between different modules, and (4) various means of synchronising the communication between modules can be tested to produce optimal results. More details on CHAMELEON are found in Broendsted et al.(1998) and Mc Kevitt (1998) (http://www.hd.uib.no/AcoHum/abs/McKevitt-demo.htm) .
SuperinformationhighwayS are forcing the merging of the Humanities and Sciences/Engineering in terms of processing, integrating, representing and accessing information in multiple modalities including at least text, voice, sounds and images/videos (Intelligent Multimedia). Information from many cultures will be input in the form of natural and formal speech and language with images in the form of simple diagrams right up to videos. The Humanities will be concerned more with the content of the information being passed while the Sciences/Engineering will be more concerned with processing, representation and transmission. As Horgan (1996) points out much of the future of science for 2000+ will be in the integration and engineering of existing theories, models and systems with convergence. Aalborg University is well equipped in terms of research expertise and education to be able to contribute to IntelliMedia 2000+ which will be important for the future of international computing and media development. An important emphasis is the integration of research and education in IntelliMedia. We believe IntelliMedia will also throw light on the numerous developments in Computer and Cognitive Science (CS) (O Nuallain 1995 and O Nuallain et al.1997). IntelliMedia 2000+ (http://www.kom.auc.dk/CPK/MMUI/) will ensure the position of Denmark and Europe in the construction of the future of SuperinformationhighwayS.
We take this opportunity to acknowledge support from the Faculty of Science and Technology, Aalborg University, Denmark and from the European Union (EU) under the ESPRIT (OPEN-LTR) Project 24 493. Paul Mc Kevitt would also like to acknowledge the British Engineering and Physical Sciences Research Council (EPSRC) for their generous funded support under grant B/94/AF/1833 for the Integration of Natural Language, Speech and Vision Processing (Advanced Fellow) and LIMSI-CNRS, Orsay, France where he was a Visiting Professor whilst completing this abstract.
1 Netscape is a trademark of Netscape Communications Corporation.
2 AltaVista is a trademark of Digital Equipment Corporation.
3 Paul Mc Kevitt is also a British Engineering and Physical Sciences Research Council (EPSRC) Advanced Fellow at the Department of Computer Science, University of Sheffield, for five years under grant B/94/AF/1833 for the Integration of Natural Language, Speech and Vision Processing.
Broendsted, T., P. Dalsgaard, L.B. Larsen, M. Manthey, P. Mc Kevitt, T.B. Moeslund, K.G. Olesen (1998) A platform for developing Intelligent MultiMedia applications. Technical Report R-98-1004, Center for PersonKommunikation (CPK), Institute for Electronic Systems (IES), Aalborg University, Denmark, May. Bakman, Lau, Mads Blidegn, Thomas Dorf Nielsen, and Susana Carrasco Gonzalez (1997a) NIVICO - Natural Interface for VIdeo COnferencing. Project Report (8th Semester), Department of Communication Technology, Institute 8, Aalborg University, Denmark. Bakman, Lau, Mads Blidegn, and Martin Wittrup (1997b) Improving human computer interaction by adding speech, gaze, tracking and agents to a WIMP based environment. Project Report (9th/10th Semester), Department of Communication Technology, Institute 8, Aalborg University, Denmark. Baekgaard, Anders (1996) Dialogue management in a Generic Dialogue System. Proceedings of the Eleventh Twente Workshop on Language Technology (TWLT), Dialogue Management in Natural Language Systems, 123-132. Twente, The Netherlands. Dalsgaard, Paul and A. Baekgaard (1994) Spoken language dialogue systems, In Prospects and Perspectives in Speech Technology: Proceedings in Artificial Intelligence, Chr. Freksa, (Ed.), 178-191, September. Muenchen, Germany, Infix. Horgan, John (1996) The end of science: facing the limits of knowledge in the twilight of the scientific age. Reading, Mass.: Addison-Wesley (Helix Books). Mc Kevitt, Paul (1994) Visions for language. Proceedings of theWorkshop on Integration of Natural Language and Vision processing. Twelfth American National Conference on Artificial Intelligence (AAAI-94), Seattle,Washington, USA, August, 47-57. Mc Kevitt, Paul (Ed.) (1995/1996) Integration of Natural Language and Vision Processing (Vols. I-IV). Dordrecht, The Netherlands: Kluwer-Academic Publishers. Mc Kevitt, Paul (1997) SuperinformationhighwayS. In ``Sprog og Multimedier'' (Speech and Multimedia) Tom Broendsted and Inger Lytje (Eds.), 166-183, April 1997. Aalborg, Denmark: Aalborg Universitetsforlag (Aalborg University Press). Mc Kevitt, Paul (1998) CHAMELEON and the IntelliMedia WorkBench: integrating research from the humanities, science and engineering. In WWW and printed Proceedings of the International Conference on The Future of the Humanities in the Digital Age: problems and perspectives for humanities education and research. University of Bergen, Bergen, Norway, September (http://www.hd.uib.no/AcoHum/abs/McKevitt.htm). Nielsen, Joergen (1997) Distributed applications communication system applied on IntelliMedia WorkBench. Project Report (8th Semester), Department of Medical Informatics and Image Analysis (MIBA), Institute 8, Aalborg University, Denmark.O Nuallain, Sean (1995) The search for mind: a new foundation for cognitive science. Norwood, New Jersey: Ablex Publishing Corporation. O Nuallain, Sean, Paul Mc Kevitt and Eoghan Mac Aogain (1997) (Eds.) Two sciences of mind: readings in cognitive science and consciousness. "Advances in Consciousness Research" (AiCR 9). USA: John Benjamins. Slethei, Kolbjørn (1998) Can education bridge the gap between the two cultures? In WWW and printed Proceedings of the International Conference on The Future of the Humanities in the Digital Age: problems and perspectives for humanities education and research. University of Bergen, Bergen, Norway, September (http://www.hd.uib.no/AcoHum/abs/McKevitt.htm) Tuns, Nicolae G. and Thomas Dorf Nielsen (1998) Experimenting with phase web as AI support in the CHAMELEON system. Project Report (9th semester), Department of Computer Science, Institute 8, Aalborg University, Denmark.