Human Speech Production Using Interactive Modules and the Internet - a Tutorial for the Virtual University

Klaus Fellbaum, Brandenburg Technical University
of Cottbus, Germany
and Joerg Richter, Technical University of Berlin,
Germany.
http://www.prz.tu-berlin.de/~jri/
fellbaum@kt.tu-cottbus.de


We present a Tutorial in which the principle of a Linear Predictive Vocoder (LPC vocoder) is interactively explained. The user
speaks into a microphone, the voice is digitised and stored in the computer. A replay of the stored voice (e.g. for comparison
purposes) is possible when appropriate. In addition, several speech samples are stored and can be replayed or processed.
Thus the program can be used if no microphone is available. The voice components, namely the fundamental frequency and the
vocal tract parameters, are computed. Then, the components are fed into the synthesis part of the vocoder which finally
generates a synthesised speech signal. Now the user can replay the signal and compare it with the original speech signal. For
visual comparison, the original speech signal and the reconstructed speech signal are depicted in both, the time and the
frequency domain. In addition, the fundamental frequency (pitch) contour is presented graphically.

The main advantage of the tutorial is it's interactive modality. The user can, for example, manipulate the fundamental frequency
contour, the number of prediction coefficients, the signal energy etc. and he can then hear the result of these manipulations.

Although the LPC vocoder is primarily a coding scheme, it can be optimally used as a model of the human speech production,
and this is the main purpose of the tutorial. The student easily identifies that human speech is concatenated of the vocal cord
signal (represented by the fundamental frequency signal) and the resonance characteristics of the mouth and nose cavity. It is
not only instructive but also exciting to study both, the visual and the audible variations of the fundamental frequency in
dependence of different speakers and emotions.

The complete programme is written in JAVA, which makes it platform-independent and accessible to the WWW.

The tutorial is foreseen for students of various disciplines like communication engineering, physics, linguistics, phonetics,
medicine, speech therapy a.s.o.. It requires some basic knowledge in signal processing and digital filtering. The student should
know how to read a time signal and a spectrogram.

We believe that the best way to understand the human speech production - in the scope of our tutorial - is to record the own
voice and to start with a playful manipulation of the fundamental frequency. This gives a feeling how stress, emotion, speech
dynamic and other characteristics are influenced by this frequency. Secondly, it is useful to vary the number of prediction
coefficients which represent the formant frequencies (i.e. the resonance frequencies) of the articulation tract. For comparison,
the unprocessed stored voice is helpful.

As mentioned, the tutorial is based on HTML pages and Java applets. The user downloads it from our WWW server with a
Netscape or Explorer browser. As to the voice recording, we need a special software. For audio input, the shareware
SoundBite of the Scrawl company is used. This tool is based on JNI (Java Native Interface) and requires the Netscape
browser 4.04 for Windows and the JDK 1.1 patch. For the audio output we use sun.audio, which is part of common
browsers. If there is no need (or interest) to record the own voice, no shareware is necessary.