39.1 Producing Digital Talks for CD-ROM
We start by describing the development process for the DAGS'92 proceedings. Experiences we made there greatly influenced the design and development process of later multimedia proceedings.
To acquire the video footage for the digital talks, a procedure was developed for DAGS'92 that was refined for subsequent conference proceedings. During the conference, the speakers' presentations were videotaped, their overhead transparencies were copied, and their papers were collected. To make it easier to resynchronize transparencies and speech in the next editing step, we used two video cameras for videotaping; one focusing on the speaker and one on the projected transparencies (figure IV.22, step 1). Contrarily to our initial expectations, it turned out that the videotape focusing on the transparencies was far more useful. After the collection, all these materials were converted to digital form for computer-based processing (figure IV.22, step 2). Due to the space constraints of the CD-ROM we decided to deliver only the eight invited talks in full audio/visual format.
Figure IV.22 DAGS'92 development processDespite the huge storage provided by the CD technology, we knew that we could not fit the full eight one-and-a-half hour video tracks on a CD-ROM. However, most of speakers are usually standing next to the projector so that they can change transparencies, mark them, and point to them. The audience gets most of the information by looking at the transparencies. Therefore we decided to display a short video loop of each speaker by selecting a piece of the video with similar beginning and ending frames so that the loop transition would not be distracting to the viewers. User feedback confirmed us in this decision because little valuable information was lost this way; it turns out that about two minutes of video is sufficient to convey a sense of a speaker's appearance and mannerisms. Some users even preferred turning off the speaker video loop at all, to fully concentrate on the transparencies. Given the small size of the video capture screen, one can see the movement of the speaker and his/her overall appearance, but not tiny details like the movement of the lips. So, somewhat surprisingly, we found that most users could not distinguish a loop of a few minutes of the speaker video from the whole video.
The video loops were digitized with a low-end video capture board and compressed to provide efficient playback from a CD-ROM. The loops were kept small to enhance playback performance. Due to the low resolution of the 120 by 180 pixel QuickTime video window the user has to look very closely to notice the missing synchronization between lip movement and the voice track.
The audio track of each talk was digitized and then edited to remove pauses and noise words such as "umm"s and "ahh"s (figure IV.22, step 2). The edited talks were roughly half as long as the originals, and much more listenable. To improve the quality, we amplified most of them using the SoundEdit(TM) sound processing application.
The overhead transparencies were scanned and edited for clarity and contrast; they were also made smaller to fit into the talks window of the user interface. Unfortunately, after diminution, poorly handwritten transparencies were almost unreadable and had to be retyped.
Then, using the original videotapes with the transparencies as guides, we synchronized the transparencies to the edited audio tracks using the Adobe Premiere(TM) video editing program. The resulting "movie" reproduced the most important features of a talk, the speaker's words and transparencies, and preserved their temporal connection. These "movies" were indexed to allow random access to a list of primary topics, and to allow more sensitive linking between the papers and the talks.
Given the variety of the playback speeds of the commercial CD-ROM drives available, single-speed CD-ROMs and slow CPUs have a hard time displaying both the transparencies video and the speakers video loops on the screen at a comfortable speed. For that reason, we have given the user the option to stop the speaker video loop and replace it with a static color picture of the speaker.