30. Introduction

The advent of affordable multimedia has made possible the development of new forms of communication and interaction. Just as film was originally used to present stage plays, digital multimedia has initially been a new technology for dealing with older media using well-established, older techniques and methods. But just as film eventually spawned forms of story-telling worlds removed from the theater stage, we currently see multimedia to be put to uses that have been previously unimagined. Such uses take advantage of interactivity, networked communication, hypermedia, automatic media analysis, and other departures from the linear, start-to-finish, sender-to-receiver model of current video communication.

In the restricted domain of digital video manipulation, easy-to-use tools have been developed to enable what has been termed "virtual video editing," the editing of computer representations of video [Mac89]. Personal computer programs such as Adobe's Premiere [Pre93] allow users to edit video data in much the same way that popular word processors allow users to edit text. Video is represented by visual proxies, typically thumbnail images and graphic representations of audio waveforms. Users can employ a mouse to click on desired movie clips and drag them into place. VCR-style buttons can be used for playback, and a collection of other metaphorical tools (e.g. scissors, magnifying glass, trash can) are available.

In one sense these tools take the video editor back to the time before videotape, when film editors held the media in their hands and edited it without the intermediate presence of video decks and time codes. But the new digital systems offer advantages beyond a more direct interaction with the media. Thanks to random-access storage devices these systems let the editor manipulate many sections of video simultaneously, with quick jumps to any point in the source material. There is no penalty to repeated editing, since the digital information does not degrade. The user interface can be tailored to use common metaphors and standard commands for the computer platform in question. The result is an environment where casual experimentation is encouraged, and beginners can quickly produce acceptable results.

The direct manipulation nature of these systems, however, also limits user options. Some repetitive or complex functions can not be expressed with the provided tools. A user can visually select and delete a period of silence in an audio track, but in a pure direct manipulation interface there is no way to abstract that specific operation into a more general command ("if there is silence, delete audio data") that can be applied repetitively. The ability to evaluate conditions (e.g. "is this audio data silent", "is this a scene transition") is left to human eyes and ears, when the computer might be able to do the job more quickly or accurately. And the user is limited to the operations that the system designer considered important; an unusual function, or combination of functions may be completely out of reach, and no designer can imagine or implement all the functions that might prove to be useful.

Systems like Premiere exploit the high bandwidth of the visual computer interface to deliver a more concrete and approachable experience to its users. But this emphasis on the concrete comes at the expense of tasks that are by their nature abstract. A user can manually select a period of silence from a soundtrack, but there is no way to make the leap to a more general command ("if a segment of the audio track is silent, delete it"). Similarly, repetitive tasks, or unforeseen combinations of tasks, can not be readily automated.

We feel that the shortcomings of direct-manipulation systems are particularly costly in a field such as digital multimedia, where the range of interesting operations is undergoing rapid expansion and no "canned" software package can anticipate the user's needs. We therefore endorse an approach that combines direct-manipulation with computer programming, yielding systems which are flexible enough to tackle new tasks. We believe that such systems can be of particular use in multimedia authoring, research, and education. In this part we describe a prototype of such a system, called VideoScheme, and discuss a number of promising applications.