scene determination (Rui et al., 1999). The latter or-
ganizes the clip into scenes, groups and shots. Thus,
a video clip is comprised of several scenes contain-
ing several groups, and a group itself consists of all
visually similar shots within a scene (Figure 5). As
a side effect of the scene determination algorithm,
shots can be ordered within a group according to
their group distance measure. This results in a mea-
sure for a group’s “best” shot, which the visual artist
will most likely use during performance. Further-
more, the media manager analyzes the motion of each
shot, which results in a camera movement classifica-
tion (pan/tilt/zoom/roll), and extracts the representa-
tive keyframes of each shot, which can be used for
browsing. If desired, the user can modify the auto-
matically generated video clip structure (e.g., by man-
ually changing shot boundaries) and add content de-
scriptions. For storage, the clip’s source file is not
modified, all editing information is stored exclusively
in the metadata. Thus, the original clip is kept ap-
plicable to all kinds of scenarios.
For browsing video clips, the user has the choice of
a temporal or a structural view. Particularly, the struc-
tural view gives an intuitive overview of the video
clip. If the user selects a shot and its destination
(i.e., a dedicated input port of a processing node),
the media manager streams the corresponding frames
to the rendering engine. In order to avoid long seek
times within a video clip, each clip is encoded with
keyframes at shot boundaries after the preparation
phase. If requested, the engine caches the frames,
which will remain available for random access later
on.
4.2 Interactive Non-Linear Editing
Besides better retrieval capabilities, the extracted
metadata of clips allows for a new form of video
footage utilization: Interactive NLE of video, i.e., the
(semi-)automatic rearrangement of a video clip’s in-
dividual shots in real-time. In order to align live visu-
als to music, our approach applies film music theory
in the reverse direction: The most popular film music
procedure is to conduct the music according to given
visual action points of a completely finished movie
(Gorbmann, 1987). A visual action point is usually
associated with a classical film cut, but it can also
be within a continuous shot (e.g., the beginning of
a pan) or refer to arbitrary types of dramatic events.
In our case, visual action points have to be created
in real-time for given “musical action points” result-
ing from audio analysis, for example extracted bar or
beat borders may enforce cuts. Following these rules,
the (short) clips of the dancers in Figure 1 have been
synchronized to the incoming beat and extrapolated
beat-durations by non-linearly stretching the clips be-
tween two beat boundaries.
In SOUNDIUM, the generation of visual action
points is realized in terms of dedicated process-
ing nodes for computer-assisted NLE. During per-
formance, the user has interactive control over the
selection of video footage and the node’s configu-
ration parameters. For instance, the user can as-
sign a whole video scene or multiple groups to the
NLE node, or tune editing parameters such as “cuts
per second”. The node then analyzes the associated
metadata and – according to its configuration – de-
cides which shots finally should be played, and how
fast and for how long. SOUNDIUM includes NLE
processing nodes implementing different editing tech-
niques (Eisenstein, 1994) ranging from the function-
ality given above (simulating visual action points) to
completely audio-independent editing.
5 THE DESIGN TREE
In our case, a design is a complete description of the
processing graph, including its nodes, value vectors,
and edges. On a more abstract level, a design directly
reflects the realization of an artistic idea. The artist’s
designs are stored in the design tree, a hierarchical
data structure, where each node contains information
about how the processing graph is to be modified in
order to realize a design. Changes to the system state
(by using the graphical management console) result in
modification of the processing graph and, if desired,
also in new nodes in the design tree.
5.1 Realization
In its simplest form, the design tree can be seen as a
multilevel undo/redo facility: All user actions manip-
ulating the system state are recorded and can be un-
done. These state manipulations are recorded as SL2
statements representing individual processing graph
changes. The user can decide to commit a design to
the design tree, where a new design node is created.
When a design is committed, the minimal sequence of
SL2 statements yielding the system state is computed
and called the normal form of a design node.
During the design process, several design nodes are
committed by the user in sequence with each node
representing a revision of a previous design (Figure 6-
b/c). This is similar to a file versioning system (Ced-
erqvist, 1993) that stores differences from one revi-
sion of a file to the next. Like in a versioning system,
the user can go back to any previous design (Figure
6-d) and start a new branch (Figure 6-e), exploring a
variant of a design. Thus, branching transforms the
linear sequence of designs into a tree.
A natural ordering of nodes by time (revisions)
takes place during the design process. However, this
GRAPP 2006 - COMPUTER GRAPHICS THEORY AND APPLICATIONS
236