34.3 Cut Detection
One important use of video scheme is the implementation of an automatic system for dividing digital video into segments, by detecting the "cuts" that divide one shot from another. There is no perfect definition of a "cut," but generally the term refers to a sharp discontinuity in a video stream, such as the break between two recording bursts in unprocessed video, or the point where two clips were concatenated in the editing process. Once cuts are found the segments they define can be represented by a subset of the segment's data (e.g. the first and last frames), since the continuity of the segment ensures a great deal of information redundancy. This reduction of a potentially long segment to a few frames is a significant boon to a number of applications, such as indexing, logging, navigation, and editing, since those tasks may be performed on a greatly reduced set of data.
A number of algorithms have been proposed for automatic cut detection, and one of the advantages of VideoScheme is that such algorithms can be implemented as compactly as their mathematical formulation. For example, a simple measure of visual continuity is a sum of pointwise differences in gray value or color. Such a test can be performed by the following fragment of VideoScheme code:
(adiff frame1 frame2 delta) ; subtract arrays of gray value (aabs delta) ; compute absolute difference values (atotal delta) ; sum differencesScene changes trigger large pointwise differences, but this measure is also very sensitive to camera motion and zooming, which may change every pixel without introducing a new scene. Refinements of this algorithm, such as one that counts the differences that exceed a threshold, have a similar weakness. So there appears to be some value in a test that is not so spatially sensitive, such as the difference between summed gray values:
(- (atotal frame1) (atotal frame2)) ; subtract summed gray valuesThis measure is insensitive to camera pans and zooms, but it is also insensitive to actual cuts, since the average gray level may not change dramatically across the cut. A more reliable indicator is the gray level or color histogram. Using VideoScheme's built in color histogram function we can easily compute this measure:
(get-color-histogram64 frame1 histogram1) ; compute 64-bucket color histograms (get-color-histogram64 frame2 histogram2) (adiff histogram1 histogram2 delta) ; subtract histograms (aabs delta) ; compute absolute differences (atotal delta) ; sum differencesThis test can be further refined, by breaking each frame into a number of sub-frames and discarding the ones with above-median changes, or counting the number of changes that exceed a threshold; either modification is quickly implemented in VideoScheme, and each makes the algorithm more robust against local phenomena such as object motion.
While histogram comparison is widely considered a robust solution to detection of simple camera breaks, the general problem remains a fertile area for new approaches. In recent years novel algorithms have been proposed for detecting gradual transitions, and for using motion-sensitive measures in conjunction with a projection detecting filter [Ued91] [Nag92] [Ots93].
Nagasaka and Tanaka have investigated automatic cut detection algorithms, obtaining the best results with a test that measures the differences in color distributions between adjacent frames [Nag92]. Following their algorithm we can write a function to compute the normalized difference between two histograms:
(define histogram-difference (lambda (hist1 hist2) (let ((hist-diff (cons-array 0 'long))) ; subtract the two histograms (adiff hist1 hist2 hist-diff) ; square the difference (atimes hist-diff hist-diff hist-diff) ; normalize by one of the histogram arrays (aquotient hist-diff hist1 hist-diff) ; sum the squared, normalized differences (atotal hist-diff))))This function makes use of VideoScheme's built-in array functions to subtract, square, normalize, and sum the histogram differences. We can compute the histograms themselves using a built-in function, making it a simple matter to compute the visual continuity at any point:
(define full-frame-diff (lambda (movie trackno time1 time2) (let ((pixels (cons-array 0 'long)) (hist1 (cons-array 64 'long)) (hist2 (cons-array 64 'long))) ; get the histogram for one frame (get-video-frame movie trackno time1 pixels) (get-color-histogram64 pixels hist1) ; get the histogram for another (get-video-frame movie trackno time2 pixels) (get-color-histogram64 pixels hist2) ; compare the histograms (histogram-difference hist1 hist2))))Nagasaka and Tanaka found this function to be sensitive to momentary image noise, which typically affected only parts of the image but created undesirable spikes in the color continuity. They eliminated this effect by dividing the frames into 16 subframes, comparing the subframe histograms, and discarding the 8 highest difference totals.
Figure III.5 Comparison of subframes for Nagasaka-Tanaka-AlgorithmWe can implement this improved algorithm in VideoScheme:
(define nagasaka-tanaka-diff (lambda (movie trackno time1 time2) (let ((pixels1 (cons-array 0 'long)) (pixels2 (cons-array 0 'long)) (sub-pixels (cons-array 0 'long)) (hist1 (cons-array 64 'long)) (hist2 (cons-array 64 'long)) (diffs (cons-array 16 'long)) (frame1 nil) (frame2 nil) (index 0)) ; get the two frames in question (set! frame1 (get-video-frame movie trackno time1 pixels1)) (set! frame2 (get-video-frame movie trackno time2 pixels2)) (set! index 0) (while (< index 16) ; histogram one 16th of frame1 (get-sub-frame16 frame1 index sub-pixels) (get-color-histogram64 sub-pixels hist1) ; histogram one 16th of frame2 (get-sub-frame16 frame2 index sub-pixels) (get-color-histogram64 sub-pixels hist2) ; remember the difference (aset diffs index (histogram-difference hist1 hist2)) (set! index (+ index 1))) ; order the subframe differences and discard the 8 highest ones (asort diffs) (asetdim diffs 8) ; total the remaining differences (atotal diffs))))A number of applications can be built using this measurement of visual continuity. A simple function can search a movie for the beginning of the next cut:
(define next-cut (lambda (movie trackno time) (let ((diff 0)) (while (and (< diff 10000) (< time (get-movie-duration movie))) (set! diff (nagasaka-tanaka-diff movie trackno time (+ time 0.1))) (set! time (+ time 0.1))) time)))Splitting movies by cutWe can modify the split-movie function presented earlier to split a movie on scene boundaries rather at a fixed interval:
(define split-movie-by-cut (lambda (movie trackno) (let ((time 0.0) (cut 0.0)) (while (< time (get-movie-duration movie)) ; find the next cut (set! cut (next-cut movie trackno time)) ; copy up to the next cut (copy-movie-clip movie time (- cut time)) ; paste the segment into a new movie (paste-movie-clip (new-movie) 0.0 0.0) (set! time cut)))))The results of executing the split-movie-by-cut function on a fifteen second TV commercial are shown in figure III.3. The movie "Fast News 80 x 60" has been split into nine segments, seven of which are shown. In one case the cut-detection algorithm has performed better than the naked eye: the cut between segment "Untitled-7" and "Untitled-8" is almost undetectable when the movie is viewed at normal speed, but close examination and the Nagasaka-Tanaka algorithm reveal the cut.
Figure III.3 Results of split-movie-by-cutUsing other knowledge of how video is sometimes structured, we can detect even higher level boundaries, such as television commercials (which can be characterized by scene changes exactly 15, 30, or 60 seconds apart). We can also detect common editing idioms: the expression (nagasaka-tanaka-diff movie track time (next-cut movie track time)) evaluates the visual continuity between the frames that bracket a cut. A high degree of continuity suggests that the editor is cutting back and forth between two video segments, for example footage of two different characters speaking.