Diverse Generation from a Single Video Made Possible

Supplementary Material

This page contains many videos, please give it a minute to load
If a video does not appear - please Refresh (F5)

Ablation of auxiliary channel used for analogies

RGB Only (no auxiliary) largely fails or results in major "smears" (see e.g., lava in 5th row).
Only Optical-Flow-Norm (magnitude; no quantization) works better but fails to map the different dynamic types (see e.g., water type in 2nd row)
Quantized Normed OF (our results, last column) finds a good mapping.
LayoutAppearanceRGB Only
(No Auxiliary)
RGB + OF Norm
(No Quantization)
Full Results
(OF + Quantization)
Back to Top

Comparison to Recycle-GAN [Bansal et al. 2018]

Note Recycle-GAN (3rd column) either fail to converge on correct appearance, or converges to the input-appearance-video.
(Runtimes: Recycle-GAN - 16 minutes per 2 outputs (50 epochs, 20 seconds per epoch). Ours - 1 minute per output).
We trained Recycle-GAN with several learning rates and for many epoch. We chose what looked like the best results.
LayoutAppearanceRecycle-GANOurs

Sketch-to-Video Comparison:
(Runtime: Runtime: 66 minutes (200 epochs, 20 secs per epoch), Ours - 1 minute)
LayoutOursRecycle-GANLayoutOursRecycle-GAN

Recycle-GAN training evolution

Note the following behvaiour: on early epochs, the spatio-temporal layout is that of the content video, but the appearance is far from similar to the style video, because it takes time for the model to converge to the correct appearance. As the training evolves, Recycle-GAN tends to overfit the appearance of the style video, but also its spatio-temporal layout.
Therefore, the overall result is a video which is very similar to the style video, with only slight motions similar to the content video.
On top of that, the overall resolution is worse than that of our output.
LayoutEpoch 1Epoch 25Epoch 50Epoch 100Epoch 200Appearance

Back to Top