Video Future Frames Generation using deep Encoder-decoder based Hierarchical Network
[Code] [Report]

Netwrok architecture: Extension to baseline model with the addition of skip connections, encoder-decoder blocks are residual blocks.

Abstract

Video frames generation is a challenging task due to the wide uncertainty in the nature of the problem. In this lab project, we approach the task of the video prediction using the model discussed in the lab, based on the Video Ladder Network. In this work, the moving Moving MNIST (MMNIST) and the KTH Action dataset are being used to perform the experiments. We present the effects of various design choices in the model architectures and the training settings. The final results achieved on both datasets are realistic and coherent with the given context frames, indicating the strong learning capability of the network.

Results on MMNIST

Results on KTH Action