Image and Video Generation via Deep Learning

In computer vision, image and video generation aims at synthesising photorealistic visual data from a random noise or based on certain conditions, such as a class label or other data. The former is called unconditional generation, while the latter is named conditional generation. Through the advanced deep learning strategies, image and video generation has grasped widespread attention over the past few years, and its practical applications have been widely expanded, e.g., movie editing, style transfer, and face forensics.

Recent advances, especially Generative Adversarial Networks (GANs), have made remarkable success in various image and video generation tasks, showing the strong ability of deep neural networks to capture high dimensional distributions of visual data. Despite their immense success, one could still observe the gaps between the real and generated images in certain cases, which may be distinguished through human eyes directly or by other tools like frequency analysis..

Liming’s research aims to provide attempts to various unresolved problems in high-fidelity image and video generation, subtopics of which include image-to-image translation, face/human manipulation and editing, multi-modal synthesis, attention-based generative models, 3D generation, and neural rendering. Notably, Some of his publications target training state-of-the-art generative models (e.g., StyleGAN2) with better synthesis quality yet with fewer data..

Click on the video below to view a presentation on the research project!