My main research focus is text-to-video generation, video diffusion models and related downstream applications.
I am also working at other AIGC-related research💖.
Welcome to any types of research collaboration and discussions!
Our lab is hiring engineering-oriented research assistant (RA). If you would like to apply, feel free to reach out with your CV!
- [10/2024] Invited talk at ECCV 2024 VENUE (Recent Advances in Video Content Understanding and Generation) Tutorial
on the topic of "LLMs Meet image and video generation".
The slides are available at our Github repo.
- [08/2024] Invited talk at Tencent Hunyuan
on the topic of "LLMs Meet Multimodal Generation and Editing"
- [08/2024] 1 paper was accepted to ECCV 2024 AI4VA Workshop.
- [07/2024] 1 paper was accepted to SIGGRAPH Asia 2024.
- [07/2024] 1 paper was accepted to ECCV 2024.
- [05/2024] We released a survey paper: LLMs Meet Multimodal Generation and Editing: A Survey.
- [03/2024] Invited talk at Meituan on the topic of "Recent Advance of Text-to-Video Generation".
- [03/2024] 1 paper was accepted to CVPR 2024.
- [02/2024] 1 paper was accepted to TVCG 2024.
- [01/2024] 2 papers were accepted to ICLR 2024 (including 1 Spotlight paper).
- [12/2023] 1 paper was accepted to AAAI 2024.
- [11/2023] Released VideoCrafter 1.
- [08/2023] 1 paper was accepted to SIGGRAPH Asia 2023.
- [06/2023] Invited talk at CVPR 2023 LOVEU Workshop on the topic of "Crafting Your Videos: From Unconditional to
Controllable Video Diffusion Models".
- [04/2023] Released VideoCrafter 0.9.
- [08/2021] 1 paper was accepted to ACM MM 2021 as an Oral paper.
MMTrail is a large-scale multi-modality video-language dataset with over 20M trailer clips, featuring high-quality multimodal captions that integrate context, visual frames, and background music, aiming to enhance cross-modality studies and fine-grained multimodal-language model training.
This survey includes works of image, video, 3D, and audio generation and editing.
We emphasize the roles of LLMs on the generation and editing of these modalities.
We also includes works of multimodal agents and generative AI safety.
Given text description and video structure (depth), our approach can generate temporally coherent and high-fidelity videos. Its applications include dynamic 3d-scene-to-video creation, real-life scene to video, and video rerendering.
we propose an unsupervised method for portrait shadow removal, leveraging the facial priors from StyleGAN2.
Our approach also supports facial tattoo and watermark removal.
Academic Services
Conference Reviewer for CVPR, SIGGRAPH Asia.
Journal Reviewer for TPAMI, IJCV, ACM Computing Surveys.