Yingqing He

Hi 👋😋. I am currently a Ph.D. student at HKUST supervised by Prof. Qifeng Chen.

My main research focus is text-to-video generation, video diffusion models and related downstream applications. I am also working at other AIGC-related research💖.

Welcome to any types of research collaboration and discussions! Our lab is hiring engineering-oriented research assistant (RA). If you would like to apply, feel free to reach out with your CV!

Email  /  Google Scholar  /  Github  /  LinkedIn

profile photo

News

- [10/2024] Invited talk at ECCV 2024 VENUE (Recent Advances in Video Content Understanding and Generation) Tutorial on the topic of "LLMs Meet image and video generation". The slides are available at our Github repo.
- [08/2024] Invited talk at Tencent Hunyuan on the topic of "LLMs Meet Multimodal Generation and Editing"
- [08/2024] 1 paper was accepted to ECCV 2024 AI4VA Workshop.
- [07/2024] 1 paper was accepted to SIGGRAPH Asia 2024.
- [07/2024] 1 paper was accepted to ECCV 2024.
- [05/2024] We released a survey paper: LLMs Meet Multimodal Generation and Editing: A Survey.
- [03/2024] Invited talk at Meituan on the topic of "Recent Advance of Text-to-Video Generation".
- [03/2024] 1 paper was accepted to CVPR 2024.
- [02/2024] 1 paper was accepted to TVCG 2024.
- [01/2024] 2 papers were accepted to ICLR 2024 (including 1 Spotlight paper).
- [12/2023] 1 paper was accepted to AAAI 2024.
- [11/2023] Released VideoCrafter 1.
- [08/2023] 1 paper was accepted to SIGGRAPH Asia 2023.
- [06/2023] Invited talk at CVPR 2023 LOVEU Workshop on the topic of "Crafting Your Videos: From Unconditional to
                            Controllable Video Diffusion Models"
.
- [04/2023] Released VideoCrafter 0.9.
- [08/2021] 1 paper was accepted to ACM MM 2021 as an Oral paper.

Publications

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Xiaowei Chi, Aosong Cheng, Pengjun Fang, Yatian Wang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen Shanghang Zhang, Qifeng Liu, Yike Guo,
arXiv, 2024
Project page / arXiv / Github

MMTrail is a large-scale multi-modality video-language dataset with over 20M trailer clips, featuring high-quality multimodal captions that integrate context, visual frames, and background music, aiming to enhance cross-modality studies and fine-grained multimodal-language model training.

FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models
Haonan Xiu, Zhaoxi Chen, Zhouxia Wang, Yingqing He, Menghan Xia, Ziwei Liu
arXiv, 2024
Project page / arXiv / Github

FreeTraj is a tuning-free method for trajectory-controllable video generation based on pre-trained video diffusion models.

LLMs Meet Multimodal Generation and Editing: A Survey
Yingqing He, Zhaoyang Liu, Jingye Chen, Zeyue Tian, Hongyu Liu, Xiaowei Chi, Runtao Liu, Ruibin Yuan, Yazhou Xing, Wenhai Wang, Jifeng Dai, Yong Zhang, Wei Xue, Qifeng Liu, Yike Guo, Qifeng Chen
arXiv, 2024
arXiv / Github

This survey includes works of image, video, 3D, and audio generation and editing. We emphasize the roles of LLMs on the generation and editing of these modalities. We also includes works of multimodal agents and generative AI safety.

Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts
Yue Ma*, Yingqing He*, Hongfa Wang, Andong Wang, Chenyang Qi, Chengfei Cai, Xiu Li, Zhifeng Li, Heung-Yeung Shum, Wei Liu, Qifeng Chen
arXiv, 2024
Project page / arXiv / Github

Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Heng Pan, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng Chen
SIGGRAPH Asia, 2024
Project page / arXiv

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
Yingqing He, Menghan Xia, Haoxin Chen,   Xiaodong Cun, Yuan Gong,   Jinbo Xing, Yong Zhang, Xintao Wang, Chao Weng,   Ying Shan, Qifeng Chen
ECCV AI4VA Workshop, 2024  
Project page / arXiv / Github

A novel story-to-video pipeline with both structure and character controls, facilitating the generation of a vlog for a teddy bear.

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
Lanqing Guo*, Yingqing He*, Haoxin Chen, Menghan Xia, Xiaodong Cun, Yufei Wang, Siyu Huang,   Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Binhan Wen
ECCV, 2024
Project page / arXiv / Github

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Yazhou Xing*, Yingqing He*, Zeyue Tian,   Xintao Wang, Qifeng Chen
CVPR, 2024
Project page / arXiv / Github

Scalecrafter: Tuning-free higher-resolution visual generation with diffusion models
Yingqing He*, Shaoshu Yang,   Haoxin Chen,   Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He,   Qifeng Chen, Ying Shan
ICLR, 2024   (Spotlight)
Project page / arXiv / Github

Generating 16x higher-resolution images and 4x higher-resolution videos without any extra data and training effort.

MagicStick: Controllable Video Editing via Control Handle Transformations
Yue Ma, Xiaodong Cun, Yingqing He, Chenyang Qi, Xintao Wang, Ying Shan, Xiu Li,   Qifeng Chen
arXiv, 2023
Project page / arXiv / Github

Freenoise: Tuning-free longer video diffusion via noise rescheduling
Haonan Xiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu
ICLR, 2024
Project page / arXiv / Github

Follow your pose: Pose-guided text-to-video generation using pose-free videos
Yue Ma*, Yingqing He*, Xiaodong Cun, Xintao Wang, Siran Chen,   Ying Shan, Xiu Li,   Qifeng Chen
AAAI, 2024
Project page / arXiv / Github

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
Jinbo Xing, Menghan Xia, Yuxin Liu,   Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen,   Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong
TVCG, 2024
Project page / arXiv

Given text description and video structure (depth), our approach can generate temporally coherent and high-fidelity videos. Its applications include dynamic 3d-scene-to-video creation, real-life scene to video, and video rerendering.

TaleCrafter: Interactive Story Visualization with Multiple Characters
Yuan Gong,   Youxi Pang,   Xiaodong Cun, Menghan Xia, Yingqing He, Haoxin Chen,   Longyue Wang,   Yong Zhang, Xintao Wang, Ying Shan, Yujiu Yang
SIGGRAPH Asia, 2023
Project page / arXiv / Github

Videocrafter1: Open diffusion models for high-quality video generation
Haoxin Chen*, Menghan Xia*, Yingqing He*, Yong Zhang, Xiaodong Cun, Shaoshu Yang,   Jinbo Xing, Yaofang Liu,   Qifeng Chen, Xintao Wang, Chao Weng,   Ying Shan
arXiv, 2023
Project page / arXiv / Github

An open-sourced foundational text-to-video and image-to-video diffusion model for high-quality video generation.

Latent Video Diffusion Models for High-Fidelity Long Video Generation
Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, Qifeng Chen
arXiv, 2022
Project page / arXiv / Github

Interpreting class conditional GANs with channel awareness
Yingqing He, Zhiyi Zhang,   Jiapeng Zhu,   Yujun Shen, Qifeng Chen
arXiv, 2022
Project page / arXiv / Github

Unsupervised portrait shadow removal via generative priors
Yingqing He*, Yazhou Xing,   Tianjia Zhang,   Qifeng Chen
ACM MM, 2021   (Oral)
arXiv / Github

we propose an unsupervised method for portrait shadow removal, leveraging the facial priors from StyleGAN2. Our approach also supports facial tattoo and watermark removal.

Academic Services

Conference Reviewer for CVPR, SIGGRAPH Asia.
Journal Reviewer for TPAMI, IJCV, ACM Computing Surveys.


Webpage templete is borrowed from this