Yingqing He
Hi there👋😋.
I am a Ph.D. student at HKUST supervised by Prof. Qifeng Chen .
My research focuses are text-to-video generation, multimodal generation, and controllable generation .
Welcome to any types of research collaboration and discussions!
Email /
Google Scholar /
Github /
LinkedIn
News
- [12/2024] 1 paper was accepted to AAAI 2025 .
- [10/2024] 1 paper was accepted to WACV 2025 .
- [08/2024] 1 paper was accepted to ECCV 2024 AI4VA Workshop .
- [07/2024] 1 paper was accepted to SIGGRAPH Asia 2024 .
- [07/2024] 1 paper was accepted to ECCV 2024 .
- [05/2024] We released a survey paper: LLMs Meet Multimodal Generation and Editing: A Survey .
- [03/2024] 1 paper was accepted to CVPR 2024 .
- [02/2024] 1 paper was accepted to TVCG 2024 .
- [01/2024] 2 papers were accepted to ICLR 2024 (including 1 Spotlight paper).
- [12/2023] 1 paper was accepted to AAAI 2024 .
- [11/2023] We released VideoCrafter 1 .
- [08/2023] 1 paper was accepted to SIGGRAPH Asia 2023 .
- [04/2023] We released VideoCrafter 0.9 .
- [08/2021] 1 paper was accepted to ACM MM 2021 as an Oral paper.
Your browser does not support the video tag.
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Xiaowei Chi ,
Aosong Cheng ,
Pengjun Fang ,
Yatian Wang ,
Zeyue Tian ,
Yingqing He ,
Zhaoyang Liu ,
Xingqun Qi ,
Jiahao Pan ,
Rongyu Zhang ,
Mengfei Li ,
Yanbing Jiang ,
Wei Xue ,
Wenhan Luo ,
Qifeng Chen
Shanghang Zhang ,
Qifeng Liu ,
Yike Guo ,
arXiv , 2024
Project page /
arXiv /
Github
MMTrail is a large-scale multi-modality video-language dataset with over 20M trailer clips, featuring high-quality multimodal captions that integrate context, visual frames, and background music, aiming to enhance cross-modality studies and fine-grained multimodal-language model training.
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models
Haonan Qiu ,
Zhaoxi Chen ,
Zhouxia Wang ,
Yingqing He ,
Menghan Xia ,
Ziwei Liu
arXiv , 2024
Project page /
arXiv /
Github
FreeTraj is a tuning-free method for trajectory-controllable video generation based on pre-trained video diffusion models.
LLMs Meet Multimodal Generation and Editing: A Survey
Yingqing He ,
Zhaoyang Liu ,
Jingye Chen ,
Zeyue Tian ,
Hongyu Liu ,
Xiaowei Chi ,
Runtao Liu ,
Ruibin Yuan ,
Yazhou Xing ,
Wenhai Wang ,
Jifeng Dai ,
Yong Zhang ,
Wei Xue ,
Qifeng Liu ,
Yike Guo ,
Qifeng Chen
arXiv , 2024
arXiv /
Github
This survey includes works of image, video, 3D, and audio generation and editing.
We emphasize the roles of LLMs on the generation and editing of these modalities.
We also includes works of multimodal agents and generative AI safety.
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts
Yue Ma * ,
Yingqing He * ,
Hongfa Wang ,
Andong Wang ,
Chenyang Qi ,
Chengfei Cai ,
Xiu Li ,
Zhifeng Li ,
Heung-Yeung Shum ,
Wei Liu ,
Qifeng Chen
AAAI , 2025
Project page /
arXiv /
Github
Your browser does not support the video tag.
Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
Yue Ma ,
Hongyu Liu ,
Hongfa Wang ,
Heng Pan ,
Yingqing He ,
Heng Pan ,
Junkun Yuan ,
Ailing Zeng ,
Chengfei Cai ,
Heung-Yeung Shum ,
Wei Liu ,
Qifeng Chen
SIGGRAPH Asia , 2024
Project page /
arXiv
Your browser does not support the video tag.
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
Yingqing He ,*
Menghan Xia* ,
Haoxin Chen,*
Xiaodong Cun ,
Yuan Gong,
Jinbo Xing ,
Yong Zhang ,
Xintao Wang ,
Chao Weng,
Ying Shan ,
Qifeng Chen
ECCV AI4VA Workshop , 2024  
Project page /
arXiv /
Github
A novel story-to-video pipeline with both structure and character controls, facilitating the generation of a vlog for a teddy bear.
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
Lanqing Guo * ,
Yingqing He * ,
Haoxin Chen ,
Menghan Xia ,
Xiaodong Cun ,
Yufei Wang ,
Siyu Huang,
Yong Zhang ,
Xintao Wang ,
Qifeng Chen ,
Ying Shan ,
Binhan Wen
ECCV , 2024
Project page /
arXiv /
Github
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Yazhou Xing * ,
Yingqing He * ,
Zeyue Tian,*
Xintao Wang ,
Qifeng Chen
CVPR , 2024
Project page /
arXiv /
Github
Your browser does not support the video tag.
Scalecrafter: Tuning-free higher-resolution visual generation with diffusion models
Yingqing He ,*
Shaoshu Yang,*
Haoxin Chen,
Xiaodong Cun ,
Menghan Xia ,
Yong Zhang ,
Xintao Wang ,
Ran He,
Qifeng Chen ,
Ying Shan
ICLR , 2024   (Spotlight)
Project page /
arXiv /
Github
Generating 16x higher-resolution images and 4x higher-resolution videos without any extra data and training effort.
Your browser does not support the video tag.
MagicStick: Controllable Video Editing via Control Handle Transformations
Yue Ma ,
Xiaodong Cun ,
Yingqing He ,
Chenyang Qi ,
Xintao Wang ,
Ying Shan ,
Xiu Li,
Qifeng Chen
WACV , 2025
Project page /
arXiv /
Github
Your browser does not support the video tag.
Freenoise: Tuning-free longer video diffusion via noise rescheduling
Haonan Qiu ,
Menghan Xia ,
Yong Zhang ,
Yingqing He ,
Xintao Wang ,
Ying Shan ,
Ziwei Liu
ICLR , 2024
Project page /
arXiv /
Github
Your browser does not support the video tag.
Follow your pose: Pose-guided text-to-video generation using pose-free videos
Yue Ma * ,
Yingqing He * ,
Xiaodong Cun ,
Xintao Wang ,
Siran Chen,
Ying Shan ,
Xiu Li,
Qifeng Chen
AAAI , 2024
Project page /
arXiv /
Github
Your browser does not support the video tag.
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
Jinbo Xing ,
Menghan Xia ,
Yuxin Liu,
Yuechen Zhang ,
Yong Zhang ,
Yingqing He ,
Hanyuan Liu ,
Haoxin Chen,
Xiaodong Cun ,
Xintao Wang ,
Ying Shan ,
Tien-Tsin Wong
TVCG , 2024
Project page /
arXiv
Given text description and video structure (depth), our approach can generate temporally coherent and high-fidelity videos. Its applications include dynamic 3d-scene-to-video creation, real-life scene to video, and video rerendering.
Your browser does not support the video tag.
TaleCrafter: Interactive Story Visualization with Multiple Characters
Yuan Gong,
Youxi Pang,
Xiaodong Cun ,
Menghan Xia ,
Yingqing He ,
Haoxin Chen,
Longyue Wang,
Yong Zhang ,
Xintao Wang ,
Ying Shan ,
Yujiu Yang
SIGGRAPH Asia , 2023
Project page /
arXiv /
Github
Videocrafter1: Open diffusion models for high-quality video generation
Haoxin Chen * ,
Menghan Xia * ,
Yingqing He * ,
Yong Zhang ,
Xiaodong Cun ,
Shaoshu Yang,
Jinbo Xing ,
Yaofang Liu,
Qifeng Chen ,
Xintao Wang ,
Chao Weng,
Ying Shan
arXiv , 2023
Project page /
arXiv /
Github
An open-sourced foundational text-to-video and image-to-video diffusion model for high-quality video generation.
Your browser does not support the video tag.
Latent Video Diffusion Models for High-Fidelity Long Video Generation
Yingqing He ,
Tianyu Yang ,
Yong Zhang ,
Ying Shan ,
Qifeng Chen
arXiv , 2022
Project page /
arXiv /
Github
Interpreting class conditional GANs with channel awareness
Yingqing He ,
Zhiyi Zhang,
Jiapeng Zhu,
Yujun Shen ,
Qifeng Chen
arXiv , 2022
Project page /
arXiv /
Github
Unsupervised portrait shadow removal via generative priors
Yingqing He ,*
Yazhou Xing,*
Tianjia Zhang,
Qifeng Chen
ACM MM , 2021   (Oral)
arXiv /
Github
we propose an unsupervised method for portrait shadow removal, leveraging the facial priors from StyleGAN2 .
Our approach also supports facial tattoo and watermark removal.
Academic Services
Conference Reviewer for CVPR, ICLR, SIGGRAPH Asia.
Journal Reviewer for TPAMI, IJCV, ACM Computing Surveys.
Webpage templete is borrowed from this