Yingqing He

Hi there👋😋.
I am a Ph.D. student at HKUST supervised by Prof. Qifeng Chen.

My research focuses are text-to-video generation, multimodal generation, and controllable generation.

Email: yhebm [at] connect [dot] ust [dot] hk

Google Scholar / Github / LinkedIn / Twitter / Rednote

News

- [11/2025] 1 paper was accepted to AAAI 2026 as an Oral paper.
- [09/2025] 2 paper was accepted to ICCV 2025:
- [03/2025] 1 paper was accepted to CVPR 2025:
- [12/2024] 1 paper was accepted to AAAI 2025.
- [10/2024] 1 paper was accepted to WACV 2025.
- [08/2024] 1 paper was accepted to ECCV 2024 AI4VA Workshop.
- [07/2024] 1 paper was accepted to SIGGRAPH Asia 2024.
- [07/2024] 1 paper was accepted to ECCV 2024.
- [05/2024] We released a survey paper: LLMs Meet Multimodal Generation and Editing: A Survey.
- [03/2024] 1 paper was accepted to CVPR 2024.
- [02/2024] 1 paper was accepted to TVCG 2024.
- [01/2024] 2 papers were accepted to ICLR 2024 (including 1 Spotlight paper).
- [12/2023] 1 paper was accepted to AAAI 2024.
- [11/2023] We released VideoCrafter 1.
- [08/2023] 1 paper was accepted to SIGGRAPH Asia 2023.
- [04/2023] We released VideoCrafter 0.9.
- [08/2021] 1 paper was accepted to ACM MM 2021 as an Oral paper.

Talks

- [12/2024] Invited talk at CSIG2024 (第二十届中国图象图形学学会青年科学家会议) in Hangzhou.
- [10/2024] Invited talk at ECCV 2024 VENUE (Recent Advances in Video Content Understanding and Generation) Tutorial on the topic of "LLMs Meet image and video generation". The slides are available at our Github repo Awesome-LLMs-meet-Multimodal-Generation.
- [08/2024] Invited talk at Tencent Hunyuan on the topic of "LLMs Meet Multimodal Generation and Editing"
- [03/2024] Invited talk at Meituan on the topic of "Recent Advance of Text-to-Video Generation".
- [06/2023] Invited talk at CVPR 2023 LOVEU Workshop on the topic of "Crafting Your Videos: From Unconditional to
Controllable Video Diffusion Models".

Publications

	VideoTuna: A Powerful Toolkit for Video Generation with Model Fine-Tuning and Post-Training Yingqing He, Yazhou Xing, Zhefan Rao, Haoyu Wu, Zhaoyang Liu, Jingye Chen, Pengjun Fang, Jiajun Li, Liya Ji, Runtao Liu, Xiaowei Chi, Yang Fei, Guocheng Shao, Yue Ma, Qifeng Chen Github, 2024 Nov Project page / Github VideoTuna is the first repo that integrates multiple AI video generation models for text-to-video, image-to-video, text-to-image generation for fine-tuning and post-training (to the best of our knowledge). Additionally, VideoTuna provides a comprehensive pipeline in video generation, including pre-training, continuous training, post-training (alignment), and fine-tuning.
	VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE Yazhou Xing^, Yang Fei^, *Yingqing He^†*, Jingye Chen, Jiaxin Xie, Xiaowei Chi, Qifeng Chen^† ICCV 2025* Project page / arXiv / Github VideoVAE+ is a state-of-the-art Video VAE model that can encode and decode video clips with large motion and high definition.
	ModelGrow: Continual Text-to-Video Pre-training with Model Expansion and Language Understanding Enhancement Zhefan Rao^, Liya Ji^, Yazhou Xing, Runtao Liu, Zhaoyang Liu, Jiaxin Xie, Ziqiao Peng, Yingqing He^†, Qifeng Chen^† arXiv, 2024 Dec Project page / arXiv / Github ModelGrow is a method that scales the model capacity and enhances the language understanding of text-to-video models during continous Pre-training.
	VideoDPO: Omni-Preference Alignment for Video Diffusion Generation Runtao Liu^, Haoyu Wu^, Ziqiang Zheng, Chen Wei, Yingqing He, Renjie Pi, Qifeng Chen^† CVPR, 2025 Project page / arXiv / Github
	HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts Xinyu Liu, Yingqing He, Lanqing Guo, Xiang Li, Bu Jin, Peng Li, Yan Li, Chi-Min Chan, Qifeng Chen, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo arXiv, 2024 Sep Project page / arXiv / Github
	MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions Xiaowei Chi, Aosong Cheng, Pengjun Fang, Yatian Wang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen Shanghang Zhang, Qifeng Liu, Yike Guo, arXiv, 2024 Project page / arXiv / Github MMTrail is a large-scale multi-modality video-language dataset with over 20M trailer clips, featuring high-quality multimodal captions that integrate context, visual frames, and background music, aiming to enhance cross-modality studies and fine-grained multimodal-language model training.
	FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models Haonan Qiu, Zhaoxi Chen, Zhouxia Wang, Yingqing He, Menghan Xia, Ziwei Liu arXiv, 2024 Project page / arXiv / Github FreeTraj is a tuning-free method for trajectory-controllable video generation based on pre-trained video diffusion models.
	LLMs Meet Multimodal Generation and Editing: A Survey Yingqing He, Zhaoyang Liu, Jingye Chen, Zeyue Tian, Hongyu Liu, Xiaowei Chi, Runtao Liu, Ruibin Yuan, Yazhou Xing, Wenhai Wang, Jifeng Dai, Yong Zhang, Wei Xue, Qifeng Liu, Yike Guo, Qifeng Chen arXiv, 2024 arXiv / Github This survey includes works of image, video, 3D, and audio generation and editing. We emphasize the roles of LLMs on the generation and editing of these modalities. We also includes works of multimodal agents and generative AI safety.
	Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts Yue Ma^, Yingqing He^, Hongfa Wang, Andong Wang, Chenyang Qi, Chengfei Cai, Xiu Li, Zhifeng Li, Heung-Yeung Shum, Wei Liu, Qifeng Chen AAAI, 2025 Project page / arXiv / Github
	Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation Yue Ma^/a>, Hongyu Liu^, Hongfa Wang^, Heng Pan^, Yingqing He, Heng Pan, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng Chen SIGGRAPH Asia, 2024 Project page / arXiv Github
	Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation Yingqing He,^* Menghan Xia^, Haoxin Chen,^ Xiaodong Cun, Yuan Gong, Jinbo Xing, Yong Zhang, Xintao Wang, Chao Weng, Ying Shan, Qifeng Chen ECCV AI4VA Workshop, 2024 Project page / arXiv / Github A novel story-to-video pipeline with both structure and character controls, facilitating the generation of a vlog for a teddy bear.
	Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation Lanqing Guo^, Yingqing He^, Haoxin Chen, Menghan Xia, Xiaodong Cun, Yufei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Binhan Wen ECCV, 2024 Project page / arXiv / Github
	Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners Yazhou Xing^, Yingqing He^, Zeyue Tian,^* Xintao Wang, Qifeng Chen CVPR, 2024 Project page / arXiv / Github
	Scalecrafter: Tuning-free higher-resolution visual generation with diffusion models Yingqing He,^* Shaoshu Yang,^* Haoxin Chen, Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He, Qifeng Chen, Ying Shan ICLR, 2024 (Spotlight) Project page / arXiv / Github Generating 16x higher-resolution images and 4x higher-resolution videos without any extra data and training effort.
	MagicStick: Controllable Video Editing via Control Handle Transformations Yue Ma, Xiaodong Cun, Yingqing He, Chenyang Qi, Xintao Wang, Ying Shan, Xiu Li, Qifeng Chen WACV, 2025 Project page / arXiv / Github
	Freenoise: Tuning-free longer video diffusion via noise rescheduling Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu ICLR, 2024 Project page / arXiv / Github
	Follow your pose: Pose-guided text-to-video generation using pose-free videos Yue Ma^, Yingqing He^, Xiaodong Cun, Xintao Wang, Siran Chen, Ying Shan, Xiu Li, Qifeng Chen AAAI, 2024 Project page / arXiv / Github
	Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance Jinbo Xing, Menghan Xia, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong TVCG, 2024 Project page / arXiv Given text description and video structure (depth), our approach can generate temporally coherent and high-fidelity videos. Its applications include dynamic 3d-scene-to-video creation, real-life scene to video, and video rerendering.
	TaleCrafter: Interactive Story Visualization with Multiple Characters Yuan Gong, Youxi Pang, Xiaodong Cun, Menghan Xia, Yingqing He, Haoxin Chen, Longyue Wang, Yong Zhang, Xintao Wang, Ying Shan, Yujiu Yang SIGGRAPH Asia, 2023 Project page / arXiv / Github
	Videocrafter1: Open diffusion models for high-quality video generation Haoxin Chen^, Menghan Xia^, Yingqing He^, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng, Ying Shan arXiv*, 2023 Project page / arXiv / Github An open-sourced foundational text-to-video and image-to-video diffusion model for high-quality video generation.
	Latent Video Diffusion Models for High-Fidelity Long Video Generation Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, Qifeng Chen arXiv, 2022 Project page / arXiv / Github
	Interpreting class conditional GANs with channel awareness Yingqing He, Zhiyi Zhang, Jiapeng Zhu, Yujun Shen, Qifeng Chen arXiv, 2022 Project page / arXiv / Github
	Unsupervised portrait shadow removal via generative priors Yingqing He,^* Yazhou Xing,^* Tianjia Zhang, Qifeng Chen ACM MM, 2021 (Oral) arXiv / Github we propose an unsupervised method for portrait shadow removal, leveraging the facial priors from StyleGAN2. Our approach also supports facial tattoo and watermark removal.

Academic Services

Conference Reviewer for CVPR, ICLR, SIGGRAPH Asia.
Journal Reviewer for TPAMI, IJCV, ACM Computing Surveys.

Webpage templete is borrowed from this