Awesome-LLMs-meet-Multimodal-Generation

LLMs Meet Multimodal Generation and Editing: A Survey

๐Ÿค— Introduction

</p>

๐Ÿ“‹ Contents

๐Ÿ’˜ Tips

๐Ÿ“ Multimodal Generation

Image Generation

๐Ÿ”… LLM-based

Non-LLM-based (Clip/T5)

Datasets

Video Generation

๐Ÿ”… LLM-based

Non-LLM-based

Datasets

3D Generation

๐Ÿ”… LLM-based

Non-LLM-based (Clip/T5)

Datasets

Audio Generation

๐Ÿ”… LLM-based

Non-LLM-based

Generation with Multiple Modalities

๐Ÿ”… LLM-based

Non-LLM-based

๐Ÿ“ Multimodal Editing

Image Editing

๐Ÿ”… LLM-based

Non-LLM-based (Clip/T5)

Video Editing

๐Ÿ”… LLM-based

Non-LLM-based (Clip/T5)

3D Editing

๐Ÿ”… LLM-based

Non-LLM-based (Clip/T5)

Audio Editing

๐Ÿ”… LLM-based

Non-LLM-based (Clip/T5)

๐Ÿ“ Multimodal Agents

๐Ÿ“ Multimodal Understanding with LLMs

Multiple modalities

Image Understanding

Video Understanding

3D Understanding

Audio Understanding

๐Ÿ“ Multimodal LLM Safety

Attack

Defense and Detect

Alignment

Datasets

3D, Video and Audio Safety

๐Ÿ“ Related Surveys

LLM

Vision

๐Ÿ‘จโ€๐Ÿ’ป Team

Here is the list of our contributors in each modality of this repository.

Modality/Task Contributors
Image Generation Jingye Chen, Xiaowei Chi, Yingqing He
Video Generation Yingqing He, Xiaowei Chi, Jingye Chen
Image and Video Editing Yazhou Xing
3D Generation and Editing Hongyu Liu
Audio Generation and Editing Zeyue Tian, Ruibin Yuan
LLM Agent Zhaoyang Liu
Safety Runtao Liu
Leaders Yingqing He, Zhaoyang Liu

๐Ÿ˜‰ Citation

If you find this work useful in your research, Please cite the paper as below:

@article{he2024llms,
    title={LLMs Meet Multimodal Generation and Editing: A Survey},
    author={He, Yingqing and Liu, Zhaoyang and Chen, Jingye and Tian, Zeyue and Liu, Hongyu and Chi, Xiaowei and Liu, Runtao and Yuan, Ruibin and Xing, Yazhou and Wang, Wenhai and Dai, Jifeng and Zhang, Yong and Xue, Wei and Liu, Qifeng and Guo, Yike and Chen, Qifeng},
    journal={arXiv preprint arXiv:2405.19334},
    year={2024},
}

โญ๏ธ Star History

Star History Chart