<img src="assets/fig.jpg" width=300"">
</p>
ctrl + F
and then type the author name. The dropdown list of authors will automatically expand when searching.customization
, iteractive
, human motion generation
tokenizer
. (More tags are ongoing)I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models (12 Feb 2025)
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning (18 Dec 2024)
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models (2 Dec 2024)
Cosmos Tokenizer: A suite of image and video neural tokenizers (06 Nov 2024)
tokenizer
[ICLR 2025 Spotlight] Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance (29 Oct 2024)
ElasticTok: Adaptive Tokenization for Image and Video (10 Oct 2024)
tokenizer
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation (10 Oct 2024)
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation (6 Sep 2024)
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation (13 Jun 2024)
tokenizer
InstantUnify: Integrates Multimodal LLM into Diffusion Models (Aug 2024)
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation (22 Aug 2024)
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions (11 Jun 2024)
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text (11 Jun 2024)
Open-World Human-Object Interaction Detection via Multi-modal Prompts (11 Jun 2024)
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? (11 Jun 2024)
An Image is Worth 32 Tokens for Reconstruction and Generation (11 Jun 2024)
TRINS: Towards Multimodal Language Models that Can Read (10 Jun 2024)
[LlamaGen] Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation (10 Jun 2024)
Chameleon: Mixed-Modal Early-Fusion Foundation Models (16 May 2024)
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation (22 Apr 2024)
Graphic Design with Large Multimodal Model (22 Apr 2024)
PMG : Personalized Multimodal Generation with Large Language Models (7 Apr 2024)
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control (19 Mar 2024)
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment (8 Mar 2024)
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis (30 Jan 2024)
tokenizer
DiffusionGPT: LLM-Driven Text-to-Image Generation System (18 Jan 2024)
StarVector: Generating Scalable Vector Graphics Code from Images (17 Dec 2023)
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation (14 Dec 2023)
StoryGPT-V: Large Language Models as Consistent Story Visualizers (13 Dec 2023)
GENIXER: Empowering Multimodal Large Language Models as a Powerful Data Generator (11 Dec 2023)
Customization Assistant for Text-to-image Generation (5 Dec 2023)
customization
ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model (29 Nov 2023)
DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback (29 Nov 2023)
COLE: A Hierarchical Generation Framework for Graphic Design (28 Nov 2023)
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering (28 Nov 2023)
LLMGA: Multimodal Large Language Model based Generation Assistant (27 Nov 2023)
Self-correcting LLM-controlled Diffusion Models (27 Nov 2023)
[ParaDiffusion] Paragraph-to-Image Generation with Information-Enriched Diffusion Model (29 Nov 2023)
Tokenize and Embed ALL for Multi-modal Large Language Models (8 Nov 2023)
tokenizer
WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models (20 Oct 2023)
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts (16 Oct 2023)
Making Multimodal Generation Easier: When Diffusion Models Meet LLMs (13 Oct 2023)
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation (12 Oct 2023)
OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation (11 Oct 2023)
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models (11 Oct 2023)
[DALL-E 3] Improving Image Generation with Better Captions
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens (3 Oct 2023)
Kaizhi Zheng, Xuehai He, Xin Eric Wang.
Making LLaMA SEE and Draw with SEED Tokenizer (2 Oct 2023)
tokenizer
InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists (30 Sep 2023)
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition (26 Sep 2023)
Text-to-Image Generation for Abstract Concepts (26 Sep 2023)
DreamLLM: Synergistic Multimodal Comprehension and Creation (20 Sep 2023)
SwitchGPT: Adapting Large Language Models for Non-Text Outputs (14 Sep 2023)
Wang, Xinyu, Bohan Zhuang, and Qi Wu.
NExT-GPT: Any-to-Any Multimodal LLM (11 Sep 2023)
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation (9 Aug 2023)
Planting a SEED of Vision in Large Language Model (16 Jul 2023)
Generative Pretraining in Multimodality (11 Jul 2023)
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs (30 Jun 2023)
Controllable Text-to-Image Generation with GPT-4 (29 May 2023)
Generating Images with Multimodal Language Models (26 May 2023)
[NeurIPS 2023] Koh, Jing Yu, Daniel Fried, and Ruslan Salakhutdinov.
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models (24 May 2023)
Visual Programming for Text-to-Image Generation and Evaluation (24 May 2023)
[NeurIPS 2023] Jaemin Cho, Abhay Zala, Mohit Bansal.
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (23 May 2023)
Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration (22 May 2023)
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation (18 May 2023)
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models (9 May 2023)
Grounding Language Models to Images for Multimodal Inputs and Outputs (31 Jan 2023)
[ICML 2023] Koh, Jing Yu, Ruslan Salakhutdinov, and Daniel Fried.
[RPG-DiffusionMaster] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (22 Jan 2024)
RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models (20 Feb 2024)
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models (11 Nov 2024)
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation (3 Apr 2024)
InstantID: Zero-shot Identity-Preserving Generation in Seconds (15 Jan 2024)
PIXART-ฮฑ: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis (30 Sep 2023)
TextDiffuser: Diffusion Models as Text Painters (18 May 2023)
TiGAN: Text-Based Interactive Image Generation and Manipulation (Dec 2022)
Tags:
iteractive
Multi-Concept Customization of Text-to-Image Diffusion (8 Dec 2022)
Tags: customization
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation (25 Aug 2022)
Tags: customization
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion (2 Aug 2022)
Tags: customization
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (23 May 2022)
[NeurIPS 2022]
High-Resolution Image Synthesis with Latent Diffusion Models (20 Dec 2021)
[CVPR 2022 (Oral)]
MIMIC-IT: Multi-Modal In-Context Instruction Tuning (8 Jun 2023)
[LAION-Glyph] GlyphControl: Glyph Conditional Control for Visual Text Generation (29 May 2023)
[MARIO-10M] TextDiffuser: Diffusion Models as Text Painters (18 May 2023)
DataComp: In search of the next generation of multimodal datasets (27 Apr 2023)
[LLava-instruct] Visual Instruction Tuning (17 Apr 2023)
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text (14 Apr 2023)
Language Is Not All You Need: Aligning Perception with Language Models (27 Feb 2023)
LAION-5B: An open large-scale dataset for training next generation image-text models (16 Oct 2022)
LAION COCO: 600M SYNTHETIC CAPTIONS FROM LAION2B-EN (15 Sep 2022)
[M3W] Flamingo: a Visual Language Model for Few-Shot Learning (29 Apr 2022)
[LAION-FACE]General Facial Representation Learning in a Visual-Linguistic Manner (6 Dec 2021)
[LAION-400M] Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs (3 Nov 2021)
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning (2 Mar 2021)
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts (17 Feb 2021)
[ALIGN] Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision (11 Feb 2021)
[MS COCO] Microsoft COCO: Common Objects in Context (1 May 2014)
[Im2Text] Describing Images Using 1 Million Captioned Photographs (12 Dec 2011)
[NeurIPS 2011] Vicente Ordonez, Girish Kulkarni, Tamara Berg
Loong: Generating Minute-level Long Videos with Autoregressive Language Models (3 Oct 2024)
Compositional 3D-aware Video Generation with LLM Director (31 Aug 2024)
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation (19 Aug 2024)
[BSQ-ViT] Image and Video Tokenization with Binary Spherical Quantization (11 Jun 2024)
[Tech Report]Yue Zhao, Yuanjun Xiong, Philipp Krรคhenbรผhl
tokenizer
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation (11 Mar 2024)
[Sora] Video generation models as world simulators (15 Feb 2024)
[LWM] World Model on Million-Length Video And Language With Blockwise RingAttention (13 Feb 2024)
[LGVI] Towards Language-Driven Video Inpainting via Multimodal Large Language Models (18 Jan 2024)
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization: Content-Consistent Multi-Scene Video Generation with LLM (2 Jan 2024)
tokenizer
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM (2 Jan 2024)
[PRO-Motion] Plan, Posture and Go: Towards Open-World Text-to-Motion Generation (22 Dec 2023)
VideoPoet: A Large Language Model for Zero-Shot Video Generation (21 Dec 2023)
FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax (27 Nov 2023)
human motion generation
general human motion generation
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning (21 Nov 2023)
[LVD] LLM-grounded Video Diffusion Models (29 Sep 2023)
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning (26 Sep 2023)
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator (25 Sep 2023)
[Dysen-VDM] Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models (26 Aug 2023)
[DirecT2V] Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation (23 May 2023)
OSV: One Step is Enough for High-Quality Image to Video Generation (17 Sep 2024)
[PAB] Real-Time Video Generation with Pyramid Attention Broadcast (26 Jun 2024)
Video-Infinity: Distributed Long Video Generation (24 Jun 2024)
Pandora: Towards General World Model with Natural Language Actions and Video (12 Jun 2024)
Text-Animator: Controllable Visual Text Video Generation (25 Jun 2024)
MotionBooth: Motion-Aware Customized Text-to-Video Generation (25 Jun 2024)
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models (24 Jun 2024)
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model (22 Jun 2024)
Image Conductor: Precision Control for Interactive Video Synthesis (21 Jun 2024)
VIDEOSCORE: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation (21 Jun 2024)
Dreamitate: Real-World Visuomotor Policy Learning via Video Generation (24 Jun 2024)
[MCM] Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation (11 Jun 2024)
Searching Priors Makes Text-to-Video Synthesis Better (5 Jun 2024)
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation (3 Jun 2024)
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture (30 May 2024)
[MOFT] Video Diffusion Models are Training-free Motion Interpreter and Controller (23 Mar 2024)
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text (21 Mar 2024)
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis (22 Feb 2024)
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models (17 Jan 2024)
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets (25 Nov 2023)
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation (30 Oct 2023)
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors (18 Oct 2023)
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling (23 Oct 2023)
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation (13 Jul 2023)
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance (1 Jun 2023)
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos (3 Apr 2023)
Real-time Controllable Denoising for Image and Video (29 Mar 2023)
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation (15 Mar 2023)
DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation (17 Feb 2025)
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE (23 Dec 2024)
VidTwin: Video VAE with Decoupled Structure and Dynamics (23 Dec 2024)
VidTok: A Versatile and Open-Source Video Tokenizer (17 Dec 2024)
[CVPR 2025] WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model (26 Nov 2024)
[CVPR 2025] [IV-VAE] Improved Video VAE for Latent Video Diffusion Model (10 Nov 2024)
[Tech Report] Cosmos Tokenizer: A suite of image and video neural tokenizers (Nov 6, 2024)
[NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Models (30 May 2024)
[ICLR 2024] [MAGVIT-v2] Language Model Beats Diffusion โ Tokenizer is Key to Visual Generation (9 Oct 2023)
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models (20 Nov 2024)
[VideoGen-Eval] The Dawn of Video Generation: Preliminary Explorations with SORA-like Models (7 Oct 2024)
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation (26 Jun 2024)
TAVGBench: Benchmarking Text to Audible-Video Generation (22 Apr 2024)
Sora Generates Videos with Stunning Geometrical Consistency (27 Feb 2024)
[CVPR 2024 Highlight] VBench: Comprehensive Benchmark Suite for Video Generative Models (29 Nov 2023)
[CVPR 2024] EvalCrafter: Benchmarking and Evaluating Large Video Generation Models (23 Mar 2024)
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation (5 Aug 2024)
Vript: A Video Is Worth Thousands of Words (10 Jun 2024)
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions (30 Jul 2024)
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation (13 Jul 2023)
[HD-VG-130M] VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation (18 May 2023)
[VideoCC3M] Learning Audio-Video Modalities from Image Captions (18 May 2023)
CelebV-Text: A Large-Scale Facial Text-Video Dataset (26 Mar 2023)
[HD-VILA-100M] Advancing High-Resolution Video-Language Representation
with Large-Scale Video Transcriptions (19 Nov 2021)
[YT-Temporal-180M] MERLOT: Multimodal Neural Script Knowledge Models (4 Jun 2021)
[WebVid-10M] Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval (1 Apr 2021)
[WTS70M] Learning Video Representations from Textual Web Supervision (29 Jul 2020)
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips (7 Jun 2019)
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research (6 Apr 2019)
How2: A Large-scale Dataset for Multimodal Language Understanding (7 Jun 2019)
[ActivityNet Captions] Dense-Captioning Events in Videos (2 May 2017)
[LSMDC] Movie Description (12 May 2016)
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language (1 Apr 2021)
SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code (2 Mar 2024)
MotionScript: Natural Language Descriptions for Expressive 3D Human Motions (19 Dec 2023)
HOLODECK: Language Guided Generation of 3D Embodied AI Environments (19 Dec 2023)
PoseGPT: Chatting about 3D Human Pose (30 Nov 2023)
3D-GPT: Procedural 3D MODELING WITH LARGE LANGUAGE MODELS (19 Oct 2023)
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion (12 Mar 2024)
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior (12 Mar 2024)
AToM: Amortized Text-to-Mesh using 2D Diffusion (1 Feb 2024)
DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior ( 12 Mar 2024)
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation (14 Dec 2023)
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior (11 Dec 2023)
Learn to Optimize Denoising Scores for 3D Generation: A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting (8 Dec 2023)
DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling (28 Nov 2023)
RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D (28 Nov 2023)
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models (30 Nov 2023)
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching (2 Dec 2023)
GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models (12 Oct 2023)
Text-to-3D using Gaussian Splatting (28 Sep 2023)
EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior (10 Sep 2023)
TADA! Text to Animatable Digital Avatars (21 Aug 2023)
SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D (20 Oct 2023 )
Noise-Free Score Distillation (26 Oct 2023)
Text-to-3D with Classifier Score Distillation (26 Oct 2023 )
HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance (28 Nov 2023)
MVDream: Multi-view Diffusion for 3D Generation (31 Aug 2023)
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation (28 Sep 2023)
Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation (11 Apr 2023)
IT3D: Improved Text-to-3D Generation with Explicit View Synthesis (22 Aug 2023)
HD-Fusion: Detailed Text-to-3D Generation Leveraging Multiple Noise Estimation (30 Jul 2023)
Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond (11 Apr 2023)
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures (14 Nov 2022)
Magic3D: High-Resolution Text-to-3D Content Creation (18 Nov 2022)
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation (1 Dec 2022)
High-fidelity 3D Face Generation from Natural Language Descriptions (5 May 2023)
RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion (12 Dec 2022)
ClipFace: Text-guided Editing of Textured 3D Morphable Models (24 Apr 2023)
DreamFusion: Text-to-3D using 2D Diffusion (29 Sep 2022)
ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation (25 May 2023)
HeadSculpt: Crafting 3D Head Avatars with Text (25 May 2023)
ATT3D: Amortized Text-to-3D Object Synthesis (6 Jun 2023)
Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation (24 Mar 2023)
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models (10 Sep 2023)
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance (28 Mar 2023)
StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation (31 May 2023)
TextMesh: Generation of Realistic 3D Meshes From Text Prompts (24 Apr 2023)
Clip-forge: Towards zero-shot text-to-shape generation (28 Apr 2022)
Zero-Shot Text-Guided Object Generation with Dream Fields (2 Dec 2021)
Text2Mesh: Text-Driven Neural Stylization for Meshes (6 Dec 2021)
TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition (20 Oct 2022)
CLIP-Mesh: Generating textured meshes from text using pretrained image-text models (24 Mar 2022)
MotionCLIP: Exposing Human Motion Generation to CLIP Space (15 Mar 2022)
Objaverse-XL: A Universe of 10M+ 3D Objects (11 Jul 2023)
Objaverse: A Universe of Annotated 3D Objects (15 Dec 2022)
SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation (27 Feb 2024)
ChatMusician: Understanding and Generating Music Intrinsically with LLM (25 Feb 2024)
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling (19 Feb 2024)
Boosting Large Language Model for Speech Synthesis: An Empirical Study (30 Dec 2023)
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action (28 Dec 2023)
M2UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models (19 Nov 2023)
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT (7 Oct 2023)
LLaSM: Large Language and Speech Model (30 Aug 2023)
AudioPaLM: A Large Language Model That Can Speak and Listen (22 Jun 2023)
Pengi: An Audio Language Model for Audio Tasks (19 May 2023)
Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities (18 May 2023)
Sparks of Artificial General Intelligence: Early experiments with GPT-4 (22 Mar 2023)
Audiobox: Unified Audio Generation with Natural Language Prompts (25 Dec 2023)
Apoorv Vyas, Bowen Shi, Matthew Le
Music ControlNet: Multiple Time-varying Controls for Music Generation (13 Nov 2023)
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing (19 Oct 2023)
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models (18 Oct 2023)
UniAudio: An Audio Foundation Model Toward Universal Audio Generation (1 Oct 2023)
Dongchao Yang, Jinchuan Tian, Xu Tan
AudioLM: a Language Modeling Approach to Audio Generation (7 Sep 2022)
Wavjourney: Compositional audio creation with large language models (26 Jul 2023)
Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody (16 Jun 2023)
Simple and Controllable Music Generation (8 Jun 2023)
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation (29 May 2023)
Jukebox: A Generative Model for Music (30 Apr 2020)
Audiogpt: Understanding and generating speech, music, sound, and talking head (25 Apr 2023)
TANGO: Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model (24 Apr 2023)
Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface (30 Mar 2023)
Neural codec language models are zero-shot text to speech synthesizers (5 Jan 2023)
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context (15 Sep 2023)
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition (7 Oct 2021)
Vggsound: A large-scale audio-visual dataset (29 Apr 2020)
Libri-Light: A Benchmark for ASR with Limited or No Supervision (17 Dec 2019 )
The mtg-jamendo dataset for automatic music tagging (15 Jun 2019)
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech (5 Apr 2019)
Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset (29 Oct 2018)
Audio Set: An ontology and human-labeled dataset for audio events (05 Mar 2017)
Librispeech: An ASR corpus based on public domain audio books (19 Apr2015)
C3LLM: Conditional Multimodal Content Generation Using Large Language Models (25 May 2024)
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation (30 Nov 2023)
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models (8 Nov 2023)
tokenizer
NExT-GPT: Any-to-Any Multimodal LLM (11 Sep 2023)
CoDi: Any-to-Any Generation via Composable Diffusion (19 May 2023)
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation (9 Jan 2024)
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners (27 Feb 2024)
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale (7 Jul 2024)
TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing (27 May 2024)
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models (11 Dec 2023)
Self-correcting LLM-controlled Diffusion Models (27 Nov 2023)
Emu Edit: Precise Image Editing via Recognition and Generation Tasks (16 Nov 2023)
Guiding Instruction-based Image Editing via Multimodal Large Language Models
CHATEDIT: Towards Multi-turn Interactive Facial Image Editing via Dialogue (20 Mar 2023)
HIVE: Harnessing Human Feedback for Instructional Visual Editing (16 Mar 2023)
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models (8 Mar 2023)
InstructPix2Pix: Learning to Follow Image Editing Instructions (17 Nov 2022)
[CVPR 2023 (Highlight)] Brooks, Tim, Aleksander Holynski, and Alexei A. Efros.
SeedEdit: Align Image Re-Generation to Image Editing (11 Nov 2024)
Yichun Shi, Peng Wang, Weilin Huang
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing (4 Feb 2024)
ZONE: Zero-Shot Instruction-Guided Local Editing (28 Dec 2023)
Watch Your Steps: Local Image and Scene Editing by Text Instructions (17 Aug 2023 )
Dragondiffusion: Enabling drag-style manipulation on diffusion models (5 Jul 2023)
Differential Diffusion: Giving Each Pixel Its Strength (1 Jun 2023)
Visual Instruction Inversion: Image Editing via Visual Prompting (26 Jul 2023)
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing (17 Apr 2023)
PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor (30 Mar 2023)
Zero-shot Image-to-Image Translation (6 Feb 2023)
SINE: SINgle Image Editing with Text-to-Image Diffusion Models (8 Dec 2022)
Interactive Image Manipulation with Complex Text Instructions (25 Nov 2022)
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation (22 Nov 2022)
Imagic: Text-Based Real Image Editing with Diffusion Models (17 Oct 2022)
Null-text Inversion for Editing Real Images using Guided Diffusion Models
Prompt-to-Prompt Image Editing with Cross Attention Control
DiffEdit: Diffusion-based semantic image editing with mask guidance (20 Oct 2022)
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation (6 Oct 2021)
[CVPR 2022] Gwanghyun Kim, Taesung Kwon, Jong Chul Ye.
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations (2 Aug 2021)
CONSISTENT VIDEO-TO-VIDEO TRANSFER USING SYNTHETIC DATASET (1 Nov 2023)
Jiaxin Cheng, Tianjun Xiao, Tong He.
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions (21 May 2023)
AudioScenic: Audio-Driven Video Scene Editing (25 Apr 2024)
LATENTWARP: CONSISTENT DIFFUSION LATENTS FOR ZERO-SHOT VIDEO-TO-VIDEO TRANSLATION (1 Nov 2023)
MagicStick: Controllable Video Editing via Control Handle Transformations (1 Nov 2023)
)
)
MagicEdit: High-Fidelity Temporally Coherent Video Editing (28 Aug 2023)
StableVideo: Text-driven Consistency-aware Diffusion Video Editing (18 Aug 2023)
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing (15 Aug 2023)
TokenFlow: Consistent Diffusion Features for Consistent Video Editing (19 Jul 2023)
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation (13 Jun 2023)
ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing (26 May 2023)
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts (15 May 2023)
Michal Geyer, Omer Bar-Tal, Shai Bagon, Tali Dekel.
Pix2Video: Video Editing using Image Diffusion (22 Mar 2023)
[ICCV 2023] Ceylan, Duygu, Chun-Hao P. Huang, and Niloy J. Mitra.
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing (16 Mar 2023)
Video-P2P: Video Editing with Cross-attention Control (8 Mar 2023)
Dreamix: Video Diffusion Models are General Video Editors (2 Feb 2023)
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation (22 Dec 2022)
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers (2 Apr 2021)
SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code (2 Mar 2024)
3D-GPT: Procedural 3D MODELING WITH LARGE LANGUAGE MODELS (19 Oct 2023)
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models (16 Nov 2023)
3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation (16 Nov 2023)
Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields (23 Aug 2023)
SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field (23 Mar 2023)
TextDeformer: Geometry Manipulation using Text Guidance (26 Apr 2023)
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions (22 Mar 2023)
DreamEditor: Text-Driven 3D Scene Editing with Neural Fields (23 Jun 2023)
SKED: Sketch-guided Text-based 3D Editing (19 Mar 2023)
Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields (22 Jun 2023)
ClipFace: Text-guided Editing of Textured 3D Morphable Modelssting Neural Radiance Fields (2 Dec 2022)
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fieldsadiance Fields (9 Dec 2021)
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing (19 Oct 2023)
UniAudio: An Audio Foundation Model Toward Universal Audio Generation (1 Oct 2023)
Dongchao Yang, Jinchuan Tian, Xu Tan
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing (1 Nov 2023)
Tags: Image Chat
Image Segmentation
, Image Generation
Image Editing
ControlLLM: Augment Language Models with Tools by Searching on Graphs (26 Oct 2023)
Tags: Image Understanding
Image Generation
Image Editing
Video Understanding
Video Generation
Video Editing
Audio Understanding
Audio Generation
ImageBind-LLM: Multi-modality Instruction Tuning (7 Sep 2023)
Modalities: text
image
video
audio
point cloud
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models (2 Sep 2023)
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language (9 May 2023)
Condition Modality: text
image
video
audio
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face (30 Mar 2023)
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models (8 Mar 2023)
text, video, audio
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions (11 Jun 2024)
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text (11 Jun 2024)
Open-World Human-Object Interaction Detection via Multi-modal Prompts (11 Jun 2024)
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? (11 Jun 2024)
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks (21 Dec 2023)
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (28 Nov 2023)
Yanwei Li, Chengyao Wang, Jiaya Jia
CogVLM: Visual Expert for Pretrained Language Models (6 Nov 2023)
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning (14 Oct 2023)
OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue (21 Jun 2023)
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition (26 Sep 2023)
[LaVIT] Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization (9 Sep 2023)
tokenizer
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond (24 Aug 2023)
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks (18 May 2023)
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning (11 May 2023)
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models (20 Apr 2023)
Visual Instruction Tuning (17 Apr 2023)
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification (11 Nov 2024)
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding (22 Sep 2024)
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution (19 Sep 2024)
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs (25 Apr 2024)
PLLaVA: Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning (25 Apr 2024)
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding (3 Dec 2023)
Enxin, Song, et al.
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (28 Nov 2023)
Yanwei, Li, et al.
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models (27 Nov 2023)
Ning, Munan, et al.
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models (22 Nov 2023)
Munasinghe, Shehan, et al.
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection (16 Nov 2023)
Lin, Bin, et al.
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding (14 Nov 2023)
Jin, Peng, et al.
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding (5 Jun 2023)
Zhang, Hang, Xin Li, and Lidong Bing. EMNLP 2023โs demo track.
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? (31 Jul 2023)
Zhao, Qi, et al.
Valley: Video Assistant with Large Language model Enhanced ability (12 Jun 2023)
Luo, Ruipu, et al.
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models (8 Jun 2023)
Muhammad Maaz, Hanoona Rasheed, Salman Khan, et al.
VideoChat: Chat-Centric Video Understanding (10 May 2023)
Li, KunChang, et al.
VideoLLM: Modeling Video Sequence with Large Language Models (22 May 2023)
Chen, Guo, et al.
Learning video embedding space with Natural Language Supervision (25 Mar 2023)
Uppala, Phani Krishna, Shriti Priya, and Vaidehi Joshi.
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding (12 Oct 2024)
Situation3D: Situational Awareness Matters in 3D Vision Language Reasoning (12 Oct 2024)
[CVPR 2024] Yunze Man, Liang-Yan Gui, Yu-Xiong Wang
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning (30 Nov 2023)
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding (21 Dec 2023)
Senqiao Yang, Jiaming Liu, Ray Zhang, et al.
3D-LLM: Injecting the 3D World into Large Language Models (24 Jul 2023)
PointLLM: Empowering Large Language Models to Understand Point Clouds (31 Aug 2023)
PointCLIP: Point Cloud Understanding by CLIP (31 Aug 2023)
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action (28 Dec 2023)
M2UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models (19 Nov 2023)
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models (14 Nov 2023)
SALMONN: Towards Generic Hearing Abilities for Large Language Models (20 Oct 2023)
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models (18 Oct 2023)
Llark: A multimodal foundation model for music (11 Oct 2023)
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT (7 Oct 2023)
Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation (29 Sep 2023)
Connecting Speech Encoder and Large Language Model for ASR (25 Sep 2023)
Can Whisper perform speech-based in-context learning (13 Sep 2023)
Music understanding LLaMA: Advancing text-to-music generation with question answering and captioning (22 Aug 2023)
On decoder-only architecture for speech-to-text and large language model integration (8 Jul 2023)
AudioPaLM: A Large Language Model That Can Speak and Listen (22 Jun 2023)
Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface (30 Mar 2023)
Sparks of Artificial General Intelligence: Early experiments with GPT-4 (22 Mar 2023)
Listen, Think, and Understand (18 May 2023)
Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities (18 May 2023)
Audiogpt: Understanding and generating speech, music, sound, and talking head (25 Apr 2023)
Jailbreaking gpt-4v via self-adversarial attacks with system prompts. (20 Jan 2024)
Defending chatgpt against jailbreak attack via self-reminders. (1 Dec 2023)
Misusing Tools in Large Language Models With Visual Adversarial Examples (4 Oct 2023)
Image Hijacks: Adversarial Images can Control Generative Models at Runtime. (18 Sep 2023)
Universal and Transferable Adversarial Attacks on Aligned Language Models (27 Jul 2023)
Prompt injection attack against llm-integrated applications (8 Jun 2023)
Automatically Auditing Large Language Models via Discrete Optimization (8 Mar 2023)
Poisoning Web-Scale Training Datasets is Practical (20 Feb 2023)
Exploiting programmatic behavior of llms: Dual-use through standard security attacks. (11 Feb 2023)
Ignore previous prompt: Attack techniques for language models (17 Nov 2022)
F bio Perez, Ian Ribeiro (NeurIPS 2022 Workshop)
Universal Adversarial Triggers for Attacking and Analyzing NLP (20 Aug 2019)
Adversarial Examples for Evaluating Reading Comprehension Systems (23 Jul 2017)
Robin Jia, Percy Liang (EMNLP 2017)
Detecting and correcting hate speech in multimodal memes with large visual language model. (12 Nov 2023)
Minh-Hao Van, Xintao Wu
Detecting Pretraining Data from Large Language Models (3 Nov 2023)
Jailbreak and guard aligned language models with only few in-context demonstrations (10 Oct 2023)
Zeming Wei, Yifei Wang, Yisen Wang
Smoothllm: Defending large language models against jailbreaking attacks. (5 Oct 2023)
A Watermark for Large Language Models (6 Jun 2023)
Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models (23 May 2023)
TRAK: Attributing Model Behavior at Scale (3 Apr 2023)
Poisoning Web-Scale Training Datasets is Practical (20 Feb 2023)
Mitigating Inappropriate Degeneration in Diffusion Models (9 Nov 2022)
Extracting Training Data from Large Language Models (15 Jun 2021)
Direct Preference Optimization: Your Language Model is Secretly a Reward Model (13 Dec 2023)
Raft: Reward ranked fine tuning for generative foundation model alignment (1 Dec 2023)
Better aligning text-to-image models with human preference (22 Aug 2023)
Scalable agent alignment via reward modeling: a research direction (19 Nov 2018)
Proximal policy optimization algorithms (20 Jul 2017)
Goat-bench: Safety insights to large multimodal models through meme-based social abuse. (7 Jan 2024)
Tovilag: Your visual-language generative model is also an evildoer. (13 Dec 2023)
Figstep: Jailbreaking large vision-language models via typographic visual prompts. (13 Dec 2023)
Query-relevant images jailbreak large multi-modal models. (29 Nov 2023)
Dress: Instructing large vision-language models to align and interact with humans via natural language feedback. (16 Nov 2023)
Beavertails: Towards improved safety alignment of llm via a human-preference dataset (7 Nov 2023)
Can pre-trained vision and language models answer visual information-seeking questions? (17 Oct 2023)
Can language models be instructed to protect personal information? (3 Oct 2023)
Safetybench: Evaluating the safety of large language models with multiple choice questions (13 Sep 2023)
Safety assessment of chinese large language models (20 Apr 2023)
Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators (25 Jan 2024)
Wiebke Hutiri, Oresiti Papakyriakopoulos, Alice Xiang
)
)
Adv3D: Generating 3D Adversarial Examples in Driving Scenarios with NeRF (4 Sep 2023)
Leheng Li, Qing Lian, Ying-Cong Chen
Deepfake Video Detection Using Generative Convolutional Vision Transformer (13 Jul 2023)
Deressa Wodajo, Solomon Atnafu, Zahid Akhtar
M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection (19 Apr 2022)
Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Ser-Nam Lim, Yu-Gang Jiang
Deepfake Video Detection Using Convolutional Vision Transformer (11 Mar 2021)
Deressa Wodajo, Solomon Atnafu
โDeepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forwardโ (25 Feb 2021)
Momina Masood, Marriam Nawaz, Khalid Mahmood Malik, Ali Javed, Aun Irtaza
MM-LLMs: Recent Advances in MultiModal Large Language Models (24 Jan 2024)
A Survey on Multimodal Large Language Models (23 Jun 2023)
Multimodal Large Language Models: A Survey (22 Nov 2023)
A Survey of Large Language Models (31 Mar 2023)
Autoregressive Models in Vision: A Survey (8 Nov 2024)
State of the Art on Diffusion Models for Visual Computing (11 Oct 2023)
Diffusion Models in Vision: A Survey (10 Sep 2022)
Here is the list of our contributors in each modality of this repository.
Modality/Task | Contributors |
---|---|
Image Generation | Jingye Chen, Xiaowei Chi, Yingqing He |
Video Generation | Yingqing He, Xiaowei Chi, Jingye Chen |
Image and Video Editing | Yazhou Xing |
3D Generation and Editing | Hongyu Liu |
Audio Generation and Editing | Zeyue Tian, Ruibin Yuan |
LLM Agent | Zhaoyang Liu |
Safety | Runtao Liu |
Leaders | Yingqing He, Zhaoyang Liu |
If you find this work useful in your research, Please cite the paper as below:
@article{he2024llms,
title={LLMs Meet Multimodal Generation and Editing: A Survey},
author={He, Yingqing and Liu, Zhaoyang and Chen, Jingye and Tian, Zeyue and Liu, Hongyu and Chi, Xiaowei and Liu, Runtao and Yuan, Ruibin and Xing, Yazhou and Wang, Wenhai and Dai, Jifeng and Zhang, Yong and Xue, Wei and Liu, Qifeng and Guo, Yike and Chen, Qifeng},
journal={arXiv preprint arXiv:2405.19334},
year={2024},
}