[Gen AI] T2I & TI2I ๋ฐ์ดํ„ฐ์…‹ ๋ฐ ๋ฒค์น˜๋งˆํฌ ์ •๋ฆฌ | ์ด๋ฏธ์ง€ ์ƒ์„ฑ & ํŽธ์ง‘ ๋ฐ์ดํ„ฐ์…‹

2025. 11. 1. 21:38ยท๐Ÿ› Research/Image•Video Generation
๋ฐ˜์‘ํ˜•

๋ณธ ์ •๋ฆฌ๋Š”Text-to-Image (T2I), Image-to-Image (TI2I) ๋ชจ๋ธ ์—ฐ๊ตฌ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ์…‹์„ ์ •๋ฆฌํ•œ ๊ฒƒ์ด๋‹ค. ๋‹จ์ˆœ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ํ’ˆ์งˆ์„ ๋„˜์–ด, ํ…์ŠคํŠธ ์ดํ•ด๋ ฅ·์„ธ๊ณ„์ง€์‹·์ง€๋Šฅํ˜• ํŽธ์ง‘(reasoning)๊นŒ์ง€ ํ‰๊ฐ€ํ•˜๋Š” ํ๋ฆ„์— ์ดˆ์ ์„ ๋งž์ท„๋‹ค.

 

1. Text-to-Image (T2I) Datasets

์ด๋ฆ„ ๋ฐ์ดํ„ฐ ๊ทœ๋ชจ  ์ฃผ์š” ํŠน์ง•
LAION-5B (Aesthetic / HighRes) 5B (Aesthetic ~200M) • ์˜คํ”ˆ์›น ์ด๋ฏธ์ง€-ํ…์ŠคํŠธ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ
• CLIP score ๋ฐ aesthetic score ๊ธฐ๋ฐ˜ ํ•„ํ„ฐ๋ง์œผ๋กœ ํ’ˆ์งˆ ์ •์ œ. 
CC12M (Conceptual Captions 12M) 12M • ๊ตฌ๊ธ€ ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ ์ž๋™ ์บก์…˜ ์ˆ˜์ง‘ + ํ•„ํ„ฐ๋ง. ํ…์ŠคํŠธ ๋‹ค์–‘์„ฑ๊ณผ ์–ธ์–ด ์ผ๋ฐ˜ํ™”๋ ฅ ํ–ฅ์ƒ. 
• ์ƒ๋Œ€์ ์œผ๋กœ ์งง๊ณ  ์ •ํ™•ํ•œ ์บก์…˜ 
DiffusionDB 14M  • ์‹ค์ œ Stable Diffusion ์‚ฌ์šฉ์ž ํ”„๋กฌํ”„ํŠธ-๊ฒฐ๊ณผ ๋งคํ•‘
• ํ˜„์‹ค์  prompt ์Šคํƒ€์ผ ๋ฐ˜์˜, RLHF·SFT alignment ์—ฐ๊ตฌ์— ์ ํ•ฉ. 
JourneyDB ~5M  • Midjourney·Lexica ๋“ฑ ์ƒ์„ฑ ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ high-aesthetic dataset. ์Šคํƒ€์ผ ์žฌํ˜„·LoRA ํ•™์Šต์šฉ. 
FLUX-Reason-6M 6M  • FLUX ์‹œ๋ฆฌ์ฆˆ ์ „์šฉ reasoning-augmented T2I ๋ฐ์ดํ„ฐ์…‹
•
๋ณตํ•ฉ ๊ฐœ๋… ์กฐํ•ฉ ๋ฐ world reasoning ๊ฐ•ํ™”. 

 

2. Image-to-Image (TI2I / Image Editing) Datasets

์ด๋ฆ„ ๋ฐ์ดํ„ฐ ๊ทœ๋ชจ  ์ฃผ์š” ํŠน์ง• 
ImgEdit  1.2M • ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜ ํŽธ์ง‘(Instruction-based Editing) ๋ฐ์ดํ„ฐ. ์›๋ณธ-ํƒ€๊นƒ ์ด๋ฏธ์ง€ + ์ง€์‹œ๋ฌธ ํฌํ•จ. .
HQ-Edit 200K  • ๊ณ ํ•ด์ƒ๋„ ํŽธ์ง‘·๋ณต์›·Inpainting ์ง€์›. ์ธ์Šคํ„ด์Šค ๋งˆ์Šคํฌ ํฌํ•จ์œผ๋กœ ์„ธ๋ฐ€ํ•œ ์ œ์–ด ๊ฐ€๋Šฅ. 
X2I2
4M  • “Any-to-Any” ํ˜•์‹ ์ฆ‰, ํ…์ŠคํŠธ→์ด๋ฏธ์ง€, ์ด๋ฏธ์ง€→์ด๋ฏธ์ง€, ์ฐธ์กฐ ์ด๋ฏธ์ง€ ์—ฌ๋Ÿฌ ์žฅ→์ด๋ฏธ์ง€ ๋“ฑ ๋‹ค์–‘ํ•œ ์ž…๋ ฅ์กฐ๊ฑด์„ ์ง€์›
• ์˜์ƒ ํ”„๋ ˆ์ž„, ์ฐธ์กฐ ์ด๋ฏธ์ง€, ํŽธ์ง‘ ์ฟผ๋ฆฌ ๋“ฑ์ด ํฌํ•จ๋œ ๋ณตํ•ฉ ์‹œ๋‚˜๋ฆฌ์˜ค์šฉ ๋ฐ์ดํ„ฐ๋กœ ์„ค๊ณ„๋จ
GPT-Image-Edit-1.5M 1.5M • HQ-Edit / UltraEdit / OmniEdit ์„ธ ํŽธ์ง‘ ์…‹์„ GPT-Image-1 ๊ธฐ๋ฐ˜์œผ๋กœ ํ†ตํ•ฉ ์ •์ œ
• ์ง€์‹œ๋ฌธ์˜ ๋ณต์žก๋„ (level Cโ‚ƒ ๊นŒ์ง€) ๋ณ„ ๋ถ„๋ฅ˜ ๋กœ ๋‹จ์ˆœ-๊ณ ์ฐจ ํŽธ์ง‘ ๋ชจ๋‘ ์ปค๋ฒ„
• OmniEdit 313 K ์ƒ˜ํ”Œ์—์„œ complex-edit ํ˜•ํƒœ ์ง€์‹œ๋ฌธ ์ƒˆ๋กœ ์ž‘์„ฑ → ๊ณ ์ฐจ reasoning ํŽธ์ง‘ ํ•™์Šต ๊ฐ€๋Šฅ
• 9 ํŽธ์ง‘ ํƒœ์Šคํฌ(add / replace / change_color / transform ๋“ฑ) ๊ท ๋“ฑ ๋ถ„ํฌ

 

3. ์ฃผ์š” Benchmarks

์ด๋ฆ„ ๋ฐ์ดํ„ฐ ๊ทœ๋ชจ  ์ฃผ์š” ํŠน์ง• 
GenEval ~8K prompts Text-to-Image ์ƒ์„ฑ ํ’ˆ์งˆ ์ž๋™ ํ‰๊ฐ€. ์ƒ‰์ƒ·๊ฐœ์ˆ˜·์†์„ฑ ์ผ๊ด€์„ฑ ์ค‘์‹ฌ ๋ฉ”ํŠธ๋ฆญ. 
WISE (World-Knowledge Integrated Semantic Evaluation) ~5K prompts (6 domains) ์„ธ๊ณ„์ง€์‹·๋ณตํ•ฉ ์˜๋ฏธ ์ดํ•ด ์ค‘์‹ฌ ๋ฒค์น˜๋งˆํฌ. GPT-4o ๊ธฐ๋ฐ˜ ์ž๋™ํ‰๊ฐ€. 
GEdit-Bench ์ˆ˜์ฒœ ๊ฐœ ์š”์ฒญ ๊ธฐ๋ฐ˜ GIER ๊ธฐ๋ฐ˜ ์‹ค์ œ ์‚ฌ์šฉ์ž ํŽธ์ง‘ ์š”์ฒญ ํ…Œ์ŠคํŠธ. GPT-4.1 ๊ธฐ๋ฐ˜ ์ž๋™ ์ฑ„์ (G_SC, G_PQ, G_O). 
IntelligentBench (BAGEL) 350 samples reasoning + world knowledge ๊ธฐ๋ฐ˜ ์ง€๋Šฅํ˜• ํŽธ์ง‘ ํ‰๊ฐ€. GPT-4o(2024-11-20) ํ‰๊ฐ€ ๊ธฐ์ค€. 

 

๋ฐ˜์‘ํ˜•

'๐Ÿ› Research > Imageโ€ขVideo Generation' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Omni] OmniGen2: Exploration to Advanced Multimodal Generation | ํ†ตํ•ฉ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ƒ์„ฑ ๋ชจ๋ธ  (1) 2025.11.30
[T2I] Back to Basics: Let Denoising Generative Models Denoise | Just image Transformers (JiT) ๋ฆฌ๋ทฐ  (0) 2025.11.29
[Gen AI] BAGEL: Unified Multimodal Design - ์ดํ•ด์™€ ์ƒ์„ฑ์˜ ํ†ตํ•ฉ ๊ตฌ์กฐ  (0) 2025.10.31
[Gen AI] Qwen-Image ํ…Œํฌ๋‹ˆ์ปฌ ๋ฆฌํฌํŠธ ๋ถ„์„ | T2I, TI2I | ์ด๋ฏธ์ง€ ์ƒ์„ฑ ํŽธ์ง‘ ๋ชจ๋ธ  (0) 2025.09.15
[Gen AI] ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ชจ๋ธ์˜ ํ‰๊ฐ€ ์ง€ํ‘œ ์ •๋ฆฌ | FID, IS, CLIP Score, LPIPS,...  (1) 2025.08.01
'๐Ÿ› Research/Image•Video Generation' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [Omni] OmniGen2: Exploration to Advanced Multimodal Generation | ํ†ตํ•ฉ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ƒ์„ฑ ๋ชจ๋ธ
  • [T2I] Back to Basics: Let Denoising Generative Models Denoise | Just image Transformers (JiT) ๋ฆฌ๋ทฐ
  • [Gen AI] BAGEL: Unified Multimodal Design - ์ดํ•ด์™€ ์ƒ์„ฑ์˜ ํ†ตํ•ฉ ๊ตฌ์กฐ
  • [Gen AI] Qwen-Image ํ…Œํฌ๋‹ˆ์ปฌ ๋ฆฌํฌํŠธ ๋ถ„์„ | T2I, TI2I | ์ด๋ฏธ์ง€ ์ƒ์„ฑ ํŽธ์ง‘ ๋ชจ๋ธ
๋ญ…์ฆค
๋ญ…์ฆค
AI ๊ธฐ์ˆ  ๋ธ”๋กœ๊ทธ
    ๋ฐ˜์‘ํ˜•
  • ๋ญ…์ฆค
    moovzi’s Doodle
    ๋ญ…์ฆค
  • ์ „์ฒด
    ์˜ค๋Š˜
    ์–ด์ œ
  • ๊ณต์ง€์‚ฌํ•ญ

    • โœจ About Me
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (213)
      • ๐Ÿ“– Fundamentals (34)
        • Computer Vision (9)
        • 3D vision & Graphics (6)
        • AI & ML (16)
        • NLP (2)
        • etc. (1)
      • ๐Ÿ› Research (75)
        • Deep Learning (7)
        • Perception (19)
        • OCR (7)
        • Multi-modal (5)
        • Image•Video Generation (18)
        • 3D Vision (4)
        • Material • Texture Recognit.. (8)
        • Large-scale Model (7)
        • etc. (0)
      • ๐Ÿ› ๏ธ Engineering (8)
        • Distributed Training & Infe.. (5)
        • AI & ML ์ธ์‚ฌ์ดํŠธ (3)
      • ๐Ÿ’ป Programming (92)
        • Python (18)
        • Computer Vision (12)
        • LLM (4)
        • AI & ML (18)
        • Database (3)
        • Distributed Computing (6)
        • Apache Airflow (6)
        • Docker & Kubernetes (14)
        • ์ฝ”๋”ฉ ํ…Œ์ŠคํŠธ (4)
        • C++ (1)
        • etc. (6)
      • ๐Ÿ’ฌ ETC (4)
        • ์ฑ… ๋ฆฌ๋ทฐ (4)
  • ๋งํฌ

    • ๋ฆฌํ‹€๋ฆฌ ํ”„๋กœํ•„ (๋ฉ˜ํ† ๋ง, ๋ฉด์ ‘์ฑ…,...)
    • ใ€Ž๋‚˜๋Š” AI ์—”์ง€๋‹ˆ์–ด์ž…๋‹ˆ๋‹คใ€
    • Instagram
    • Brunch
    • Github
  • ์ธ๊ธฐ ๊ธ€

  • ์ตœ๊ทผ ๋Œ“๊ธ€

  • ์ตœ๊ทผ ๊ธ€

  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.3
๋ญ…์ฆค
[Gen AI] T2I & TI2I ๋ฐ์ดํ„ฐ์…‹ ๋ฐ ๋ฒค์น˜๋งˆํฌ ์ •๋ฆฌ | ์ด๋ฏธ์ง€ ์ƒ์„ฑ & ํŽธ์ง‘ ๋ฐ์ดํ„ฐ์…‹
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”