[Gen AI] Diffusion ๋ชจ๋ธ ์ƒ˜ํ”Œ๋ง & ํ•™์Šต ํŠธ๋ฆญ ์ •๋ฆฌ

2025. 7. 8. 11:28ยท๐Ÿ› Research/Image•Video Generation
๋ฐ˜์‘ํ˜•

1. ์ƒ˜ํ”Œ๋ง(Sampling) ๋ฐฉ๋ฒ•

1.1 DDIM

DDIM(Denoising Diffusion Implicit Models)์€ DDPM์˜ stochastic sampling์„ deterministic ๋ฐฉ์‹์œผ๋กœ ๋ฐ”๊ฟ” ์ ์€ step์œผ๋กœ๋„ ๊ณ ํ’ˆ์งˆ ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค.

  • DDPM์€ noise๋ฅผ ๊ฑฐ์Šฌ๋Ÿฌ ์˜ฌ๋ผ๊ฐˆ ๋•Œ ๋งค step์— randomness๊ฐ€ ๋“ค์–ด๊ฐ€์ง€๋งŒ, DDIM์€ deterministic trajectory๋ฅผ ๋”ฐ๋ผ๊ฐ„๋‹ค.
  • ์ฆ‰, DDIM์€ ํ•œ ์Šคํ…์— ๋” ํฌ๊ฒŒ ๋…ธ์ด์ฆˆ๋ฅผ ๋ฒ—๊ฒจ๋‚ด๋„ trajectory๊ฐ€ ๊นจ์ง€์ง€ ์•Š์•„์„œ ๋” ์ ์€ ์Šคํ…์œผ๋กœ๋„ ์ข‹์€ ํ’ˆ์งˆ์˜ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋‹ค.
  • DDPM์ด ๋ณดํ†ต 1000 step์„ ํ•„์š”๋กœ ํ•˜๋Š”๋ฐ DDIM์€ 50~100 step ์ •๋„๋กœ๋„ ์ข‹์€ ํ’ˆ์งˆ์„ ๋‚ผ ์ˆ˜ ์žˆ๋‹ค.
  • ์˜ˆ์‹œ: Stable Diffusion์—์„œ num_inference_steps=50๋กœ sampling ์†๋„๋ฅผ ํฌ๊ฒŒ ์ค„์ด๋Š” ๋ฐ ํ™œ์šฉ.

 

1.2 PNDM / LMSD / DPM-Solver

DDPM/DDIM์—์„œ sampling์„ ํ•  ๋•Œ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ Euler step ๊ฐ™์€ ๋‹จ์ˆœํ•œ ๋ฐฉ์‹์œผ๋กœ noisyํ•œ trajectory๋ฅผ ๊ฑฐ์Šฌ๋Ÿฌ ์˜ฌ๋ผ๊ฐ„๋‹ค. ํ•˜์ง€๋งŒ ์ด๋ฅผ ๋” ๋น ๋ฅด๊ณ  ๊ณ ํ’ˆ์งˆ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด multi-step ODE solver๋ฅผ ์ ์šฉํ•˜๋Š” ๊ฒƒ์ด PNDM, LMSD, DPM-Solver์ด๋‹ค.

  • PNDM (Pseudo Numerical Methods for Diffusion Models)
    • DDIM์˜ single-step ๋ฐฉ์‹ ๋Œ€์‹ , Adams-Bashforth ๊ณ„์—ด์˜ multi-step solver๋ฅผ ์จ์„œ sampling path๋ฅผ ๋” ์ •ํ™•ํžˆ ์ถ”์ •ํ•œ๋‹ค.
    • 2~4 step ์ด์ „์˜ noise prediction(gradient) ์ •๋ณด๋ฅผ ๊ฐ™์ด ์‚ฌ์šฉํ•ด ํ˜„์žฌ ์ƒ˜ํ”Œ๋ง์— ๋ฐ˜์˜ํ•จ์œผ๋กœ์จ trajectory๋ฅผ ๋ณด์ •ํ•œ๋‹ค.
  • LMSD (Linear Multi-Step DPM)
    • ๋น„์Šทํ•˜๊ฒŒ linear multi-step ODE solver๋ฅผ ์ ์šฉํ•ด ์•ˆ์ •์„ฑ๊ณผ ์†๋„๋ฅผ ๋ชจ๋‘ ์žก๋Š”๋‹ค.
    • ํŠนํžˆ timestep ์ˆ˜๋ฅผ ์ค„์—ฌ๋„ blur๋‚˜ artifact ์—†์ด ๊นจ๋—ํ•œ ์ƒ˜ํ”Œ์„ ๋ฝ‘๋Š” ๋ฐ ์œ ๋ฆฌํ•˜๋‹ค.
  • DPM-Solver
    • adaptive step-size ๋ฐฉ์‹์„ ๋„์ž…ํ•ด timestep ์ˆ˜๋ฅผ ๋™์ ์œผ๋กœ ์กฐ์ •ํ•˜๋ฉฐ ๋” ๋น ๋ฅด๊ณ  ์•ˆ์ •์ ์ธ ์ƒ˜ํ”Œ๋ง์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.
    • ์ด ๋•Œ๋ฌธ์— DDIM์„ 50 step์œผ๋กœ ๋Œ๋ฆด ๋•Œ๋ณด๋‹ค ํ›จ์”ฌ ์ ์€ step (20~30 step)์œผ๋กœ ๋น„์Šทํ•˜๊ฑฐ๋‚˜ ๋” ์ข‹์€ ํ’ˆ์งˆ์„ ๋‚ผ ์ˆ˜ ์žˆ๋‹ค.
PNDM, LMSD, DPM-Solver๋Š” DDIM์—์„œ ๋” ๋‚˜์•„๊ฐ„ multi-step ODE solver ๋ฐฉ์‹์œผ๋กœ, ๊ฐ™์€ ์กฐ๊ฑด์—์„œ ๋” ์ ์€ step์œผ๋กœ ๋น ๋ฅด๊ณ  ๊นจ๋—ํ•˜๊ฒŒ ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์–ด ์ง€๊ธˆ์€ ๊ฑฐ์˜ ํ‘œ์ค€์ฒ˜๋Ÿผ ์‚ฌ์šฉ๋œ๋‹ค. ๋Œ€์‹  timestep schedule๊ณผ solver ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ ์ ˆํžˆ ์กฐ์ ˆํ•ด์•ผ ์ตœ์  ํ’ˆ์งˆ์„ ๋‚ผ ์ˆ˜ ์žˆ๋‹ค๋Š” trade-off๊ฐ€ ์žˆ๋‹ค.

 

1.3 Classifier-free Guidance (CFG)

์กฐ๊ฑด ์ƒ์„ฑ(class-conditional, text-conditional)์—์„œ classifier๋ฅผ ๋ณ„๋„๋กœ ํ•™์Šต์‹œํ‚ค์ง€ ์•Š๊ณ ๋„ ์ƒ˜ํ”Œ์„ ์กฐ๊ฑด์— ๋งž๊ฒŒ ์œ ๋„(guidance)ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.

  • ์›๋ž˜๋Š” ์กฐ๊ฑด ์ƒ์„ฑ์—์„œ ∇_x log p(class|x) ํ˜•ํƒœ์˜ classifier gradient๋ฅผ ์‚ฌ์šฉํ•ด ์ƒ˜ํ”Œ๋ง์„ ์œ ๋„ํ–ˆ์ง€๋งŒ, ์ด ๋ฐฉ์‹์€ ๋ณ„๋„์˜ classifier๋ฅผ ๋”ฐ๋กœ ํ•™์Šตํ•ด์•ผ ํ•˜๋Š” ๋ถ€๋‹ด์ด ์žˆ์—ˆ๋‹ค.
  • Classifier-free Guidance(CFG)๋Š” ์ด๋ฅผ ๋Œ€์ฒดํ•˜๊ธฐ ์œ„ํ•ด, ๊ฐ™์€ ๋ชจ๋ธ์—์„œ ์กฐ๊ฑด์ด ์žˆ๋Š” prediction๊ณผ ์กฐ๊ฑด์ด ์—†๋Š” prediction์„ ๋‘˜ ๋‹ค ์ˆ˜ํ–‰ํ•ด ๊ฐ„๋‹จํžˆ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.
  • ์ด๋ ‡๊ฒŒ ์กฐ๊ฑด์ด ์žˆ๋Š” prediction๊ณผ ์กฐ๊ฑด์ด ์—†๋Š” prediction์„ ๊ฐ™์ด ํ•™์Šตํ•˜๋Š” ์ด์œ ๋Š”, ๋‚˜์ค‘์— ์ƒ˜ํ”Œ๋ง(inference) ๋‹จ๊ณ„์—์„œ ์กฐ๊ฑด์ด ์žˆ๋Š” ε(x_t|c)์™€ ์—†๋Š” ε(x_t)๋ฅผ ๋™์‹œ์— ์‚ฌ์šฉํ•ด guidance๋ฅผ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ์ด๋‹ค.
  • ์ฆ‰, ์กฐ๊ฑด์ด ์—†๋Š” branch๋„ ํ•จ๊ป˜ ํ•™์Šตํ•จ์œผ๋กœ์จ
    • ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ ์กฐ๊ฑด์„ ๋” ๊ฐ•ํ•˜๊ฒŒ ๋„ฃ๊ฑฐ๋‚˜, ํ˜น์€ ์•ฝํ•˜๊ฒŒ ๋„ฃ๋Š” ๋‹ค์–‘ํ•œ ์ƒ˜ํ”Œ์„ ์œ ์—ฐํ•˜๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.
    • ๋ณ„๋„์˜ classifier gradient๋ฅผ ์“ฐ์ง€ ์•Š์œผ๋ฉด์„œ๋„ ์›ํ•˜๋Š” ์ •๋„๋กœ condition์„ ๊ฐ•์กฐํ•˜๋Š” ์กฐ์ ˆ ๊ฐ€๋Šฅํ•œ ์ƒ์„ฑ์ด ๊ฐ€๋Šฅํ•ด์ง„๋‹ค.

๊ตฌ์ฒด์ ์œผ๋ก  ๋‘ noise prediction์„ ๊ณ„์‚ฐํ•œ๋‹ค.

  • ε(x_t|c) : ์กฐ๊ฑด(e.g class, text)์ด ์ฃผ์–ด์ง„ noise prediction
  • ε(x_t) : ์กฐ๊ฑด์„ ์ฃผ์ง€ ์•Š์€ unconditional noise prediction
  • ์ด ๋‘˜์„ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์œผ๋กœ ๊ฒฐํ•ฉํ•ด guidance scale w๋ฅผ ์กฐ์ •ํ•œ๋‹ค.
  • ์ฆ‰, ๋ฌด์กฐ๊ฑด์ ์ธ ์˜ˆ์ธก์„ ๋นผ๋ฉด์„œ ์กฐ๊ฑด์— ์˜์กดํ•˜๋Š” ์ •๋„๋ฅผ w๋ฐฐ๋กœ ๊ฐ•์กฐํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด text-to-image์—์„œ guidance scale=7.5๋Š” text condition์„ 7.5๋ฐฐ ๋” ๊ฐ•์กฐํ•œ ์ƒ˜ํ”Œ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค.

  • w=1 ์ •๋„๋Š” ์กฐ๊ฑด์„ ์ ๋‹นํžˆ ๋ฐ˜์˜ (์•ฝํ•œ guidance)
  • w=7.5๋Š” ์กฐ๊ฑด์„ ๋งค์šฐ ๊ฐ•ํ•˜๊ฒŒ ๋ฐ˜์˜ (๋ณดํ†ต text2image์—์„œ ๋งŽ์ด ์”€)
  • w=15๋Š” ์˜คํžˆ๋ ค ๋ถˆ์•ˆ์ •ํ•˜๊ฑฐ๋‚˜ ๋ถ€์ž์—ฐ์Šค๋Ÿฌ์šด ๊ฒฐ๊ณผ๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ๋‹ค. (over guidance)
์ด๋Ÿฌํ•œ CFG๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ณ„๋„์˜ classifier๋ฅผ ๋”ฐ๋กœ ํ›ˆ๋ จํ•  ํ•„์š”๊ฐ€ ์—†์œผ๋ฏ€๋กœ ๋ชจ๋ธ ๊ฐœ๋ฐœ์ด ํ›จ์”ฌ ๋‹จ์ˆœํ•ด์ง„๋‹ค. ๋˜ํ•œ w๋ฅผ ์‰ฝ๊ฒŒ ๋ฐ”๊ฟ”๊ฐ€๋ฉฐ guidance ๊ฐ•๋„๋ฅผ ์‹คํ—˜์ ์œผ๋กœ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์–ด, ์—ฐ๊ตฌ๋‚˜ ์„œ๋น„์Šค์—์„œ prompt tuning์„ ๋น ๋ฅด๊ฒŒ ์‹œ๋„ํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

2. ํ•™์Šต ํŠธ๋ฆญ๊ณผ ์•ˆ์ •ํ™” ๊ธฐ๋ฒ•

2.1 Noise Schedule & Beta Schedule

DDPM์—์„œ forward process๋ž€, ๊นจ๋—ํ•œ ์ด๋ฏธ์ง€ x_0์— ์ ์  noise๋ฅผ ์ถ”๊ฐ€ํ•ด x_t๋กœ ๋ณ€ํ˜•์‹œ์ผœ ๊ฐ€๋Š” ๊ณผ์ •์ด๋‹ค.
์ด๋•Œ ๊ฐ timestep์—์„œ ์–ผ๋งˆ๋‚˜ noise๋ฅผ ์ถ”๊ฐ€ํ• ์ง€๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒƒ์ด ๋ฐ”๋กœ Noise Schedule (๋˜๋Š” Beta Schedule) ์ด๋‹ค.

  • Linear, Cosine, Exponential ๋“ฑ ๋‹ค์–‘ํ•œ beta schedule์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค.
    • Linear: ์ผ์ •ํ•œ ์†๋„๋กœ β_t๋ฅผ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ๊ตฌํ˜„์€ ๊ฐ„๋‹จํ•˜์ง€๋งŒ, ์ดˆ๊ธฐ step์—์„œ noise๊ฐ€ ๋„ˆ๋ฌด ๋น ๋ฅด๊ฒŒ ์ฆ๊ฐ€ํ•ด ์ด๋ฏธ์ง€ ์ •๋ณด๋ฅผ ์žƒ์„ ์ˆ˜ ์žˆ๋‹ค.
    • Cosine (DDPM Improved, Nichol & Dhariwal): β_t๋ฅผ ์ฒœ์ฒœํžˆ ์ฆ๊ฐ€์‹œ์ผœ, ์ดˆ๊ธฐ ๊ตฌ๊ฐ„์—์„œ ์ด๋ฏธ์ง€ ์ •๋ณด๋ฅผ ์˜ค๋ž˜ ๋ณด์กดํ•˜๊ณ  ํ•™์Šต ์•ˆ์ •์„ฑ์„ ๋†’์ธ๋‹ค.
    • Exponential: ์•ž๋ถ€๋ถ„์—์„œ ๋А๋ฆฌ๊ณ  ๋’ท๋ถ€๋ถ„์—์„œ ๋น ๋ฅด๊ฒŒ ์ฆ๊ฐ€ํ•œ๋‹ค. ์ผ๋ถ€ ์‹คํ—˜์—์„œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์žˆ์ง€๋งŒ ์ผ๋ฐ˜์ ์œผ๋กœ ์ž˜ ์“ฐ์ด์ง€๋Š” ์•Š๋Š”๋‹ค.

EDM์—์„œ์˜ SNR ๊ธฐ๋ฐ˜ ์„ค๊ณ„

EDM(Elucidating the Design Space of Diffusion Models)์€ β schedule ๋Œ€์‹  SNR(Signal-to-Noise Ratio) ๋ฅผ ์ง์ ‘ ์„ค๊ณ„ํ•˜๋Š” ๋ฐฉ์‹์„ ๋„์ž…ํ–ˆ๋‹ค.

SNR(t) = (signal power) / (noise power) = (α(t)^2) / (σ(t)^2)
  • SNR์€ ์œ„์™€ ๊ฐ™์ด ์ •์˜๋˜๋ฉฐ, timestep์ด ์ง„ํ–‰๋ ์ˆ˜๋ก ์ ์ง„์ ์œผ๋กœ ๊ฐ์†Œํ•œ๋‹ค.
  • EDM์—์„œ๋Š” noise scale σ(t)๋ฅผ power-law decay๋กœ ์„ค๊ณ„ํ•ด SNR์ด ์ผ์ •ํ•œ ๊ธฐ์šธ๊ธฐ๋กœ ์ค„์–ด๋“ค๋„๋ก ์„ค๊ณ„ํ–ˆ๋‹ค.
  • ์ด๋ฅผ ํ†ตํ•ด timestep๋ณ„ loss variance๋ฅผ ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€ํ•ด ํ•™์Šต์„ ์•ˆ์ •ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.

์ฆ‰, EDM์€ "์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋…ธ์ด์ฆˆ๋ฅผ ์–ผ๋งˆ๋‚˜ ๋„ฃ์„๊นŒ?" ๋Œ€์‹  "์‹œ๊ทธ๋„๊ณผ ๋…ธ์ด์ฆˆ์˜ ๋น„์œจ(SNR)์ด timestep์—์„œ ์–ด๋–ป๊ฒŒ ์ค„์–ด๋“ค์–ด์•ผ ํ•˜๋Š”์ง€"๋ฅผ ์ง์ ‘ ์„ค๊ณ„ํ•ด ํ•™์Šต ์•ˆ์ •์„ฑ๊ณผ ํšจ์œจ์„ ๊ทน๋Œ€ํ™”ํ•œ ๊ฒƒ

 

Noise schedule์€ diffusion ๋ชจ๋ธ์—์„œ ๊ฐ€์žฅ ํ•ต์‹ฌ์ ์ธ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์ค‘ ํ•˜๋‚˜์ด๋ฉฐ, ํ•™์Šต์˜ ์•ˆ์ •์„ฑ๊ณผ ์„ฑ๋Šฅ, ์ƒ˜ํ”Œ๋ง ํ’ˆ์งˆ์— ์ง๊ฒฐ๋˜๋Š” ์š”์†Œ๋‹ค. ํŠนํžˆ Cosine ์Šค์ผ€์ค„๊ณผ SNR ๊ธฐ๋ฐ˜ ์„ค๊ณ„๋Š” ์ตœ๊ทผ ๊ฑฐ์˜ ํ‘œ์ค€์ฒ˜๋Ÿผ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค.

 

2.2 V-Prediction 

๊ธฐ์กด DDPM์€ forward process์—์„œ noisy image x_t๋ฅผ ์ฃผ๊ณ , ๋ชจ๋ธ์ด ์ง์ ‘ added noise ε(epsilon) ์„ ์˜ˆ์ธกํ•˜๋„๋ก ํ•™์Šตํ–ˆ๋‹ค.

ํ•˜์ง€๋งŒ, velocity prediction (v-prediction)์€ x_t๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ velocity v๋ฅผ ์˜ˆ์ธกํ•˜๋„๋ก ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

v = (ε * σ) + (xโ‚€ * α)
v = (x_t - α * xโ‚€) / σ
  • ์—ฌ๊ธฐ์„œ velocity๋ž€ ์œ„์™€ ๊ฐ™์€ weighted sum์ด๋‹ค.
  • ์ฆ‰ v๋Š” noise์™€ data๋ฅผ ๊ฐ™์ด ์„ž์–ด ๋…ธ์ด์ฆˆ์™€ ์›๋ณธ ์ด๋ฏธ์ง€ ์ •๋ณด๋ฅผ ๋™์‹œ์— ๊ฐ€์ง„ latent velocity ๊ฐ™์€ ๊ฐœ๋…์ด๋‹ค.
  • ์ด๋Ÿฐ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜๋ฉด,
    • sigma parameterization์ด ํ›จ์”ฌ ์•ˆ์ •์ ์œผ๋กœ ์ž‘๋™ํ•œ๋‹ค.
    • timestep (noise scale)์ด ๊ทน๋‹จ์ ์œผ๋กœ ์ž‘๊ฑฐ๋‚˜ ํฌ๋”๋ผ๋„ loss variance๊ฐ€ ์ผ์ •ํ•ด์ ธ ํ•™์Šต์ด ๊ท ์ผํ•ด์ง„๋‹ค.
    • ํŠนํžˆ ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€๋‚˜ ๋‹ค์–‘ํ•œ β ์Šค์ผ€์ค„์„ ์“ธ ๋•Œ๋„ robustํ•˜๊ฒŒ ๋™์ž‘ํ•œ๋‹ค.

 

 

๊ธฐ์กด์—๋Š” noisy image์—์„œ noise๋งŒ ์ง์ ‘ ์˜ˆ์ธกํ–ˆ์ง€๋งŒ, EDM ์ดํ›„ velocity v๋ฅผ ์˜ˆ์ธกํ•จ์œผ๋กœ์จ noise scale(sigma)์— ๊ฐ•๊ฑดํ•˜๊ณ  loss variance๋ฅผ ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค. 

 

2.3 SNR weighting 

SNR weighting์€ EDM์—์„œ ์ œ์•ˆ๋œ ํ•™์Šต ์•ˆ์ •ํ™” ๊ธฐ๋ฒ•์œผ๋กœ, timestep ๋ณ„๋กœ loss๋ฅผ ์กฐ์ ˆํ•ด, ํŠน์ • ๋…ธ์ด์ฆˆ ๋ ˆ๋ฒจ์—๋งŒ ๊ณผ๋„ํ•˜๊ฒŒ ์น˜์šฐ์น˜์ง€ ์•Š๊ณ  ์ „ ๊ตฌ๊ฐ„์—์„œ ๊ท ํ˜•์žˆ๊ฒŒ ํ•™์Šต์ด ๋˜๋„๋ก ๋งŒ๋“œ๋Š” ๊ธฐ๋ฒ•์ด๋‹ค.

 

โœ… ๊ธฐ์กด Diffusion ๋ชจ๋ธ์˜ ๋ฌธ์ œ์ 

  • diffusion ๋ชจ๋ธ์€ timestep์ด ์ž‘์„ ๋• ์ด๋ฏธ์ง€๊ฐ€ ๊ฑฐ์˜ ์›๋ณธ์ด๋ผ loss๊ฐ€ ์ž‘๊ฒŒ ๋‚˜์˜ค๊ณ , timestep์ด ํด ๋• ์™„์ „ noise๋ผ ์˜๋ฏธ ์žˆ๋Š” gradient๊ฐ€ ์ž˜ ์•ˆ ๋‚˜์˜จ๋‹ค.
  • ๊ทธ๋ž˜์„œ ํŠน์ • ๊ตฌ๊ฐ„์—๋งŒ loss๊ฐ€ ์ง‘์ค‘๋˜๊ฑฐ๋‚˜, timestep๋ณ„๋กœ gradient variance๊ฐ€ ํฌ๊ฒŒ ๋‹ฌ๋ผ์ ธ ํ•™์Šต์ด ๋ถˆ์•ˆ์ •ํ•ด์ง„๋‹ค.

โœ… SNR weighting ๊ฐœ์„ 

  • timestep t์—์„œ signal-to-noise ratio (SNR) ๋ฅผ ๊ณ„์‚ฐํ•ด, SNR์— ๊ธฐ๋ฐ˜ํ•œ weight๋ฅผ loss์— ๊ณฑํ•ด์ค€๋‹ค.
  • SNR์ด ํฐ ๊นจ๋—ํ•œ ๊ตฌ๊ฐ„์—์„œ๋Š” loss๋ฅผ ์ž‘๊ฒŒ ๋งŒ๋“ค์–ด์„œ ๊ณผ๋„ํ•˜๊ฒŒ ํ•™์Šต๋˜์ง€ ์•Š๊ฒŒ ํ•˜๊ณ ,
  • SNR์ด ์ž‘์€ noisyํ•œ ๊ตฌ๊ฐ„์—์„œ๋Š” loss๋ฅผ ๋” ํฌ๊ฒŒ ์ฃผ์–ด ์ถฉ๋ถ„ํžˆ ํ•™์Šต๋˜๋„๋ก ํ•œ๋‹ค.
  • ๊นจ๋—ํ•œ ๊ตฌ๊ฐ„์€ ๋ชจ๋ธ์ด ์ด๋ฏธ ์ž˜ reconstructํ•˜๊ธฐ ์‰ฝ๊ธฐ ๋•Œ๋ฌธ์— ๊ตณ์ด loss๋ฅผ ํฌ๊ฒŒ ์ค„ ํ•„์š”๊ฐ€ ์—†๊ณ ,
  • noisyํ•œ ๊ตฌ๊ฐ„์—์„œ๋„ ์ž˜ ์ž‘๋™ํ•˜๋„๋ก loss๋ฅผ ๋ณด๊ฐ•ํ•ด์ค˜์•ผ diffusion process ์ „์ฒด๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค.

*SNR์€ t๊ฐ€ ์ž‘์„๋•Œ ํฌ๊ณ , t๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ์ž‘์•„์ง„๋‹ค.

 

2.4 Gradient Clipping & EMA

โœ… Gradient Clipping

  • diffusion ๋ชจ๋ธ์€ timestep๋งˆ๋‹ค loss scale์ด ๋‹ฌ๋ผ gradient exploding์ด ๋ฐœ์ƒํ•˜๊ธฐ ์‰ฝ๋‹ค.
  • ํŠนํžˆ ๊ณ ํ•ด์ƒ๋„๋‚˜ large batch๋กœ ํ•™์Šตํ•  ๋•Œ gradient norm์ด ์ˆœ๊ฐ„์ ์œผ๋กœ ๋„ˆ๋ฌด ์ปค์ ธ optimizer step์—์„œ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์ด์ƒ์น˜๋กœ ํŠˆ ์œ„ํ—˜์ด ์žˆ๋‹ค.
  • ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด gradient๊ฐ€ ํŠน์ • threshold๋ฅผ ๋„˜์œผ๋ฉด gradient๋ฅผ ๋น„์œจ์ ์œผ๋กœ ์ถ•์†Œํ•ด threshold๋ฅผ ๋„˜์ง€ ์•Š๋„๋ก ํ•˜๋Š” ๊ธฐ๋ฒ•์ด๋‹ค.
  • ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด optimizer๊ฐ€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋„ˆ๋ฌด ํฌ๊ฒŒ ์—…๋ฐ์ดํŠธํ•˜์ง€ ์•Š์•„ ํ•™์Šต์ด ์•ˆ์ •์ ์ด๋‹ค.

 

โœ… Exponential Moving Average (EMA)

θ_ema ← 0.999 * θ_ema + 0.001 * θ

 

EMA๋Š” ํ•™์Šต ์ค‘ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ํ‰๊ท ๋‚ด๋Š” ๋ฐฉ์‹์ด๋‹ค. ๊ธฐ๋ณธ ํŒŒ๋ผ๋ฏธํ„ฐ θ๋ฅผ ํ•™์Šตํ•˜๋ฉด์„œ, ๋™์‹œ์— ์•„๋ž˜์ฒ˜๋Ÿผ smoothed ํŒŒ๋ผ๋ฏธํ„ฐ θ_ema๋ฅผ ์—…๋ฐ์ดํŠธํ•œ๋‹ค.

  • ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด noisyํ•œ gradient update์˜ ๋‹จ๊ธฐ์  ์š”๋™์„ ์™„ํ™”ํ•ด ๋” ๋ถ€๋“œ๋Ÿฌ์šด ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.
  • ์‹ค์ œ๋กœ ์ƒ˜ํ”Œ๋ง(inference)์—์„œ๋Š” EMA ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์ด๋‹ค.
    • ์˜ˆ๋ฅผ ๋“ค์–ด DDPM, LDM, EDM ์ฝ”๋“œ์—์„œ๋„ ๋งˆ์ง€๋ง‰ sampling ๋‹จ๊ณ„๋Š” θ_ema๋ฅผ ๋ถˆ๋Ÿฌ์™€ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์— ์“ด๋‹ค.

 

๋ฐ˜์‘ํ˜•

'๐Ÿ› Research > Imageโ€ขVideo Generation' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Gen AI] Flow Matching & Rectified Flow ์ดํ•ดํ•˜๊ธฐ! | Diffusion ๋ณด๋‹ค ๋” ๋น ๋ฅธ ์ƒ์„ฑ ๋ฐฉ์‹  (2) 2025.07.31
[Gen AI] Diffusion Transformer (DiT) ์™„๋ฒฝ ์ดํ•ดํ•˜๊ธฐ!  (5) 2025.07.15
[Gen AI] LDM (Latent Diffusion Models) ๊ฐœ๋… ์„ค๋ช…  (1) 2025.06.29
[Gen AI] Diffusion Model๊ณผ DDPM ๊ฐœ๋… ์„ค๋ช…  (0) 2025.03.31
[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] DREAMFUSION: TEXT-TO-3D USING 2D DIFFUSION  (0) 2025.03.23
'๐Ÿ› Research/Image•Video Generation' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [Gen AI] Flow Matching & Rectified Flow ์ดํ•ดํ•˜๊ธฐ! | Diffusion ๋ณด๋‹ค ๋” ๋น ๋ฅธ ์ƒ์„ฑ ๋ฐฉ์‹
  • [Gen AI] Diffusion Transformer (DiT) ์™„๋ฒฝ ์ดํ•ดํ•˜๊ธฐ!
  • [Gen AI] LDM (Latent Diffusion Models) ๊ฐœ๋… ์„ค๋ช…
  • [Gen AI] Diffusion Model๊ณผ DDPM ๊ฐœ๋… ์„ค๋ช…
๋ญ…์ฆค
๋ญ…์ฆค
AI ๊ธฐ์ˆ  ๋ธ”๋กœ๊ทธ
    ๋ฐ˜์‘ํ˜•
  • ๋ญ…์ฆค
    moovzi’s Doodle
    ๋ญ…์ฆค
  • ์ „์ฒด
    ์˜ค๋Š˜
    ์–ด์ œ
  • ๊ณต์ง€์‚ฌํ•ญ

    • โœจ About Me
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (213)
      • ๐Ÿ“– Fundamentals (34)
        • Computer Vision (9)
        • 3D vision & Graphics (6)
        • AI & ML (16)
        • NLP (2)
        • etc. (1)
      • ๐Ÿ› Research (75)
        • Deep Learning (7)
        • Perception (19)
        • OCR (7)
        • Multi-modal (5)
        • Image•Video Generation (18)
        • 3D Vision (4)
        • Material • Texture Recognit.. (8)
        • Large-scale Model (7)
        • etc. (0)
      • ๐Ÿ› ๏ธ Engineering (8)
        • Distributed Training & Infe.. (5)
        • AI & ML ์ธ์‚ฌ์ดํŠธ (3)
      • ๐Ÿ’ป Programming (92)
        • Python (18)
        • Computer Vision (12)
        • LLM (4)
        • AI & ML (18)
        • Database (3)
        • Distributed Computing (6)
        • Apache Airflow (6)
        • Docker & Kubernetes (14)
        • ์ฝ”๋”ฉ ํ…Œ์ŠคํŠธ (4)
        • C++ (1)
        • etc. (6)
      • ๐Ÿ’ฌ ETC (4)
        • ์ฑ… ๋ฆฌ๋ทฐ (4)
  • ๋งํฌ

    • ๋ฆฌํ‹€๋ฆฌ ํ”„๋กœํ•„ (๋ฉ˜ํ† ๋ง, ๋ฉด์ ‘์ฑ…,...)
    • ใ€Ž๋‚˜๋Š” AI ์—”์ง€๋‹ˆ์–ด์ž…๋‹ˆ๋‹คใ€
    • Instagram
    • Brunch
    • Github
  • ์ธ๊ธฐ ๊ธ€

  • ์ตœ๊ทผ ๋Œ“๊ธ€

  • ์ตœ๊ทผ ๊ธ€

  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.3
๋ญ…์ฆค
[Gen AI] Diffusion ๋ชจ๋ธ ์ƒ˜ํ”Œ๋ง & ํ•™์Šต ํŠธ๋ฆญ ์ •๋ฆฌ
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”