๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ“– Theory/AI & ML

[AI/ML] ๋”ฅ๋Ÿฌ๋‹ ์ •๊ทœํ™” Regularization : Weight Decay, Batch Normalization, Early Stopping

by ๋ญ…์ฆค 2022. 3. 23.
๋ฐ˜์‘ํ˜•

๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์€ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋ณต์žกํ•œ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋†’์€ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ชจ๋ธ์ด ํ•™์Šต ๋ฐ์ดํ„ฐ์— ์ง€๋‚˜์น˜๊ฒŒ ๋งž์ถฐ์ ธ์„œ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์ด ๋–จ์–ด์ง€๋Š” '๊ณผ์ ํ•ฉ(overfitting)' ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ. ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ณ  ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด '์ •๊ทœํ™”(regularization)' ๊ธฐ๋ฒ•์ด ์‚ฌ์šฉ๋œ๋‹ค.


Regularization (์ •๊ทœํ™”)๋ž€ ๋ฌด์—‡์ธ๊ฐ€?

 

์ •๊ทœํ™”๋Š” ๋ชจ๋ธ์˜ ๋ณต์žก์„ฑ์„ ์ œ์–ดํ•˜์—ฌ ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๊ณ  ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๊ธฐ๋ฒ•์ด๋‹ค. ์ด๋Š” ๋ชจ๋ธ์˜ ํ•™์Šต ๊ณผ์ •์—์„œ ํŠน์ • ์ œ์•ฝ ์กฐ๊ฑด์„ ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ์ด๋ฃจ์–ด ์ง€๋Š”๋ฐ, ์ด๋Ÿฌํ•œ ์ œ์•ฝ ์กฐ๊ฑด์€ ๋ชจ๋ธ์ด ์ง€๋‚˜์น˜๊ฒŒ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๋งž์ถ”์ง€ ์•Š๋„๋ก ํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋” ์ž˜ ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.

 

๋Œ€ํ‘œ์ ์œผ๋กœ ์•„๋ž˜์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•๋“ค์ด ์žˆ๋‹ค.

 

 

  1. Weight Decay - L1, L2
  2. Batch Normalization
  3. Early Stopping

 

Weight Decay

  • Neural network์˜ ํŠน์ • weight๊ฐ€ ๋„ˆ๋ฌด ์ปค์ง€๋Š” ๊ฒƒ์€ ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋–จ์–ด๋œจ๋ ค overfitting ๋˜๊ฒŒ ํ•˜๋ฏ€๋กœ, weight์— ๊ทœ์ œ๋ฅผ ๊ฑธ์–ด์ฃผ๋Š” ๊ฒƒ์ด ํ•„์š”.
  • L1 regularization, L2 regularization ๋ชจ๋‘ ๊ธฐ์กด Loss function์— weight์˜ ํฌ๊ธฐ๋ฅผ ํฌํ•จํ•˜์—ฌ weight์˜ ํฌ๊ธฐ๊ฐ€ ์ž‘์•„์ง€๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šตํ•˜๋„๋ก ๊ทœ์ œ

 

L1 Regularization vs L2 Regularization

  • L1 Regularization : weight ์—…๋ฐ์ดํŠธ ์‹œ weight์˜ ํฌ๊ธฐ์— ๊ด€๊ณ„์—†์ด ์ƒ์ˆ˜๊ฐ’์„ ๋นผ๊ฒŒ ๋˜๋ฏ€๋กœ(loss function ๋ฏธ๋ถ„ํ•˜๋ฉด ํ™•์ธ ๊ฐ€๋Šฅ) ์ž‘์€ weight ๋“ค์€ 0์œผ๋กœ ์ˆ˜๋ ดํ•˜๊ณ , ๋ช‡๋ช‡ ์ค‘์š”ํ•œ weight ๋“ค๋งŒ ๋‚จ์Œ. ๋ช‡ ๊ฐœ์˜ ์˜๋ฏธ์žˆ๋Š” ๊ฐ’์„ ์‚ฐ์ถœํ•˜๊ณ  ์‹ถ์€ sparse model ๊ฐ™์€ ๊ฒฝ์šฐ์— L1 Regularization์ด ํšจ๊ณผ์ . ๋‹ค๋งŒ ์•„๋ž˜ ๊ทธ๋ฆผ์—์„œ ๋ณด๋“ฏ์ด ๋ฏธ๋ถ„ ๋ถˆ๊ฐ€๋Šฅํ•œ ์ง€์ ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— gradient-base learning ์—์„œ๋Š” ์ฃผ์˜๊ฐ€ ํ•„์š”.
  • L2 Regularization : weight ์—…๋ฐ์ดํŠธ ์‹œ weight์˜ ํฌ๊ธฐ๊ฐ€ ์ง์ ‘์ ์ธ ์˜ํ–ฅ์„ ๋ผ์ณ weight decay์— ๋”์šฑ ํšจ๊ณผ์ 

 

 

Batch Normalization

  • Gradient vanishing/exploding ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ํ•™์Šต ๊ณผ์ • ์ž์ฒด๋ฅผ ์•ˆ์ •ํ™”์‹œํ‚ค๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•
  • ํ•™์Šต์‹œ ๋„คํŠธ์›Œํฌ์˜ ๊ฐ layer ๋˜๋Š” activation ๋งˆ๋‹ค ์ž…๋ ฅ ๊ฐ’์˜ ๋ถ„ํฌ๊ฐ€ ๋‹ฌ๋ผ์ง€๋Š” "Internal Covariance Shift" ๊ฐ€ ๋ฐœ์ƒํ•˜๊ณ  ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ž…๋ ฅ๊ฐ’์˜ ๋ถ„ํฌ๋ฅผ ์กฐ์ •
  • ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ์กฐ์ •ํ•˜๋Š” ๊ณผ์ •์ด neural network ๋‚ด๋ถ€์— ํฌํ•จ๋˜์–ด ํ•™์Šต์‹œ batch์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ์ด์šฉํ•˜์—ฌ ์ •๊ทœํ™”
  • scale๊ณผ shift(bias)๋ฅผ ๊ฐ๋งˆ, ๋ฒ ํƒ€ ๊ฐ’์œผ๋กœ ์กฐ์ •
  • Inference ์‹œ์—๋Š” ๋ฐฐ์น˜ ๋‹จ์œ„์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ๊ตฌํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต ๋‹จ๊ณ„์—์„œ moving average ๋˜๋Š” exponential average๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•œ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ๊ณ ์ •๊ฐ’์œผ๋กœ ์‚ฌ์šฉ

 

Batch Normalization ํšจ๊ณผ

  • Gradient vanishing/exploding ์„ ์™„ํ™”ํ•˜๋ฏ€๋กœ ๋†’์€ learning rate ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต ์†๋„ ํ–ฅ์ƒ
  • Careful weight initialization์œผ๋กœ ๋ถ€ํ„ฐ ์ž์œ ๋กœ์›Œ์ง
  • Regularization ํšจ๊ณผ : BN ๊ณผ์ •์œผ๋กœ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์ด ์ง€์†์ ์œผ๋กœ ๋ณ€ํ•˜๊ณ  weight ์—…๋ฐ์ดํŠธ์—๋„ ์˜ํ–ฅ์„ ์ฃผ์–ด ํ•˜๋‚˜์˜ weight ๊ฐ€ ๋งค์šฐ ์ปค์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€.

 

Batch Normalization ์ฃผ์˜ ์‚ฌํ•ญ

  • Batch size ๊ฐ€ ๋„ˆ๋ฌด ํฌ๊ฑฐ๋‚˜ ์ž‘์œผ๋ฉด ํšจ๊ณผ๋ฅผ ๊ธฐ๋Œ€ํ•˜๊ธฐ ์–ด๋ ค์›€
  • ์‚ฌ์šฉ ์ˆœ์„œ : Convolution - BN - Activation - Pooling - ... (BN์˜ ๋ชฉ์ ์ด ๋„คํŠธ์›Œํฌ ์—ฐ์‚ฐ ๊ฒฐ๊ณผ๊ฐ€ ์›ํ•˜๋Š” ๋ฐฉํ–ฅ์˜ ๋ถ„ํฌ๋Œ€๋กœ ๋‚˜์˜ค๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด๋ฏ€๋กœ conv ์—ฐ์‚ฐ ๋ฐ”๋กœ ๋’ค์— ์ฃผ๋กœ ์‚ฌ์šฉ/ ์•„๋‹Œ ๊ฒฝ์šฐ๋„ ์žˆ์Šต๋‹ˆ๋‹ค.)
  • Multi GPU training ์‹œ ์ฃผ๋กœ "Synchronized Batch Normalization" ์‚ฌ์šฉ

 

 

Early Stopping

  • ๊ฒ€์ฆ ์†์‹ค(validation loss)์ด ๋” ์ด์ƒ ๊ฐ์†Œํ•˜์ง€ ์•Š์„ ๋•Œ ํ•™์Šต์„ ์ค‘์ง€ํ•˜๋Š” ๊ธฐ๋ฒ•
  • Deep Neural Network๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ํ•™์Šต์„ ๋„ˆ๋ฌด ๋งŽ์ดํ•˜๋ฉด ํŠน์ • epoch ์ดํ›„์—๋Š” overftting์ด ๋ฐœ์ƒํ•˜์—ฌ test ์„ฑ๋Šฅ ํ•˜๋ฝ
  • ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด validation set์„ ์ด์šฉํ•˜๋Š” ๋“ฑ์˜ ๋ฐฉ๋ฒ•์œผ๋กœ overfitting์ด ๋ฐœ์ƒํ•˜๊ธฐ ์ „์— ํ•™์Šต์„ ์ข…๋ฃŒ
๋ฐ˜์‘ํ˜•