๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๐Ÿ“– Theory/AI & ML

[AI/ML] Cross Entropy( + Loss) & MSE Loss ์„ค๋ช…

by ๋ญ…์ฆค 2022. 3. 23.
๋ฐ˜์‘ํ˜•
  • Information(์ •๋ณด๋Ÿ‰) : ๋ถˆํ™•์‹ค์„ฑ์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ์งˆ๋ฌธ์˜ ์ˆ˜ ๋˜๋Š” ์–ด๋–ค ์ด๋ฒคํŠธ๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ๊นŒ์ง€ ํ•„์š”ํ•œ ์‹œํ–‰์˜ ์ˆ˜ 
  • Entropy : ํ™•๋ฅ ๋ถ„ํฌ P(x)์— ๋Œ€ํ•œ ์ •๋ณด๋Ÿ‰์˜ ๊ธฐ๋Œ“๊ฐ’, ๋ถˆ๊ท ํ˜•ํ•œ ๋ถ„ํฌ๋ณด๋‹ค ๊ท ๋“ฑํ•œ ๋ถ„ํฌ์˜ ๊ฒฝ์šฐ ๋ถˆํ™•์‹ค์„ฑ์ด ๋” ๋†’๊ธฐ ๋•Œ๋ฌธ์— ์—”ํŠธ๋กœํ”ผ๊ฐ€ ๋” ๋†’์Œ
  • Cross Entropy : ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ P(x), ๋ชจ๋ธ์ด ์ถ”์ •ํ•˜๋Š” ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ Q(x)๋ผ ํ• ๋•Œ, ๋‘ ํ™•๋ฅ  ๋ถ„ํฌ P์™€ Q์˜ ์ฐจ์ด๋ฅผ ์ธก์ •ํ•˜๋Š” ์ง€ํ‘œ
  • KL-divergence : ๋‘ ํ™•๋ฅ  ๋ถ„ํฌ P, Q ๊ฐ€ ์žˆ์„ ๋•Œ, P๋ฅผ ๊ทผ์‚ฌํ•˜๊ธฐ ์œ„ํ•œ Q ๋ถ„ํฌ๋ฅผ ํ†ตํ•ด ์ƒ˜ํ”Œ๋งํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์ •๋ณด๋Ÿ‰์˜ ์†์‹ค (Cross Entropy(P,Q) - Entropy(P))

 

์ด ๋•Œ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ๋ชฉํ‘œ๋Š” ํ™•๋ฅ  ๋ถ„ํฌ P์™€ ๋ชจ๋ธ์˜ ์˜ˆ์ธก ํ™•๋ฅ  ๋ถ„ํฌ Q์˜ ์ฐจ์ด์ธ KL divergence๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์ด๊ณ , Entropy๋Š” ๊ณ ์ •๋œ ๊ฐ’์ด๋ฏ€๋กœ Cross Entropy๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

 

 

KL-divergence, Cross Entropy and Entropy

 

Cross Entropy Loss 

Cross Entropy

  • Classification ๋ฌธ์ œ์—์„œ ์ฃผ๋กœ cross entropy loss ๋ฅผ ์‚ฌ์šฉ
  • True distribution P๋Š” one-hot ์ธ์ฝ”๋”ฉ๋œ vector๋ฅผ ์‚ฌ์šฉ(Ground Truth)
  • Prediction distribution Q ๋Š” ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฐ’์œผ๋กœ softmax layer๋ฅผ ๊ฑฐ์นœ ํ›„์˜ ๊ฐ’์œผ๋กœ ํด๋ž˜์Šค ๋ณ„ ํ™•๋ฅ  ๊ฐ’์„ ๋ชจ๋‘ ํ•ฉ์น˜๋ฉด 1

 

e.g.) P = [0, 1, 0], Q = [0.2, 0.7, 0.1] ์ผ ๋•Œ, cross entropy loss ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

 

Mean Squared Error (MSE) Loss

  • ์˜ˆ์ธก ๊ฐ’๊ณผ ์ •๋‹ต๊ณผ์˜ ์ฐจ์ด๋ฅผ ์ œ๊ณฑํ•˜์—ฌ ํ‰๊ท ์„ ๋‚ธ ๊ฐ’
  • ์˜ค์ฐจ๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ์ œ๊ณฑ ์—ฐ์‚ฐ์œผ๋กœ ์ธํ•ด ๊ฐ’์ด ๋šœ๋ ทํ•ด์ง
  • ์—ฐ์†์ ์ธ ๋ถ„ํฌ๋ฅผ ์ถ”์ •ํ•˜๋Š” regression ์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ

 

 

Cross Entropy Loss vs. MSE Loss
  • ๋ฐ์ดํ„ฐ๊ฐ€ ์—ฐ์†์ ์ธ ๋ถ„ํฌ์ธ gaussian ๋ถ„ํฌ์— ๊ฐ€๊นŒ์šธ ๋•Œ(continuous) → MSE Loss
  • ๋ฐ์ดํ„ฐ๊ฐ€ categoricalํ•œ bernoulli ๋ถ„ํฌ์— ๊ฐ€๊นŒ์šธ ๋•Œ(discrete) → Cross Entropy Loss

 

*ํ™•๋ฅ  ๋ถ„ํฌ ๊ด€์ ์—์„œ ๋”ฅ๋Ÿฌ๋‹ ๋„คํŠธ์›Œํฌ์˜ ์ถœ๋ ฅ์€ ์ •ํ•ด์ง„ ํ™•๋ฅ ๋ถ„ํฌ(๊ฐ€์šฐ์‹œ์•ˆ, ๋ฒ ๋ฃจ๋ˆ„์ด,..)์—์„œ ์ถœ๋ ฅ์ด ๋‚˜์˜ฌ ํ™•๋ฅ ์ด๋‹ค. ํ•™์Šต์‹œํ‚ค๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ f(x)์˜ ์—ญํ• ์€ ํ™•๋ฅ ๋ถ„ํฌ์˜ ๋ชจ์ˆ˜๋ฅผ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์ด๊ณ , ๊ณ„์‚ฐ๋œ loss๋Š” ์ถ”์ •๋œ ๋ถ„ํฌ์—์„œ ground truth์˜ likelihood๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. Loss๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋”ฅ๋Ÿฌ๋‹ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ฒƒ์€ likelihood๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๊ฒƒ.

๋ฐ˜์‘ํ˜•