๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
728x90

๐Ÿ“– Theory/AI & ML12

[ML] Cross Entropy( + Loss) & MSE Loss ์„ค๋ช… Information(์ •๋ณด๋Ÿ‰) : ๋ถˆํ™•์‹ค์„ฑ์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ์งˆ๋ฌธ์˜ ์ˆ˜ ๋˜๋Š” ์–ด๋–ค ์ด๋ฒคํŠธ๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ๊นŒ์ง€ ํ•„์š”ํ•œ ์‹œํ–‰์˜ ์ˆ˜ Entropy : ํ™•๋ฅ ๋ถ„ํฌ P(x)์— ๋Œ€ํ•œ ์ •๋ณด๋Ÿ‰์˜ ๊ธฐ๋Œ“๊ฐ’, ๋ถˆ๊ท ํ˜•ํ•œ ๋ถ„ํฌ๋ณด๋‹ค ๊ท ๋“ฑํ•œ ๋ถ„ํฌ์˜ ๊ฒฝ์šฐ ๋ถˆํ™•์‹ค์„ฑ์ด ๋” ๋†’๊ธฐ ๋•Œ๋ฌธ์— ์—”ํŠธ๋กœํ”ผ๊ฐ€ ๋” ๋†’์Œ Cross Entropy : ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ P(x), ๋ชจ๋ธ์ด ์ถ”์ •ํ•˜๋Š” ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ Q(x)๋ผ ํ• ๋•Œ, ๋‘ ํ™•๋ฅ  ๋ถ„ํฌ P์™€ Q์˜ ์ฐจ์ด๋ฅผ ์ธก์ •ํ•˜๋Š” ์ง€ํ‘œ KL-divergence : ๋‘ ํ™•๋ฅ  ๋ถ„ํฌ P, Q ๊ฐ€ ์žˆ์„ ๋•Œ, P๋ฅผ ๊ทผ์‚ฌํ•˜๊ธฐ ์œ„ํ•œ Q ๋ถ„ํฌ๋ฅผ ํ†ตํ•ด ์ƒ˜ํ”Œ๋งํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์ •๋ณด๋Ÿ‰์˜ ์†์‹ค (Cross Entropy(P,Q) - Entropy(P)) ์ด ๋•Œ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ๋ชฉํ‘œ๋Š” ํ™•๋ฅ  ๋ถ„ํฌ P์™€ ๋ชจ๋ธ์˜ ์˜ˆ์ธก ํ™•๋ฅ  ๋ถ„ํฌ Q์˜ ์ฐจ์ด์ธ KL .. 2022. 3. 23.
[ML] Classification๊ณผ Regression์˜ ์ฐจ์ด Classification(๋ถ„๋ฅ˜)๊ณผ Regression(ํšŒ๊ท€) ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ supervised learning(์ง€๋„ ํ•™์Šต) ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ, ๋‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ชจ๋‘ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ์˜ˆ์ธก์— ์‚ฌ์šฉ๋˜๋ฉฐ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ๋ฐ์ดํ„ฐ์…‹๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉ๋œ๋‹ค. Classification๊ณผ Regression์˜ ์ฃผ๋œ ์ฐจ์ด๋Š” classification์˜ ๊ฒฝ์šฐ ๋‚จ์„ฑ vs ์—ฌ์„ฑ, ์ฐธ vs ๊ฑฐ์ง“ ๋“ฑ์˜ discreteํ•œ value๋ฅผ ์˜ˆ์ธก/๋ถ„๋ฅ˜ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋˜๊ณ , regression์€ ๊ฐ€๊ฒฉ, ๊ธ‰์—ฌ, ๋‚˜์ด ๋“ฑ๊ณผ ๊ฐ™์€ continuousํ•œ value๋ฅผ ์˜ˆ์ธกํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋œ๋‹ค๋Š” ์ ์ด๋‹ค. Classification ๋ฐ์ดํ„ฐ์…‹์„ ๋‹ค์–‘ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํด๋ž˜์Šค๋ฅผ ๋‚˜๋ˆ„๋Š”๋ฐ ๋„์›€์ด ๋˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ฐพ๋Š” ํ”„๋กœ์„ธ์Šค ์ž…๋ ฅ x๋ฅผ discreteํ•œ ์ถœ๋ ฅ y์— ๋งคํ•‘ํ•˜๋Š” ๋งคํ•‘ ํ•จ์ˆ˜๋ฅผ ์ฐพ๋Š”.. 2022. 3. 23.
[ML] Classification ์„ฑ๋Šฅ ํ‰๊ฐ€ ๋ฐฉ๋ฒ• Classification์˜ ์„ฑ๋Šฅ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์ธ TP, TN, FP, FN, Recall, Precision, ROC,... ๋“ฑ์„ ์ •๋ฆฌ Binary classification ์„ฑ๋Šฅ ํ‰๊ฐ€ True Positive(TP) : P → P ๋กœ ์˜ˆ์ธก (์ •๋‹ต) True Negative(TN) : N → N ์œผ๋กœ ์˜ˆ์ธก (์ •๋‹ต) False Positive(FP) : N → P ๋กœ ์˜ˆ์ธก (์˜ค๋‹ต) False Negative(FN) : P → N ์œผ๋กœ ์˜ˆ์ธก (์˜ค๋‹ต) Accurarcy : ๋ชจ๋“  ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์ค‘, ์ž˜ ๋ถ„๋ฅ˜ํ•œ ๋ฐ์ดํ„ฐ์˜ ๋น„์œจ Recall : Positive ์ค‘์— Positive๋ผ๊ณ  ์˜ˆ์ธกํ•œ ๋น„์œจ. (P→P)์™€ (P→N) ์ค‘ (P→P)์˜ ๋น„์œจ Precision : Positive ๋ผ๊ณ  ์˜ˆ์ธกํ•œ ๊ฒƒ ์ค‘ ์‹ค์ œ positive.. 2022. 3. 23.
[ML] Bias์™€ Variance : ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ ํ‰๊ฐ€ ๋ฐฉ๋ฒ• Bias : ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์œผ๋กœ ์–ป์€ ์˜ˆ์ธก๊ฐ’๊ณผ ์ •๋‹ต(Ground Truth) ์™€์˜ ์ฐจ์ด์˜ ํ‰๊ท  Variance : ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์…‹์—์„œ ์˜ˆ์ธก๊ฐ’์ด ์–ผ๋งˆ๋‚˜ ๋ณ€ํ™”ํ•  ์ˆ˜ ์žˆ๋Š”์ง€์— ๋Œ€ํ•œ ๊ฐ’ ๋จธ์‹ ๋Ÿฌ๋‹์—์„œ bias์™€ variance๋Š” ๋ชจ๋ธ์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ํ•™์Šต๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜๋Š” ์ฒ™๋„ ์ค‘ ํ•˜๋‚˜๋กœ, ๊ฐ€์žฅ ์ข‹์€ ๊ฒฝ์šฐ๋Š” bias์™€ variance๊ฐ€ ๋ชจ๋‘ ๋‚ฎ์€ ๊ฒฝ์šฐ์ด๋‹ค. (์•„๋ž˜ ๊ทธ๋ฆผ ์ฐธ๊ณ ) ๊ทธ๋Ÿฐ๋ฐ ์œ„์˜ ๋‚ด์šฉ์€ ๋„ˆ๋ฌด ๋‹น์—ฐํ•œ ๋‚ด์šฉ์ด์ž ๊ฒฐ๊ณผ๋ก ์ ์ธ ์ด์•ผ๊ธฐ์ด๊ณ , bias์™€ variance๋Š” ๋ชจ๋ธ ํ•™์Šต๊ณผ ์—ฐ๊ด€์ง€์–ด ์ƒ๊ฐํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค. ํ•™์Šต์ด ๋œ๋œ underfitting ๊ตฌ๊ฐ„์—๋Š” ํ•™์Šต๋ฐ์ดํ„ฐ ์…‹์˜ ์˜ˆ์ธก๊ฐ’๋„ ๋งŽ์ด ํ‹€๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์— bias๊ฐ€ ๋†’์€ ์ƒํƒœ์ด๊ณ , ์ ์ ˆํ•œ ํ•™์Šต ์ข…๋ฃŒ ์ง€์ ์„ ์ง€๋‚œ ๊ตฌ๊ฐ„์—์„œ๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋ชจ๋ธ์˜ loss๋ฅผ ์ตœ์†Œํ™” ํ•˜๊ธฐ ์œ„ํ•ด ov.. 2022. 3. 22.
[ML] Gradient Descent Algorithms (๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•) ๊ฐœ๋… Gradient Descent (๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•)์€ 1์ฐจ ๊ทผ์‚ฟ๊ฐ’ ๋ฐœ๊ฒฌ์šฉ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ํ•จ์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•˜๊ณ  ๊ฒฝ์‚ฌ์˜ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์œผ๋กœ ๊ณ„์† ์ด๋™์‹œ์ผœ ๊ทน๊ฐ’์— ์ด๋ฅผ ๋•Œ๊นŒ์ง€ ๋ฐ˜๋ณตํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ๋จธ์‹ ๋Ÿฌ๋‹์—๋Š” Gradient Descent ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ์˜ค์ฐจ(Loss)๊ฐ€ ์ž‘์•„์ง€๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋ชจ๋ธ์„ ์—…๋ฐ์ดํŠธ(ํ•™์Šต)์‹œํ‚ค๋Š” ๊ฒƒ SGD(Stochastic Gradient Descent) Batch Gradient Descent : ์ „์ฒด Dataset์— ๋Œ€ํ•ด parameter ๋“ค์˜ gradient๋ฅผ ๊ตฌํ•จ(๋งŽ์€ memory ํ•„์š”) SGD : ์ „์ฒด dataset์—์„œ mini-batch ๋งŒํผ์˜ gradient๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ parameter update Batch gradient descent ๋ณด๋‹ค ๋น ๋ฅด์ง€๋งŒ local mini.. 2022. 1. 13.
[ML] Back Propagation (์˜ค์ฐจ ์—ญ์ „ํŒŒ) ๊ฐœ๋… ๋ฐ ์˜ˆ์‹œ Back propagation ? - ๊ธฐ์กด์— ์„ค์ •๋œ weight, bias ๋กœ feed forward๋ฅผ ์ง„ํ–‰ํ•˜์—ฌ prediction๊ฐ’๊ณผ ground truth ๊ฐ’์˜ ์ฐจ์ด์ธ Loss๋ฅผ ๊ณ„์‚ฐ - Loss๋ฅผ forward ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์œผ๋กœ ์ „ํŒŒ์‹œํ‚ค๋ฉฐ weight, bias๋ฅผ loss๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ update (gradient descent ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์‚ฌ์šฉ) Back Propagation ๊ธฐ๋ณธ ์Šคํ… - Weight, Bias์˜ ๋ณ€ํ™”์— ๋”ฐ๋ฅธ ์ตœ์ข… loss์˜ ๋ณ€ํ™”๋Ÿ‰์„ chain rule์„ ์ด์šฉํ•˜์—ฌ ๋ถ„๋ฆฌํ•ด์„œ ๊ณ„์‚ฐ - ์ด๋Ÿฌํ•œ ๊ทน์†Œ ๋ฏธ๋ถ„์„ ํ–‰๋ ฌ๋กœ ํ‘œํ˜„๋˜๋Š” ์ˆ˜ํ•™ ๊ณต์‹์„ ์ด์šฉํ•˜์—ฌ ๊ณ„์‚ฐ Back propagation ์˜ˆ์‹œ · ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’(Hidden layer 1๊ฐœ) - ๊ฐ layer์˜ parameter๊ฐ’ · Weigh.. 2022. 1. 12.
728x90