[AI/ML] ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ• (Gradient Descent Algorithms): ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜

2024. 7. 11. 22:14ยท๐Ÿ“– Fundamentals/AI & ML
๋ฐ˜์‘ํ˜•

์˜ค๋Š˜์€ ๋”ฅ๋Ÿฌ๋‹์—์„œ ํ•ต์‹ฌ์ ์ธ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ Gradient Descent (๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•) ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋ ค๊ณ  ํ•ด์š”. ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐ ์–ด๋–ป๊ฒŒ ํ™œ์šฉ๋˜๋Š”์ง€ ํ•จ๊ป˜ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์˜ ๊ฐœ๋…

Gradient Descent๋Š” ํ•จ์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ(Gradient)๋ฅผ ์ด์šฉํ•˜์—ฌ ํ•จ์ˆ˜์˜ ์ตœ์†Ÿ๊ฐ’์„ ์ฐพ์•„๊ฐ€๋Š” ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ๋”ฅ๋Ÿฌ๋‹์—์„œ๋Š” ์†์‹ค ํ•จ์ˆ˜(Loss Function)์˜ ๊ฐ’์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋งŽ์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

 

๋™์ž‘ ์›๋ฆฌ

  1. ๊ธฐ๋ณธ ๊ฐœ๋…:
    • ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์€ ์ดˆ๊ธฐ์— ์ž„์˜์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์—์„œ ์‹œ์ž‘ํ•˜์—ฌ, ๊ฐ ๋‹จ๊ณ„์—์„œ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ๊ทธ๋ž˜๋””์–ธํŠธ์˜ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์œผ๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•˜์—ฌ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  2. ์ฃผ์š” ๋‹จ๊ณ„:
    • ์ดˆ๊ธฐํ™”(Initialization): ํŒŒ๋ผ๋ฏธํ„ฐ(๊ฐ€์ค‘์น˜์™€ ํŽธํ–ฅ)๋ฅผ ์ž„์˜์˜ ๊ฐ’์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
    • ์ˆœ์ „ํŒŒ(Forward Propagation): ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๋„คํŠธ์›Œํฌ์— ์ฃผ์ž…ํ•˜๊ณ  ์˜ˆ์ธก๊ฐ’์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ค์ฐจ ๊ณ„์‚ฐ(Error Calculation): ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’ ์‚ฌ์ด์˜ ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
    • ์—ญ์ „ํŒŒ(Backward Propagation): ์˜ค์ฐจ๋ฅผ ์—ญ๋ฐฉํ–ฅ์œผ๋กœ ์ „ํŒŒํ•˜์—ฌ ๊ฐ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•œ ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
    • ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ(Parameter Update): ๊ณ„์‚ฐ๋œ ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•˜์—ฌ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ฐ’์„ ์ตœ์†Œํ™”ํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์˜ ์ข…๋ฅ˜

๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ• ์ข…๋ฅ˜

 

- SGD(Stochastic Gradient Descent)

  • Batch Gradient Descent : ์ „์ฒด Dataset์— ๋Œ€ํ•ด parameter ๋“ค์˜ gradient๋ฅผ ๊ตฌํ•จ(๋งŽ์€ memory ํ•„์š”)
  • SGD : ์ „์ฒด dataset์—์„œ mini-batch ๋งŒํผ์˜ gradient๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ parameter update
  • Batch gradient descent ๋ณด๋‹ค ๋น ๋ฅด์ง€๋งŒ local minima๋ฅผ ์ž˜ ๋น ์ ธ๋‚˜๊ฐ€์ง€๋Š” ๋ชปํ•จ

 

 

- Momentum

 

  • SGD+Momentum : momentum ์„ฑ๋ถ„์„ ์ถ”๊ฐ€ํ•˜์—ฌ local minima๋ฅผ ํƒˆ์ถœํ•  ์ˆ˜ ์žˆ์Œ
  • ๋‹จ์  : Global minima์—์„œ ๋ฉˆ์ถ”์ง€ ๋ชปํ•˜๊ณ  ๋„˜์–ด๊ฐ€ ๋ฒ„๋ฆด ์ˆ˜ ์žˆ์Œ

 

 

 

- NAG(Nesterov Accelerated Gradient)

 

  • Momentum step์„ ๋ฐŸ๊ณ  ์ด๋™ํ•œ ์œ„์น˜์—์„œ gradient ๊ณ„์‚ฐํ•˜์—ฌ ์ด๋™ -> minima์— ์•ˆ์ •์ ์œผ๋กœ ๋„๋‹ฌ ๊ฐ€๋Šฅ
  • ๋‹จ์  : ๋ชจ๋“  Parameter๋“ค์˜ step size๊ฐ€ ๋™์ผ(์ตœ์ ํ™”์— ๊ฐ€๊นŒ์›Œ์ง„ ๊ฐ’๋„ ์žˆ๊ณ  ๋จผ ๊ฐ’๋„ ์žˆ์„ ํ…Œ๋‹ˆ๊นŒ)

 

 

- Adagrad(Adaptive Gradient)

 

  • Gt ๋Š” gradient ๊ฐ’์„ ๋ฐ›์•„ ์–ผ๋งˆ๋‚˜ ๋ณ€ํ•ด์™”๋Š”์ง€๋ฅผ ์•Œ ์ˆ˜ ์žˆ๊ณ , Gt-1 ๊ฐ’์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์ด์ „ ๊ฐ’์„ ์ถ”๊ฐ€
  • ์–ผ๋งˆ๋‚˜ ๋ณ€ํ•ด์™”๋Š”์ง€์— ๋Œ€ํ•œ ๊ฐ’์„ ๋ฐ˜๋น„๋ก€์‹œ์ผœ ๊ณฑํ•˜๋ฏ€๋กœ ๊ฐ parameter๊ฐ’์ด ๋ณ€ํ•œ ์ •๋„์— ๋”ฐ๋ผ Learning rate๋ฅผ ์กฐ์ ˆ
  • ๋‹จ์  : Gt์˜ ๊ฐ’์ด ๊ณ„์† ๋”ํ•ด์ ธ ๊ฐ€๋ฏ€๋กœ ๋ฐœ์‚ฐํ•  ์ˆ˜ ์žˆ์Œ -> Step size๊ฐ€ ๋งค์šฐ ์ž‘์•„์ง

 

 

- RMSProp

 

  • Gt์— γ ๊ฐ’์„ ์ถ”๊ฐ€ํ•ด์„œ ๋ฐœ์‚ฐํ•˜์ง€ ์•Š๋„๋ก ์กฐ์ ˆ -> step size ์ž‘์•„์ง€๋Š” ๊ฒƒ ๋ฐฉ์ง€

 

 

- Adam(Adaptive Momentum Estimation)

  • NAG์™€ RMSprop์˜ ์žฅ์ ์„ ํ•ฉ์นจ
  • β_1=0.9,β_2=0.999 ์ผ ๋•Œ, (1-β_1 )=0.1,(1-β_2 )=0.001 ์ด ๋˜์–ด, m>>v -> ์ฒซ step ๋„ˆ๋ฌด ์ปค์ง

 

๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์˜ ์ค‘์š”์„ฑ

  • ๋ชจ๋ธ ์ตœ์ ํ™”: ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐ ํ•„์ˆ˜์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.
  • ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹: ํ•™์Šต๋ฅ (Learning Rate)๊ณผ ๊ฐ™์€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•˜์—ฌ ํ•™์Šต ์†๋„์™€ ์ตœ์ ํ™” ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์˜ค๋Š˜์€ Gradient Descent์˜ ๊ฐœ๋…๊ณผ ์ข…๋ฅ˜์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋”ฅ๋Ÿฌ๋‹์—์„œ ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์€ ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜๊ณ  ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐ ์žˆ์–ด์„œ ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค. 

๋ฐ˜์‘ํ˜•

'๐Ÿ“– Fundamentals > AI & ML' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[ML] ์ „ํ†ต์ ์ธ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ: SVM, Decision Tree, Random Forest, Gradient Boosting  (0) 2024.12.09
[AI/ML] Matrix Factorization(ํ–‰๋ ฌ ๋ถ„ํ•ด)์™€ ๋จธ์‹ ๋Ÿฌ๋‹  (0) 2024.12.09
[AI/ML] ๋”ฅ๋Ÿฌ๋‹ ํ•ต์‹ฌ ๊ฐœ๋…: Back Propagation (์˜ค์ฐจ ์—ญ์ „ํŒŒ)์˜ ์ดํ•ด์™€ ์‘์šฉ  (0) 2024.07.11
[AI/ML] ๋จธ์‹ ๋Ÿฌ๋‹ : ์ง€๋„ํ•™์Šต, ๋น„์ง€๋„ํ•™์Šต, ๊ฐ•ํ™”ํ•™์Šต ์ดํ•ดํ•˜๊ธฐ  (0) 2024.07.11
[AI/ML] ๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ •์˜์™€ ๊ธฐ๋ณธ ๊ฐœ๋…  (0) 2024.07.11
'๐Ÿ“– Fundamentals/AI & ML' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [ML] ์ „ํ†ต์ ์ธ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ: SVM, Decision Tree, Random Forest, Gradient Boosting
  • [AI/ML] Matrix Factorization(ํ–‰๋ ฌ ๋ถ„ํ•ด)์™€ ๋จธ์‹ ๋Ÿฌ๋‹
  • [AI/ML] ๋”ฅ๋Ÿฌ๋‹ ํ•ต์‹ฌ ๊ฐœ๋…: Back Propagation (์˜ค์ฐจ ์—ญ์ „ํŒŒ)์˜ ์ดํ•ด์™€ ์‘์šฉ
  • [AI/ML] ๋จธ์‹ ๋Ÿฌ๋‹ : ์ง€๋„ํ•™์Šต, ๋น„์ง€๋„ํ•™์Šต, ๊ฐ•ํ™”ํ•™์Šต ์ดํ•ดํ•˜๊ธฐ
๋ญ…์ฆค
๋ญ…์ฆค
AI ๊ธฐ์ˆ  ๋ธ”๋กœ๊ทธ
    ๋ฐ˜์‘ํ˜•
  • ๋ญ…์ฆค
    CV DOODLE
    ๋ญ…์ฆค
  • ์ „์ฒด
    ์˜ค๋Š˜
    ์–ด์ œ
  • ๊ณต์ง€์‚ฌํ•ญ

    • โœจ About Me
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (196)
      • ๐Ÿ“– Fundamentals (33)
        • Computer Vision (9)
        • 3D vision & Graphics (6)
        • AI & ML (15)
        • NLP (2)
        • etc. (1)
      • ๐Ÿ› Research (63)
        • Deep Learning (7)
        • Image Classification (2)
        • Detection & Segmentation (17)
        • OCR (7)
        • Multi-modal (4)
        • Generative AI (6)
        • 3D Vision (2)
        • Material & Texture Recognit.. (8)
        • NLP & LLM (10)
        • etc. (0)
      • ๐ŸŒŸ AI & ML Tech (7)
        • AI & ML ์ธ์‚ฌ์ดํŠธ (7)
      • ๐Ÿ’ป Programming (85)
        • Python (18)
        • Computer Vision (12)
        • LLM (4)
        • AI & ML (17)
        • Database (3)
        • Apache Airflow (6)
        • Docker & Kubernetes (14)
        • ์ฝ”๋”ฉ ํ…Œ์ŠคํŠธ (4)
        • C++ (1)
        • etc. (6)
      • ๐Ÿ’ฌ ETC (3)
        • ์ฑ… ๋ฆฌ๋ทฐ (3)
  • ๋งํฌ

  • ์ธ๊ธฐ ๊ธ€

  • ํƒœ๊ทธ

    OpenAI
    ๋„์ปค
    Computer Vision
    Python
    ํ”„๋กฌํ”„ํŠธ์—”์ง€๋‹ˆ์–ด๋ง
    airflow
    OCR
    deep learning
    LLM
    multi-modal
    3D Vision
    ๋”ฅ๋Ÿฌ๋‹
    pytorch
    VLP
    GPT
    segmentation
    Text recognition
    ๊ฐ์ฒด๊ฒ€์ถœ
    Image Classification
    AI
    ํŒŒ์ด์ฌ
    ๊ฐ์ฒด ๊ฒ€์ถœ
    nlp
    ์ปดํ“จํ„ฐ๋น„์ „
    object detection
    CNN
    material recognition
    OpenCV
    ChatGPT
    pandas
  • ์ตœ๊ทผ ๋Œ“๊ธ€

  • ์ตœ๊ทผ ๊ธ€

  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.3
๋ญ…์ฆค
[AI/ML] ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ• (Gradient Descent Algorithms): ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”