[Model Inference] Pytorch 2.0 Compile ์‚ฌ์šฉ ํ›„๊ธฐ ๋ฐ ์žฅ๋‹จ์  | pytorch compile ๋ชจ๋ธ ์ถ”๋ก  ์†๋„ ๊ฐœ์„  ํ…Œ์ŠคํŠธ

2023. 10. 7. 14:06ยท๐Ÿ’ป Programming/AI & ML
๋ฐ˜์‘ํ˜•

Pytorch Compile ํŠœํ† ๋ฆฌ์–ผ

Pytorch 2.0 Overview

 

Pytorch 2.0

Speedups for torch.compile against eager mode on an NVIDIA A100 GPU

 

compiled_model = torch.compile(model)

๊ธด ์„ค๋ช…ํ•  ๊ฒƒ ์—†์ด Pytorch 2.0 ์ดํ›„ compile ์ด๋ผ๋Š” ๊ฒƒ์ด ์ถ”๊ฐ€๋˜์—ˆ๋Š”๋ฐ, ์œ„ ์˜ˆ์‹œ์ฒ˜๋Ÿผ torch.comile(model)์ด๋ผ๋Š” ์งง์€ ์ฝ”๋“œ ํ•œ ์ค„๋งŒ ์ถ”๊ฐ€ํ•˜๋ฉด ๋ชจ๋ธ ์ธํผ๋Ÿฐ์Šค ์†๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.

 

๊ณต์‹ ๋ฌธ์„œ์— ๋”ฐ๋ฅด๋ฉด A100 GPU์—์„œ ๋ชจ๋ธ ํ•™์Šต ์†๋„๋Š” 43% ํ–ฅ์ƒ๋˜๊ณ , ๋ชจ๋ธ ์ธํผ๋Ÿฐ์Šค ์†๋„๋Š” Float32 precision์—์„œ 21%, AMP precision์—์„œ 51% ์ •๋„ ํ–ฅ์ƒ๋œ๋‹ค๊ณ  ํ•œ๋‹ค.

 

 

Pytorch 2.0 Compile ๋ชจ๋ธ ์ธํผ๋Ÿฐ์Šค ํ…Œ์ŠคํŠธ

torchvision์—์„œ ์ œ๊ณตํ•˜๋Š” ๊ธฐ๋ณธ์ ์ธ ๋ชจ๋ธ์ธ resnet50์œผ๋กœ pytorch compile์˜ ์†๋„ ๊ฐœ์„  ํ…Œ์ŠคํŠธ๋ฅผ ์ง„ํ–‰ํ•ด๋ดค๋‹ค.

 

# ํ…Œ์ŠคํŠธ ์ฝ”๋“œ

import torch
import torchvision.models
import numpy as np

model = torchvision.models.resnet50()
model.eval().cuda()

starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
repetitions = 10
timings=np.zeros((repetitions,1))

B = 1
H = 1000
W = 1000
input_data = torch.randn((B, 3, H, W)).float().cuda()

for _ in range(5):
    _ = model(input_data)
torch.cuda.synchronize()

with torch.no_grad():
    for rep in range(repetitions):
        starter.record()
        torch_out = model(input_data)
        ender.record()
        torch.cuda.synchronize()
        curr_time = starter.elapsed_time(ender)
        timings[rep] = curr_time
        # new_H = torch.randint(H-100,H+100,(1,))
        # new_W = torch.randint(W-100,W+100,(1,))
        # input_data = torch.randn((1, 3, new_H, new_W)).float().cuda()

print('torch ๋ชจ๋ธ ํ‰๊ท  ์†Œ์š” ์‹œ๊ฐ„ : ', np.mean(np.array(timings)))

# model_compiled = torch.compile(model, dynamic=True)
model_compiled = torch.compile(model)
del model

for _ in range(5):
    _ = model_compiled(input_data)
torch.cuda.synchronize()


timings=np.zeros((repetitions,1))
with torch.no_grad():
    for rep in range(repetitions):
        starter.record()
        complied_out = model_compiled(input_data)
        ender.record()
        torch.cuda.synchronize()
        curr_time = starter.elapsed_time(ender)
        timings[rep] = curr_time
        # new_H = torch.randint(H-100,H+100,(1,))
        # new_W = torch.randint(W-100,W+100,(1,))
        # input_data = torch.randn((1, 3, new_H, new_W)).float().cuda()

print('compiled ๋ชจ๋ธ ํ‰๊ท  ์†Œ์š” ์‹œ๊ฐ„ : ', np.mean(np.array(timings)))

# ๊ฒฐ๊ณผ ๋น„๊ต
error = torch.abs(torch_out - complied_out).mean()
print(f"Mean Absolute Error: {error.item()}")
  • ํ…Œ์ŠคํŠธ ๋ชจ๋ธ : resnet50
  • ํ…Œ์ŠคํŠธ ๋ฐฉ๋ฒ•
    • pytorch ๋ชจ๋ธ๊ณผ compileํ•œ ๋ชจ๋ธ์„ ์ƒ์„ฑํ•œ ํ›„ ๋ช‡ ๊ฐ€์ง€ ์ž…๋ ฅ ํฌ๊ธฐ์— ๋Œ€ํ•ด ์†๋„ ํ…Œ์ŠคํŠธ
    • ์ž…๋ ฅ ํ…์„œ๋ฅผ ๋ชจ๋ธ์— 10ํšŒ ๋ฐ˜๋ณตํ•ด์„œ ์ž…๋ ฅํ•œ ํ›„ ํ‰๊ท  ์†Œ์š” ์‹œ๊ฐ„์„ ์ธก์ •
  • ์ž…๋ ฅ ํฌ๊ธฐ
    • ์ •์ ์ธ ์ž…๋ ฅ ํฌ๊ธฐ๋กœ ํ…Œ์ŠคํŠธ
    • torch.compile()์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์—์„œ dynamic=True๋กœ ์„ค์ •ํ•˜๋ฉด dynamic input shape์— ๋Œ€์‘์ด ๊ฐ€๋Šฅํ•˜๋‹ค ํ•˜์—ฌ ํ…Œ์ŠคํŠธ

 

 

# ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ

 Input shape ์ธํผ๋Ÿฐ์Šค ํ‰๊ท  ์†Œ์š” ์‹œ๊ฐ„ (ms) ์†๋„ ํ–ฅ์ƒ (%)
Pytorch model Compiled model
[1,3,500,500] 9.78 14.56 -32.83
[10,3,500,500] 44.78 38.15 17.38
[1,3,1000,1000] 20.46 17.88 14.43
[8,3,1000,1000] 140.84 113.8 23.76
[1,3,900~1100,900~1100]
(Dynamic shape)
162.26 8513 -98.09
  • ์™œ์ธ์ง€ [1,3,500,500] ํฌ๊ธฐ์˜ ์ž‘์€ ์ž…๋ ฅ ํฌ๊ธฐ์—์„œ๋Š” compile ๋ชจ๋ธ์ด ์˜คํžˆ๋ ค ์†๋„ ๊ฐ์†Œ
  • ์ž…๋ ฅ ํฌ๊ธฐ๋Š” ์ž‘๋”๋ผ๋„ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ํ‚ค์šฐ๋ฉด ์†๋„ ํ–ฅ์ƒ 
  • ์ž…๋ ฅ ํฌ๊ธฐ๋ฅผ ํ‚ค์šฐ๋ฉด ๋ฐฐ์น˜=1 ์—์„œ๋„ ์†๋„ ํ–ฅ์ƒ๋˜๊ณ , ๋ฐฐ์น˜๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ์†๋„ ํ–ฅ์ƒ์ด ๋‘๋“œ๋Ÿฌ์ง
    • ์ „๋ฐ˜์ ์œผ๋กœ 15~20%์˜ ์†๋„ ํ–ฅ์ƒ (๊ณต์‹ ๋ฌธ์„œ์™€ ๋น„์Šทํ•œ ์ˆ˜์น˜)
  • ๋™์ ์ธ ์ž…๋ ฅ์—์„œ์˜ ํ…Œ์ŠคํŠธ๋Š”... ์†๋„๊ฐ€ ์˜คํžˆ๋ ค ๋งค์šฐ ๋А๋ ค์ง
    • ๊ณต์‹ ๋ฌธ์„œ์— ๋”ฐ๋ฅด๋ฉด ํ˜„์žฌ๋Š” Danamic shape ์ž…๋ ฅ์— ๋Œ€ํ•œ ์ง€์›์ด ์ œํ•œ๋˜์–ด ์žˆ๋‹ค๊ณ  ํ•จ

→ Pytorch Compile ์‚ฌ์šฉ ์‹œ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ๊ณผ ์ž…๋ ฅ ํฌ๊ธฐ์— ๋”ฐ๋ผ ์†๋„ ๊ฐœ์„  ์ •๋„๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์œผ๋‹ˆ ํ…Œ์ŠคํŠธ๋ฅผ ํ•œ ํ›„์— ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹์„ ๊ฒƒ ๊ฐ™๋‹ค.

→ ํ•™์Šต ์‹œ์—๋Š” ๋ฐฐ์น˜๊ฐ€ ์ž…๋ ฅ ํฌ๊ธฐ๊ฐ€ ๊ณ ์ •๋œ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์œผ๋‹ˆ ์ƒ๊ด€์—†์ง€๋งŒ, ์ธํผ๋Ÿฐ์Šค ์‹œ ๋™์ ์ธ ์ž…๋ ฅ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” ์‚ฌ์šฉ์ด ์ œํ•œ๋  ์ˆ˜ ์žˆ๋‹ค.

 

๋ฐ˜์‘ํ˜•

'๐Ÿ’ป Programming > AI & ML' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[pytorch] ๋ชจ๋ธ ์ผ๋ถ€๋ถ„๋งŒ ์ €์žฅํ•˜๊ธฐ/๋ถˆ๋Ÿฌ์˜ค๊ธฐ  (0) 2023.12.09
[ONNX] ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ONNX Runtime์œผ๋กœ CPU ํ™˜๊ฒฝ์—์„œ ๊ฐ€์†ํ™”ํ•˜๊ธฐ  (0) 2023.11.16
[Model Inference] Torch-TensorRT ์‚ฌ์šฉ๋ฒ• | ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ์ตœ์ ํ™” ๋ฐ ์ธํผ๋Ÿฐ์Šค ๊ฐ€์†ํ™”  (1) 2023.10.02
[pytorch] Multi-GPU Training | ๋‹ค์ค‘ GPU ํ•™์Šต ์˜ˆ์‹œ| Distributed Data Parallel (DDP) | Data Parallel (DP)  (0) 2023.04.17
[pytorch]Box operator๋กœ Bounding Box ์‰ฝ๊ฒŒ ๋‹ค๋ฃจ๊ธฐ  (0) 2023.03.17
'๐Ÿ’ป Programming/AI & ML' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • [pytorch] ๋ชจ๋ธ ์ผ๋ถ€๋ถ„๋งŒ ์ €์žฅํ•˜๊ธฐ/๋ถˆ๋Ÿฌ์˜ค๊ธฐ
  • [ONNX] ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ONNX Runtime์œผ๋กœ CPU ํ™˜๊ฒฝ์—์„œ ๊ฐ€์†ํ™”ํ•˜๊ธฐ
  • [Model Inference] Torch-TensorRT ์‚ฌ์šฉ๋ฒ• | ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ์ตœ์ ํ™” ๋ฐ ์ธํผ๋Ÿฐ์Šค ๊ฐ€์†ํ™”
  • [pytorch] Multi-GPU Training | ๋‹ค์ค‘ GPU ํ•™์Šต ์˜ˆ์‹œ| Distributed Data Parallel (DDP) | Data Parallel (DP)
๋ญ…์ฆค
๋ญ…์ฆค
AI ๊ธฐ์ˆ  ๋ธ”๋กœ๊ทธ
    ๋ฐ˜์‘ํ˜•
  • ๋ญ…์ฆค
    CV DOODLE
    ๋ญ…์ฆค
  • ์ „์ฒด
    ์˜ค๋Š˜
    ์–ด์ œ
  • ๊ณต์ง€์‚ฌํ•ญ

    • โœจ About Me
    • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (202)
      • ๐Ÿ“– Fundamentals (33)
        • Computer Vision (9)
        • 3D vision & Graphics (6)
        • AI & ML (15)
        • NLP (2)
        • etc. (1)
      • ๐Ÿ› Research (67)
        • Deep Learning (7)
        • Image Classification (2)
        • Detection & Segmentation (17)
        • OCR (7)
        • Multi-modal (4)
        • Generative AI (8)
        • 3D Vision (3)
        • Material & Texture Recognit.. (8)
        • NLP & LLM (11)
        • etc. (0)
      • ๐Ÿ› ๏ธ Engineering (7)
        • Distributed Training (4)
        • AI & ML ์ธ์‚ฌ์ดํŠธ (3)
      • ๐Ÿ’ป Programming (86)
        • Python (18)
        • Computer Vision (12)
        • LLM (4)
        • AI & ML (18)
        • Database (3)
        • Apache Airflow (6)
        • Docker & Kubernetes (14)
        • ์ฝ”๋”ฉ ํ…Œ์ŠคํŠธ (4)
        • C++ (1)
        • etc. (6)
      • ๐Ÿ’ฌ ETC (3)
        • ์ฑ… ๋ฆฌ๋ทฐ (3)
  • ๋งํฌ

  • ์ธ๊ธฐ ๊ธ€

  • ํƒœ๊ทธ

    pytorch
    ml
    nlp
    Computer Vision
    Python
    pandas
    OCR
    ํ”„๋กฌํ”„ํŠธ์—”์ง€๋‹ˆ์–ด๋ง
    ํŒŒ์ด์ฌ
    3D Vision
    OpenCV
    ๋”ฅ๋Ÿฌ๋‹
    ๊ฐ์ฒด๊ฒ€์ถœ
    deep learning
    ChatGPT
    material recognition
    Text recognition
    CNN
    LLM
    multi-modal
    segmentation
    VLP
    OpenAI
    ๊ฐ์ฒด ๊ฒ€์ถœ
    ๋„์ปค
    object detection
    AI
    airflow
    generative ai
    ์ปดํ“จํ„ฐ๋น„์ „
  • ์ตœ๊ทผ ๋Œ“๊ธ€

  • ์ตœ๊ทผ ๊ธ€

  • hELLOยท Designed By์ •์ƒ์šฐ.v4.10.3
๋ญ…์ฆค
[Model Inference] Pytorch 2.0 Compile ์‚ฌ์šฉ ํ›„๊ธฐ ๋ฐ ์žฅ๋‹จ์  | pytorch compile ๋ชจ๋ธ ์ถ”๋ก  ์†๋„ ๊ฐœ์„  ํ…Œ์ŠคํŠธ
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”