[python] 파이썬 병렬 처리 | joblib μ‚¬μš©λ²• | λ©€ν‹°ν”„λ‘œμ„Έμ‹± | λ©€ν‹°μ“°λ ˆλ”©

2024. 1. 19. 23:30Β·πŸ’» Programming/Python
λ°˜μ‘ν˜•


joblib은 νŒŒμ΄μ¬μ—μ„œ 병렬 처리λ₯Ό κ°„νŽΈν•˜κ²Œ μˆ˜ν–‰ν•  수 μžˆλ„λ‘ λ„μ™€μ£ΌλŠ” λΌμ΄λΈŒλŸ¬λ¦¬μ΄λ‹€. 


 

Parallel 클래슀

Parallel ν΄λž˜μŠ€λŠ” λ³‘λ ¬λ‘œ ν•¨μˆ˜λ₯Ό μ‹€ν–‰ν•˜κ±°λ‚˜ 반볡 κ°€λŠ₯ν•œ μž‘μ—…μ„ μ²˜λ¦¬ν•  λ•Œ μ‚¬μš©

 

 

- n_jobs

  • n_jobs λ§€κ°œλ³€μˆ˜λ₯Ό μ‚¬μš©ν•˜μ—¬ λ™μ‹œμ— μ‹€ν–‰ν•  μž‘μ—…μ˜ 수λ₯Ό μ‘°μ ˆν•  수 있으며 일반적으둜 CPU의 μ½”μ–΄ 수λ₯Ό μ‚¬μš©ν•˜λ©΄ νš¨κ³Όμ μ΄λ‹€.
  • n_jobs=-1은 μ‹œμŠ€ν…œμ—μ„œ μ‚¬μš© κ°€λŠ₯ν•œ λͺ¨λ“  μ½”μ–΄λ₯Ό ν™œμš©ν•˜μ—¬ μ΅œλŒ€ν•œ λ³‘λ ¬λ‘œ μ‹€ν–‰ν•˜λΌλŠ” 의미

 

- backend

  • λ°±μ—”λ“œλŠ” μ–΄λ–»κ²Œ 병렬 μž‘μ—…μ„ μ²˜λ¦¬ν• μ§€λ₯Ό κ²°μ •ν•˜λŠ” 핡심 역할을 ν•˜λ©°, loky λ°±μ—”λ“œκ°€ 기본적으둜 μ‚¬μš©λœλ‹€.
  • loky (κΈ°λ³Έ λ°±μ—”λ“œ)
    • lokyλŠ” 파이썬의 concurrent.futuresλ₯Ό 기반으둜 ν•˜λŠ” λ°±μ—”λ“œλ‘œ, λ©€ν‹°ν”„λ‘œμ„Έμ‹±μ„ ν™œμš©
    • ν”„λ‘œμ„ΈμŠ€ 풀링을 톡해 효율적인 μž‘μ—… λΆ„λ°°λ₯Ό ν•˜λ©°, GIL(Global Interpreter Lock)을 ν”Όν•˜μ—¬ CPU-bound μž‘μ—…μ—μ„œ νš¨κ³Όμ μ΄λ‹€.
    • I/O-bound μž‘μ—…μ—μ„œλ„ 효과적으둜 ν™œμš©λ  수 μžˆλ‹€.

 

  • threading (λ©€ν‹°μ“°λ ˆλ”©)
    • threading λ°±μ—”λ“œλŠ” 파이썬의 λ‚΄μž₯ threading λͺ¨λ“ˆμ„ μ΄μš©ν•˜μ—¬ λ©€ν‹°μ“°λ ˆλ”©μ„ κ΅¬ν˜„ν•œλ‹€.
    • 파이썬의 GIL λ•Œλ¬Έμ— μ‹€μ œλ‘œ CPU-bound μž‘μ—…μ—μ„œμ˜ μ„±λŠ₯ ν–₯상은 ν•œκ³„κ°€ μžˆλ‹€.
    • I/O-bound μž‘μ—…μ—μ„œ 효과적

 

  • multiprocessing (λ©€ν‹°ν”„λ‘œμ„Έμ‹±)
    • multiprocessing λ°±μ—”λ“œλŠ” 파이썬의 λ‚΄μž₯ multiprocessing λͺ¨λ“ˆμ„ μ΄μš©ν•˜μ—¬ λ©€ν‹°ν”„λ‘œμ„Έμ‹±μ„ κ΅¬ν˜„ν•œλ‹€.
    • 각각의 μž‘μ—…μ„ λ³„λ„μ˜ ν”„λ‘œμ„ΈμŠ€μ—μ„œ μ‹€ν–‰ν•˜λ―€λ‘œ GIL에 영ν–₯을 λ°›μ§€ μ•Šκ³ , CPU-bound μž‘μ—…μ—μ„œ μ„±λŠ₯ ν–₯상을 κΈ°λŒ€ν•  수 μžˆλ‹€.
    • κ·ΈλŸ¬λ‚˜ ν”„λ‘œμ„ΈμŠ€ κ°„ ν†΅μ‹ μ˜ μ˜€λ²„ν—€λ“œκ°€ μžˆμ„ 수 μžˆλ‹€.

 

 

delayed ν•¨μˆ˜

  • delayed ν•¨μˆ˜λŠ” ν•¨μˆ˜λ₯Ό μ§€μ—°μ‹œμΌœ μ‹€ν–‰ν•˜λŠ” 역할을 ν•œλ‹€.
  • 각 μž‘μ—…μ΄ λΉ„λ™κΈ°μ μœΌλ‘œ μ‹€ν–‰λ˜λ―€λ‘œ κ²°κ³Όλ₯Ό 기닀리지 μ•Šκ³  λ‹€μŒ μž‘μ—…μ„ μ‹œμž‘ν•  수 μžˆλ‹€.
  • 특히, I/O-bound μž‘μ—…μ—μ„œ μ„±λŠ₯을 ν–₯μƒμ‹œν‚€λŠ” 데 도움이 λœλ‹€.

joblibλ₯Ό μ΄μš©ν•œ λ³‘λ ¬μ²˜λ¦¬ 방법

 

1. λ³‘λ ¬λ‘œ μ‹€ν–‰ν•  ν•¨μˆ˜ μ •μ˜

def process_data(data):
    # μž‘μ—…μ„ μˆ˜ν–‰ν•˜κ³  κ²°κ³Όλ₯Ό λ°˜ν™˜ν•˜λŠ” ν•¨μˆ˜
    result = data * 2
    return result

 

2. Parallel 클래슀 호좜

data_list = [1, 2, 3, 4, 5]
results = Parallel(n_jobs=-1, backend="loky")(delayed(process_data)(data) for data in data_list)
  • Parallel ν΄λž˜μŠ€μ—μ„œ n_jobs, backend 선택
  • λ³‘λ ¬λ‘œ μ‹€ν–‰ν•  ν•¨μˆ˜μ— delayed() ν•¨μˆ˜ 적용

 

μ•„μ£Ό κ°„λ‹¨ν•œ μ‚¬μš© 방법이닀 !


# μ½”λ“œ μ˜ˆμ‹œ : I/O-bound μž‘μ—…

from joblib import Parallel, delayed
import time

# 데이터 처리 ν•¨μˆ˜ (I/O-bound μž‘μ—…)
def io_bound_task(data):
    time.sleep(1)  # κ°€μ •: 1초 λ™μ•ˆ I/O μž‘μ—… μˆ˜ν–‰
    return data

# 데이터 리슀트
data_list = [1, 2, 3, 4, 5]

# μ‹±κΈ€ μ“°λ ˆλ“œλ‘œ μ²˜λ¦¬ν•˜λŠ” 경우
start_time_single = time.time()

results_single_io = [io_bound_task(data) for data in data_list]

end_time_single = time.time()
elapsed_time_single = end_time_single - start_time_single

print("Using Single Thread (I/O-bound Task):")
print(f"Results: {results_single_io}")
print(f"Elapsed Time: {elapsed_time_single} seconds\n")

# λ©€ν‹°μ“°λ ˆλ”©μœΌλ‘œ μ²˜λ¦¬ν•˜λŠ” 경우 (threading λ°±μ—”λ“œ μ‚¬μš©)
start_time_multi_io = time.time()

results_multi_io = Parallel(n_jobs=-1, backend="threading")(delayed(io_bound_task)(data) for data in data_list)

end_time_multi_io = time.time()
elapsed_time_multi_io = end_time_multi_io - start_time_multi_io

print("Using Multi-Threading (threading backend - I/O-bound Task):")
print(f"Results: {results_multi_io}")
print(f"Elapsed Time: {elapsed_time_multi_io} seconds")

 

  • I/O-bound μž‘μ—…μ—μ„œλŠ” loky λ°±μ—”λ“œ λ˜λŠ” threading λ°±μ—”λ“œλ₯Ό μ‚¬μš©ν•˜λ©΄ 처리 μ‹œκ°„μ„ 쀄일 수 있음
  • μœ„ μ˜ˆμ‹œ μ½”λ“œμ—μ„œλŠ” threading λ°±μ—”λ“œλ₯Ό μ‚¬μš©ν•˜λŠ” 것이 쑰금 더 빨랐음

 

 

# μ½”λ“œ μ˜ˆμ‹œ : CPU-bound μž‘μ—…

from joblib import Parallel, delayed
import time

# CPU-bound μž‘μ—… ν•¨μˆ˜
def cpu_bound_task(number):
    result = 0
    for _ in range(10**7):  # λŒ€λž΅μ μœΌλ‘œ 1천만 번 λ°˜λ³΅ν•˜λŠ” 계산
        result += number ** 2
    return result

# μž‘μ—…ν•  데이터 리슀트
data_list = [1, 2, 3, 4, 5]

# 병렬 μ²˜λ¦¬ν•˜μ§€ μ•Šμ„ λ•Œμ˜ μ‹œκ°„ μΈ‘μ •
start_time_serial = time.time()
results_serial = [cpu_bound_task(data) for data in data_list]
end_time_serial = time.time()
elapsed_time_serial = end_time_serial - start_time_serial

print("Without Parallel Processing:")
print(f"Results: {results_serial}")
print(f"Elapsed Time: {elapsed_time_serial} seconds\n")

# 병렬 μ²˜λ¦¬ν•  λ•Œμ˜ μ‹œκ°„ μΈ‘μ •
start_time_parallel = time.time()
results_parallel = Parallel(n_jobs=-1, backend="loky")(delayed(cpu_bound_task)(data) for data in data_list)
end_time_parallel = time.time()
elapsed_time_parallel = end_time_parallel - start_time_parallel

print("Using Parallel Processing:")
print(f"Results: {results_parallel}")
print(f"Elapsed Time: {elapsed_time_parallel} seconds")

  • CPU-bound μž‘μ—…μ—μ„œλŠ” 파이썬의 GIL을 ν”Όν•  수 μžˆλŠ” loky λ°±μ—”λ“œλ₯Ό μ‚¬μš©ν•˜λŠ” 것이 효율적
  • threading λ°±μ—”λ“œλ₯Ό μ‚¬μš©ν•˜λŠ” 경우 μ²˜λ¦¬μ‹œκ°„μ΄ 쀄어듀지 μ•Šμ•˜μŒ
λ°˜μ‘ν˜•

'πŸ’» Programming > Python' μΉ΄ν…Œκ³ λ¦¬μ˜ λ‹€λ₯Έ κΈ€

[python] 파이썬 클린 μ½”λ“œ μž‘μ„± κΏ€νŒ 8κ°€μ§€ : 더 κΉ”λ”ν•˜κ³  가독성 높은 μ½”λ“œ μž‘μ„±ν•˜κΈ°!  (0) 2024.07.11
[python] Streamlit 으둜 데이터 μ›Ή μ• ν”Œλ¦¬μΌ€μ΄μ…˜ λ§Œλ“€κΈ°! | κ°„λ‹¨ν•œ λŒ€μ‹œλ³΄λ“œ & 웹데λͺ¨ νŽ˜μ΄μ§€ 개발  (1) 2024.07.08
[python] λ©€ν‹°ν”„λ‘œμ„Έμ‹± Process μ‚¬μš©λ²• 및 μ½”λ“œ μ˜ˆμ‹œ | multiprocessing.Process | μ—¬λŸ¬ ν”„λ‘œμ„ΈμŠ€μ— μ„œλ‘œ λ‹€λ₯Έ μž‘μ—…μ„ ν• λ‹Ή  (3) 2024.01.07
[python] λ©€ν‹°ν”„λ‘œμ„Έμ‹± Pool μ‚¬μš©λ²• 및 μ½”λ“œ μ˜ˆμ‹œ | multiprocessing.Pool | python 속도 ν–₯상  (0) 2024.01.07
[pandas] νŠΉμ • μ»¬λŸΌμ—μ„œ νŠΉμ • λ¬Έμžμ—΄μ΄ ν¬ν•¨λœ ν–‰ μ°ΎκΈ° | str.contains  (0) 2023.11.17
'πŸ’» Programming/Python' μΉ΄ν…Œκ³ λ¦¬μ˜ λ‹€λ₯Έ κΈ€
  • [python] 파이썬 클린 μ½”λ“œ μž‘μ„± κΏ€νŒ 8κ°€μ§€ : 더 κΉ”λ”ν•˜κ³  가독성 높은 μ½”λ“œ μž‘μ„±ν•˜κΈ°!
  • [python] Streamlit 으둜 데이터 μ›Ή μ• ν”Œλ¦¬μΌ€μ΄μ…˜ λ§Œλ“€κΈ°! | κ°„λ‹¨ν•œ λŒ€μ‹œλ³΄λ“œ & 웹데λͺ¨ νŽ˜μ΄μ§€ 개발
  • [python] λ©€ν‹°ν”„λ‘œμ„Έμ‹± Process μ‚¬μš©λ²• 및 μ½”λ“œ μ˜ˆμ‹œ | multiprocessing.Process | μ—¬λŸ¬ ν”„λ‘œμ„ΈμŠ€μ— μ„œλ‘œ λ‹€λ₯Έ μž‘μ—…μ„ ν• λ‹Ή
  • [python] λ©€ν‹°ν”„λ‘œμ„Έμ‹± Pool μ‚¬μš©λ²• 및 μ½”λ“œ μ˜ˆμ‹œ | multiprocessing.Pool | python 속도 ν–₯상
뭅즀
뭅즀
AI 기술 λΈ”λ‘œκ·Έ
    λ°˜μ‘ν˜•
  • 뭅즀
    CV DOODLE
    뭅즀
  • 전체
    였늘
    μ–΄μ œ
  • 곡지사항

    • ✨ About Me
    • λΆ„λ₯˜ 전체보기 (200)
      • πŸ“– Fundamentals (33)
        • Computer Vision (9)
        • 3D vision & Graphics (6)
        • AI & ML (15)
        • NLP (2)
        • etc. (1)
      • πŸ› Research (65)
        • Deep Learning (7)
        • Image Classification (2)
        • Detection & Segmentation (17)
        • OCR (7)
        • Multi-modal (4)
        • Generative AI (6)
        • 3D Vision (3)
        • Material & Texture Recognit.. (8)
        • NLP & LLM (11)
        • etc. (0)
      • 🌟 AI & ML Tech (7)
        • AI & ML μΈμ‚¬μ΄νŠΈ (7)
      • πŸ’» Programming (86)
        • Python (18)
        • Computer Vision (12)
        • LLM (4)
        • AI & ML (18)
        • Database (3)
        • Apache Airflow (6)
        • Docker & Kubernetes (14)
        • μ½”λ”© ν…ŒμŠ€νŠΈ (4)
        • C++ (1)
        • etc. (6)
      • πŸ’¬ ETC (3)
        • μ±… 리뷰 (3)
  • 링크

  • 인기 κΈ€

  • νƒœκ·Έ

    deep learning
    파이썬
    객체 κ²€μΆœ
    도컀
    ChatGPT
    λ”₯λŸ¬λ‹
    OpenCV
    GPT
    pytorch
    CNN
    ν”„λ‘¬ν”„νŠΈμ—”μ§€λ‹ˆμ–΄λ§
    multi-modal
    Text recognition
    LLM
    pandas
    Image Classification
    Computer Vision
    material recognition
    κ°μ²΄κ²€μΆœ
    VLP
    AI
    segmentation
    3D Vision
    airflow
    nlp
    컴퓨터비전
    object detection
    Python
    OCR
    OpenAI
  • 졜근 λŒ“κΈ€

  • 졜근 κΈ€

  • hELLOΒ· Designed Byμ •μƒμš°.v4.10.3
뭅즀
[python] 파이썬 병렬 처리 | joblib μ‚¬μš©λ²• | λ©€ν‹°ν”„λ‘œμ„Έμ‹± | λ©€ν‹°μ“°λ ˆλ”©
μƒλ‹¨μœΌλ‘œ

ν‹°μŠ€ν† λ¦¬νˆ΄λ°”