λ³Έλ¬Έ λ°”λ‘œκ°€κΈ°
πŸ’» Programming/Python

[python] 파이썬 병렬 처리 | joblib μ‚¬μš©λ²• | λ©€ν‹°ν”„λ‘œμ„Έμ‹± | λ©€ν‹°μ“°λ ˆλ”©

by 뭅즀 2024. 1. 19.
λ°˜μ‘ν˜•


joblib은 νŒŒμ΄μ¬μ—μ„œ 병렬 처리λ₯Ό κ°„νŽΈν•˜κ²Œ μˆ˜ν–‰ν•  수 μžˆλ„λ‘ λ„μ™€μ£ΌλŠ” λΌμ΄λΈŒλŸ¬λ¦¬μ΄λ‹€. 


 

Parallel 클래슀

Parallel ν΄λž˜μŠ€λŠ” λ³‘λ ¬λ‘œ ν•¨μˆ˜λ₯Ό μ‹€ν–‰ν•˜κ±°λ‚˜ 반볡 κ°€λŠ₯ν•œ μž‘μ—…μ„ μ²˜λ¦¬ν•  λ•Œ μ‚¬μš©

 

 

- n_jobs

  • n_jobs λ§€κ°œλ³€μˆ˜λ₯Ό μ‚¬μš©ν•˜μ—¬ λ™μ‹œμ— μ‹€ν–‰ν•  μž‘μ—…μ˜ 수λ₯Ό μ‘°μ ˆν•  수 있으며 일반적으둜 CPU의 μ½”μ–΄ 수λ₯Ό μ‚¬μš©ν•˜λ©΄ νš¨κ³Όμ μ΄λ‹€.
  • n_jobs=-1은 μ‹œμŠ€ν…œμ—μ„œ μ‚¬μš© κ°€λŠ₯ν•œ λͺ¨λ“  μ½”μ–΄λ₯Ό ν™œμš©ν•˜μ—¬ μ΅œλŒ€ν•œ λ³‘λ ¬λ‘œ μ‹€ν–‰ν•˜λΌλŠ” 의미

 

- backend

  • λ°±μ—”λ“œλŠ” μ–΄λ–»κ²Œ 병렬 μž‘μ—…μ„ μ²˜λ¦¬ν• μ§€λ₯Ό κ²°μ •ν•˜λŠ” 핡심 역할을 ν•˜λ©°, loky λ°±μ—”λ“œκ°€ 기본적으둜 μ‚¬μš©λœλ‹€.
  • loky (κΈ°λ³Έ λ°±μ—”λ“œ)
    • lokyλŠ” 파이썬의 concurrent.futuresλ₯Ό 기반으둜 ν•˜λŠ” λ°±μ—”λ“œλ‘œ, λ©€ν‹°ν”„λ‘œμ„Έμ‹±μ„ ν™œμš©
    • ν”„λ‘œμ„ΈμŠ€ 풀링을 톡해 효율적인 μž‘μ—… λΆ„λ°°λ₯Ό ν•˜λ©°, GIL(Global Interpreter Lock)을 ν”Όν•˜μ—¬ CPU-bound μž‘μ—…μ—μ„œ νš¨κ³Όμ μ΄λ‹€.
    • I/O-bound μž‘μ—…μ—μ„œλ„ 효과적으둜 ν™œμš©λ  수 μžˆλ‹€.

 

  • threading (λ©€ν‹°μ“°λ ˆλ”©)
    • threading λ°±μ—”λ“œλŠ” 파이썬의 λ‚΄μž₯ threading λͺ¨λ“ˆμ„ μ΄μš©ν•˜μ—¬ λ©€ν‹°μ“°λ ˆλ”©μ„ κ΅¬ν˜„ν•œλ‹€.
    • 파이썬의 GIL λ•Œλ¬Έμ— μ‹€μ œλ‘œ CPU-bound μž‘μ—…μ—μ„œμ˜ μ„±λŠ₯ ν–₯상은 ν•œκ³„κ°€ μžˆλ‹€.
    • I/O-bound μž‘μ—…μ—μ„œ 효과적

 

  • multiprocessing (λ©€ν‹°ν”„λ‘œμ„Έμ‹±)
    • multiprocessing λ°±μ—”λ“œλŠ” 파이썬의 λ‚΄μž₯ multiprocessing λͺ¨λ“ˆμ„ μ΄μš©ν•˜μ—¬ λ©€ν‹°ν”„λ‘œμ„Έμ‹±μ„ κ΅¬ν˜„ν•œλ‹€.
    • 각각의 μž‘μ—…μ„ λ³„λ„μ˜ ν”„λ‘œμ„ΈμŠ€μ—μ„œ μ‹€ν–‰ν•˜λ―€λ‘œ GIL에 영ν–₯을 받지 μ•Šκ³ , CPU-bound μž‘μ—…μ—μ„œ μ„±λŠ₯ ν–₯상을 κΈ°λŒ€ν•  수 μžˆλ‹€.
    • κ·ΈλŸ¬λ‚˜ ν”„λ‘œμ„ΈμŠ€ κ°„ ν†΅μ‹ μ˜ μ˜€λ²„ν—€λ“œκ°€ μžˆμ„ 수 μžˆλ‹€.

 

 

delayed ν•¨μˆ˜

  • delayed ν•¨μˆ˜λŠ” ν•¨μˆ˜λ₯Ό μ§€μ—°μ‹œμΌœ μ‹€ν–‰ν•˜λŠ” 역할을 ν•œλ‹€.
  • 각 μž‘μ—…μ΄ λΉ„λ™κΈ°μ μœΌλ‘œ μ‹€ν–‰λ˜λ―€λ‘œ κ²°κ³Όλ₯Ό 기닀리지 μ•Šκ³  λ‹€μŒ μž‘μ—…μ„ μ‹œμž‘ν•  수 μžˆλ‹€.
  • 특히, I/O-bound μž‘μ—…μ—μ„œ μ„±λŠ₯을 ν–₯μƒμ‹œν‚€λŠ” 데 도움이 λœλ‹€.

joblibλ₯Ό μ΄μš©ν•œ λ³‘λ ¬μ²˜λ¦¬ 방법

 

1. λ³‘λ ¬λ‘œ μ‹€ν–‰ν•  ν•¨μˆ˜ μ •μ˜

def process_data(data):
    # μž‘μ—…μ„ μˆ˜ν–‰ν•˜κ³  κ²°κ³Όλ₯Ό λ°˜ν™˜ν•˜λŠ” ν•¨μˆ˜
    result = data * 2
    return result

 

2. Parallel 클래슀 호좜

data_list = [1, 2, 3, 4, 5]
results = Parallel(n_jobs=-1, backend="loky")(delayed(process_data)(data) for data in data_list)
  • Parallel ν΄λž˜μŠ€μ—μ„œ n_jobs, backend 선택
  • λ³‘λ ¬λ‘œ μ‹€ν–‰ν•  ν•¨μˆ˜μ— delayed() ν•¨μˆ˜ 적용

 

μ•„μ£Ό κ°„λ‹¨ν•œ μ‚¬μš© 방법이닀 !


# μ½”λ“œ μ˜ˆμ‹œ : I/O-bound μž‘μ—…

from joblib import Parallel, delayed
import time

# 데이터 처리 ν•¨μˆ˜ (I/O-bound μž‘μ—…)
def io_bound_task(data):
    time.sleep(1)  # κ°€μ •: 1초 λ™μ•ˆ I/O μž‘μ—… μˆ˜ν–‰
    return data

# 데이터 리슀트
data_list = [1, 2, 3, 4, 5]

# μ‹±κΈ€ μ“°λ ˆλ“œλ‘œ μ²˜λ¦¬ν•˜λŠ” 경우
start_time_single = time.time()

results_single_io = [io_bound_task(data) for data in data_list]

end_time_single = time.time()
elapsed_time_single = end_time_single - start_time_single

print("Using Single Thread (I/O-bound Task):")
print(f"Results: {results_single_io}")
print(f"Elapsed Time: {elapsed_time_single} seconds\n")

# λ©€ν‹°μ“°λ ˆλ”©μœΌλ‘œ μ²˜λ¦¬ν•˜λŠ” 경우 (threading λ°±μ—”λ“œ μ‚¬μš©)
start_time_multi_io = time.time()

results_multi_io = Parallel(n_jobs=-1, backend="threading")(delayed(io_bound_task)(data) for data in data_list)

end_time_multi_io = time.time()
elapsed_time_multi_io = end_time_multi_io - start_time_multi_io

print("Using Multi-Threading (threading backend - I/O-bound Task):")
print(f"Results: {results_multi_io}")
print(f"Elapsed Time: {elapsed_time_multi_io} seconds")

 

  • I/O-bound μž‘μ—…μ—μ„œλŠ” loky λ°±μ—”λ“œ λ˜λŠ” threading λ°±μ—”λ“œλ₯Ό μ‚¬μš©ν•˜λ©΄ 처리 μ‹œκ°„μ„ 쀄일 수 있음
  • μœ„ μ˜ˆμ‹œ μ½”λ“œμ—μ„œλŠ” threading λ°±μ—”λ“œλ₯Ό μ‚¬μš©ν•˜λŠ” 것이 쑰금 더 빨랐음

 

 

# μ½”λ“œ μ˜ˆμ‹œ : CPU-bound μž‘μ—…

from joblib import Parallel, delayed
import time

# CPU-bound μž‘μ—… ν•¨μˆ˜
def cpu_bound_task(number):
    result = 0
    for _ in range(10**7):  # λŒ€λž΅μ μœΌλ‘œ 1천만 번 λ°˜λ³΅ν•˜λŠ” 계산
        result += number ** 2
    return result

# μž‘μ—…ν•  데이터 리슀트
data_list = [1, 2, 3, 4, 5]

# 병렬 μ²˜λ¦¬ν•˜μ§€ μ•Šμ„ λ•Œμ˜ μ‹œκ°„ μΈ‘μ •
start_time_serial = time.time()
results_serial = [cpu_bound_task(data) for data in data_list]
end_time_serial = time.time()
elapsed_time_serial = end_time_serial - start_time_serial

print("Without Parallel Processing:")
print(f"Results: {results_serial}")
print(f"Elapsed Time: {elapsed_time_serial} seconds\n")

# 병렬 μ²˜λ¦¬ν•  λ•Œμ˜ μ‹œκ°„ μΈ‘μ •
start_time_parallel = time.time()
results_parallel = Parallel(n_jobs=-1, backend="loky")(delayed(cpu_bound_task)(data) for data in data_list)
end_time_parallel = time.time()
elapsed_time_parallel = end_time_parallel - start_time_parallel

print("Using Parallel Processing:")
print(f"Results: {results_parallel}")
print(f"Elapsed Time: {elapsed_time_parallel} seconds")

  • CPU-bound μž‘μ—…μ—μ„œλŠ” 파이썬의 GIL을 ν”Όν•  수 μžˆλŠ” loky λ°±μ—”λ“œλ₯Ό μ‚¬μš©ν•˜λŠ” 것이 효율적
  • threading λ°±μ—”λ“œλ₯Ό μ‚¬μš©ν•˜λŠ” 경우 μ²˜λ¦¬μ‹œκ°„μ΄ 쀄어듀지 μ•Šμ•˜μŒ
λ°˜μ‘ν˜•