λ°μν
joblibμ νμ΄μ¬μμ λ³λ ¬ μ²λ¦¬λ₯Ό κ°νΈνκ² μνν μ μλλ‘ λμμ£Όλ λΌμ΄λΈλ¬λ¦¬μ΄λ€.
Parallel ν΄λμ€
Parallel ν΄λμ€λ λ³λ ¬λ‘ ν¨μλ₯Ό μ€ννκ±°λ λ°λ³΅ κ°λ₯ν μμ μ μ²λ¦¬ν λ μ¬μ©
- n_jobs
- n_jobs 맀κ°λ³μλ₯Ό μ¬μ©νμ¬ λμμ μ€νν μμ μ μλ₯Ό μ‘°μ ν μ μμΌλ©° μΌλ°μ μΌλ‘ CPUμ μ½μ΄ μλ₯Ό μ¬μ©νλ©΄ ν¨κ³Όμ μ΄λ€.
- n_jobs=-1μ μμ€ν μμ μ¬μ© κ°λ₯ν λͺ¨λ μ½μ΄λ₯Ό νμ©νμ¬ μ΅λν λ³λ ¬λ‘ μ€ννλΌλ μλ―Έ
- backend
- λ°±μλλ μ΄λ»κ² λ³λ ¬ μμ μ μ²λ¦¬ν μ§λ₯Ό κ²°μ νλ ν΅μ¬ μν μ νλ©°, loky λ°±μλκ° κΈ°λ³Έμ μΌλ‘ μ¬μ©λλ€.
- loky (κΈ°λ³Έ λ°±μλ)
- lokyλ νμ΄μ¬μ concurrent.futuresλ₯Ό κΈ°λ°μΌλ‘ νλ λ°±μλλ‘, λ©ν°νλ‘μΈμ±μ νμ©
- νλ‘μΈμ€ νλ§μ ν΅ν΄ ν¨μ¨μ μΈ μμ λΆλ°°λ₯Ό νλ©°, GIL(Global Interpreter Lock)μ νΌνμ¬ CPU-bound μμ μμ ν¨κ³Όμ μ΄λ€.
- I/O-bound μμ μμλ ν¨κ³Όμ μΌλ‘ νμ©λ μ μλ€.
- threading (λ©ν°μ°λ λ©)
- threading λ°±μλλ νμ΄μ¬μ λ΄μ₯ threading λͺ¨λμ μ΄μ©νμ¬ λ©ν°μ°λ λ©μ ꡬννλ€.
- νμ΄μ¬μ GIL λλ¬Έμ μ€μ λ‘ CPU-bound μμ μμμ μ±λ₯ ν₯μμ νκ³κ° μλ€.
- I/O-bound μμ μμ ν¨κ³Όμ
- multiprocessing (λ©ν°νλ‘μΈμ±)
- multiprocessing λ°±μλλ νμ΄μ¬μ λ΄μ₯ multiprocessing λͺ¨λμ μ΄μ©νμ¬ λ©ν°νλ‘μΈμ±μ ꡬννλ€.
- κ°κ°μ μμ μ λ³λμ νλ‘μΈμ€μμ μ€ννλ―λ‘ GILμ μν₯μ λ°μ§ μκ³ , CPU-bound μμ μμ μ±λ₯ ν₯μμ κΈ°λν μ μλ€.
- κ·Έλ¬λ νλ‘μΈμ€ κ° ν΅μ μ μ€λ²ν€λκ° μμ μ μλ€.
delayed ν¨μ
- delayed ν¨μλ ν¨μλ₯Ό μ§μ°μμΌ μ€ννλ μν μ νλ€.
- κ° μμ μ΄ λΉλκΈ°μ μΌλ‘ μ€νλλ―λ‘ κ²°κ³Όλ₯Ό κΈ°λ€λ¦¬μ§ μκ³ λ€μ μμ μ μμν μ μλ€.
- νΉν, I/O-bound μμ μμ μ±λ₯μ ν₯μμν€λ λ° λμμ΄ λλ€.
joblibλ₯Ό μ΄μ©ν λ³λ ¬μ²λ¦¬ λ°©λ²
1. λ³λ ¬λ‘ μ€νν ν¨μ μ μ
def process_data(data):
# μμ
μ μννκ³ κ²°κ³Όλ₯Ό λ°ννλ ν¨μ
result = data * 2
return result
2. Parallel ν΄λμ€ νΈμΆ
data_list = [1, 2, 3, 4, 5]
results = Parallel(n_jobs=-1, backend="loky")(delayed(process_data)(data) for data in data_list)
- Parallel ν΄λμ€μμ n_jobs, backend μ ν
- λ³λ ¬λ‘ μ€νν ν¨μμ delayed() ν¨μ μ μ©
μμ£Ό κ°λ¨ν μ¬μ© λ°©λ²μ΄λ€ !
# μ½λ μμ : I/O-bound μμ
from joblib import Parallel, delayed
import time
# λ°μ΄ν° μ²λ¦¬ ν¨μ (I/O-bound μμ
)
def io_bound_task(data):
time.sleep(1) # κ°μ : 1μ΄ λμ I/O μμ
μν
return data
# λ°μ΄ν° 리μ€νΈ
data_list = [1, 2, 3, 4, 5]
# μ±κΈ μ°λ λλ‘ μ²λ¦¬νλ κ²½μ°
start_time_single = time.time()
results_single_io = [io_bound_task(data) for data in data_list]
end_time_single = time.time()
elapsed_time_single = end_time_single - start_time_single
print("Using Single Thread (I/O-bound Task):")
print(f"Results: {results_single_io}")
print(f"Elapsed Time: {elapsed_time_single} seconds\n")
# λ©ν°μ°λ λ©μΌλ‘ μ²λ¦¬νλ κ²½μ° (threading λ°±μλ μ¬μ©)
start_time_multi_io = time.time()
results_multi_io = Parallel(n_jobs=-1, backend="threading")(delayed(io_bound_task)(data) for data in data_list)
end_time_multi_io = time.time()
elapsed_time_multi_io = end_time_multi_io - start_time_multi_io
print("Using Multi-Threading (threading backend - I/O-bound Task):")
print(f"Results: {results_multi_io}")
print(f"Elapsed Time: {elapsed_time_multi_io} seconds")
- I/O-bound μμ μμλ loky λ°±μλ λλ threading λ°±μλλ₯Ό μ¬μ©νλ©΄ μ²λ¦¬ μκ°μ μ€μΌ μ μμ
- μ μμ μ½λμμλ threading λ°±μλλ₯Ό μ¬μ©νλ κ²μ΄ μ‘°κΈ λ λΉ¨λμ
# μ½λ μμ : CPU-bound μμ
from joblib import Parallel, delayed
import time
# CPU-bound μμ
ν¨μ
def cpu_bound_task(number):
result = 0
for _ in range(10**7): # λλ΅μ μΌλ‘ 1μ²λ§ λ² λ°λ³΅νλ κ³μ°
result += number ** 2
return result
# μμ
ν λ°μ΄ν° 리μ€νΈ
data_list = [1, 2, 3, 4, 5]
# λ³λ ¬ μ²λ¦¬νμ§ μμ λμ μκ° μΈ‘μ
start_time_serial = time.time()
results_serial = [cpu_bound_task(data) for data in data_list]
end_time_serial = time.time()
elapsed_time_serial = end_time_serial - start_time_serial
print("Without Parallel Processing:")
print(f"Results: {results_serial}")
print(f"Elapsed Time: {elapsed_time_serial} seconds\n")
# λ³λ ¬ μ²λ¦¬ν λμ μκ° μΈ‘μ
start_time_parallel = time.time()
results_parallel = Parallel(n_jobs=-1, backend="loky")(delayed(cpu_bound_task)(data) for data in data_list)
end_time_parallel = time.time()
elapsed_time_parallel = end_time_parallel - start_time_parallel
print("Using Parallel Processing:")
print(f"Results: {results_parallel}")
print(f"Elapsed Time: {elapsed_time_parallel} seconds")
- CPU-bound μμ μμλ νμ΄μ¬μ GILμ νΌν μ μλ loky λ°±μλλ₯Ό μ¬μ©νλ κ²μ΄ ν¨μ¨μ
- threading λ°±μλλ₯Ό μ¬μ©νλ κ²½μ° μ²λ¦¬μκ°μ΄ μ€μ΄λ€μ§ μμμ
λ°μν