๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
728x90

๐Ÿ’ป Programming/Apache Airflow6

[Airflow] DB ์ฟผ๋ฆฌ์˜ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ์‹œํ•˜๊ณ , ํŠน์ • ์กฐ๊ฑด์ด ์ถฉ์กฑ๋  ๋•Œ๊นŒ์ง€ ์ž‘์—…์„ ์ผ์‹œ ์ค‘์ง€ํ•˜๋Š” ๊ธฐ๋Šฅ | SqlSensor Apache Airflow์˜ SqlSensor๋Š” ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ฟผ๋ฆฌ์˜ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ์‹œํ•˜๊ณ , ํŠน์ • ์กฐ๊ฑด์ด ์ถฉ์กฑ๋  ๋•Œ๊นŒ์ง€ ์ž‘์—…์„ ์ผ์‹œ ์ค‘์ง€ํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ด ์„ผ์„œ๋Š” ์ฃผ๋กœ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ฟผ๋ฆฌ์˜ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•˜์—ฌ ํŠน์ • ๊ฐ’์ด๋‚˜ ์กฐ๊ฑด์ด ์ถฉ์กฑ๋˜์—ˆ๋Š”์ง€๋ฅผ ํ™•์ธํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. SqlSensor ์‚ฌ์šฉ ๋ฐฉ๋ฒ• from airflow.sensors.sql import SqlSensor sql_sensor_task = SqlSensor( task_id='sql_sensor_task', conn_id='your_database_connection_id', # ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์—ฐ๊ฒฐ ID sql='SELECT COUNT(*) FROM your_table WHERE your_condition;', # ๊ฐ์‹œํ•  ์ฟผ๋ฆฌ mode='poke', #.. 2023. 11. 20.
[Airflow] ์—์–ดํ”Œ๋กœ์šฐ ์„ค์น˜ ๋ฐ ์›น ์ธํ„ฐํŽ˜์ด์Šค ์‹คํ–‰ํ•˜๊ธฐ 1. Airflow ์„ค์น˜ pip install apache-airflow 2. Airflow ์„ค์ • cd airflow airflow db init mkdir dags ๋งŒ๋“ค์–ด์ง„ airflow ํด๋”๋กœ ๋“ค์–ด๊ฐ€์„œ db๋ฅผ init ํ•ด์ฃผ๊ณ  dags ํด๋”๋ฅผ ์ƒ์„ฑ airflow users create -u admin -p admin -f Clueless -l Coder -r Admin -e admin@admin.com ๊ด€๋ฆฌ์ž ๊ณ„์ • ์ƒ์„ฑ 3. Airflow ์‹คํ–‰ airflow webserver -p 8080 8080 ํฌํŠธ๋กœ ์—์–ดํ”Œ๋กœ์šฐ ์‹คํ–‰ 'localhost:8080' ๋กœ ์ ‘์† 4. Airflow ์›น ์ธํ„ฐํŽ˜์ด์Šค Apache Airflow ์›น์„œ๋ฒ„๋Š” Airflow ์›Œํฌํ”Œ๋กœ์šฐ์˜ ์‹œ๊ฐํ™”, ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๊ด€๋ฆฌ๋ฅผ ์œ„ํ•œ ์‚ฌ์šฉ์ž ์ธํ„ฐํŽ˜.. 2023. 11. 20.
[Airflow] Python ํ•จ์ˆ˜ ์‹คํ–‰ํ•˜๊ธฐ | PythonOperator ์‚ฌ์šฉ PythonOperator๋Š” Apache Airflow์—์„œ Python ํ•จ์ˆ˜๋ฅผ ์‹คํ–‰ํ•˜๋Š” ์ž‘์—…์„ ์ •์˜ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์—ฐ์‚ฐ์ž์ด๋‹ค. ์ด๋ฅผ ํ†ตํ•ด Python ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ, ๊ณ„์‚ฐ, ๋˜๋Š” ์‚ฌ์šฉ์ž ์ง€์ • ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค. ์•„๋ž˜๋Š” PythonOperator๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ„๋‹จํ•œ ์˜ˆ์ œ์ด๋‹ค. from airflow import DAG from airflow.operators.python_operator import PythonOperator from datetime import datetime, timedelta # DAG ์ •์˜ dag = DAG( 'python_operator_example', description='Example DAG with PythonOperator', schedule_inte.. 2023. 11. 19.
[Airflow] ์‰˜ ์Šคํฌ๋ฆฝํŠธ, ๋ช…๋ น์–ด ์‹คํ–‰ํ•˜๊ธฐ | BashOperator ์‚ฌ์šฉ BashOperator๋Š” Apache Airflow์—์„œ ์‰˜ ์Šคํฌ๋ฆฝํŠธ๋‚˜ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜๋Š” ์ž‘์—…์„ ์ •์˜ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์—ฐ์‚ฐ์ž์ด๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์™ธ๋ถ€ ํ”„๋กœ๊ทธ๋žจ, ์Šคํฌ๋ฆฝํŠธ ๋˜๋Š” ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‹ค์Œ์€ BashOperator๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ„๋‹จํ•œ ์˜ˆ์ œ๋กœ, ๊ฐ„๋‹จํ•œ Bash ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ์ถœ๋ ฅ์„ ๋กœ๊น…ํ•œ๋‹ค. from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta # DAG ์ •์˜ dag = DAG( 'bash_operator_example', description='Example DAG with BashOperator', sched.. 2023. 11. 19.
[Airflow] ์ผ์ •ํ•œ ๊ฐ„๊ฒฉ์œผ๋กœ DAG ์‹คํ–‰ํ•˜๊ธฐ (์Šค์ผ€์ค„๋ง) | schedule_interval | cron ๊ธฐ๋ฐ˜ ์Šค์ผ€์ค„ Apache Airflow์—์„œ DAG์„ ์ผ์ •ํ•œ ๊ฐ„๊ฒฉ์œผ๋กœ ์‹คํ–‰ํ•˜๋ ค๋ฉด schedule_interval ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ด ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” DAG์ด ์‹คํ–‰๋  ์ฃผ๊ธฐ๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ , ์ฃผ๊ธฐ๋Š” timedelta ๊ฐ์ฒด๋กœ ์ •์˜๋œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋งค์ผ ์‹คํ–‰ํ•˜๋ ค๋ฉด timedelta(days=1)๊ณผ ๊ฐ™์ด ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค. timedelta๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ช‡ ๊ฐ€์ง€ ์˜ˆ์ œ๋ฅผ ์‚ดํŽด๋ณด์ž. timedelta # ๋งค์ผ ์‹คํ–‰ schedule_interval=timedelta(days=1) # 3์ผ๋งˆ๋‹ค ์‹คํ–‰ schedule_interval=timedelta(days=3) # ๋งค์ฃผ ์›”์š”์ผ ์‹คํ–‰ schedule_interval=timedelta(weeks=1, days=1) # ๋งค์‹œ๊ฐ„ ์‹คํ–‰ schedule_interval=timedelta(hours=1.. 2023. 11. 19.
[Airflow] Airflow & DAG ์„ค๋ช… Apache Airflow Apache Airflow๋Š” ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ด€๋ฆฌํ•˜๊ณ  ์Šค์ผ€์ค„๋งํ•˜๊ธฐ ์œ„ํ•œ ์˜คํ”ˆ ์†Œ์Šค ํ”Œ๋žซํผ์ด๋‹ค. Airflow๋Š” ํŒŒ์ด์ฌ ์ฝ”๋“œ๋ฅผ ์ด์šฉํ•ด ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๊ธฐ์— ํŒŒ์ด์ฌ ์–ธ์–ด๋กœ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ๋Œ€๋ถ€๋ถ„์˜ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ ์ปค์Šคํ…€ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์‰ฝ๊ฒŒ ํ™•์žฅ ๊ฐ€๋Šฅํ•˜๊ณ  ๋‹ค์–‘ํ•œ ์‹œ์Šคํ…œ๊ณผ ํ†ตํ•ฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ์ˆ˜๋งŽ์€ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์œผ๋กœ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ •๊ธฐ์ ์œผ๋กœ ์‹คํ–‰ํ•˜๊ณ  ์ ์ง„์  ์ฒ˜๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•˜๊ณ  ์˜คํ”ˆ ์†Œ์Šค๋ผ๋Š” ์žฅ์ ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋งŽ์€ ๊ธฐ์—…์—์„œ Airflow๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹ค. (๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์€ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ์ž‘์—…์„ ์กฐ์งํ•˜๊ณ  ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ์ผ๋ จ์˜ ๋‹จ๊ณ„ ๋ฐ ํ”„๋กœ์„ธ์Šค) Apache Airflow์˜ ํŠน์ง• ์Šค์ผ€์ค„๋ง๊ณผ ๋ชจ๋‹ˆํ„ฐ๋ง Airflow๋Š” ์ž‘์—…์„ ์Šค์ผ€์ค„๋งํ•˜๊ณ  ๊ฐ์‹œํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋จ ์˜ˆ.. 2023. 11. 19.
728x90