AWS ec2(Ubuntu)에 Airflow2.0 설치하기

AWS ec2(Ubuntu)에 Airflow2.0 설치하기

Ubuntu에서 airflow2.0 설치하는 방법

참고 https://github.com/keeyong/data-engineering-batch5/blob/main/docs/Airflow%202%20Installation.md

1. python 설치

sudo apt-get update sudo apt-get install -y python3-pip python3 --version Python 3.8.10

2. airlfow 및 기타모듈 설치

sudo apt-get install -y postgresql-server-dev-all

sudo apt-get install -y postgresql-common

sudo pip3 install apache-airflow

sudo pip3 install apache-airflow-providers-postgres[amazon]==2.0.0

sudo pip3 install cryptography psycopg2-binary boto3 botocore

sudo pip3 install SQLAlchemy==1.3.23

3.airflow 계정생성

ubunut의 root계정이 아닌 airlfow user를 생성해서 작업을 진행할 예정

sudo groupadd airflow

sudo useradd -s /bin/bash airflow -g airflow -d /var/lib/airflow -m

루트디렉토리 : /var/lib/airflow/

4. postgre 설치

sudo apt-get install -y postgresql postgresql-contrib

postgre user로 로그인해서 postgre의 USER와 DATABASE생성

#postgre user로그인

ubuntu@ip-172-31-50-243:~$ sudo su postgres

#user,database생성

postgres@ip-172-31-50-243:/home/ubuntu$ psql

psql (10.18 (Ubuntu 10.18-0ubuntu0.18.04.1))

Type "help" for help.

postgres=# CREATE USER airflow PASSWORD 'airflow';

CREATE ROLE

postgres=# CREATE DATABASE airflow;

CREATE DATABASE

postgres=# \q

postgres@ip-172-31-50-243:/home/ubuntu$ exit

exit

#postgresql 재실행

ubuntu@ip-172-31-50-243:~$ sudo service postgresql restart

5.airflow 초기화

# airflow user사용

ubuntu@ip-172-31-50-243:~$ sudo su airflow

airflow@ip-172-31-50-243:/home/ubuntu$ cd /var/lib/airflow

#dags 폴더생성

airflow@ip-172-31-50-243:~$ pwd

/var/lib/airflow

airflow@ip-172-31-50-243:~$ mkdir dags

airflow@ip-172-31-50-243:~$ ls

dags

#airflow 초기화

airflow@ip-172-31-50-243:~$ AIRFLOW_HOME=/var/lib/airflow airflow db init

airflow@ip-172-31-50-243:~$ ls

airflow.cfg airflow.db dags logs webserver_config.py

5.airflow config수정 (ariflow.cfg 파일)

# executor = LocalExecutor

# sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@localhost:5432/airflow

ID와 PW와 데이터베이스 이름이 모두 airflow, 호스트이름 localhost

# "load_examples" 설정을 False로 바꾼다

# airflow 재설정

airflow@ip-172-31-50-243:~$ AIRFLOW_HOME=/var/lib/airflow airflow db init

6.airflow 웹서버 , 스케쥴러 서비스 등록

# ubuntu 계정으로 이동

airflow@ip-172-31-50-243:~$ exit

exit

ubuntu@ip-172-31-50-243:~$

#Airflow 웹서버를 백그라운드 서비스로 등록

ubuntu@ip-172-31-50-243:~$ sudo vi /etc/systemd/system/airflow-webserver.service

[Unit]

Description=Airflow webserver

After=network.target

[Service]

Environment=AIRFLOW_HOME=/var/lib/airflow

User=airflow

Group=airflow

Type=simple

ExecStart=/usr/local/bin/airflow webserver -p 8080

Restart=on-failure

RestartSec=10s

[Install]

WantedBy=multi-user.target

#Airflow 스케쥴러를 백그라운드 서비스로 등록

ubuntu@ip-172-31-50-243:~$ sudo vi /etc/systemd/system/airflow-scheduler.service

[Unit]

Description=Airflow scheduler

After=network.target

[Service]

Environment=AIRFLOW_HOME=/var/lib/airflow

User=airflow

Group=airflow

Type=simple

ExecStart=/usr/local/bin/airflow scheduler

Restart=on-failure

RestartSec=10s

[Install]

WantedBy=multi-user.target

#서비스 활성화

ubuntu@ip-172-31-50-243:~$ sudo systemctl daemon-reload

ubuntu@ip-172-31-50-243:~$ sudo systemctl enable airflow-webserver

Created symlink /etc/systemd/system/multi-user.target.wants/airflow-webserver.service → /etc/systemd/system/airflow-webserver.service.

ubuntu@ip-172-31-50-243:~$ sudo systemctl enable airflow-scheduler

Created symlink /etc/systemd/system/multi-user.target.wants/airflow-scheduler.service → /etc/systemd/system/airflow-scheduler.service.

#서비스 시작

ubuntu@ip-172-31-50-243:~$ sudo systemctl start airflow-webserver

ubuntu@ip-172-31-50-243:~$ sudo systemctl start airflow-scheduler

#서비스 상태확인

ubuntu@ip-172-31-50-243:~$ sudo systemctl status airflow-webserver

ubuntu@ip-172-31-50-243:~$ sudo systemctl status airflow-scheduler

7. Airflow webserver 로그인 어카운트 생성

ubuntu@ip-172-31-50-243:~$ AIRFLOW_HOME=/var/lib/airflow airflow users create --role Admin --username admin --email admin --firstname admin --lastname admin --password admin

[2021-09-03 13:36:03,043] {filesystemcache.py:224} ERROR - set key '\x1b[1m__wz_cache_count\x1b[22m' -> [Errno 1] Operation not permitted: '/tmp/tmpplecjlbc.__wz_cache' -> '/tmp/2029240f6d1128be89ddc32729463129'

[2021-09-03 13:36:03,079] {manager.py:788} WARNING - No user yet created, use flask fab command to do it.

8. Airflow접속

현재 ec2 ubuntu를 사용했으므로, [ec2의 hostname]:8080으로 접속해서 확인

반응형

from http://pearlluck.tistory.com/678 by ccl(A) rewrite - 2021-09-03 23:26:32