ไม่สามารถเล่นวิดีโอนี้
ขออภัยในความไม่สะดวก
Airflow Docker: run Airflow 2.0 in docker container
ฝัง
- เผยแพร่เมื่อ 30 ก.ค. 2024
- Airflow Docker: run Airflow 2.0 in docker container
#Airflow #AirflowTutorial #Coder2j
========== VIDEO CONTENT 📚 ==========
Today I am going to show you how to get Apache Airflow 2.0 running in Docker step by step. By watching this video, you will know:
👉 What is Docker and Docker Compose
👉 How to run Airflow in Docker
Video Request: forms.gle/UMp4GA3krcSMMWzy9
You don't know what Apache Airflow is yet? Check out my 8 minutes Introduction Tutorial video for Apache Airflow 2.0. • Airflow introduction a...
========== L I N K S 🔗 ==========
GitHub Repo 👉 bit.ly/3HD5oTX
Airflow Documentation 👉 bit.ly/3wbTqv4
Docker for Mac OS 👉 dockr.ly/3cz54Hh
Docker for Windows 👉 dockr.ly/3r1CbMm
========== T I M E S T A M P ⏰ ==========
00:00 - What is Docker
00:54 - Install Docker and Docker Compose
03:00 - Create a local python environment
05:10 - Download and customize the docker-compose.yaml file
06:00 - Initialize the Airflow DB
06:33 - Launch the Airflow in Docker
========== Connect with me 👏 ==========
Twitter 👉 / coder2j
Website 👉 coder2j.com
GitHub 👉 github.com/coder2j
Awesome! Pretty clear explanation. Looking forward to more advanced content.
My airflow environment was so slow, after this run like a charm, thank you!
Hello coder2j...
Thanks for the clear explanation, I'm going to try this at home tonight. Gotta learn fast.
Looking forward for more content! ^^
Have fun! :-)
Very clear and helpful tutorial so far, really appreciate it!
Thank you!
hola coder2j, estuvo super! muchas gracias
Great content. Pls keep doing it
THAT WAS GREAT! SUBBED!
This is soo amazing the best tutorial by far!! Thank you so very much!!!! amazing!!
Glad it helped!
Thanks man, you saved my life. Love from india
Hey and thanks for the tutorial! It is great! It also would be nice to see the terminal commands that you use in the videos. :)
Do you mean certain terminal commands are not visible in the video or you suggest having them in the video description?
Great tutorial. amazing explanation. thank u so much
You are welcome.
thank you so much sir, finally i got airflow installed well
You are welcome. 🤗
Thankyou so much for the amazing explanation.
Thank you!
i like the way you say "excutable"
very helpful, thanks
You are welcome!
thanks for sharing
Thanks for watching!
Need this same to install kafka, could be possible a tutorial, thanks a lot
How can we setup this for multiple environments like Dev, Prod can you please guide us through?
finally!!
docker desktop is stuck on "starting..." I've tried pretty much everything suggested on stack to fix it(wsl --update). Any ideas? I'm on windows 10
Thanks for the video. It can't get clearer than this. I was wondering: What if I decide not to edit the docker-compose.yaml file? Does it really matter?
The only difference is that you will be use CeleryExecutor instead of LocalExecutor if you don't change anything.
Came to learn airflow, stayed for boom 💥
🙌🙌
Boom!! I did it
Nice to hear that.
Great vid! How can I remove properly it's postgresql service and volume? I am trying to just compose up a airflow service then hook it to my postgresql container, I kept getting error upon composing
You can remove the postgres definition in the docker compose yaml file.
Good to start 2.0; I have a question how to add python libraries into the image like usually we do RUN pip install
The easiest way to do that is to extend the apache airflow official docker image. So basically you create a Dockerfile as following:
FROM apache/airflow:2.0.1
COPY requirements.txt /requirements.txt
RUN pip install --user --upgrade pip
RUN pip install --no-cache-dir --user -r /requirements.txt
You will have to create a requirements.txt file in the same directory as the Dockerfile which will be copied into the image and installed.
Then you use docker build command to build the extended image:
docker build . --tag my_airflow:latest
After that, you need to replace the airflow docker image name from the official image to your extended image my_airflow:latest in the docker-compose.yaml file. That's it, the rest steps will be the same, you call the docker-compose up airflow-init and docker-compose up to launch the airflow webserver and scheduler.
@@coder2j Yes; I figured out that on same day just after posting my comments :-) . We have airflow 1.x setups in our project having everything in "requirements.txt" which executes by "entrypoint.sh" during container initialization (refer to 1.x git and entrypoint.sh); and we were struggling to add that way in 2.x poc environment. Later we have found all these details in 2.x git (refer to Dockerfile of 2.x)... but thanks for replying. Looking forward to see more videos like task chaining, dag chaining, dynamic task creations on the fly to leverage multi-processing in parallel. I am reading those from 2.x documents, but good to have those in videos. Thanks again.
You are welcome! I am glad to hear that you found a solution. :-)
@cookie you are welcome!
I am trying to add a new dag in the dags folder, but I am getting "Import airflow could not be resolved" error in my vscode. Whats the best way to fix this? Thanks in advance.
If you are running airflow in docker, your airflow package dependency is installed in docker container, which is not visible to VSCode. Therefore, you can either ignore the error or create a python environment in VSCode, install airflow and tell VSCode the path of your Python environment. The issue could be resolved!
Hi coder2j, just want to ask if there is any big difference in running airflow on Kubernetes and on Docker. I know that Kubernetes can auto reallocate resources to other Pods when some other Pods re done. Would the airflow on docker do the same? Thank you so much!
They are different. Airflow on docker means running airflow in docker container runtime. Kubernetes is a tool to orchestrate container bases application running on a cluster of servers. Running airflow in (docker) container doesn't mean it has auto scale out of box, but it is a prerequisite for tools like kubernetes to manage it in scale.
@@coder2j Thank you so much for your reply!
In practice, do we commonly used Kubernetes to manage the airflow in docker? I found it is fairly complicated to do that even we use the Helm chart.😅
It depends on the way you use airflow. If you outsource the heavy computation, like to spark cluster. Airflow is only doing the basic scheduling and management jobs, which don't need a lot of resources. Otherwise, you need to scale the airflow either using CeleryExecutor or kubernetes.
@@coder2j Thank you for your reply!
I got your points, they really make sense. Thank you.
Thanks
Coder2j, thanks for this great video... Please, I am having problems with docker-compose up airflow-init. I'm getting this error consistently
docker-compose up airflow-init
[+] Running 0/15
⠹ postgres Pulling 10.1s
⠸ f1f26f570256 Pulling fs layer 1.2s
⠸ 1c04f8741265 Pulling fs layer 1.2s
⠸ dffc353b86eb Pulling fs layer 1.2s
⠸ 18c4a9e6c414 Waiting 1.2s
⠸ 81f47e7b3852 Waiting 1.2s
⠸ 5e26c947960d Waiting 1.2s
⠸ a2c3dc85e8c3 Waiting 1.2s
⠸ 17df73636f01 Waiting 1.2s
⠸ 124bb42a3852 Waiting 1.2s
⠸ dfb19482a052 Waiting 1.2s
⠸ bbb12a596105 Waiting 1.2s
⠸ aa8960c4e383 Waiting 1.2s
⠸ fdbdb6eba8dc Waiting 1.2s
⠿ airflow-init Error 10.1s
Error response from daemon: pull access denied for extending_airflow, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Any ideas please?
If you are running with source code from GitHub repo, make sure check this commit. github.com/coder2j/airflow-docker/commit/576fb2f78549c62d554e1675af0045956f7f0d69
Hi coders2j, is the password "airflow" in the yaml file different from the password of postgres running in the machine?
No, they are the same. Check if you already have postgres instance running locally on port 5432.
If you want to keep it running on CeleryExecutor what is the difference in effect between that and LocalExector?
Using CeleryExecutor you have the possibility to scale up with more workers. But If you are running it on a single machine, there is not much difference as LocalExecutor.
@@coder2j Ok, thanks very much!
Can i ask why in the step install airflow in airflow_tutorial, I can open the web UI, but to the vid airflow_docker i cannot open the web UI although i have done exactly as you instructed. Please give me some helps, i have been stucking with that for 2 days
Please check the log in airflow webserver and see what is the error.
I was wondering, why you deleted airflow worker on docker compose and what the reasons? Is it fine run airflow without airflow worker?
If we use local executor, all the airflow jobs run in the scheduler container. Workers are needed if you use distributed setup, like celery executor for example.
i've been following your guidance, but when i'm about to test run dag manually, it always running but never finished... when i see .log file, it's some kinda looping... do you know why? thanks for the reply
It's hard to tell where exactly went wrong with the info provided. I think you can try to check your dag implementation. It might have some loop logic that never stops.
@@coder2j i'm running example_bash_operator dags
im sorry, it's my bad, i didn't turn on the dag and just found out it won't running even you click it manually... sorry beginner error
hey, I used your exact steps but my containers for the scheduler and webservice keep restarting. thus i cannot visualize anything!! please help
What about your postgres container? Does it also keep restarting? Check and compare your docker-compose.yaml file with this github.com/coder2j/airflow-docker/commit/576fb2f78549c62d554e1675af0045956f7f0d69
when I run the 'airflow webserver -p 8080 ' command: error import pwd
ModuleNotFoundError: No module named 'pwd'
I need some help!!! thanks
Why do you need this command if you are running airflow in docker?
Airflow installing on docker gives message to upgrade airflow db. But when I try airflow db upgrade it get error: airflow command not found. Please help
Can you share your docker compose yaml file?
@@coder2j I was able to start airflow webserver, I had to enable my permissions to my path. But, now I ran into different error: "latest-test-repo-airflow-webserver-1 | error: option --workers not recognized" and "latest-test-repo-airflow-scheduler-1 | error: invalid command 'scheduler'". Also, please can you let me know how and where can I share docker-compose yaml file with you.
unable to install docker
how can I add some python packages, I mean pyspark, s3, and so on?
Check out this video: th-cam.com/video/0UepvC9X4HY/w-d-xo.html
how to install new python module in installed airflow via docker ?
You can check out this video th-cam.com/video/0UepvC9X4HY/w-d-xo.html
@@coder2j i get isuue ModuleNotFoundError: No module named 'pymysql'
even though I have added pymysql in the requirements.txt file
PyMySQL==1.0.2
Is that necessary to install docker
Theoretically you can run it locally to follow the tutorial, but It is recommended to install docker as the following videos are running airflow in docker.
i forfot to input -d
Without -d, the container will run in the foreground.
username is airflow but is the password?
The password is also airflow in the demo I shown.