Watching this is really worth the time. Not like other TH-cam channels where they run promotions for a minute or two. Above all, it is really a good video on getting started with Airflow. Great work Karolina. You are an amazing instructor.
@@karolinasowinska Nice video.i really love your installation., hope you dont mind if i post here my yt vid about installation airflow in heroku....thanks and please more vid in airflow...th-cam.com/video/Xa4Rw-cbKIQ/w-d-xo.html
i am data scientist by profession and wanted to learn data engineering in details , i didnt found single free online resource to learn all data engineering skills ..you are doing great job ..waitng for your videos
Amazing work ma'am. I am new to all this and this tutorial was so simple and clear. Your way of explaining is also unique because you talk about errors as well which very few people do.
The video I was waiting for! I'm happy to see it, it's very well presented. It was really useful, I now have a good feeling about how Airflow works. Can't wait to see what's next on your channel :)
I like your tutorials because they are simple. I kinda got stuck biting off more than I can chew and I was going in circles for a while. Data careers are about so many things it's easy to get lost (python, hadoop/spark, airflow, ML, math/stats, visualization, cloud).... I just needed something simple I can do easily to get going.
well , i had a BI (business intelligence) project this year and i had no idea what are those etl and reporting tools , i searched for like 10 days and tested a lot of softwares some of them were usefull and some of them were ... meh and i actually liked airflow and dbeaver and both are in this video what a surprise , for people who wants to test some bi tools tou have (free and open source) : etl :airflow , knime , pentaho DI reporting : superset,also some cool dashboards in pentaho server db RAT and gui tools : dbeaver and also DbSchema data mining : tanagra and weka take a look also at apache kylin (i did not knew how to setup it to get postgres as a datasource so ...) good luck guys and great video lady
@@karolinasowinska thanks for your reply , I would be very happy if you make a video of how to connect apache kylin to nex data sources ( it is painfull lol I searchedand itis not well documented) A serie about bi tools and data manipulation can be a great idea since not a lot of ppl do it on youtube Good luck
Karolina, I was enlightened by your explanation/methodology, helped me a lot to get started with Apache Airflow, mad props for this! Keep up with the work! Cheers from Brazil
@@karolinasowinska it work well. I’m getting data from one customer’s FTP and it is failing using Python 🐍 working on a solution and it will be ready for deployment
Karolina, thank you a lot for your efforts and for making these videos! You've sparkled ingenuine interest in me to try the project out. Plus, it's really great to know that Data Engineering community is empowered by women. I'm only starting my way in DE, so it's great to follow you and learn. Love ❤️
Something that I was looking for. I know Python, SQL, R and a good amount of machine learning. But I didn’t know what to do next. I just search for Apache Airflow and I got this! Thank you!
Thank you for explaining such a complicated topic in a simple way. This will definitely be a help as a foundation to data engineering. Keep up the great work!
Great video Karolina, for those struggling with pip install - I suggest doing a quick learn of conda so you can create a quick conda environment to install airflow without messing up your primary python/pip libs and versions. I agree though, Airflow is so tricky to set up.
Important note - You won't find the airflow directory until you run something on the CLI using airflow Just type in airflow once and hit enter to find the config file
Thank you! Also one needs to run "source airflow-venv/bin/activate" before running the command "airflow". That way you don't get an error that "airflow" command is not found
Ah the joys of finding new errors when you try to install things for a new production environment... this is such an accurate depiction of real engineering life.
Hi Karolina , Thank you for the video. I need a help as am kinda stuck at one place where you will be editing the airflow.cfg file. I use Macbook. I couldnt find the file even after installing the aiflow. I dont see the folder airflow at all in spite of giving the command "export AIRFLOW_HOME=/airflow" . Need your help on this.
Nice video. And about direc acyclic graphs - actualy, you could draw an arrow from 3 to 2 in the graph you showed as an example ^_^ (because there was no way to go back from 2 to 3, so it would not make a cycle)
This is the first time I've heard about Airflow. Apache Airflow is an open-source workflow management platform for data engineering pipelines. It started at Airbnb in October 2014[2] as a solution to manage the company's increasingly complex workflows. Creating Airflow allowed Airbnb to programmatically author and schedule their workflows and monitor them via the built-in Airflow user interface.[3][4] From the beginning, the project was made open source, becoming an Apache Incubator project in March 2016 and a top-level Apache Software Foundation project in January 2019. Airflow is written in Python, and workflows are created via Python scripts.
Really impressive explanations and teaching approach. You were concise but covered so many small in-between points that I would have otherwise missed. I'm definitely subscribing and going to watch other videos! My only complaint would be the resolution of the capture of the VS Code window - can it be a 16:9 ratio? It was so small on my phone.
Great video for airflow beginner!! I have tried to run airflow for too much time and always stuck even before start webserver. This is the first time I successfully run it!! For anyone who is also new with Airflow, I got some small issue when I follow with the video. Here is how I solve it, just in case anyone encounter the same issue. To start airflow 1. After install airflow, need to run airflow first, to create the airflow.cfg in the home.(If you haven't run it before) Simply type "airflow" will do the work. I didn't run it first, so I couldn't find the cfg file anywhere. 2. I also need to run "airflow db init" to create the db for logs. 3. Last, I need to create a user before I use the webserver, otherwise there will no user for me to login. These steps are available in the airflow document quick start as well. To run the dag as in the video. 4. I switched the toggle to ON in the dag view, otherwise the task will remain in running forever. 5. To run the extract.py or run_spotify.py, I need to put extract.py in the dags folder first. I just put the file directly in the dag folder, but I saw others put the whole python package(subfolder with __init.py) in the dag folder. The latter approach is better for bigger project with no doubt. But I still want to know does everyone put the packages directly in dag folder in real world? Since it's still a little messy for me putting dags file with scripts itself. A few questions I have though, should I terminate and restart airflow scheduler everytime I change my script or it will pick up automatically? I am still having token expired issue when run the script in airflow, even I updated my token in the script and ran fine in local machine. But it is a awesome video to me! Thanks to Karolina!
This is a great tutorial. I am having difficulties setting this up in Windows 10 environment. I was able to setup the virtual environment, but the install process for Airflow differs.
Nice video! One thing to think about concerning running Docker containers from the DAG: Airflow 1.x apparently has an issue which leaves containers in a non-started state. (At least it was a problem in our environment.) Airflow 2.0 seems to have resolved it. Thank you for making this video.
I had to run `airflow --help` after installing or the airflow folder and .cfg file were never created. Any airflow command should trigger the generation of the expected directory and files.
You're 100% right, it would. But there is a way to automatically download the token and have that in our script, so that it is fresh every time the program runs:)
@@karolinasowinska Thanks, I'm using authorization code flow (OUTH 2.0) from the Spotify website. Where I had to manually get the auth code every couple of hours. I'll try your method.
3:00 In my case, I encountered a different error and it required upgrading pip to its latest version and adding include-system-site-packages = true into the pyvenv.cfg file.
Great video. Is there a way to pass configurations to the DAG and also they can be accessed by different tasks with in the dag? I am aware of XCOM and Variables etc. But is there a way a config file in form of json or yaml can be passed to the dag? And without using xcomm or variables from admin menu is there any other way to set and get values across diffetnt tasks with in dag?
If you'd like to learn data engineering, I recommend following the 4 simple steps below to land you the first job interview: 1. Learn Python I recommend following the Python for Everybody specialization course on Coursera, which is one of the most popular courses there: imp.i384100.net/x9gVO3 2. Learn SQL SQL is still the lingua franca of data. I recommend going with Learn SQL Basics for Data Science course, because it contains some chapters which are very releavant to data engineering in partcular, e.g. distributed computing with Spark imp.i384100.net/QOMZ09 3. Learn Bash scripting/Linux I wouldn't take a full course on it, but at least read a good article. if you do prefer to take a course/guided project, I think this one is short and good: www.coursera.org/projects/command-line-linux 4. Learn how to develop on the cloud, e.g. on AWS There are a few good courses around there, but I think the Coursera one is the most comprehensive imp.i384100.net/P0MJBM
Certainly! I'd actually soon be releasing my own course on how to enter the data career. If you'd like to get alerted, feel free to drop your email address :)
Hello, you agree that the laptop where you configure airflow etc.. needs to be on to run the etl daily ? If the laptop is off nothing will happen right ?
One of the best etl series I've ever watched on youtube... thank you.
Watching this is really worth the time. Not like other TH-cam channels where they run promotions for a minute or two. Above all, it is really a good video on getting started with Airflow. Great work Karolina. You are an amazing instructor.
I really appreciate this comment, thanks so much Teja!
@@karolinasowinska Nice video.i really love your installation., hope you dont mind if i post here my yt vid about installation airflow in heroku....thanks and please more vid in airflow...th-cam.com/video/Xa4Rw-cbKIQ/w-d-xo.html
I'll be needing more of these airflow tutorial
I'll try to do my best! :)
i am data scientist by profession and wanted to learn data engineering in details , i didnt found single free online resource to learn all data engineering skills ..you are doing great job ..waitng for your videos
I'm super glad my videos are useful! :)
@@karolinasowinska
Can you please suggest free online resources to learn end to end data engineer
Amazing work ma'am. I am new to all this and this tutorial was so simple and clear. Your way of explaining is also unique because you talk about errors as well which very few people do.
Thank you, I'm glad you like my talking style! :)
The video I was waiting for! I'm happy to see it, it's very well presented.
It was really useful, I now have a good feeling about how Airflow works. Can't wait to see what's next on your channel :)
Aw I'm super glad to hear that it met your expectations! Thanks! :)
I like your tutorials because they are simple. I kinda got stuck biting off more than I can chew and I was going in circles for a while. Data careers are about so many things it's easy to get lost (python, hadoop/spark, airflow, ML, math/stats, visualization, cloud).... I just needed something simple I can do easily to get going.
I found it very hard in the documentation, the book... then I found your video. Thanks a lot!
your content is getting better every time
Oh, really? I'm super glad you think so! :)
well , i had a BI (business intelligence) project this year and i had no idea what are those etl and reporting tools , i searched for like 10 days and tested a lot of softwares some of them were usefull and some of them were ... meh
and i actually liked airflow and dbeaver and both are in this video what a surprise ,
for people who wants to test some bi tools tou have (free and open source) :
etl :airflow , knime , pentaho DI
reporting : superset,also some cool dashboards in pentaho server
db RAT and gui tools : dbeaver and also DbSchema
data mining : tanagra and weka
take a look also at apache kylin (i did not knew how to setup it to get postgres as a datasource so ...)
good luck guys and great video lady
Awesome, thanks so much for the tips! :)
@@karolinasowinska thanks for your reply , I would be very happy if you make a video of how to connect apache kylin to nex data sources ( it is painfull lol I searchedand itis not well documented)
A serie about bi tools and data manipulation can be a great idea since not a lot of ppl do it on youtube
Good luck
Karolina, I was enlightened by your explanation/methodology, helped me a lot to get started with Apache Airflow, mad props for this! Keep up with the work!
Cheers from Brazil
I'm so glad that my effort didn't go to waste! Thanks for your comment! :)
"back to downgrading our future" got me cracked up
This is such a great introduction to Airflow. I already designed one pipeline and I am ready to implement it. Thank you so much.
That's fantastic, how did it go? :)
@@karolinasowinska it work well. I’m getting data from one customer’s FTP and it is failing using Python 🐍 working on a solution and it will be ready for deployment
@@AlexAcostaB Awesome stuff :)
She a real one, you can tell because she's showing all the problems she's running into
What a great tutorial, the best I've seen so far for Airflow! Thank you very much
Karolina, thank you a lot for your efforts and for making these videos! You've sparkled ingenuine interest in me to try the project out. Plus, it's really great to know that Data Engineering community is empowered by women. I'm only starting my way in DE, so it's great to follow you and learn.
Love ❤️
Thanks so much for this lovely comment! Good luck on your DE journey and I hope I'll see you around here! :)
Awesome content!! I ve been struggling with that stuff for few months. Thanks for sharing
Something that I was looking for. I know Python, SQL, R and a good amount of machine learning. But I didn’t know what to do next. I just search for Apache Airflow and I got this! Thank you!
Great video, thanks a lot Karolina! I really like your clear way to explain, which is straight to the point and your great energy!
what a well done mentor's job you are doing.
Thank you!
Could get on the concepts and working with Airflow just by watching the video. Very much helpful video to get started with. Amazing!
Glad you enjoyed it!
Thank you for explaining such a complicated topic in a simple way. This will definitely be a help as a foundation to data engineering. Keep up the great work!
Great video Karolina, for those struggling with pip install - I suggest doing a quick learn of conda so you can create a quick conda environment to install airflow without messing up your primary python/pip libs and versions. I agree though, Airflow is so tricky to set up.
So helpful! Thank you so much for this mini series, I've learnt alot.
My pleasure, I'm glad you're finding it useful:)
Best explanation I’ve see about DAGs and super helpful intro to Airflow. Makes complete sense. Thank you!
I'm glad this was helpful! :)
The moment I saw the video
Thought she has over a million subscribers
She deserves more subscribers and also more views....
Oh, that's very nice to hear, thanks! :) 🙏
More on airflow please! This was great!
I'll see what I can do! I'm glad you enjoyed it! ;)
Thank you for the neat video !
I'm new to data engineering and may be nailing my upcoming job interview thanks to you
Wow fingers crossed!;)
Thanks very much. I come from Vietnam. Right now intern Data Engineer. I hope you can do more topics on Data Engineers in the near future
Hello there! Nice to hear from a fellow techie. I will do for sure! ;)
Important note - You won't find the airflow directory until you run something on the CLI using airflow
Just type in airflow once and hit enter to find the config file
Thank you! Also one needs to run "source airflow-venv/bin/activate" before running the command "airflow". That way you don't get an error that "airflow" command is not found
Excellent video! Thank you very much Karolina!
Ah the joys of finding new errors when you try to install things for a new production environment... this is such an accurate depiction of real engineering life.
Exactly! ;)
Your video was exactly what you said it would be. An introduction. VERY GOOD JOB! Thank you.
Yes indeed, it'd be very hard to discuss details in a 15-minute video! I'm glad you liked it! :)
Hi Karolina , Thank you for the video. I need a help as am kinda stuck at one place where you will be editing the airflow.cfg file. I use Macbook. I couldnt find the file even after installing the aiflow. I dont see the folder airflow at all in spite of giving the command "export AIRFLOW_HOME=/airflow" . Need your help on this.
Great content! Just what I needed before starting my data engineering courses
Glad it was useful! :)
Love the extract, transform, load part! ❤️
Thank you for posting such relevant content. These are really worth it!
My pleasure! :)
Nice video. And about direc acyclic graphs - actualy, you could draw an arrow from 3 to 2 in the graph you showed as an example ^_^ (because there was no way to go back from 2 to 3, so it would not make a cycle)
I love Karolina , you are the best
Thank you! I loved the videos. You explained core concepts in a clear and simple way, well done :)
Glad you like them! :)
Wow, thank you so much Karolina! It helped me a lot with my project!
My pleasure!
Thank you Karolina, very useful, totally no waste of time.
I'm glad it was useful! ;)
Thanks for this introduction. I have been wondering what the big deal is with Airflow. Now I see the potential.
My pleasure! :)
DAG = a directed collection of tasks without going back. Thanks!!!!!!!!!
You can remember it this way too :)!
This is the first time I've heard about Airflow. Apache Airflow is an open-source workflow management platform for data engineering pipelines. It started at Airbnb in October 2014[2] as a solution to manage the company's increasingly complex workflows. Creating Airflow allowed Airbnb to programmatically author and schedule their workflows and monitor them via the built-in Airflow user interface.[3][4] From the beginning, the project was made open source, becoming an Apache Incubator project in March 2016 and a top-level Apache Software Foundation project in January 2019.
Airflow is written in Python, and workflows are created via Python scripts.
The nano tip is very useful!! Very good content! Thank you!
Glad it was helpful!
Wuau, an amazing video tutorial. I love your videos :)
Excellent Karo !!!! 💪💪💪
Amazing work, Karolina. You teach very well. Thank you so much.
This is really helpful for Airflow beginners like I am. I am appreciated your work a lot. Keep working those topic like this, girl ;)
I'm glad you enjoyed it! I will keep it up ;)
You're really good to explain the basic concepts, Thanks so much for sharing, definitely you gonna win a lot of subscriptors
You are a clear communicator. Thank you.
So much help here. You have wonderful skills for teaching.
Glad to hear that! ;)
Thank you so much Karolina! I learnt alot today by watching your video. :)
Really impressive explanations and teaching approach. You were concise but covered so many small in-between points that I would have otherwise missed. I'm definitely subscribing and going to watch other videos!
My only complaint would be the resolution of the capture of the VS Code window - can it be a 16:9 ratio? It was so small on my phone.
Thanks! I'll try to improve resolution going forward ;)
Wow, simply amazing!
@10:16 why should we avoid passing data to operator/Task from its predecessor operator? Passing data enables creating a dynamic pipeline
Great video for airflow beginner!! I have tried to run airflow for too much time and always stuck even before start webserver.
This is the first time I successfully run it!!
For anyone who is also new with Airflow, I got some small issue when I follow with the video.
Here is how I solve it, just in case anyone encounter the same issue.
To start airflow
1. After install airflow, need to run airflow first, to create the airflow.cfg in the home.(If you haven't run it before)
Simply type "airflow" will do the work. I didn't run it first, so I couldn't find the cfg file anywhere.
2. I also need to run "airflow db init" to create the db for logs.
3. Last, I need to create a user before I use the webserver, otherwise there will no user for me to login.
These steps are available in the airflow document quick start as well.
To run the dag as in the video.
4. I switched the toggle to ON in the dag view, otherwise the task will remain in running forever.
5. To run the extract.py or run_spotify.py, I need to put extract.py in the dags folder first.
I just put the file directly in the dag folder, but I saw others put the whole python package(subfolder with __init.py) in the dag folder.
The latter approach is better for bigger project with no doubt. But I still want to know does everyone put the packages directly in dag folder in real world?
Since it's still a little messy for me putting dags file with scripts itself.
A few questions I have though, should I terminate and restart airflow scheduler everytime I change my script or it will pick up automatically?
I am still having token expired issue when run the script in airflow, even I updated my token in the script and ran fine in local machine.
But it is a awesome video to me! Thanks to Karolina!
Thanks for the amazing tutorials, just curious why you created a different virtual env for airflow ?
Best.... keep posting 🙏🙏
This is a great tutorial. I am having difficulties setting this up in Windows 10 environment. I was able to setup the virtual environment, but the install process for Airflow differs.
thanks for each & every videos 👍
My pleasure!! :)
hello karolina,
i am just college grad and wann learn how to start carrier in data analysis
Thanks Karol :)
Nice video thanks. But 6:45 you can draw an arrow from 3 to 2 and it will still be a dag
Nice video! One thing to think about concerning running Docker containers from the DAG: Airflow 1.x apparently has an issue which leaves containers in a non-started state. (At least it was a problem in our environment.) Airflow 2.0 seems to have resolved it. Thank you for making this video.
You are the best, Thank you!
AMAZING VIDEOOOOOOO! Thank you for your time!
Thank you!! :)
Great content! Thanks, it was very helpful!
Love your tutorials. Well done :)
Thanks so much! :)
I’d love more Airflow videos
I'll see what I can do :)
I had to run `airflow --help` after installing or the airflow folder and .cfg file were never created. Any airflow command should trigger the generation of the expected directory and files.
Great video, short and to the point.
But, I was wondering if the job executes daily, wouldn't our token expire. Maybe we have to update it manually.
You're 100% right, it would. But there is a way to automatically download the token and have that in our script, so that it is fresh every time the program runs:)
@@karolinasowinska Thanks, I'm using authorization code flow (OUTH 2.0) from the Spotify website. Where I had to manually get the auth code every couple of hours. I'll try your method.
More on airflow? Any new video?
Also if you could also do a tutorial on LUIGI...we could compare easily which one to chose from
Fantastically well explained, thanks for sharing. Shame Airflow such a pita to install, easiest route I found via docker option. Subbed
Thanks! :) Shame indeed... I guess it depends on your environment, but it was a pain for me for sure haha!
Thanks. It was very helpful for me!
More more more courses on data engineering pleaseeeeee. 🥰🥰🥰🥰🥰
In progress! :)
14:02 Great taste in music 👍
Great video! Very helpful. Do you plan to make new videos about data engineering and Airflow?
Thank you for the Spotify api demo. Luckily I have installed airflow 2.0 version today in my ubuntu 18.04 with no errors.
Oh wow, looks like I was unlucky with my environment setup, good to hear that your installation went smoothly!
Great job!!!
Thank you very much, this style of quick start works really well for me, thank you for making it and sharing it!
Glad it helped!
Also, before the scheduler would run, after installation, I had to run `airflow db init` the the scheduler started right up.
What about apache spark ?? I am waiting to see in your tutorials about it
Thanks too much : )
I'll see what I can do!;)
3:00 In my case, I encountered a different error and it required upgrading pip to its latest version and adding include-system-site-packages = true into the pyvenv.cfg file.
just watched your short data engineering course, very helpful intro to the topic. Thought I should mention that the playlist is out of order, though.
Thanks for that! I'll try to fix the order
Thank you very much!
superb explanation! thanks
Love you and thanks.
You so Funny! I like soooooo much your video! Thanks!!! 😍😍😍😍❤️❤️
My pleasure! :)
Great video.
Is there a way to pass configurations to the DAG and also they can be accessed by different tasks with in the dag?
I am aware of XCOM and Variables etc. But is there a way a config file in form of json or yaml can be passed to the dag? And without using xcomm or variables from admin menu is there any other way to set and get values across diffetnt tasks with in dag?
Hey! Thanks for this... One question though : How does one implement the same thing in AWS?
That's a topic for another video! :)
@@karolinasowinska Hey... It'll be great if you could make a video on this !
There are so many tools in the market for data extraction and loading.Which one to choose?
very cool video! well done!
If you'd like to learn data engineering, I recommend following the 4 simple steps below to land you the first job interview:
1. Learn Python
I recommend following the Python for Everybody specialization course on Coursera, which is one of the most popular courses there:
imp.i384100.net/x9gVO3
2. Learn SQL
SQL is still the lingua franca of data. I recommend going with Learn SQL Basics for Data Science course, because it contains some chapters which are very releavant to data engineering in partcular, e.g. distributed computing with Spark
imp.i384100.net/QOMZ09
3. Learn Bash scripting/Linux
I wouldn't take a full course on it, but at least read a good article.
if you do prefer to take a course/guided project, I think this one is short and good:
www.coursera.org/projects/command-line-linux
4. Learn how to develop on the cloud, e.g. on AWS
There are a few good courses around there, but I think the Coursera one is the most comprehensive
imp.i384100.net/P0MJBM
Hey, Karolina! Please check the Coursera link at item 4 because it's returning "bad merchant".
Ma'am, please share some learning course on Python related toData engineer.
Certainly! I'd actually soon be releasing my own course on how to enter the data career. If you'd like to get alerted, feel free to drop your email address :)
Very concise nice tutorial ! Thanks
Beginner view on deploying managed airflow ETL task on AWS?
Also, AWS airflow vs glue?
Awesome!!!!!!!!!!!!.New subscriber! . How could it be done with postgreSQL instead of sqlite!?
You just need to create a postgreSQL database instead :)
I love your channel, tahnk you so much, I am learning a lot.
I'm super glad to hear that! :))
Hello, you agree that the laptop where you configure airflow etc.. needs to be on to run the etl daily ? If the laptop is off nothing will happen right ?