Why Data Engineers LOVE/HATE Airflow (FT.

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 ก.ค. 2022
  • Airflow is a favorite tool of many data engineers. But some data engineers dislike it.
    It can be tricky to scale and hard to manage if set up incorrectly.
    Let's talk about it.
    Also, I am an advisor at Mage which is working to make data workflow orchestration easier - www.mage.ai/
    If you enjoyed this video, check out some of my other top videos.
    Top Courses To Become A Data Engineer In 2022
    • Top Courses To Become ...
    What Is The Modern Data Stack - Intro To Data Infrastructure Part 1
    • What Is The Modern Dat...
    If you would like to learn more about data engineering, then check out Googles GCP certificate
    bit.ly/3NQVn7V
    If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
    seattledataguy.substack.com/​​
    Or check out my blog
    www.theseattledataguy.com/
    And if you want to support the channel, then you can become a paid member of my newsletter
    seattledataguy.substack.com/s...
    Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio
    _____________________________________________________________
    Subscribe: / @seattledataguy
    _____________________________________________________________
    About me:
    I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
    *I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.
  • บันเทิง

ความคิดเห็น • 56

  • @SeattleDataGuy
    @SeattleDataGuy  ปีที่แล้ว +2

    If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/ or join the discord here discord.gg/2yRJq7Eg3k

  • @anna__geller
    @anna__geller ปีที่แล้ว +2

    Awesome video, a very balanced perspective without focusing only on strengths or weaknesses of any single tool 👍

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว +1

      Glad you found it helpful! I really was trying to be balanced so I am glad you felt that way.

  • @miguelvera9465
    @miguelvera9465 ปีที่แล้ว +1

    This was very interesting, glad to hear the different insights. Hope to see more collaborations in the community

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว +1

      That is my goal! I really want to get more perspectives than my own.

  • @MarcLamberti
    @MarcLamberti ปีที่แล้ว +13

    Thank you for making this video. I don't want to over promote Airflow because I'm obviously a little bit biased, but I do think a lot of people still know Airflow from version 1.10.X and haven't tried 2.X yet. Many things have been fixed (performances, dag autorhing, UI, etc.). The gap is just huge. Also, I would say the flexibility/freedom that Airflow brings is a double edge sword: You can do a lot, you can configure many things, touch any details to fit perfectly with you needs, but the deeper you go the steeper the learning curve. It's easy to get lost in all features and parameterable things that Airflow brings. However, it's relatively easy IMHO if you just want to run data pipelines and execute a few tasks.❤

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว

      Thank you, yeah I think, as you said, most people use Airflow at a very base level. Even if they are using 2.X. Also, I think a while back you may have had a comment on collabing...I feel like I never got back to you on that

    • @splashoui3760
      @splashoui3760 ปีที่แล้ว

      What is the best way to learn and practise airflow?

    • @sreemantakesh9637
      @sreemantakesh9637 ปีที่แล้ว

      Hi @@SeattleDataGuy . I am seeing lot of people using Airflow to orchestrate ADF in Azure. Is it really worth using it given we already have ADF triggers?

  • @chetansurwade
    @chetansurwade ปีที่แล้ว

    I for one didn't face any issue while working with Xcoms, specially with large dataset using custom backend of Azure Blob storage. And Airflow by design is an orchestrator, so offloading computation is more sensible.

  • @nashaeshire6534
    @nashaeshire6534 2 หลายเดือนก่อน

    Thanks a lot, much appreciate. I plan to use Apache Kafka on log system. In order to add maintenability to my ETL (transform on Kafka and before ElasticSearch), I wish to add air flow. But Apache Kafka connect look pretty good too. Over this 2 solutions, what will you choose for an ELK + Kafka Pipeline ?

  • @sevegarza
    @sevegarza ปีที่แล้ว +11

    Do a video about Prefect!

  • @mehdio
    @mehdio ปีที่แล้ว +2

    Cool journalist approach, glad to have other's opinion included! 👍

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว

      Thank you so much for all your perspective on the topic!

  • @mohamedyasser5285
    @mohamedyasser5285 ปีที่แล้ว +2

    Great video! I would love to hear your opinion about Apache Kedro.

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว +2

      It feels like one day it could be great, but I feel like its still early and needs a stronger community before I would adopt it.

  • @gava5327
    @gava5327 ปีที่แล้ว +1

    Can you review the Meta Database Engineer Professional Certificate on Coursera when it comes out?

  • @rdean150
    @rdean150 ปีที่แล้ว +1

    We've adopted Argo Workflows, which is a Cloud Native Computing Foundation project built on top of Kubernetes.

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว

      Nice! Any pros and cons with that?

  • @mauludinrohman6177
    @mauludinrohman6177 ปีที่แล้ว

    What is the different between airflow and astronomer, can you help me sir ?

  • @minthura24
    @minthura24 ปีที่แล้ว +1

    Thanks for the video.

  • @kevinsu2219
    @kevinsu2219 ปีที่แล้ว +2

    Do a video about flyte

  • @peterbizik224
    @peterbizik224 ปีที่แล้ว +1

    Interesting point of views, thanks for the video. As I see it, technology evolves, but the tech stacks, getting crazy complicated. At the end, mostly it got stuck on the budget, get someone cheap (overpromise data engineer) and you are getting headache, can't move from dev environment and most of the data pipelines are sql at the end. But I could be wrong.

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว

      They are getting crazy, keep things as simple as possible for as long as possible

  • @joshi1q2w3e
    @joshi1q2w3e ปีที่แล้ว +3

    How do you feel about Prefect?

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว +1

      I still haven't got it into production. I believe Madison has a better opinion here. madisonmae.substack.com/p/sorry-i-hate-airflow

  • @Emanuel-yb3qk
    @Emanuel-yb3qk ปีที่แล้ว

    Hi
    I’m a new subscriber and I just saw your video of “roadmap to become a data engineer” and, I wonder if you could advice me a course to learn python.
    You channel is awesome

  • @anildangol
    @anildangol ปีที่แล้ว +6

    I don't think there is best ETL pipeline and I would not bother finding the best one. Each company and team operates differently depending on their skillset, Line of Business & priorities. I never had problem while working in SSIS and rarely have problem while working in Data factory either. Yes, each tool have lots of limitations but you will find a way to overcome those limitations.
    One thing which I liked about Azure Data Factory is its ease of use with no code and extremely cheap to maintain. Yes, I like to code in Python and work on airflow which gives extreme flexibility which I couldn't have it in ADF but if ADF gives me headache then I will go with this tool anyway. I've onboarded a junior dev who have never worked in any ETL tool in a week. It's that easy.
    May time we, data engineers, spend our time tweaking and finding best tools possible in the market but companies hired us to deliver result.

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว +2

      That's fair. Tool wise I think its always up in the air in terms of which is best. I think finding a process that works with the tools you have at hand is probably far more important...because once you switch companies it will be a completely new tool. As you said, results first, fancy tools later.

    • @alexischicoine2072
      @alexischicoine2072 ปีที่แล้ว

      I’ve also found data factory to work well for orchestrating and the low code keeps it simple. The actual steps being orchestrated are already complex enough.

    • @tanmaybagul2957
      @tanmaybagul2957 ปีที่แล้ว

      😅😅

  • @rguez2332
    @rguez2332 ปีที่แล้ว +1

    Is Pentaho PDI used for different purposes than Airflow??

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว +1

      I don't think Pentaho is that popular..but i could be wrong. Where do you use it? Have you used it alot?

    • @rguez2332
      @rguez2332 ปีที่แล้ว +1

      @@SeattleDataGuy Its used for ETL. But I can mention popular tools.
      Im still learnirng and I was wondering if you can ETL,ELT the data with tools like FiveTran/Airbyte/Stitch without using Airflow. Is Airflow used just to automate or you can get the whole ETL process with it?

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว +2

      @@rguez2332 In the past airflow was used for everything. But technically its just an orchestrator. Nowadays people are trying to use other tools like airbyte, dbt + airflow to make pipelines. But thats more for open-source style pipelines. There are so many other tools that people like out there.

    • @rguez2332
      @rguez2332 ปีที่แล้ว +1

      @@SeattleDataGuy thx so much!

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว +1

      @@rguez2332 you're welcome!

  • @robot01001
    @robot01001 ปีที่แล้ว

    I'm halfway through this video and I still don't know wtf AIrflow is. I know it has a k8s operator but I have no idea what it is or why I would use it. Maybe this video is for advanced people.

  • @janswee1
    @janswee1 8 หลายเดือนก่อน

    You should summarize pros and cons in the beginning

  • @datawitharslan
    @datawitharslan ปีที่แล้ว +1

    As a starter in Modern Data Stack , should i learn Prefect or Airflow ? What you recommend

    • @valerianmp
      @valerianmp ปีที่แล้ว +1

      Just pick any one of that, you can always learn the other one later when you need it

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว +2

      Valerian is kind of right. Airflow is far more popular for now. But in tech you're constantly learning. What was important to know this year is old news 3 years from now.

  • @sana-sz5ue
    @sana-sz5ue ปีที่แล้ว +1

    What are peoples thoughts on what data engineer career progression is like because you dont gain a qualification, only work experience???

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว

      Do you mean like certificates?

    • @sana-sz5ue
      @sana-sz5ue ปีที่แล้ว

      @@SeattleDataGuy yes like how can you keep working your way up without qualifications or in the tech industry do certificates work the same way?

  • @lucasbayout195
    @lucasbayout195 ปีที่แล้ว

    Airflow is amazing.

  • @TheSilpelit
    @TheSilpelit ปีที่แล้ว +1

    Why can't you use the well known DevOps tools like Jenkins?

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว +2

      To manage custom data pipelines? I have seen it done. It was pretty hairy though.

  • @user-gl9tr6eq7e
    @user-gl9tr6eq7e 7 หลายเดือนก่อน +2

    Mage AI

  • @ASO-xh5vu
    @ASO-xh5vu ปีที่แล้ว

    This is a perfect channel. My only criticism is "verbosity". Too many words...

  • @samsal073
    @samsal073 ปีที่แล้ว

    Airflow sucks ....it should be thrown in the trash ......doesn't support muli system, all code based which violates low code-no code rules....impossible to install cluster on standalone mode on premise without involving technologies like docker and kubernetes which increases complexity...etc.