Scalable Data Ingestion Architecture Using Airflow and Spark | Komodo Health

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 ม.ค. 2025

ความคิดเห็น • 8

  • @Barnabassteiniger
    @Barnabassteiniger 2 ปีที่แล้ว +1

    Wow. Great speaker. Learned a lot. Nice to see someone is dealing with the same problem.

  • @MarioRugeles
    @MarioRugeles 2 ปีที่แล้ว

    I got a question: Why not use AWS EMR's autoscaling for the spark layer?

  • @sumitkumarsahoo
    @sumitkumarsahoo 4 ปีที่แล้ว

    Can anyone tell me what is that commonization tool to being in schema or columns for transformation or joining? Curious about it, seems it's inhouse built in that organization

  • @Funfina
    @Funfina 2 ปีที่แล้ว

    What could be a common schema ?

  • @dsinghr
    @dsinghr 4 ปีที่แล้ว +2

    Composer vs Airflow: Airflow version upgrade is a nightmare. you won't have to worry about that if you use Composer. Another advantage is ipv4 addresses. As they are limited, you don;t have to think too much about them if you use composer. Imagine you created multiple namespaces for different use cases and each use case has 3-5 different environments, just think about how many IP addresses you would need. You may exhaust you quota pretty quickly that way. So composer is great. But i think it is still in beta.

  • @supermousedd
    @supermousedd 5 ปีที่แล้ว

    Very Cooooool!

  • @atampanday6085
    @atampanday6085 4 ปีที่แล้ว

    why not use EKS?

  • @dsinghr
    @dsinghr 4 ปีที่แล้ว

    why won't you use cloud dataflow on GCP instead of Spark? You then won't have to worry about Kubernetes at all as far as etl is concerned. Airflow itself should definitely run inside Kubernetes.