From Poll to Push: Revolutionize Your Data Architecture with Airflow and Lambda

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ก.ย. 2024
  • Say goodbye to constantly polling for data updates and embrace a proactive approach that pushes notifications and triggers workflows as soon as data is available using AWS Lambda & AIRFLOW Rest API.
    This video explains , how to enable Apache Airflow to use event-based triggers for real-time data pipelines via the API(and AWS Lambda).
    Documentation Link:
    -------------------------------------
    airflow.apache...
    Code:
    -------------
    github.com/Sat...
    Check this playlist for more Data Engineering related videos:
    • Demystifying Data Engi...
    Apache Kafka form scratch
    • Apache Kafka for Pytho...
    Snowflake Complete Course from scratch with End-to-End Project with in-depth explanation--
    doc.clickup.co...
    🙏🙏🙏🙏🙏🙏🙏🙏
    YOU JUST NEED TO DO
    3 THINGS to support my channel
    LIKE
    SHARE
    &
    SUBSCRIBE
    TO MY TH-cam CHANNEL

ความคิดเห็น • 7

  • @bhumikalalchandani321
    @bhumikalalchandani321 10 หลายเดือนก่อน +2

    Helpful!

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  10 หลายเดือนก่อน

      Thank you Bhumika Lalchandani! Happy Learning

  • @karangupta_DE
    @karangupta_DE ปีที่แล้ว +2

    Great video.

  • @Andonokar
    @Andonokar ปีที่แล้ว +3

    Hello, Great video, great channel, I learned kafka alot from your content and started working with it. 1 question: How this DAG handles multiple requests from multiple files arriving in S3? And if handles bad, what is the best way to orchestrate it? thanks for the attention

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  ปีที่แล้ว +2

      Thank you for watching the video and for your question, Imagine Zat! I'm glad to hear that you find the content helpful. Regarding your question about how the DAG handles multiple requests from multiple files arriving in S3, the answer is _SUCCESS File. The _SUCCESS file is a marker file that is automatically generated by Apache Spark when writing data to a destination, such as a file system (e.g., HDFS, S3). Its purpose is to indicate the successful completion of a write operation. So instead of triggering lambda code for any file write , only if _SUCCESS File is written in s3 , then only Lambda should be triggerd , that way , lambda will run only once and this process make sure , the airflow dag will run only when whole data is written by source system 😊For details , you can refer this video -- th-cam.com/video/otm7Nbmvy3E/w-d-xo.html I hope this clarifies your question. If you have any further inquiries, please feel free to ask.

  • @user-yc3bs3gk2m
    @user-yc3bs3gk2m 7 หลายเดือนก่อน

    Let say, we lambda triggered multiple REST API request to Airflow? How will Airflow handle that scenario? Will it create multiple DAG instances and run concurrently? Or there will be one DAG run with multiple concurrent tasks for each request?