Modularized ETL Writing with Apache Spark

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 ธ.ค. 2024

ความคิดเห็น • 5

  • @ajaychouksey1234
    @ajaychouksey1234 3 ปีที่แล้ว +2

    Great to know the DataQuality checks which are added before writing the transformed data in data warehouse. Will take this as best practise for our ETL jobs as well..

  • @shubhamsannyasi794
    @shubhamsannyasi794 2 ปีที่แล้ว

    The journaling step seems ingenious except for the part where the entire table needs to be re written each time. Any way around that?

  • @vinayemmadi9467
    @vinayemmadi9467 3 ปีที่แล้ว

    How do you use timestamp written to metadata? @4:57

    • @neeleshsalian1912
      @neeleshsalian1912 3 ปีที่แล้ว +1

      write_timestamp helps understand the freshness of the data. The Metadata UI picks up this field and shows the user the last timestamp when the table was written into.
      The batch_id timestamp isn't directly used by anything except to mark the latest version of the data. Let's say partition of a table is re-written, it would go into a new sub-directory with a new batch_id which now holds the latest data.

  • @santoshkumargouda6033
    @santoshkumargouda6033 2 ปีที่แล้ว

    show me the code