Great to know the DataQuality checks which are added before writing the transformed data in data warehouse. Will take this as best practise for our ETL jobs as well..
write_timestamp helps understand the freshness of the data. The Metadata UI picks up this field and shows the user the last timestamp when the table was written into. The batch_id timestamp isn't directly used by anything except to mark the latest version of the data. Let's say partition of a table is re-written, it would go into a new sub-directory with a new batch_id which now holds the latest data.
Great to know the DataQuality checks which are added before writing the transformed data in data warehouse. Will take this as best practise for our ETL jobs as well..
The journaling step seems ingenious except for the part where the entire table needs to be re written each time. Any way around that?
How do you use timestamp written to metadata? @4:57
write_timestamp helps understand the freshness of the data. The Metadata UI picks up this field and shows the user the last timestamp when the table was written into.
The batch_id timestamp isn't directly used by anything except to mark the latest version of the data. Let's say partition of a table is re-written, it would go into a new sub-directory with a new batch_id which now holds the latest data.
show me the code