Evolving Data Pipelines at Scale

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ต.ค. 2024

ความคิดเห็น • 2

  • @1988YUVAL
    @1988YUVAL 5 หลายเดือนก่อน

    Very interesting presentation. Looks like a very well thought out solution for managing data transformations. I wonder if it will take off like dbt.

  • @tratkotratkov126
    @tratkotratkov126 5 หลายเดือนก่อน

    Great, very much needed and promising project ! However, it is not quiet clear what do you mean when you are talking about data versioning (DV) - do you version the data as LakeFS does or you are just versioning the source code which is producing this data. Also the diagrams in the presentation (Virtual/Physical layers) I find confusing and not easy to grasp at first glance. It will be nice in the next iteration if you use some real world/practical entities to describe demo objects like customer, product, sales etc. instead of just “source” and wrap the demo in some quick story like “Meet Alex, the data engineer at TechCorp, a rapidly growing tech company. Alex is responsible for managing the company’s data pipelines, ensuring that data from various sources is clean, consistent, and available for analysis” etc. you got the idea. Finally I would suggest you switch the sequence and the time you spend on the theory and the demo part - show your fantastic open source project demo first and how easy is implementing the 3 concepts in meaningful story then after each segment just mention the theoretical part, but don’t allow the theory to consume 75% of your presentation unless you want to be considered as one of the many Data Governance “gurus” which are presenting on this channel. Whishing you all good luck with this fantastic project !