The future of Delta Lake and Apache Iceberg with Tathagata Das

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 ม.ค. 2025

ความคิดเห็น •

  • @utsavchanda4190
    @utsavchanda4190 5 หลายเดือนก่อน

    Really insightful discussion. Thank you for that. Honestly, I've always wondered whether these lakehouses built on open format tables can guarantee the same performance as MPP warehouses. And the biggest reason for that concern has been how in delta every operation (insert, update or delete) is essentially an insert (new file) under the hood. And then there are other considerations like small file problems and optimized writes. And always felt there was a significant development/operational overhead in terms of running OPTIMIZE, Z-ORDER and now enabling DELETING VECTORS in order to keep the tables performant as they grow. Does LIQUID CLUSTERING take that overhead away from customers and make their life easier? I know Databricks promises intelligent optimization and automatic clustering for managed tables but what about external tables because most companies would be having external tables where the underlying files are in their realm.

  • @jeanchindeko5477
    @jeanchindeko5477 5 หลายเดือนก่อน

    Yes Liquid Clustering is a good starting point and moving thing in the right direction in terms of user/developer experience. But Liquid Clustering might not solve all the problems, but will already help with the part of your small files concern.

  • @jeanchindeko5477
    @jeanchindeko5477 5 หลายเดือนก่อน

    Why did Apple switch to Apache Iceberg?