Beyond Lambda: Introducing Delta Architecture

Mixed Attention & LLM Context | Data Brew | Episode 35

2024 12 18 Chicago DataFusion Meetup 03 Xiangpeng Hao

【พากย์ไทย】สาวใช้ในวังจะถูกประหารชีวิต แต่เธอมีฐานะที่ไม่ธรรมดา คือพระราชบุตรีแท้ๆ ของพระราชา!

🔴LIVE โหนกระแส ศึกชิงมรดก 500 ล้าน ทายาทฟ้องเด็กรับใช้ปลอมลายเซ็น

PiXXiE - Pick A Card | OFFICIAL M/V

Simplify and Scale Data Engineering Pipelines with Delta Lake

Databricks

มุมมอง 27 902

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 ธ.ค. 2024

ความคิดเห็น • 12

@jasonabhi 4 ปีที่แล้ว ⁺⁶
Amazing Hands On Session
@KoushikPaulliveandletlive 4 ปีที่แล้ว ⁺⁶
Wonderful Demonstration and very handy notebook.
Following are my assumptions.
1. Deltalake keeps multiple version of the data( like HBASE ) .
2. Deltalake takes care of the automicity for the user showing only the latest file if not specified otherwise.
3. Deltalake checks the schema before appending to prevent corruption of the table, this makes developers job easy, similar things can be achieved with manual effort like manually mentioning the schema instead of infering it.
4. In case of update it always overwrites the entire table or the entire partition(dataframes are immutable) .
Questions.
1. If it keeps multiple version is there a default limit for number of versions ?
2. As it keeps multiple versions so is it only for smaller tables ? for tables in terabytes wont it be a waste of space?
3. In relational DB data is tightly coupled with metadata/schema , so we can only get the data only from the table not the data files . But in hive / spark this is different. external tables are also allowed . Without having access the metadata, we can recreate the table . How it is handled in DeltaLake , because we have multiple snapshot/version of the same table , without the log/metadata will someone be able to access it? In hive/Spark multiple table with different tool ( hive, presto, spark) can be created on the same data. Can other tool share the same data with deltalake ?
@vinyasshetty4042 4 ปีที่แล้ว
For updates, it will not overwrite the entire table, but look at the files that has the data that needs to be updated and create the new copy of only those files . Such files will have the updates in them + non update records in that file.To eventually clean up the older version you will have to run a vacuum command. Currently only sparksql works for querying the delta location but I believe they are working on making presto, hive work with it.
@andyharman1771 4 ปีที่แล้ว ⁺⁶
Starts at 3:10
@Databricks 4 ปีที่แล้ว ⁺¹
Thanks Andy, I trimmed it. Video starts right at 0:00
@CoopmanGreg 2 ปีที่แล้ว
If the streaming / batch notebook you demonstrated were being run in a workflow and and lets say100k rows have streamed in successfully, but then an error occurs and the job fails. As I understand it, the 100K rows and all other changes that occurred in the workflow would be automatically rolled back. Is this correct?
@nit46hin 4 ปีที่แล้ว ⁺³
Great demo... very useful for learning delta architecture
@Databricks 4 ปีที่แล้ว ⁺¹
Thanks for the feedback Nithin! Glad you enjoyed it.
@nit46hin 4 ปีที่แล้ว ⁺²
Can you help to share the steps on how to import the notebook from the github link to databricks community edition.
@dennylee4934 4 ปีที่แล้ว ⁺⁵
Please refer to the "Importing Notebooks" section of github.com/delta-io/delta/tree/master/examples/tutorials/saiseu19#importing-notebooks for step-by-step instructions. HTH!

ต่อไป

เล่นอัตโนมัติ

Beyond Lambda: Introducing Delta Architecture

Beyond Lambda: Introducing Delta Architecture

Mixed Attention & LLM Context | Data Brew | Episode 35

Mixed Attention & LLM Context | Data Brew | Episode 35

2024 12 18 Chicago DataFusion Meetup 03 Xiangpeng Hao

2024 12 18 Chicago DataFusion Meetup 03 Xiangpeng Hao

【พากย์ไทย】สาวใช้ในวังจะถูกประหารชีวิต แต่เธอมีฐานะที่ไม่ธรรมดา คือพระราชบุตรีแท้ๆ ของพระราชา!

【พากย์ไทย】สาวใช้ในวังจะถูกประหารชีวิต แต่เธอมีฐานะที่ไม่ธรรมดา คือพระราชบุตรีแท้ๆ ของพระราชา!

🔴LIVE โหนกระแส ศึกชิงมรดก 500 ล้าน ทายาทฟ้องเด็กรับใช้ปลอมลายเซ็น

🔴LIVE โหนกระแส ศึกชิงมรดก 500 ล้าน ทายาทฟ้องเด็กรับใช้ปลอมลายเซ็น

PiXXiE - Pick A Card | OFFICIAL M/V

PiXXiE - Pick A Card | OFFICIAL M/V

Nec Red Rockets Kawasaki vs. LP Bank Ninh Binh - Pool B | Highlights | Club World Champs 2024

Nec Red Rockets Kawasaki vs. LP Bank Ninh Binh - Pool B | Highlights | Club World Champs 2024

No data prep, no feature engineering: Predictive modeling in 2025

No data prep, no feature engineering: Predictive modeling in 2025

[Webinar] LLMs for Evaluating LLMs

[Webinar] LLMs for Evaluating LLMs

Unity Catalog Community Meetup - December 5, 2024

Unity Catalog Community Meetup - December 5, 2024

GenAI: The Shift to Data Intelligence - Customer Panel on Industry Use Cases

GenAI: The Shift to Data Intelligence - Customer Panel on Industry Use Cases

Bring Data to Life: Create Interactive Apps with Shiny in R

Bring Data to Life: Create Interactive Apps with Shiny in R

What is a Semantic Layer? TechShort 1

What is a Semantic Layer? TechShort 1

Apache Sedona Community Office Hours - December 2024

Apache Sedona Community Office Hours - December 2024

SLM Master Class Workshop: Practical Model Distillation for Efficient Language Models

SLM Master Class Workshop: Practical Model Distillation for Efficient Language Models

Deploying Databricks from Google Cloud Marketplace | Google Cloud Platform Demo Marketplace Latest

Deploying Databricks from Google Cloud Marketplace | Google Cloud Platform Demo Marketplace Latest

คุณอยากเรียนเวลาไหนทุกวันไปตลอดชีวิต? เลือกเลย!

คุณอยากเรียนเวลาไหนทุกวันไปตลอดชีวิต? เลือกเลย!

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

Nec Red Rockets Kawasaki vs. LP Bank Ninh Binh - Pool B | Highlights | Club World Champs 2024

Nec Red Rockets Kawasaki vs. LP Bank Ninh Binh - Pool B | Highlights | Club World Champs 2024

Bloxfruits player after Dragon update🐲| Doge Gaming

Bloxfruits player after Dragon update🐲| Doge Gaming

ใครขยับไม่ได้เป็น!!

ใครขยับไม่ได้เป็น!!

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

Cat mode activated 🤣

Cat mode activated 🤣

คอมเมนต์แฟนเวียดนามสุดทึ่ง หลังไทยเกือบหลับแต่กลับมาได้ พลิกนรกคว้าชัยเหนือสิงคโปร์ 4-2 แบบสุดมันส์

คอมเมนต์แฟนเวียดนามสุดทึ่ง หลังไทยเกือบหลับแต่กลับมาได้ พลิกนรกคว้าชัยเหนือสิงคโปร์ 4-2 แบบสุดมันส์