Sir, I just want to say thank you so much, I've gone through many videos but was still confused, u made this crystal clear with all your conceptual approach.
Thank you for kind words. I'm so glad my videos are helping you. That's why I do them. I know this technology is not easy to learn so kudos to you for sticking with it.
do you have an example in any of your videos connecting to an s3 bucket specifying an endpoint within databricks? basically how to connect to an s3 bucket from a service other than aws? Thanks
@@BryanCafferky yeah ha, i did find a solution eventually, i think somewhere from stack overflow, searched around several places so i don't have the exact source "sc
Can we implement data lakehouse with open source tools like spark, presto & hive metastore ? is there any alternative for unity catalog in open source eco system
Lakehouse is just Delta Lake, i.e., delta tables which are available in open source Spark so yes. Unity Catalog is really just a catalog of catalogs so you could build your own central catalog by extracting the meta data from local Hive metastores. I believe Spark tends to work one cluster at a time unlike Databricks which spins any number of clusters up as needed so not sure if UC could be implemented on open source Spark but perhaps?
It's meant for data warehousing, i.e., warehouse = lake + house, so warehouse on a data lake. OLTP has stringent requirements like high data transactions concurrency, referential integrity, etc. Delta logging is done at a file level whereas SQL databases log at a row level. See my video on Delta logs to get an understanding of what I mean.
It is amazing how concisely you put so much information in one video! Great!
Sir, I just want to say thank you so much, I've gone through many videos but was still confused, u made this crystal clear with all your conceptual approach.
Thank you for kind words. I'm so glad my videos are helping you. That's why I do them. I know this technology is not easy to learn so kudos to you for sticking with it.
I really enjoyed the perspective you brought into the evolution. Great work. Please keep bringing in these great videos. Thank you very much.
Thank You! and you're welcome.
Beautiful explanation! Loved it
Underrated channel, really quality information.
Your videos are really helping me improve the core knowledge on Data Engineering concepts. Thankyou!
Great to hear! You're welcome.
I'd love to see a non-bias comparison between delta lake, hudi, and iceberg.
So would I. lol. Iceberg seems to be Snowflake's version of Lakehouse. Not sure about hudi.
Looks like Amazon is promoting hudi.
wow that was very informative and amazing, thank you for your efforts
Again, perfectly explained. Thank you
Dude you are on the money!! Agree all 100%.
Best video on this topic ever!
Life saver 🫡 Thank you sir!
Thank you Bryan.
You're welcome Stu.
do you have an example in any of your videos connecting to an s3 bucket specifying an endpoint within databricks? basically how to connect to an s3 bucket from a service other than aws? Thanks
Hmmmm.... No have not tried that. Have you googled it?
@@BryanCafferky yeah ha, i did find a solution eventually, i think somewhere from stack overflow, searched around several places so i don't have the exact source
"sc
and run the function obvi
Best explanation
you are a hidden gem
Can we implement data lakehouse with open source tools like spark, presto & hive metastore ? is there any alternative for unity catalog in open source eco system
Lakehouse is just Delta Lake, i.e., delta tables which are available in open source Spark so yes. Unity Catalog is really just a catalog of catalogs so you could build your own central catalog by extracting the meta data from local Hive metastores. I believe Spark tends to work one cluster at a time unlike Databricks which spins any number of clusters up as needed so not sure if UC could be implemented on open source Spark but perhaps?
Amazing contents.. Thank you Bryan
You're Welcome! Glad it is helpful!
Amazing lecture! Thank you!
You're Welcome!
Is it mainly used for OLAP or can this be used for OLTP also ?
It's meant for data warehousing, i.e., warehouse = lake + house, so warehouse on a data lake. OLTP has stringent requirements like high data transactions concurrency, referential integrity, etc. Delta logging is done at a file level whereas SQL databases log at a row level. See my video on Delta logs to get an understanding of what I mean.
Delta Logs 1: th-cam.com/video/pCH_qNqnms0/w-d-xo.html
Delta Logs 2: th-cam.com/video/ZSTJLfZy_Hs/w-d-xo.html
Good presentation Thank!
YW!
Can we use the lakehouse to replace a transactional system ?
See my reply to your question about OLTP.
you're the best thank you.
You're welcome! Thanks for watching.
Amazing stuff, as always!
Thank you!