What do you think of the data vault compared to the dimensional data warehouse? Have you built both? For more Data warehouse options: th-cam.com/video/Tff34jj_V-0/w-d-xo.html
@@JimRohn-u8c I love how he mentioned at the end that data vault may not be the best option for some scenario. This shows that it's not about which is the better one, but it's about which one is more reasonable to use in specific scenarios.
I think the main problem is computational power /time to build every link tables. The fact that in the end you build a reporting layer that is in fact a dimensional model vanish all the effort. The clear advantage is having the original keys in a staging area and avoid to change the extractors. But this is all made having in mind old row and disk based databases. With in memory columnstore database (SAP HANA) the link logics is not necessary, it can be all virtual. We have customers with all dwh / BI logic that runs on the erp database with tables over 100 million rows, all with virtual modeling without persistence.
I enjoy your videos quite a bit, just a few pieces of constructive criticism: I feel like a little bit more space between sentences to let the viewer digest what is being said/shown would help a lot. I like the clean look of the visuals, but the text labels etc. help make things easier to visually process. I think the visual example you did with the tables in this one was good, more real examples like that for what these concepts actually look like in the real world, even just as examples helps drive the points hope. Looking forward to seeing your channel grow, keep up the good work!
Thanks for the feedback. I'm trying to keep these as 5 minute overview videos, which is a challenge with some of these dense topics. Still trying to work out the pacing and how much detail to cram in. I have some ideas for more in depth, slower paced example videos to go along with the overviews. Just need to find the time!
@@nullQueries for me you don't need to change anything. I mean a short video will not replace proper training, but helps a lot. Thank you for your effort.
My data engineering team have built many data vaults, but could never quite articulate to me as a business leader why? This has been very educational for me in explaining the benefits vs complexity. The pace that business is changing and the number of new data sources that become available makes a data vault seem a more obvious choice. The business still gets its Inmon Kimble model, but the foundational data structures in the Vault provide more capability to make changes to them. That's what this inferred to me. I hope I am on the right mark.
This is a wonderful video. Unfortunately for me, I read 450 pages from Dan Lindstedt's book introducing the data vault 2.0 architecture. This is, hands down, the worst book I have ever ready. It is just horrible. However, it does contain about 7 good ideas and this video captures all of them in a nicely presented coherent way. Thank you!
I went from the 3NF video to the dimensions one to this one and I feel like the only advantage I see is the dimension/Kimball one. This data vault seems just overkill. The storage will increase exponentially with all the extra keys needed and with very large storage of millions/billions of rows the performance I suppose will be greatly impacted when querying all those keys. Why is this an easier ETL solution? Am I missing something?
Hi Daniel, I think a key point of the data vault to understand is that it is exceptionally good at showing lineage. In my point of view it is only a good solution when you are dealing with many different data sources which need to be combined. A great example of a project I have helped on was combining 10 different SAP clients at a manufacturing company. Each is customized slightly, the data may be stored in the same fact table, say sales, but have different indicators or flags etc. modifying it. WIth the ETL solution you would do a one off ETL to land it in a standardized table; however, in 4 years you will need to spend weeks of development trying to figure out where the mistakes are and what transformation occurred.
I see a lot of advantages with data vault but I just can't see it as an advantage over dimensional warehouse for my business context: e-commerce platform + CRM + billing system + marketing campaign system because all of these sources are quite static. Would be great to get feedback on this.
Hi Daniel! Actually, storage cost with Data Vault is in average a lot less than with Dimensional data modeling. I would suggest two main factors moving to Data Vault: 1. enormous amount of data, 2. complexity of data and business processes. So, when building Data Vault, you'll make a data model that's change tolerant - i.e. if something changes in business, or in business processes, data model will remain, which is not the case for dimensional data modeling. Data Vault is extremely hard and expensive to create, but cheap to maintain, in dimensional data model it's easy and cheap to create, but expensive to maintain in the long run. Therefore there are hybrids - data vault + dimensional modeling where you first model data in vaults, then model dimensions and facts on top of data vault
We can use Data Vault (incremental loading) + Inmon (EDW) + Kimbal (Star schema) + Deta Lake (ELT = Bronze, Silver and Gold data movements) methodologies and use all of them at the same time. The core will be Kimball + Inmon.
This video is very good but I need to clarify the ETL Process. Supposed I have a few raw files yet to be stored. They are placed inside the data lake unmodified. From there, I insert the data as hubs, link tables and satellites tables into the raw vault, creating surrogate keys along the way. Is that right? And what does 'since objects in each layer never connect to each other' mean? 4:01
First time watching your videos and I absolutely love them! Subbed and liked. It'd be even more awesome if you could allow for an extra second to digest what you're saying. It's a lot of useful information. But even if you don't change anything, I'll still be a fan! Thank you for this!
My company is moving from a Datalake with a Raw and Curated zone, to a Datalake/Datavault with a Raw and Certified zone. We are a huge bank with 9billions$ of revenue. I feel it’s a big gamble, the current system, while having governance flaws, isn’t that bad and I wondered if all the money will be genuinely adding value. What do you think?
Really good video! Thank you! Quick question: what do you mean by "Business logic"? Do you mean that kind of logic that would be used with an MDM, to control whether new attributes about an entity should be added or ignored (eg if we have conflicting phone numbers for a customer)?
I'm using Business Logic to represent anytime some sort of business rule alters source data. Sometimes it's explicit (ie: phone numbers are always stored in a certain format). And sometimes it's just tribal knowledge (ie: Some sources call it a customerID and some a consumerID. But everyone in the office knows it's referred to as ClientID. So we'll convert to that naming so it's easy for users to consume. ) A good MDM should handle this but it depends on how it's implemented, what it catches, and where in the architecture it makes the changes. But for the DV this would happen in the business vault layer, as the raw vault should reflect the sources.
Hi. Thank you for this overview video. Do you have also a webpage where you can be contacted? Would be happy to get your thoughts about DWH automation (we are the creators of the Datavault Builder tool). Regards
Great video, I stumbled upon this channel by accident today, after reading an opinion piece by Bill Inmon on why Snowflake isnt a data warehouse (on LInkedIn) after watching your video on Inmon vs KImbal i immediately subscribed, great content, what software do you use for the video animations? anyways you've got a new subscriber from Papua New Guinea, keep it up, happy Easter.
What do you think of the data vault compared to the dimensional data warehouse? Have you built both?
For more Data warehouse options: th-cam.com/video/Tff34jj_V-0/w-d-xo.html
I would love to see more videos on how to implement this. Wish there was a Udemy course on how to implement this.
@@JimRohn-u8c
I love how he mentioned at the end that data vault may not be the best option for some scenario.
This shows that it's not about which is the better one, but it's about which one is more reasonable to use in specific scenarios.
The idea of Data Vault sounds nice. But using an ETL Automation Tool like WhereScape etls can be adapted very nicely too and with less overhead
I think the main problem is computational power /time to build every link tables. The fact that in the end you build a reporting layer that is in fact a dimensional model vanish all the effort.
The clear advantage is having the original keys in a staging area and avoid to change the extractors.
But this is all made having in mind old row and disk based databases. With in memory columnstore database (SAP HANA) the link logics is not necessary, it can be all virtual. We have customers with all dwh / BI logic that runs on the erp database with tables over 100 million rows, all with virtual modeling without persistence.
I enjoy your videos quite a bit, just a few pieces of constructive criticism:
I feel like a little bit more space between sentences to let the viewer digest what is being said/shown would help a lot.
I like the clean look of the visuals, but the text labels etc. help make things easier to visually process.
I think the visual example you did with the tables in this one was good, more real examples like that for what these concepts actually look like in the real world, even just as examples helps drive the points hope.
Looking forward to seeing your channel grow, keep up the good work!
Thanks for the feedback. I'm trying to keep these as 5 minute overview videos, which is a challenge with some of these dense topics. Still trying to work out the pacing and how much detail to cram in. I have some ideas for more in depth, slower paced example videos to go along with the overviews. Just need to find the time!
@@nullQueries for me you don't need to change anything. I mean a short video will not replace proper training, but helps a lot. Thank you for your effort.
One of the best video's out there regarding Data Vault modelling
My data engineering team have built many data vaults, but could never quite articulate to me as a business leader why? This has been very educational for me in explaining the benefits vs complexity. The pace that business is changing and the number of new data sources that become available makes a data vault seem a more obvious choice. The business still gets its Inmon Kimble model, but the foundational data structures in the Vault provide more capability to make changes to them. That's what this inferred to me. I hope I am on the right mark.
This is a wonderful video. Unfortunately for me, I read 450 pages from Dan Lindstedt's book introducing the data vault 2.0 architecture. This is, hands down, the worst book I have ever ready. It is just horrible. However, it does contain about 7 good ideas and this video captures all of them in a nicely presented coherent way. Thank you!
I went from the 3NF video to the dimensions one to this one and I feel like the only advantage I see is the dimension/Kimball one. This data vault seems just overkill. The storage will increase exponentially with all the extra keys needed and with very large storage of millions/billions of rows the performance I suppose will be greatly impacted when querying all those keys. Why is this an easier ETL solution? Am I missing something?
Hi Daniel, I think a key point of the data vault to understand is that it is exceptionally good at showing lineage. In my point of view it is only a good solution when you are dealing with many different data sources which need to be combined. A great example of a project I have helped on was combining 10 different SAP clients at a manufacturing company. Each is customized slightly, the data may be stored in the same fact table, say sales, but have different indicators or flags etc. modifying it. WIth the ETL solution you would do a one off ETL to land it in a standardized table; however, in 4 years you will need to spend weeks of development trying to figure out where the mistakes are and what transformation occurred.
I see a lot of advantages with data vault but I just can't see it as an advantage over dimensional warehouse for my business context: e-commerce platform + CRM + billing system + marketing campaign system because all of these sources are quite static. Would be great to get feedback on this.
Hi Daniel! Actually, storage cost with Data Vault is in average a lot less than with Dimensional data modeling. I would suggest two main factors moving to Data Vault: 1. enormous amount of data, 2. complexity of data and business processes. So, when building Data Vault, you'll make a data model that's change tolerant - i.e. if something changes in business, or in business processes, data model will remain, which is not the case for dimensional data modeling. Data Vault is extremely hard and expensive to create, but cheap to maintain, in dimensional data model it's easy and cheap to create, but expensive to maintain in the long run. Therefore there are hybrids - data vault + dimensional modeling where you first model data in vaults, then model dimensions and facts on top of data vault
thanks for make this kind of videos, i really appreciate it, they are so useful for people like me who are learning about it
Can I just say "Dimensional Datamart" is my favorite cyberpunk term
We can use Data Vault (incremental loading) + Inmon (EDW) + Kimbal (Star schema) + Deta Lake (ELT = Bronze, Silver and Gold data movements) methodologies and use all of them at the same time. The core will be Kimball + Inmon.
Would it happen that you guys have a transcript of this video? maybe posted in a blog post?
This video is very good but I need to clarify the ETL Process. Supposed I have a few raw files yet to be stored. They are placed inside the data lake unmodified. From there, I insert the data as hubs, link tables and satellites tables into the raw vault, creating surrogate keys along the way. Is that right? And what does 'since objects in each layer never connect to each other' mean? 4:01
it's mean that no any hard foreign keys, but logically they of course connected
First time watching your videos and I absolutely love them! Subbed and liked. It'd be even more awesome if you could allow for an extra second to digest what you're saying. It's a lot of useful information. But even if you don't change anything, I'll still be a fan! Thank you for this!
Really, the best explanation.
Very well explained with good examples, this is very helpful!
Data vault is the curated layer in a data lake. And they have a very specific design... But really its an inmon/operational design
My company is moving from a Datalake with a Raw and Curated zone, to a Datalake/Datavault with a Raw and Certified zone.
We are a huge bank with 9billions$ of revenue. I feel it’s a big gamble, the current system, while having governance flaws, isn’t that bad and I wondered if all the money will be genuinely adding value.
What do you think?
Well explained in pictorial format. But there should be some use case or an example so the newbies can understand more easily.
very well explained...tks a lot
Glad it was helpful!
Great videos .. very informative ...can you do a quick comparison between Redshift & Vertica? an overall evaluation?
Really good video! Thank you!
Quick question: what do you mean by "Business logic"? Do you mean that kind of logic that would be used with an MDM, to control whether new attributes about an entity should be added or ignored (eg if we have conflicting phone numbers for a customer)?
I'm using Business Logic to represent anytime some sort of business rule alters source data. Sometimes it's explicit (ie: phone numbers are always stored in a certain format). And sometimes it's just tribal knowledge (ie: Some sources call it a customerID and some a consumerID. But everyone in the office knows it's referred to as ClientID. So we'll convert to that naming so it's easy for users to consume. ) A good MDM should handle this but it depends on how it's implemented, what it catches, and where in the architecture it makes the changes. But for the DV this would happen in the business vault layer, as the raw vault should reflect the sources.
Thank you!
Hi. Thank you for this overview video. Do you have also a webpage where you can be contacted? Would be happy to get your thoughts about DWH automation (we are the creators of the Datavault Builder tool). Regards
Great content..subscribed!
Nice video, where can we learn about the other data warehouse format?
a little bit complex
All those fancy pictures make zero sense without real live examples, just think about it
Great video, I stumbled upon this channel by accident today, after reading an opinion piece by Bill Inmon on why Snowflake isnt a data warehouse (on LInkedIn) after watching your video on Inmon vs KImbal i immediately subscribed, great content, what software do you use for the video animations? anyways you've got a new subscriber from Papua New Guinea, keep it up, happy Easter.
Thanks for the compliment! I use the adobe suite for all illustration and animations.