If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/ or join the discord here discord.gg/2yRJq7Eg3k
Different data models were defined to respond to different scenarios: - 3NF for transactional applications - Dimensional (star or snowflake schemas) or flat for reporting - Datavault, Anchor, Hook for integration Sometimes you could even decide not to model the data at all, as many practitioners are doing, which could work for specific scenarios. Now we also have hybrid tables, so models are also being challenged for those previous use cases. Perhaps Anchor with its 6NF could be an interesting approach when having hybrid tables? It sounds appealing to me! But regardless the technology challenges and innovations, when we consider the overall enterprise and overarching data needs, I would say that the most important models for any scenario are the enterprise and conceptual models (not technical models) where you define the key business elements (data domains) and relationships relevant for the business. After that, I would even suggest you can use whatever you want on every layer, as long as the physical model is aligned to the business model. I think that's why datavault with automation in place, is very popular in many industries, because it enforces you to understand the conceptual model and focus on the business rather than the technical implementation. Certainly datavault without automation is just a pain in the integration, and it certainly doesn't consider lakehouse architectures where you could physically design your lake folders around data domains and define guidelines for relationships (the devil is in the details). But sadly, as you suggest, people usually think conceptual models are easy to create or unnecessary, and that's a wrong step towards proper scalability. Even if you follow a federated approach (data mesh or equivalent) you'll have eventually to integrate data from multiple domains and only enterprise and conceptual modeling can soften the burden. This is where business and data architecture play a crucial role in defining proper data domains and linking that with data development, delivery, management, and governance. Thanks for the video!
My team certainly doesn't follow data modeling standards and it works perfectly fine for us. One of the main reasons we don't is we simply don't have enough time or people to properly put together a data model. We handle data ingestion, transformations and reporting for many many department and it would be impractical to build data models. What we do is we have processes to that build tables for different subject areas. So, its technically one table that fits a subject area and these tables get joined with other datasets etc.. i'm simplifying this, in general this is how we work. If my team covered only one subject area and had specific questions that needed to be reported on then yeah a data model might work in our case.
@SeattleDataGuy Hello, I had a question. I just finished my BS degree in Software Development I was wondering is that enough as far as the Degree aspect to get into Data engineering? When I search around for jobs I see a lot of Data Engineer positions require a Master degree. Which I do not have.
In brief, when the requirements are clear, what may go wrong while designing a data model? Like the challenges we might encounter.. could you please share those?
Here is one: perfect model with 3 Fact tables and 45 Dimensions(!), model was made almost perfect ...cool yeah !! However 25%, 33% and 45% of FKs for the 3 Fact tables were zeros, or completely missing (without enforced constrains of course) , due to “successful” migration years ago , of old DB to the OLTP DB where this star schema was reading ... modeling work was done , PMs were happy; and time for me to come , to do “reporting”... biggest joke in my career
@@cnaeuspompeus3188 ha.. if it was a successful migration, then how were those tables empty. And how did you overcome this issue ? Reloaded the tables ?
@@sarvesht7299 it's often a typical situation in many DB's - when at some point you add a new property (ex. "') to you fact table, and the old data wasn't updated (has NULL in it), and new data came with populated
Can you give a demo instead of theoritically explaining as many of us really need some help on real world example of data modelling for cracking interviews
If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/ or join the discord here discord.gg/2yRJq7Eg3k
That data modeling live will be so good!
Different data models were defined to respond to different scenarios:
- 3NF for transactional applications
- Dimensional (star or snowflake schemas) or flat for reporting
- Datavault, Anchor, Hook for integration
Sometimes you could even decide not to model the data at all, as many practitioners are doing, which could work for specific scenarios. Now we also have hybrid tables, so models are also being challenged for those previous use cases. Perhaps Anchor with its 6NF could be an interesting approach when having hybrid tables? It sounds appealing to me!
But regardless the technology challenges and innovations, when we consider the overall enterprise and overarching data needs, I would say that the most important models for any scenario are the enterprise and conceptual models (not technical models) where you define the key business elements (data domains) and relationships relevant for the business. After that, I would even suggest you can use whatever you want on every layer, as long as the physical model is aligned to the business model.
I think that's why datavault with automation in place, is very popular in many industries, because it enforces you to understand the conceptual model and focus on the business rather than the technical implementation. Certainly datavault without automation is just a pain in the integration, and it certainly doesn't consider lakehouse architectures where you could physically design your lake folders around data domains and define guidelines for relationships (the devil is in the details).
But sadly, as you suggest, people usually think conceptual models are easy to create or unnecessary, and that's a wrong step towards proper scalability. Even if you follow a federated approach (data mesh or equivalent) you'll have eventually to integrate data from multiple domains and only enterprise and conceptual modeling can soften the burden.
This is where business and data architecture play a crucial role in defining proper data domains and linking that with data development, delivery, management, and governance.
Thanks for the video!
Could you do a demonstration video? Here's what we want to model -> Here's one way to model it
this and the logic of microdecisions that goes into building the model would be gold!
My team certainly doesn't follow data modeling standards and it works perfectly fine for us. One of the main reasons we don't is we simply don't have enough time or people to properly put together a data model. We handle data ingestion, transformations and reporting for many many department and it would be impractical to build data models. What we do is we have processes to that build tables for different subject areas. So, its technically one table that fits a subject area and these tables get joined with other datasets etc.. i'm simplifying this, in general this is how we work. If my team covered only one subject area and had specific questions that needed to be reported on then yeah a data model might work in our case.
Yeah, I think a lot of teams take it on in different ways. One of which is to have looser practices, there are always tradeoffs of course
Basically gold layer about a certain domain and then dbt to create OBTs joining multiple domains
Congrats on the new home
Thank you!!!
Loved this video. Thank you
glad you liked it!
what is the best open source data modelling option
What do you think about a graph data model?
@SeattleDataGuy Hello, I had a question. I just finished my BS degree in Software Development I was wondering is that enough as far as the Degree aspect to get into Data engineering? When I search around for jobs I see a lot of Data Engineer positions require a Master degree. Which I do not have.
awesome video!
Thank you!
In brief, when the requirements are clear, what may go wrong while designing a data model? Like the challenges we might encounter.. could you please share those?
The requirements will turn out to be wrong 😂
Here is one: perfect model with 3 Fact tables and 45 Dimensions(!), model was made almost perfect ...cool yeah !!
However 25%, 33% and 45% of FKs for the 3 Fact tables were zeros, or completely missing (without enforced constrains of course) , due to “successful” migration years ago , of old DB to the OLTP DB where this star schema was reading ... modeling work was done , PMs were happy; and time for me to come , to do “reporting”... biggest joke in my career
@@cnaeuspompeus3188 ha.. if it was a successful migration, then how were those tables empty. And how did you overcome this issue ? Reloaded the tables ?
@@sarvesht7299 it's often a typical situation in many DB's - when at some point you add a new property (ex. "') to you fact table, and the old data wasn't updated (has NULL in it), and new data came with populated
The requirements are constantly changing over the time, plus new data is coming.
THANK YOU VERY USEFUL
Glad it helped
congrats on the new house
thank you!
There are some background noise.
why the camera angle always wiggles from zoom in to zoom out..making my head burst...content is always good from you 🤗can you please fix this alone
Ok, I can see if we can do that less!!
Can you give a demo instead of theoritically explaining as many of us really need some help on real world example of data modelling for cracking interviews