How do you know if you need a data lake? suppose all your data sources are dbs in the cloud, except maybe 2 or 3 files uploaded to an S3 bucket periodically. I don't see how data is sent to the operational dbs to cloud storage, instead of doing a traditional ETL to the data warehouse
Thanks! This is a complex topic in itself, but here are the short answers to your question.... Facts/Dimensions - These terms come from what's called "Dimensional Modeling" which is a strategy for creating tables in a data warehouse. They are still just database tables, but are described with these terms to help indicate their function in an overall strategy. Fact Tables - Typically represent an activity (ex. sales or comments) and include quantitative data (ex. price, quantity, etc.) along w/ foreign keys to dimensions. Dimensions - Provide qualitative context around fact tables, these are descriptions. (ex. color, type, name, etc.). The goal is to join facts to dimensions and create various types of views of the underlying business activity (fact) for reporting. When designed properly, this type of relationship makes slicing up data really straightforward. Slowly Changing Dimensions - Dimensions that have attributes that may change over time (ex. the location of an employee). This is called a slowly changing dimension. There are various strategies for handling the change (ex. overwrite it vs add a new row and attach time frames to them). Again, this topic could be an entire video in itself but ultimately it revolves around a strategy for organizing a data warehouse. I suggest looking into Kimball Data Modeling to learn more as well. Hope that helps! Here's a wiki link - en.wikipedia.org/wiki/Dimensional_modeling
@@KahanDataSolutions love how you explain in your way, pretty clear for me as always. Hope you could share more data architecturing topics in a simple way :)
So if I use a data lake and a data warehouse this means that I necessarily am using an ELT? Since I'm getting the data, loading it into the lake, then structuring it better on the warehouse
Want to build a reliable, modern data architecture without the mess?
Here’s a free checklist to help you → bit.ly/kds-checklist
Your channel is now my new bible when it comes to Data Engineering
Great video! Perfect for anyone looking to understand some of the key first steps in setting up a solid data architecture.
Thank you!
You rock bro. Real clean and concise.
Much appreciated! Thanks for watching
Great. Straightforward and simple.
I loved those animation parts, nice video! 😎😎
Thanks! Glad you enjoyed it
How do you know if you need a data lake? suppose all your data sources are dbs in the cloud, except maybe 2 or 3 files uploaded to an S3 bucket periodically. I don't see how data is sent to the operational dbs to cloud storage, instead of doing a traditional ETL to the data warehouse
Very easy to understand!
Could you explain more about fact/dimensions and slowly changed pls?
Thanks! This is a complex topic in itself, but here are the short answers to your question....
Facts/Dimensions - These terms come from what's called "Dimensional Modeling" which is a strategy for creating tables in a data warehouse. They are still just database tables, but are described with these terms to help indicate their function in an overall strategy.
Fact Tables - Typically represent an activity (ex. sales or comments) and include quantitative data (ex. price, quantity, etc.) along w/ foreign keys to dimensions.
Dimensions - Provide qualitative context around fact tables, these are descriptions. (ex. color, type, name, etc.). The goal is to join facts to dimensions and create various types of views of the underlying business activity (fact) for reporting. When designed properly, this type of relationship makes slicing up data really straightforward.
Slowly Changing Dimensions - Dimensions that have attributes that may change over time (ex. the location of an employee). This is called a slowly changing dimension. There are various strategies for handling the change (ex. overwrite it vs add a new row and attach time frames to them).
Again, this topic could be an entire video in itself but ultimately it revolves around a strategy for organizing a data warehouse. I suggest looking into Kimball Data Modeling to learn more as well. Hope that helps!
Here's a wiki link - en.wikipedia.org/wiki/Dimensional_modeling
@@KahanDataSolutions love how you explain in your way, pretty clear for me as always.
Hope you could share more data architecturing topics in a simple way :)
So if I use a data lake and a data warehouse this means that I necessarily am using an ELT? Since I'm getting the data, loading it into the lake, then structuring it better on the warehouse
When or what situation you would need a data lake? Wouldnt tranforming the various data directly into the data wharehouse be more efficient?
Great videos!!
Glad you like them!
Thank you so much. love your content!
Much appreciated! Thanks for watching
Will be appreciated to do a more technical difference vidéo please.
Thanks a lot.