Agreed! Conversational tone, appropriate visuals, and well-paced. I'm in a $4k EDW graduate school class that teaches from bulleted PowerPoint slides. Doesn't compare 🙆😟
I watched this video to prep for an interview. Not only did I get the job but here I am a year later rewatching this with more confidence and still learning. Thank you, sir
The most understandable, detailed & overall introduction to Dimensional Modeling, with clear key words explanation and logically sequential arrangement on the slide content. Thanks Bryan!
This is the most informative session I have ever seen! 54 minutes of pure knowledge. If only everything I had an interest in learning was taught by Bryan!
I come from an OLTP software data architect side of the house and only had to hand off streaming/replication to an ODS. I just took a role that will require dimensional modeling. So glad I found your video. This clicks.
My current favorite channel on TH-cam being a senior data engineer. I would like to have a video created by you telling more about what cubes are, OLAP, ROLAP, etc. This kind of nomenclature is getting more and more rare and people joining data engineering in this modern data stack area should understand them well so that they can understand how we got here and also talk to people familiar with this data engineering nomenclature.
ROLAP and MOLAP are only used in the legacy SSAS Multidimensional Models. They were cool but the Tabular model does not use them b/c everything is in memory. Thanks for the suggestion. Are you using SSAS Multidimensional Models?
Hi Bryan, many thanks to provide this tutorial. The most solid foundation tutorial on data warehouse & dimensional modelling I have even seen. For those who want to build their career path as a data analyst, data architect etc. This is the best start point. I really regret that I did not see this video 2 years ago when I start off my career.
I'm grateful for the time and efforts you put in explaining one of the key concepts anyone deals with enterrpise data needs. Clear and comprehensive. Thanks a lot
These are the best 53 minutes I've spent recently. Crisp and Clear. I feel much more confident and I request more videos on Dimensional Modeling. Thank you for your effort.
Thanks. I did do a video on Slowly Changing Dimensions which is a big area for interview questions. What topics are you most interested in regarding dimensional modeling?
Good sir, I typically never like or comment TH-cam videos. This was a must. You are masterful, and could quite possibly mint some people into data engineers off of this video. God bless you. I hope to become a teacher like you one day
I wish I had watched it earlier when i was struggling to understand Datawarehousing and Dimension Modeling. Very informative video. Every second was worth spending.
I dont usually spend time watching a 53 min learning video most of the time unless the course and the orator is really worth it. Trust me sir, you kept me glued every minute. God bless you Sir !
A very comprehensive and fully understandable video during my very first viewing itself. I felt like having read a high quality book on the topic within an hour. Bryan never wastes time on unnecessary talks and very effectively concentrates on the core through and through. Thank you!!
@@BryanCafferky My statement was true. Not just a passing compliment. I got a full perspective on what is dimension modelling and how it is different from Data warehousing. Leave alone warehousing, it actually helped me in designing various reports using data from Database tables. I was able to see the common threads that run through them all.
This is an excellent primer. The best one I've seen. I'll recommend it if anyone asks for advice on a good starting point. One point however: USER STORY. In 99% of Agile Software Development, the example from that book is NOT a User Story. A user story actually describes something that needs to be done and why. e.g. As a business manager I need to be able to drill down into the types of sales act my book stores to see the types of books being purchased, how they are being purchased, how they are being paid for, down to the level of the book title, so that I can understand what is being sold and better stock my stores. What the book you mention is actually doing is closer to a Use Case. There is semantic impedance going on. But it's just data. :) Anyway, I really do like this video and think the approach of using use cases is very good. I think though that there will be some need to translate between different groups kind of like what is a location to different systems. :)
I have been working in data warehousing for the last 5 years and this video gave me the answer I have looking for so long- why is there so much normalisation in data warehouses we see today? Nobody ever gave me a satisfactory answer but you did sir. Big thanks !! This video deserves to be on the billboard.
A very useful introduction to data warehousing and the common terminologies, presented in an interesting and easy to understand manner. IT was a very quick 53 minutes - I only meant to get an idea of the content and then watch it later - but got completely absorbed.
This is a must watch video for anyone having a hard time understanding Dimensional Modeling. Wish you could do a full series on Database systems and Warehousing.
Thank you for this excellent presentation on Dimensional Modeling. I'm a student and it was so easy to follow because you made it interesting and you provided some excellent examples to support your slides.
Hi Bryan, This video is one of the best videos have watched and has every information required, to the point and well described...Hats off...Thank You. Highly recommend must watch for everyone who is working in DWH domain.
Excellent Video and loved it. Being an OLTP modeler, this gave me a very nice idea about dimension modeling. The only one thing which I wanted to bring it to your attention that, When you talked about 7W's of dimensional modeling, it only had 6 Ws and I was searching for the 7th one :)
Hi Sathya, Great observation! Actually, the 'How Many?' is one of the 7 but yeah, it is an H not a W. The book 'Agile Data Warehouse Design' presents it that way so I did to. It should be the 6 W's and 1 H of Data Warehouse Design but that is harder to say. :-) Actually, it helps me remember the How because it is different. Thanks!
I listed out some topics (after my failed interview )to gain clearcut understanding and this video answered all my questions in detail,Sir big thumbs to you ,if possible please do a video on interview questions and how to answer them (Dw,DBMS concepts).
Hi Sai, That's a great idea for a video. Do you have any specific questions in mind? BTW: Interviews always have questions to stump you. But I'd like to help with a video that helps. Thanks! Bryan.
Thank you so much!! Struggling through my data modeling/structuring course and your video was incredibly helpful in understanding dimensional modeling.
Thanks, This is very helpful. When you discussed the scenarios regarding a person getting married, it triggered a bunch of other questions I can ask for my project. I enjoyed the descriptions of what is fact vs what is dimension.
I learned so much from this one video. Thank you! Also, 23:51 "snowflake is something you may get questioned in an interview, so wake up" I feel personally attacked lol. I wasn't sleeping (your video was long but not boring at all) but I DID get asked about this in a recent interview and I totally flopped. At least now I can answer that question :)
@BryanCafferky Thank you for taking the time to put this together. This is a great foundational video for anyone getting started and presents the subject in a very relatable way.
This is a phenomenal presentation on dimensional modeling but i don’t understand the implementation of surrogate keys. I feel like I’m missing an obvious and low compute way of maintaining all the surrogate keys on your facts. No videos I’ve seen discuss this. But it seems every time a new fact record is generated you have to join every related dim on the foreign natural key and update the fact with the dim’s related surrogate key. So that you can later perform joins using the surrogate key. Am I thinking through this correctly?
Yeah. It does add complexity but you have the gist of it. Surrogate keys are particuarly important when you want SCD history since natural keys would result in duplicate keys on the dim tables. Also, they isolate changes from the backend systems to the dw. But they do add some extra work.
Excellent video. I do have a question at 15:57 . I was unable to understand how the design of the SalesFact table differs from what the OLTP table for Sales would have been. My OLTP Sales table would have been almost identical to SalesFact shown in this presentation, with the exception of a SalesLineItem.
Yeah. I had a hard time seeing the difference in the beginning too. Good question. In the OLTP table at about 23:09, notice the sample table has descriptive columns like CustomerFirstName, CustomerLastName, Product, SalesDate, Order Number, and OrderLineNumber. These are Dimension attributes. It also has Quantity and Price which are Facts or Measures. In a Star Schema, these cannot be in the same table. The Dimension attributes are stored in a separate table that has its own Primary Key. The Facts are stored in a Fact table with a foreign key to the Dimension table. The Dimension table Primary Key is called a Surrogate key and is usually an auto-generated identity column. There is no effort to reduce data redundancy for Dimensions, i.e. you could have Product Category values in the same table with Product Model values. It is not efficient to maintain the data that way which is why OLTP design would not do this. But it is fine for a Star Schema, i.e. Dimensional Modeled design. Make sense?
I love this video, the facts, and color commentary you present with it. But what is the relevance of star schema (vs wide flat) for consumption by analysis tools such as Tableau which implicitly and automatically creates a high performing dimensional model from a flat view without a human needing to do any dimensional data development? I would love to hear your thoughts on this.
Thanks. Sure. First, a Star Schema is arranged for efficient and easy data analysis for Tableau, Power BI, or any other tool. There is more to the world than Tableau. The use of surrogate keys supports inclusion of dimension history, i.e. slowly changing dimensions. Thinking things out like conformed dimensions builds in extensibility. The dimensional model creates the foundation of your data which will sustain your organization long term through many changes in the reporting tools that consume the data. The second reason to use a Star Schema is performance when used by reporting tools. You need to load the data and that process can be slow if a lot of joins are needed to get the data together. The Star Schema connects the fact to dimension tabless in one join. No need to join dimension to dimension tables. See medium.com/data-ops/why-do-i-need-a-star-schema-338c1b029430 Many organizations don't want to spend the effort to build a star schema but then run into problems and build hacks to solve them. It's pay now or pay later.
Hi Bryan, I found your video extremely helpful. I know dimensional modelling could be a bit outdated compared to newer tools out there these days but I find your videos easy to follow and would love to see more videos on this topic, i.e. olap cube. Thanks for your amazing work!
Hi Bryan, great tutorial! I've been dimensional modeling for quite some time but it's always good to review the basics again. I was wondering if you might have some thoughts on a fact at the atomic grain for a order header/detail fact where your order lines have different dimension attributes. For example, imagine a fact for vehicle maintenance where 2 or 3 line numbers might be for labor with 3 different employees, and 5 line numbers for different parts. This is all easy to rollup, but getting it to the atomic grain leaves a lot of zero keys for the dimensions that aren't applicable. Would be curious to see your thoughts on this. Thanks!
My thought is to consolidate the data (employye, parts, etc.) into a single dimension table and add a column, charge type with values like 'Emp', 'Part'. Assign a surrogate key and point the fact table to this. The line detail points to the related charges which is now in one table.
@@BryanCafferky Hi Bryan, thanks for the reply. Interesting. So if you had 1 line item with 1 employee and 3 parts, then another line item with the same employee and 2 parts, I'm guessing that you'd have to calculate ratios of hours/costs for the labor to each row that's got a part...otherwise, you'd have repeating values (i.e., double-counting) of the labor. In my original thought, it avoids that by having a row for each record type, but I get why that's a problem. Another approach is to just create a fact for each type of line item detail, but then a work order (similar to a sales order) would be broken up among multiple line item detail facts and make drill down impossible.
@@craigdubin6325 If feasible, creating the DW at the lowest grain you may ever need gives you a lot of future flexibility. Your second reply indicate each line item has both an employee(s) and part(s) so you could just aggregate for reporting. Maybe aggregate to the employee/part level for drill down or whatever the business needs.
@@BryanCafferky Yup, I always go down to the atomic grain if possible. And I think I probably made it more confusing. A work order is comprised of basically 2 different types of records, 1) labor, 2) parts. So imagine a quote or invoice from a mechanic. You might have the summary costs at the top with total labor and total parts. But there will be n number of lines with only labor charges, and n number of lines with only parts. They're never on the same line. So quantity on a labor line might be hours for the employee on that labor line, unit cost would be dollars per hour, and total cost would be hours x unit cost. On the parts lines, the quantity would be the number of the specific part number, the unit cost would be the cost of the particular part, and the total cost would be quantity x unit cost. My concern with this method was that the dim_employee_key will always be 0 (or possibly could make it -1 for N/A) on the parts line...while the dim_part_key will always be 0 (or -1 as above) for the part number on the labor lines. Hopefully that's making more sense!
Hi Bryan, Thank you so much for helping me understand dimensional modeling. I had a question regarding fact tables. Is it an acceptable practice to create a separate fact table that reports on a different grain? So say for example we have an orders fact table that consist of billions or rows. There are requests to create reports on the lowest grain possible, so in this case it would be the order_id but there other reports where the business wants to do their analysis at a higher grain, so say for example total number of order by day and country in the past 3 years. Due to the number of records, the query to preform this takes a a lot of time and eats into costs. If so, would it make sense to script the ETL to create this other fact table by utilizing the original fact table as the base table? I hope my question made sense. Thanks!
Yes. Aggregated fact tables are a way to do what you are saying. See www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/aggregate-fact-table-cube/#:~:text=Aggregate%20fact%20tables%20are%20simple,aggregate%20level%20at%20query%20time. I there is a need for the more detailed grain too, you can have that as a fact table too.
Hi Bryan, I really enjoyed the video! I have a question on surrogate keys that I'm hoping you could help with. It makes a lot of sense to me that the DW is in control of these keys for the reasons you shared but I'm worried about duplicates in the ETL. It seems like you suggest that the surrogate key is an auto-incremented integer. If that's the case, how do you ensure that here are no duplicates? I understand why a hash of other columns for a surrogate key would be a bad idea but you'd at least be able to recreate it during ET to prevent duplicate records or upsert. Are duplicate records just inevitable without more control of the source data and how would you deal with them when you find them? Thanks again and I'm excited to jump into more of your videos!
Thanks. Identity columns in SQL Server or a Sequence type in Oracle guarantees no duplicates. If the pk is an Identity the database will not allow dups. For SCD type 2, you would have multiple dim rows with different keys. You map the latest key to incoming data to preserve dim history.
Hi Bryan, it was very nice video for beginners. I liked the examples. I'm interested more in practical approach and best industry example for Snowflake schema (i.e. may be end-of-life applications or hardware etc.). Suggestion: It's good if you keep your subtitle smaller or bit below the slide as it was covering half or more of text in your slide. One of the quick and better video on DW-housing
I assume you mean the Snowflake product? Snowflaking is when dimensions relate to other dimensions and generally should be avoided. See books by Kimball for use cases and explanations See www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/
Thanks Brian for awesome presentation, can you please also cover topics like how to handle late arriving dimensions and is it relevant in Midern Data Warehouse?
I've seen hours of videos on data warehousing. These are the most valuable 56 minutes.
Well said
53 minutes*
100% agreed
Could not agree more, this is the best video on data warehousing I have watched so far!!
Agreed! Conversational tone, appropriate visuals, and well-paced. I'm in a $4k EDW graduate school class that teaches from bulleted PowerPoint slides. Doesn't compare 🙆😟
I watched this video to prep for an interview. Not only did I get the job but here I am a year later rewatching this with more confidence and still learning. Thank you, sir
That's great to hear. Glad it helped and thanks for letting me know.
This is by far the best guide to dimensional modeling I've found on the internet!
The most understandable, detailed & overall introduction to Dimensional Modeling, with clear key words explanation and logically sequential arrangement on the slide content. Thanks Bryan!
This is the most informative session I have ever seen! 54 minutes of pure knowledge. If only everything I had an interest in learning was taught by Bryan!
Thanks. What are your tech learning interests?
I come from an OLTP software data architect side of the house and only had to hand off streaming/replication to an ODS. I just took a role that will require dimensional modeling. So glad I found your video. This clicks.
Glad it is helpful.
Using for an interview prep. Great refresher. Perfect length and depth of content...
My current favorite channel on TH-cam being a senior data engineer. I would like to have a video created by you telling more about what cubes are, OLAP, ROLAP, etc. This kind of nomenclature is getting more and more rare and people joining data engineering in this modern data stack area should understand them well so that they can understand how we got here and also talk to people familiar with this data engineering nomenclature.
ROLAP and MOLAP are only used in the legacy SSAS Multidimensional Models. They were cool but the Tabular model does not use them b/c everything is in memory. Thanks for the suggestion. Are you using SSAS Multidimensional Models?
Perfect combination of experience and theoretical handle on the subject. Thank you for the time it took to record it.
You're welcome. Glad it was helpful.
By far the MOST lucid and practical no-nonsense explanation of key terms. Loved it!
Glad it is helpful! Thank you for the kind words.
Hi Bryan, many thanks to provide this tutorial. The most solid foundation tutorial on data warehouse & dimensional modelling I have even seen. For those who want to build their career path as a data analyst, data architect etc. This is the best start point. I really regret that I did not see this video 2 years ago when I start off my career.
You're very welcome!
I'm grateful for the time and efforts you put in explaining one of the key concepts anyone deals with enterrpise data needs. Clear and comprehensive. Thanks a lot
Amazing video. Well explained. Uploaded 3 years ago, still valid for educational and professional training for 2020. Thanks Bryan!
These are the best 53 minutes I've spent recently. Crisp and Clear. I feel much more confident and I request more videos on Dimensional Modeling. Thank you for your effort.
Thanks. I did do a video on Slowly Changing Dimensions which is a big area for interview questions. What topics are you most interested in regarding dimensional modeling?
Good sir, I typically never like or comment TH-cam videos. This was a must. You are masterful, and could quite possibly mint some people into data engineers off of this video. God bless you. I hope to become a teacher like you one day
Thank You
Many Thanks !! This is the best video I have even seen on this subject. Simple explanation of all complicated areas.
This is premium content on the topic. Simple yet effective explanation shows your understanding of the subject. Thank you Bryan!
Glad it was helpful! Thanks
I wish I had watched it earlier when i was struggling to understand Datawarehousing and Dimension Modeling. Very informative video. Every second was worth spending.
Thanks. Yeah. I struggled understanding this too for some time. Glad it helped.
The best video I have ever watched till date. Very well explained and neatly presented. This video helped a lot!!!
Glad it was helpful!
I'm about 4 or 5 years late to this party. You've still done a great job in this video compared to many other sources. Thanks, and well done!
YW and thanks for watching.
I dont usually spend time watching a 53 min learning video most of the time unless the course and the orator is really worth it. Trust me sir, you kept me glued every minute. God bless you Sir
!
Thank you!
Thank you for taking the time to put this together. I found it very educational and helpful!
You're welcome and thanks for the kind words.
Great presentation!👍 Thank you Bryan ! 👏
YW. Glad it is helpful.
A very comprehensive and fully understandable video during my very first viewing itself. I felt like having read a high quality book on the topic within an hour. Bryan never wastes time on unnecessary talks and very effectively concentrates on the core through and through. Thank you!!
Thanks
@@BryanCafferky My statement was true. Not just a passing compliment. I got a full perspective on what is dimension modelling and how it is different from Data warehousing.
Leave alone warehousing, it actually helped me in designing various reports using data from Database tables. I was able to see the common threads that run through them all.
Quality. Seeing your videos, I realise that this is the subsection of my role that I enjoy the most and need to learn more about. Great video.
This is an excellent primer. The best one I've seen. I'll recommend it if anyone asks for advice on a good starting point. One point however: USER STORY. In 99% of Agile Software Development, the example from that book is NOT a User Story. A user story actually describes something that needs to be done and why. e.g. As a business manager I need to be able to drill down into the types of sales act my book stores to see the types of books being purchased, how they are being purchased, how they are being paid for, down to the level of the book title, so that I can understand what is being sold and better stock my stores. What the book you mention is actually doing is closer to a Use Case. There is semantic impedance going on. But it's just data. :) Anyway, I really do like this video and think the approach of using use cases is very good. I think though that there will be some need to translate between different groups kind of like what is a location to different systems. :)
Thanks!
An excellent video on data warehousing, easily the best I've seen.
Lovely way of teaching.
Looking up for more of your material about data warehouse on the web.
I have been working in data warehousing for the last 5 years and this video gave me the answer I have looking for so long- why is there so much normalisation in data warehouses we see today? Nobody ever gave me a satisfactory answer but you did sir. Big thanks !! This video deserves to be on the billboard.
Great! Glad it helped!
A very useful introduction to data warehousing and the common terminologies, presented in an interesting and easy to understand manner. IT was a very quick 53 minutes - I only meant to get an idea of the content and then watch it later - but got completely absorbed.
Thanks. Glad you liked it.
This is a must watch video for anyone having a hard time understanding Dimensional Modeling. Wish you could do a full series on Database systems and Warehousing.
Thanks. What specific topics are you thinking of?
Thank you for this excellent presentation on Dimensional Modeling. I'm a student and it was so easy to follow because you made it interesting and you provided some excellent examples to support your slides.
Thanks. Glad it helped. Hope you find my other videos helpful too.
The concept has been explained thoroughly with real time examples. Thanks Bryan.
YW. Thanks for watching.
Hi Bryan, This video is one of the best videos have watched and has every information required, to the point and well described...Hats off...Thank You.
Highly recommend must watch for everyone who is working in DWH domain.
Thanks! Appreciate the kind words.
This was an excellent primer. I was alert and it really grounded me on a number of key points. Thank you so much for this contribution.
among several videos i watched on dimensional modeling, this is the one with more insight and experience sharing!
AMAZING lecture, Bryan - thanks so much! Exactly what I was looking for and an extremely well articulated 56 minutes.
Excellent Video and loved it. Being an OLTP modeler, this gave me a very nice idea about dimension modeling. The only one thing which I wanted to bring it to your attention that, When you talked about 7W's of dimensional modeling, it only had 6 Ws and I was searching for the 7th one :)
Hi Sathya, Great observation! Actually, the 'How Many?' is one of the 7 but yeah, it is an H not a W. The book 'Agile Data Warehouse Design' presents it that way so I did to. It should be the 6 W's and 1 H of Data Warehouse Design but that is harder to say. :-) Actually, it helps me remember the How because it is different. Thanks!
@@BryanCafferky
hoW?
What?
When?
Where?
Who?
hoW many?
Why?
makes 7.
Thanks for the video. Very informative!!!👍
This is so helpful. I'm modeling a data warehouse for my Org and only have experience with OLTP. This has saved so much headache
Great! Yeah. Dimensional Modeling is a very different mindset. Good luck!
Great content Bryan. Great level of detail and insights (from actual experience). Please keep it up !
I am listening you for the first time and I found out that you are a great teacher.
Thanks!
@@BryanCafferky After an year I am listening again to refresh concepts.
Very well presented! Clear and concise with real-world use cases!
This is what I subscribe internet for, it's beautiful piece of 53 mins straight
Thanks. Hope you check out my other videos like the ones on Databricks and Python too.
Thank you Bryan for this video. You did an excellent job of explaining the concepts data warehouse design.
Thanks!
Great explanation of dimensional modelling. Highly insightful.
I listed out some topics (after my failed interview )to gain clearcut understanding and this video answered all my questions in detail,Sir big thumbs to you ,if possible please do a video on interview questions and how to answer them (Dw,DBMS concepts).
Hi Sai, That's a great idea for a video. Do you have any specific questions in mind? BTW: Interviews always have questions to stump you. But I'd like to help with a video that helps.
Thanks! Bryan.
One of the best videos I’ve seen on DM!!
this is great Dimensional Modeling tutorial ever.. !! Thank you so much, sir..
The ideal video for a dba trying to reach the dw world (using mssql server and also Azure) . Thanks a lot for the video !
Glad it helped!
Thank you so much!! Struggling through my data modeling/structuring course and your video was incredibly helpful in understanding dimensional modeling.
Wow! Really glad to hear that. Thanks for letting me know.
Thanks, This is very helpful. When you discussed the scenarios regarding a person getting married, it triggered a bunch of other questions I can ask for my project. I enjoyed the descriptions of what is fact vs what is dimension.
Amazing simple and focused explanations.
Thanks Bryan
You're Welcome! Glad it was helpful.
For someone just getting started, this was amazing thank you so much!
YW!
Thank you, loved the content and how well it was structured and presented. Looking forward to your other tutorials!
You are amazing!! Thank you so much for this. Best summary you can get and which can make you talk like a pro..
Thanks.
A nice and concise presentation of dimensional modeling for data warehouse.
Great Job Bryan, great content .Thanks for sharing
Thanks!
Great tutorial sir , Thank you so much for such relevent information .
You are most welcome
It's a great content and presentation. Thank you very much for this wonderful work!
YW!
I learned so much from this one video. Thank you! Also, 23:51 "snowflake is something you may get questioned in an interview, so wake up" I feel personally attacked lol. I wasn't sleeping (your video was long but not boring at all) but I DID get asked about this in a recent interview and I totally flopped. At least now I can answer that question :)
Yes. I find I always remember answers to interview questions I missed.
@BryanCafferky Thank you for taking the time to put this together. This is a great foundational video for anyone getting started and presents the subject in a very relatable way.
Thanks.
Very well done and explained. Thank you for sharing your knowledge.
Thanks for watching.
Thank you so much for sharing your knowledge and your skills to teach them in a so clean, so comprehensible.
Thanks. So glad it is helpful!
This is a phenomenal presentation on dimensional modeling but i don’t understand the implementation of surrogate keys. I feel like I’m missing an obvious and low compute way of maintaining all the surrogate keys on your facts. No videos I’ve seen discuss this. But it seems every time a new fact record is generated you have to join every related dim on the foreign natural key and update the fact with the dim’s related surrogate key. So that you can later perform joins using the surrogate key. Am I thinking through this correctly?
Yeah. It does add complexity but you have the gist of it. Surrogate keys are particuarly important when you want SCD history since natural keys would result in duplicate keys on the dim tables. Also, they isolate changes from the backend systems to the dw. But they do add some extra work.
Thank you for confirming my understanding and great presentation!
Excellent video. I do have a question at 15:57 . I was unable to understand how the design of the SalesFact table differs from what the OLTP table for Sales would have been. My OLTP Sales table would have been almost identical to SalesFact shown in this presentation, with the exception of a SalesLineItem.
Yeah. I had a hard time seeing the difference in the beginning too. Good question. In the OLTP table at about 23:09, notice the sample table has descriptive columns like CustomerFirstName, CustomerLastName, Product, SalesDate, Order Number, and OrderLineNumber. These are Dimension attributes. It also has Quantity and Price which are Facts or Measures. In a Star Schema, these cannot be in the same table. The Dimension attributes are stored in a separate table that has its own Primary Key. The Facts are stored in a Fact table with a foreign key to the Dimension table. The Dimension table Primary Key is called a Surrogate key and is usually an auto-generated identity column. There is no effort to reduce data redundancy for Dimensions, i.e. you could have Product Category values in the same table with Product Model values. It is not efficient to maintain the data that way which is why OLTP design would not do this. But it is fine for a Star Schema, i.e. Dimensional Modeled design. Make sense?
Thank you so much for this! Very organized lecture, and I love how you included the time stamps.
I love this video, the facts, and color commentary you present with it. But what is the relevance of star schema (vs wide flat) for consumption by analysis tools such as Tableau which implicitly and automatically creates a high performing dimensional model from a flat view without a human needing to do any dimensional data development? I would love to hear your thoughts on this.
Thanks. Sure.
First, a Star Schema is arranged for efficient and easy data analysis for Tableau, Power BI, or any other tool. There is more to the world than Tableau. The use of surrogate keys supports inclusion of dimension history, i.e. slowly changing dimensions. Thinking things out like conformed dimensions builds in extensibility. The dimensional model creates the foundation of your data which will sustain your organization long term through many changes in the reporting tools that consume the data.
The second reason to use a Star Schema is performance when used by reporting tools. You need to load the data and that process can be slow if a lot of joins are needed to get the data together. The Star Schema connects the fact to dimension tabless in one join. No need to join dimension to dimension tables. See medium.com/data-ops/why-do-i-need-a-star-schema-338c1b029430
Many organizations don't want to spend the effort to build a star schema but then run into problems and build hacks to solve them. It's pay now or pay later.
Amazing video, very detail level. Thanks so much.
Hi Bryan, I found your video extremely helpful. I know dimensional modelling could be a bit outdated compared to newer tools out there these days but I find your videos easy to follow and would love to see more videos on this topic, i.e. olap cube. Thanks for your amazing work!
Thanks.
Great presenation. Very clear and to the point!
@Bryan Cafferky - thank you for creating this great video. Its really a marvel , simple, realistic approach to understand.
I'm getting into data engineering and I really enjoyed this content.
Great lecture, definitely worth the watch!
Hi Bryan, great tutorial! I've been dimensional modeling for quite some time but it's always good to review the basics again. I was wondering if you might have some thoughts on a fact at the atomic grain for a order header/detail fact where your order lines have different dimension attributes. For example, imagine a fact for vehicle maintenance where 2 or 3 line numbers might be for labor with 3 different employees, and 5 line numbers for different parts. This is all easy to rollup, but getting it to the atomic grain leaves a lot of zero keys for the dimensions that aren't applicable. Would be curious to see your thoughts on this. Thanks!
My thought is to consolidate the data (employye, parts, etc.) into a single dimension table and add a column, charge type with values like 'Emp', 'Part'. Assign a surrogate key and point the fact table to this. The line detail points to the related charges which is now in one table.
@@BryanCafferky Hi Bryan, thanks for the reply. Interesting. So if you had 1 line item with 1 employee and 3 parts, then another line item with the same employee and 2 parts, I'm guessing that you'd have to calculate ratios of hours/costs for the labor to each row that's got a part...otherwise, you'd have repeating values (i.e., double-counting) of the labor. In my original thought, it avoids that by having a row for each record type, but I get why that's a problem. Another approach is to just create a fact for each type of line item detail, but then a work order (similar to a sales order) would be broken up among multiple line item detail facts and make drill down impossible.
@@craigdubin6325 If feasible, creating the DW at the lowest grain you may ever need gives you a lot of future flexibility. Your second reply indicate each line item has both an employee(s) and part(s) so you could just aggregate for reporting. Maybe aggregate to the employee/part level for drill down or whatever the business needs.
@@BryanCafferky Yup, I always go down to the atomic grain if possible. And I think I probably made it more confusing. A work order is comprised of basically 2 different types of records, 1) labor, 2) parts. So imagine a quote or invoice from a mechanic. You might have the summary costs at the top with total labor and total parts. But there will be n number of lines with only labor charges, and n number of lines with only parts. They're never on the same line. So quantity on a labor line might be hours for the employee on that labor line, unit cost would be dollars per hour, and total cost would be hours x unit cost. On the parts lines, the quantity would be the number of the specific part number, the unit cost would be the cost of the particular part, and the total cost would be quantity x unit cost. My concern with this method was that the dim_employee_key will always be 0 (or possibly could make it -1 for N/A) on the parts line...while the dim_part_key will always be 0 (or -1 as above) for the part number on the labor lines. Hopefully that's making more sense!
Excellent, specially focus on process model. Thank you
YW!
Amazing lecture. Thanks Sir. 🙏🙏🙏
Hi Bryan,
Thank you so much for helping me understand dimensional modeling. I had a question regarding fact tables. Is it an acceptable practice to create a separate fact table that reports on a different grain?
So say for example we have an orders fact table that consist of billions or rows. There are requests to create reports on the lowest grain possible, so in this case it would be the order_id but there other reports where the business wants to do their analysis at a higher grain, so say for example total number of order by day and country in the past 3 years. Due to the number of records, the query to preform this takes a a lot of time and eats into costs.
If so, would it make sense to script the ETL to create this other fact table by utilizing the original fact table as the base table?
I hope my question made sense.
Thanks!
Yes. Aggregated fact tables are a way to do what you are saying. See www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/aggregate-fact-table-cube/#:~:text=Aggregate%20fact%20tables%20are%20simple,aggregate%20level%20at%20query%20time.
I there is a need for the more detailed grain too, you can have that as a fact table too.
Excellent explanation Bryan!!
Absolutely amazing video, thanks Bryan!
YW. Glad it was helpful.
Thanks Bryan. This is a really useful overview; great to have the fruits of your extensive experience.
Thanks Michael!
Excellent job! Thank you for this wonderful video!
Thanks Bryan, great video.
OMG! thank you for making such a informative video, amazing, well done my mannnn!
YW. Thanks for watching.
This video was perfect to answer my questions! Thank you!
Glad to hear that!
Best video on DW design ever!
Thank you for this worderful session.Very clear and informative.Really enjoyed it.
Thanks!
Hi Bryan, I really enjoyed the video! I have a question on surrogate keys that I'm hoping you could help with.
It makes a lot of sense to me that the DW is in control of these keys for the reasons you shared but I'm worried about duplicates in the ETL. It seems like you suggest that the surrogate key is an auto-incremented integer. If that's the case, how do you ensure that here are no duplicates? I understand why a hash of other columns for a surrogate key would be a bad idea but you'd at least be able to recreate it during ET to prevent duplicate records or upsert. Are duplicate records just inevitable without more control of the source data and how would you deal with them when you find them?
Thanks again and I'm excited to jump into more of your videos!
Thanks. Identity columns in SQL Server or a Sequence type in Oracle guarantees no duplicates. If the pk is an Identity the database will not allow dups. For SCD type 2, you would have multiple dim rows with different keys. You map the latest key to incoming data to preserve dim history.
Thank you so much for this video.
I really appreciate your efforts to explain this in the best possible way 👍🏻
Glad to help!
You're a savior, thank you Bryan.
YW. Glad it helps.
This was very informative and also has few drawbacks, I want to know if you have done video on Data Vault 2.0 with Dimensional Modeling ???
Thanks. I have not done any videos on Data Vault yet.
Absolutely amazing explanations.
Priceless resource. Much appreciated!
This is the first time I loved blue screen on my computer. :) Very good advice.
Really enjoyable, highly informative, and easy to understand this is just best! Thank you!
Hi Bryan, it was very nice video for beginners. I liked the examples. I'm interested more in practical approach and best industry example for Snowflake schema (i.e. may be end-of-life applications or hardware etc.).
Suggestion: It's good if you keep your subtitle smaller or bit below the slide as it was covering half or more of text in your slide.
One of the quick and better video on DW-housing
I assume you mean the Snowflake product? Snowflaking is when dimensions relate to other dimensions and generally should be avoided. See books by Kimball for use cases and explanations See www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/
@@BryanCafferky Thanks. Will read the book.
Excellent video! Thanks for sharing
YW
Thanks Brian for awesome presentation, can you please also cover topics like how to handle late arriving dimensions and is it relevant in Midern Data Warehouse?
Wow. well detail and explainable. Thanks Bryan
YW!
Hey Bryan! This is amazing, thank you for the great video. Could we get a download link for the slide deck?
Excellent and very informative video. Thank you very much.