Header-detail vs star schema models in Tabular and Power BI

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ม.ค. 2025

ความคิดเห็น • 67

  • @Milhouse77BS
    @Milhouse77BS 3 ปีที่แล้ว +7

    Nice touch with the expresso drink at 24:10

  • @MortenHannibalsenOlsen
    @MortenHannibalsenOlsen 3 ปีที่แล้ว +10

    These deep dive videos where you actually prove (or disprove) common data modeling theory are pure gold!

  • @toniskolik5807
    @toniskolik5807 3 ปีที่แล้ว +12

    You deliver by far the best content for the community! Thank you so much!

  • @Milhouse77BS
    @Milhouse77BS 3 ปีที่แล้ว +4

    Love the "No" answer at 1:48

  • @alenalihovidova7607
    @alenalihovidova7607 3 ปีที่แล้ว +4

    That was really useful! I was using header-detail model structures extensively to save some model size, but I had no idea that it has such a huge impact on performance.
    Will rebuild the existing models. It's good that I spot this video right now. Great many thanks!

  • @Ralom2000
    @Ralom2000 ปีที่แล้ว +1

    Excellent video. Crystal clear and very good practical demonstrations

  • @carloscantu75
    @carloscantu75 3 ปีที่แล้ว +1

    Thank you Alberto, great content as always and also thank you for taking the time to answer the questions in this section I learned not only from the video but also from you replies.

  • @ajaaskelainen
    @ajaaskelainen 3 ปีที่แล้ว +2

    I've learned a lot here! I knew that Star schema is better, but never actually investigated how much faster it is. I also liked the most interesting scenarios after around 19:00

  • @mabl4367
    @mabl4367 3 ปีที่แล้ว +1

    Fantastic video!
    It adressed questions I have had since first coming in contact with Power BI and it did it in the clearest way possible!!!
    Thank you!

  • @rafalg8575
    @rafalg8575 3 ปีที่แล้ว +1

    As always perfect! I was sure that star schema is the best but now I can prove it.

  • @beginho2454
    @beginho2454 3 ปีที่แล้ว +2

    Hello Alberto, Macro, thank your books...
    you open a new page to my career and life ;)

  • @sebastiendebosscher
    @sebastiendebosscher 3 ปีที่แล้ว +1

    Fantastic video. Very clear explination! I love you actually prove everything you say 👍👍

  • @gulhermepereira249
    @gulhermepereira249 3 ปีที่แล้ว +2

    This is gold, thank you Alberto

  • @aminejerbi1634
    @aminejerbi1634 3 ปีที่แล้ว +1

    Lovely and very convincing....Thank you Alberto

  • @hannesw.8297
    @hannesw.8297 3 ปีที่แล้ว +1

    Perfect explanation, love it! Great work, thank you!

  • @nlagreste
    @nlagreste 3 ปีที่แล้ว +1

    Excelent as always! Thanks for sharing Alberto.

  • @charlesdelanoy
    @charlesdelanoy 3 ปีที่แล้ว +1

    Thanks so much for making these videos, just great content and much appreciated...

  • @jacqueskassubeck2338
    @jacqueskassubeck2338 3 ปีที่แล้ว +1

    Thank you for the information - really great content!

  • @thomaschauvin5651
    @thomaschauvin5651 3 หลายเดือนก่อน +1

    Thank you for the video, great information! What do you do if you want to slice by, let's say order number from Sales in a Power BI report? Wouldn't you want a dimension table that stores that type of information, instead of using the order number from the fact table?

  • @luanacristinalongo2094
    @luanacristinalongo2094 11 หลายเดือนก่อน +1

    Excellent explanation.

  • @nishantkumar9570
    @nishantkumar9570 3 ปีที่แล้ว +1

    Well explained. Thank you very much.

  • @bgg732
    @bgg732 2 ปีที่แล้ว +1

    One of the conveniences of having a header table is it makes DAX much simpler for things like average difference between payment due date vs actual payment date. Those are fields at the header level and require you to use table functions like summarizecolumns() in every measure where the granularity stops at the header level. Not a show stopper just something to be aware of.

    • @SQLBI
      @SQLBI  2 ปีที่แล้ว +5

      Nothing stops you from having two fact tables with two different granularities (header and detail). The issue is to create a relationship between header and detail tables.

  • @ikar2k
    @ikar2k ปีที่แล้ว +1

    Brilliant video 👌 I'll go and have a cup of double espresso in honor of you! 😌 Molte grazie!

  • @Anthony_Lecoq
    @Anthony_Lecoq 3 ปีที่แล้ว +1

    What a beautiful fact checked demonstration. Many thanks for your valuable videos Alberto.

  • @Milhouse77BS
    @Milhouse77BS 3 ปีที่แล้ว +1

    I've been wrestling with this for nearly a year, with added twist of having another level of header-detail, say each Line has many Discounts per line. And on the other end, Transactions having many Payments.

  • @alexkim7270
    @alexkim7270 2 ปีที่แล้ว +1

    This is the second time reviewing this video. This time, I have a practical problem. I would need advice.
    For e-commerce databases, it is extremely common to have header-detail where discounts could be applied on total order and item level. From here, I have 2 questions.
    1. Denormalising HD to Star would create duplicate records of "order discount amt" and it would not be easy to sum the max of discounts (I may be wrong, as I've never done this before)
    2. I also "imputed" some relevant "product dimension" attributes to the detail table to avoid that heavy calculation (in your demo, 18 mins). Obviously this approach hinders me from easily deploying basket value analysis that requires 2 product dimension tables as you explained in your insightful articles.
    I am caught in a Catch 22 situation where both solutions would be suboptimal. Hence I chose the lesser evil by using HD on very specific use cases (such as tracking order item fulfillment), while mostly relying on Star to do the heavy lifting. What's your advise on this? Would it be optimal for me to denormalise and use Star, and figure ways out to calculate sum of max of discounts? Would love to hear from you. Thanks again for sharing!

    • @SQLBI
      @SQLBI  ปีที่แล้ว

      1) Allocate the discount - the demo with MAX is an example why star schema is better for analytical purposes.
      2) Converting to a star schema has a cost, but also benefits.
      The denormalization process can be done with many tools, usually classified as ETL. If you are using Power Query, you use the most accessible tool in Power BI, but it is not the more performant when data volume grows. Depending on your scenario, consider SSIS / ADF / SQL / other ETL tools.

    • @alexkim7270
      @alexkim7270 ปีที่แล้ว

      By allocating the discount here I assume that I need to perform an INNER JOIN between the header table and details table? I am not able to allocate the order level discount amount to respective line items as this is not how the business logic works. This is the only way I can think of denormalising the warehouse model from HD to Star Schema.
      I am thankful that for the time being, the SQL scripts I wrote is still able to handle the complex operations. I totally agree with you that it will be less performant when the data volume grows. I am considering to propose ADF as well.

  • @ivankhoo93
    @ivankhoo93 3 ปีที่แล้ว +1

    Thanks for the insightful video, is the same thing applicable on the dimension side e.g in a extended snowflake configuration like the product /subcategory/ category where it would face similar issues if the dimension key links are large as well?

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +1

      Yes, you should denormalize a snowflake into a star schema whenever possible.

  • @downunderdax
    @downunderdax 2 ปีที่แล้ว +1

    Great Video!
    Before I go and fix my model.
    I have been playing with a Header Detail Model, and encountered a behavior I don't understand.
    My model has customer --> Header and Product --> Detail.
    Building a table with
    Customer[Name], Product[SKU], filtered for a single Sales Order in the Header table, I get 4 lines in the table (the 4 products for that order).
    If I drag SalesHeader[Amount] as an implicit measure (Sum of SalesHeader[Amount]) into the Table visual, I get the expected behavior. 4 rows, repeating the same Amount value in each row.
    If however, I use a measure Total Sales = SUM(SalesHeader[Amount]), the table returns a row for every single Product SKU.
    I can fix the behavior with CROSSFILTER():
    Total Sales Fixed = CALCULATE(SUM(SalesHeader[Amount]), CROSSFILTER(SalesHeader[OrderID], SalesDetail[OrderID],BOTH))
    BUT, what I don't understand is, why does the implicit measure deliver 4 rows naturally, but the explicit measure requires CROSSFILTER() to behave the same way?
    Do you have a video explaining this?

    • @SQLBI
      @SQLBI  2 ปีที่แล้ว +1

      When you use the implicit measure, Power BI creates an additional measure COUNTROWS(Sales) - just in the query and not shown in the results - just to reduce the rows in the visualization. The presence of an explicit measure suggests to Power BI that there is already a business logic to reduce the rows to the relevant ones and there are no additional hidden measures in the query. You can analyze the different queries generated by using Performance Analyzer and running them in DAX Studio (search for related videos about them - there are many - we don't have videos about this specific behavior).

  • @ofbirkinsandmen6025
    @ofbirkinsandmen6025 3 ปีที่แล้ว

    what if there is no back end to create the fact table in the star schema? Can I just do a left outer join and use that table?

  • @solangefranco3444
    @solangefranco3444 ปีที่แล้ว +1

    Hi Alberto! I always learn a lot from your videos and the explanations.
    But currently I have been facing a huge challenge, and it has been hard to find a better way or design in Power BI. I am working with Data set, and The goal is = creating a full Data set to enable Final users to create their own reports. The Data set must contain header and details data from each step of procuremente, starting from Budget, Purchase request and lines, then the related Request for Quotation, and other specific details. And at the same time the goal is that Data model must to be easy to use for creating reports by inexpert pbi final users. Do you have a video or advice to face a model like that? Thanks

    • @SQLBI
      @SQLBI  ปีที่แล้ว +1

      Start with the Introduction to Data Modeling, and then move forward with the full Data Modeling class - that is the first step to create a model that is easy to use!
      www.sqlbi.com/p/introduction-to-data-modeling-for-power-bi-video-course/
      www.sqlbi.com/p/data-modeling-for-power-bi-video-course/

  • @marilenadonofrio9787
    @marilenadonofrio9787 3 ปีที่แล้ว +1

    Kudos, well done!

  • @andresgangas9532
    @andresgangas9532 3 ปีที่แล้ว

    Really clear explanation. But could you please explain or give some tips on what happen with the dimensions tables, and if it is necessary to "normalize" its. For instance, the product table could have a hierarchy schema like groups of materials --> lines of materials --> material. It is necessary to create each one of these levels in different tables? in only one table denormalized? or depends on what we are going to calculate/filter?
    Thanks a lot for all your videos

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว

      In the star schema you should denormalize all the attributes of a business entity (like products) into a single table. Hierarchies are formed by different columns of that dimension. If you normalize the product table in multiple tables, you have a snowflake schema, which works in Power BI but it's not a best practice (you don't get the best performance and you might have issues in certain DAX expressions).

  • @shekharkumardas
    @shekharkumardas 3 ปีที่แล้ว +1

    Excellent

  • @Linger92
    @Linger92 3 ปีที่แล้ว

    It would be interesting to see the performance in the header detail version with SUMX ( DISTINCT ( SalesHeader[CustomerKey] ), 1 ).
    I had an issue with DISTINCTCOUNT in a bi-directional relationship scenario only recently and that solved it for me in this case (because I read it in one of your articles of course).

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว

      It depends on the number of customers involved. Up to a few thousands is manageable, if it goes to millions it's going to be slower. It also depends on how you use that measure in the report - a matrix repeats the same calculation several times, when you do that for a small number the SUMX could be an advantage, but as usual... it depends!

  • @craigdubin6325
    @craigdubin6325 2 ปีที่แล้ว

    Toni, great video! All my facts in our business are designed this way. I was wondering what your advice would be in a sales star schema where, the line items aren't all on the same grain. so instead of solely product key in the line items, you have some lines with products and other lines with labor? I could do two different sales tables (fact_sale_products and fact_sales_labor). But the problem with this is that one sales header has both products and labor as part of the sale.

    • @ikar2k
      @ikar2k ปีที่แล้ว

      Sorry, but our sensei's name is Alberto. Toni is a brother in arms 🙏👍

  • @kamilkalecinski4115
    @kamilkalecinski4115 2 ปีที่แล้ว

    This is a struggle for my model too: heavy relationships at the lowest fact granularity. However, we have a business case i.e. Our users want to use attributes from Sales to slice Orders and other way round. So, the only solution we have introduced so far is to always split fact tables i.e. split Sales into "Sales" and "Sales Attributes" so that "Sales Attributes" become a dimension table that can slice both "Sales" and "Orders". And then we have "Orders Attributes" that can slice back "Sales". Is there any better solution for this to avoid these heavy relationships, but still follow basic star schema basics?

  • @mshparber
    @mshparber 3 ปีที่แล้ว

    Many times I need to calculate columns and measures at the Header level, for example: AVERAGEX(SalesHeader, ‘SalesHeader’[OrderAmount])
    Are you saying it is better to do this instead:
    AVERAGEX(VALUES(Sales[SalesOrder]),CALCULATE(SUM(Sales[LineAmount])))
    ?

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +1

      The first approach is faster at query time because it precalculates the order amount. However, if you have 100K orders, you might not notice the performance difference. If you have tens of millions of orders, you have to evaluate whether the additional cost for a second fact table makes sense for the performance advantage. However, also in that case you should create two fact tables related to shared dimensions instead of creating a relationship between the two fact tables.

  • @porraloka31
    @porraloka31 3 ปีที่แล้ว

    Alberto, when we need information about how many New customers we have coming from a table with The date that that client was registred and connect with a table with transactions with a different set of dates (date of transaction) are we obliged to have two calendars? Because everytime we need to filter The date to see New customers we might filter, wrongly, The transaction table as well, right?

  • @smartdata678
    @smartdata678 3 ปีที่แล้ว +3

    hahahahahaha, it is not, no, not no, no no no, nooooooo.
    I love your videos!

  • @IN2FPA
    @IN2FPA 3 ปีที่แล้ว

    Given that distinctcount of one column on a large table takes longer than distinctcount on a smaller column, would it make sense to create a dimension that contains the order header excluding the customer + date columns?
    In this case you would still have one fact table but now joined to product, customer, date and order header in a star schema.
    PS: Wondering why the distinccount of a large table on a single column is slower - does tabular not store only the distinct values for each column ... ?

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +1

      The compression is by column, so having a separate table wouldn't help. It's not that DISTINCTCOUNT is slow, it's that the other aggregations are super-fast. DISTINCTCOUNT is computationally more expensive by its nature.

  • @sebastiendebosscher
    @sebastiendebosscher 3 ปีที่แล้ว

    What to do when you have two sales tables from different ERP systems (different columns) and you want to slice by brand once, to see total sales of both ERP systems?
    I was thinking of having two sales tables with each respectively having one related item table and on top of both item tables a third (simplified) huge item table that slices both item tables.
    This would violate star schema, but how else can I solve this?

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +1

      You can append the rows from two tables into the same table, or create two independent fact tables. It depends on many details, but in none of these cases you would create a relationship between the two fact tables if you keep them separate.

  • @Narses3
    @Narses3 3 ปีที่แล้ว

    Obviously not great practice, but would performance be any better with a sales order bridge table? I guess probably not?

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +1

      Do you mean a bridge table between Header and Details? If that is the question, the answer is no.

  • @jjpp1970
    @jjpp1970 3 ปีที่แล้ว

    Alberto,
    What application did you use to make the table "Sales" by joining the tables "SalesHeader" and "SalesDetail"?
    I mean, for cases with that big amount of rows, what good choices do we have when using Power BI Desktop?
    Dax, Power Query, Any other?
    Thank you.
    Jose Padilla

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +2

      SQL is usually a good idea. We used a view in SQL Server.

    • @jjpp1970
      @jjpp1970 3 ปีที่แล้ว

      @@SQLBI Thank you.

  • @NicholasEnslow
    @NicholasEnslow 3 ปีที่แล้ว

    Can we get a video of taking a trash, 250+ column wide de-normalized model and normalizing it? That would be great for people who are dealing with Report output and having to turn it into gold.

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว

      Look at these courses:
      www.sqlbi.com/p/introduction-to-data-modeling-for-power-bi-video-course/
      www.sqlbi.com/p/data-modeling-for-power-bi-video-course/
      However, they are not about the transformation itself, that can be performed with many different tools (SQL, Power Query, other tools... look for ETL tools around).

  • @Phoenixspin
    @Phoenixspin 3 ปีที่แล้ว

    "I'm sorry, that's on you."

  • @zentiumx
    @zentiumx 8 หลายเดือนก่อน

    I tend to disagree a bit with you. Your discussion is great and the topic is well covered, but one thing you need to take into consideration: when using one or the other schema, stick to the schema's best practices and use cases. Using a star schema's development style in a HD schema will of course yield worse results. In H/D, sometimes it is better to denormalize. Example, as an invoice 'HD schema' do not change over time, how about denormalizing the sum of the details and store it in the header? Welcome triggers. In a write/once read/multiple, pre-calculation is godsend, as we tend to emphasize the reading time. Now redo the math!
    In addition, when you are summing while grouping two or more tables, as in your products and customers grouping example, do the sum and group directly in the header table, linking only to the dimension tables only once you have the results. This is where writing stored procedures comes handy as they are abke to optimize such queries where DAX might not be able to.
    To conclude, when you want to compare efficiency, it would be better to do so using the best practices for each model, and not using the same practice for both models.
    Finally, you also need to compare the other pros/cons, such as inserting, deleting, updating data.
    Example, you said that one wouldn't require much fetching distinct records, a statement that is not quite correct. When you need to display your orders, which is done very frequently in production environments, the star approach will have to always prove to be very slow.
    I personally use a mixture of both, by denormalizing my HD architecture where I need the 'star' performance and cut on high 'waiting lists'.

  • @Wzxxx
    @Wzxxx ปีที่แล้ว

    Well, but that if I have a shipment cost columns at order header level? I cannot just allocate this for order line as it there would be duplicated values and at the end the sum of shipment will be wrong

    • @SQLBI
      @SQLBI  ปีที่แล้ว

      In that case you keep two fact tables, one for each granularity, without a relationship between the two fact tables. This is covered in our Data Modeling training (also watch the free Introducing Data modeling: www.sqlbi.com/p/introduction-to-data-modeling-for-power-bi-video-course/ )

    • @Wzxxx
      @Wzxxx ปีที่แล้ว

      @@SQLBI Thank You for the answer, i thought about this solution but I in this case I will not be able to filter the shipment cost with products dim

    • @SQLBI
      @SQLBI  ปีที่แล้ว +1

      You must allocate the cost if you want to see by product, no?

    • @Wzxxx
      @Wzxxx ปีที่แล้ว +1

      I would have to. I don’t understand one concept. If i keep both tables and connect them with a dim table(distinct order number) as without it i has no point to work i still need to iterate by same number of order numbers. On the other hand I need to use crossfilter to get shipment cost by product going from order items to sales through order number table. Isn’t it the as connecting both fact tables directly? All examples are always very simple and far from real cases. Why i never see dim table with order number to filter data from many fact tables.