Understand Slowly Changing Dimensions

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ก.พ. 2025

ความคิดเห็น • 57

  • @sheldonwayne1390
    @sheldonwayne1390 3 ปีที่แล้ว +3

    I'm from China and it's frustrating that you can't find any video doing a good job explaining knowledges about DWH, glad I found this on TH-cam. Cheers!

  • @kopaczp
    @kopaczp 2 หลายเดือนก่อน

    Great explanation! This channel deserves for more views !

  • @sandrafield9813
    @sandrafield9813 3 ปีที่แล้ว +1

    Thanks for this description! Its better than all the other ones I found. Its more clear with extra details without being convoluted and messy.

  • @macellit
    @macellit 4 ปีที่แล้ว +5

    Excellent content and very well explained. Thank you

  • @maisamabbas6872
    @maisamabbas6872 11 หลายเดือนก่อน

    This is the best explaination for slowly changing dimensions that I've come across on TH-cam. You definitely deserve my like and subscribe. Thanks for such an amazing video. I hope you make a lot more videos on Data Engineering topics.

  • @MuhammadMustafa-rd6vg
    @MuhammadMustafa-rd6vg 2 ปีที่แล้ว

    this is a good tutorial , watched dimensional modelling & SCD tutorial, you were pretty clear , explained in detail without making it boring !

  • @houstonfirefox
    @houstonfirefox ปีที่แล้ว

    Great explanation Bryan w/real-world examples and analogies. New sub!

  • @Pasdpawn
    @Pasdpawn ปีที่แล้ว

    man I spent a fortune to get an IT degree and here i am , learning it from a youtube tutorial which is far better and easier to understand than my database lectures. thanks a lot sir for your effort.

  • @thoroth_
    @thoroth_ 6 หลายเดือนก่อน

    This is the best explanation of this concept that I’ve seen, thank you so much

  • @cluebcke
    @cluebcke 3 ปีที่แล้ว +6

    This is really awesome, thank you so much. Just to add, one use case for Type 5 (which I had no idea was called Type 5 before watching this) is that if you have a lot of data in a record that doesn't change much, and only a few fields that change a lot, partitioning them into different dimensions can substantially reduce table bloat (as you're not creating many copies of unchanged data) and improve query performance. There are tradeoffs of course and you really have to evaluate (and test empirically) for your use case.

  • @shivam6565
    @shivam6565 2 ปีที่แล้ว

    Thank you Bryan for this wonderful tutorial. Concept lucidly explained and I like your teaching style. Liked and subscribed.

  • @guillermopalaciosgaray
    @guillermopalaciosgaray 2 ปีที่แล้ว

    Thanks for your explanation! This is the best video I've found :)

  • @sauravmisra6061
    @sauravmisra6061 2 ปีที่แล้ว

    Thank you sir for explaining the code EDW concepts in both the videos - Dimensional modeling and SCD, so well and precisely. I also followed your Databricks and Spark and these are the best I have found so far on these topics. Thank you , sir and really appreciate your effort.

    • @BryanCafferky
      @BryanCafferky  2 ปีที่แล้ว

      Glad it is helpful. Thanks for your kind words.

  • @batmanbuddha
    @batmanbuddha 4 ปีที่แล้ว +1

    Awesome Explanation, even sharing to my friend too,

    • @BryanCafferky
      @BryanCafferky  4 ปีที่แล้ว +1

      Thanks. Yeah. The more that can be helped the better.

  • @bestherk4522
    @bestherk4522 3 ปีที่แล้ว +1

    Good video, Clear Explanations. Thank you.

  • @I677000
    @I677000 3 ปีที่แล้ว +1

    Awesome, :) hop to get this question asked on my interview :) now I know how to answer it:)

  • @dataofbitcoin6685
    @dataofbitcoin6685 2 ปีที่แล้ว

    Amazing video, again! The best in biz.

  • @kamilkarczmarczyk9445
    @kamilkarczmarczyk9445 หลายเดือนก่อน

    posted 4 years ago, still great content Bryan!

  • @marcosoliveira8731
    @marcosoliveira8731 4 ปีที่แล้ว

    Topics like this still so relevant.
    I hope you have more to share about DW Architecture.

  • @helovesdata8483
    @helovesdata8483 2 ปีที่แล้ว

    you've helped me to understand data warehouses more

  • @fatallny
    @fatallny 2 ปีที่แล้ว +1

    Awesome video, now everything is clear

  • @JimRohn-u8c
    @JimRohn-u8c 3 ปีที่แล้ว +2

    Thank you so much for this! I’d love to see more videos on this.
    Btw, how does this work for Snowflake Schemas? Are those schemas only used in OLTP Databases?

    • @BryanCafferky
      @BryanCafferky  3 ปีที่แล้ว +1

      Snowflake schemas are to be avoided if at all possible. I think the product Snowflake is causing confusion about this due to its name,

    • @SwapperTheFirst
      @SwapperTheFirst 3 ปีที่แล้ว +1

      @@BryanCafferky I guess these to be avoided only if (1) compute and storage are expensive (2) compute and storage are slow (3) you don't have MPP or vectorized calculations. In current environment, compute and storage cost much less than data engineer and data architect time, so data modeling approaches changed.
      What was true for Teradata or Netezza in 1999 is not true for Snowflake or Big Query in 2022.
      On the topic of SCD, you can find the video of Airflow original creator, where he asserts, that all these approaches with SCD are just thing of the past, and now we should create daily or weekly full snapshots of dimensional tables. I don't fully understand his approach (for example, Kimball Type 2 for me sounds very intuitive), but I don't fully grasp how to solve the SCD reporting problem, if you have (say) daily snapshots of all your dimensional tables.

    • @BryanCafferky
      @BryanCafferky  3 ปีที่แล้ว +1

      @@SwapperTheFirst SCD concepts I explained are based on Kimball. It's more about the functionality required rather than the technology used. Airflow has nothing to do with this subject so not show how the Airflow creator's comments apply. You should choose the best approach for your needs. Thanks

  • @Jel.Awesh.M
    @Jel.Awesh.M 2 ปีที่แล้ว

    Thank you for sharing! A very good resource for SCDs!

  • @rickstratton1714
    @rickstratton1714 4 ปีที่แล้ว +1

    Good video, clear explanations and visual examples. Have to wonder what kind of cocktail parties you attend! :)

    • @BryanCafferky
      @BryanCafferky  4 ปีที่แล้ว

      Hmmm... Come to think of it, I was not invited back. :-) Thanks

  • @a29miller
    @a29miller 3 ปีที่แล้ว

    Well done. I really appreciate the detail

  • @AC-hh2cb
    @AC-hh2cb 3 ปีที่แล้ว +1

    Great explanation, will help me at work :)

  • @usman_oz
    @usman_oz 3 ปีที่แล้ว

    Thanks for the explanation. I just wanted to check if the "Dim customer" table at around 13:20 playtime shows the correct Start & End dates. Shouldn't the "Start Date" be +1 day of the previous record's End Date ?

  • @zerovine8025
    @zerovine8025 ปีที่แล้ว

    Thanks for the easy to understand explanation, man. (btw nice mustache)

  • @javedabdool1187
    @javedabdool1187 ปีที่แล้ว +1

    great video boss. I have 2 questions though maybe a lil stupid :P if eveer i missed something:
    should we put the source key in the fact table?if not then why bring the source key then?
    do fact table need a fact id?

    • @BryanCafferky
      @BryanCafferky  ปีที่แล้ว +1

      Thanks. The source dimension key like ProductKey should be stored in the dim table and the dim surrogate key (created by the BI ETL) is the Pk to teh Dim table and FK to the Fact table. You don't need a fact table key per se but SQL Tables perform much better with a primary key(pk) so an Identity column suffices. See my video on dim modeling th-cam.com/video/lWPiSZf7-uQ/w-d-xo.html

    • @javedabdool1187
      @javedabdool1187 ปีที่แล้ว

      @@BryanCafferky cools thnx!!

  • @anirudhvyas6069
    @anirudhvyas6069 ปีที่แล้ว

    nicely presented thank you!

  • @marcosoliveira8731
    @marcosoliveira8731 4 ปีที่แล้ว +1

    THANK YOU! Real good explanation!
    btw: Nice R mug :)

  • @canyurt1
    @canyurt1 3 ปีที่แล้ว +1

    Hey Bryan - I think Type 4 resembles Data Vault initiatives. Would you not agree?

    • @BryanCafferky
      @BryanCafferky  3 ปีที่แล้ว

      I have not studied Data Vault modeling in depth but from the wiki on the subject, I would be careful equating concepts of the two methods as Data Vault modeling does not cleanse and optimize for querying per this link en.wikipedia.org/wiki/Data_vault_modeling

  • @gokukanishka
    @gokukanishka 3 ปีที่แล้ว +1

    Hey Bryan , can you please make videos on fast changing dimensions and how to handle them

    • @houstonfirefox
      @houstonfirefox ปีที่แล้ว

      Apparently Gender went from a static dimension to a fast changing dimension 🤣

  • @elinadiary9357
    @elinadiary9357 7 หลายเดือนก่อน

    Thank you sir

  • @JimRohn-u8c
    @JimRohn-u8c 3 ปีที่แล้ว +2

    Also why would a “NULL” be better than “9999-12-31” etc. ?

    • @BryanCafferky
      @BryanCafferky  3 ปีที่แล้ว +2

      a NULL means unknown whereas '9999-12-31' is a value. If you filter where date > 2022-01-01, the later would return TRUE but usually a NULL compare would not return TRUE. Forcing vlaues is an option but technically inaccurate with repercussions. Bear in mind the 9999-12-31 would appear in groupings on reports, etc. A text value of 'NULL' is just a glitch of the extract from a database and not a true NULL.

  • @LAPOVETS
    @LAPOVETS 2 ปีที่แล้ว

    thank you!

  • @nameunknown007
    @nameunknown007 3 ปีที่แล้ว +3

    Writes a book about Microsoft and sells it on Amazon like a legend 👍

  • @marcjkeppler3590
    @marcjkeppler3590 5 หลายเดือนก่อน

    How does this video have fewer than 1k likes? 😅