Is the Kimball Data Warehouse Still Relevant in 2023?

แชร์
ฝัง
  • เผยแพร่เมื่อ 3 พ.ย. 2024

ความคิดเห็น • 11

  • @TheR0yalBeast
    @TheR0yalBeast ปีที่แล้ว +4

    If you only need to query your data (i.e. very specific requests or one-off questions) then you don't need it. If you need to provide BI solutions to Business Users then its still completely relevant and will remain relevant indefinitely, until natural language is able to generate BI seamlessly.

  • @kalheer
    @kalheer ปีที่แล้ว +4

    I think there will be an element of modelling needed in the Gold layer like you said, but with whats available out there to plug-in to data lakes and give someone what data they need, in a fast way and its up to date and self serve.. times might be changing

    • @PGVladimirovich
      @PGVladimirovich ปีที่แล้ว

      Data lakes without an intermediate DW layer to serve a large data model that can be sliced/diced by end users in the BI layer with low latency is a huge pain in the ass in practice. The lake is nice for accessibility, DE, DS, and ad-hoc queries but I don't see DW going anywhere as part of the full analytics stack (specifically dealing with larger datasets and models in BI) for some time.

    • @dlb8685
      @dlb8685 ปีที่แล้ว

      I don't think most companies will stand for an older access model where only a few BI developers have any access to raw data. There will be some need to give users (at least power users or data scientists) access to a data lake or upstream raw data. A data team with 5% or less of a company's employees can simply never keep up with what the rest of the company is doing, otherwise.
      However, the data team should always be looking for the most critical business processes and reports and modeling those out with a consistent, proven standard like Kimball's dimensional modeling. Good monitoring and communication is important to do that effectively. It's not a perfect solution but the old school alternative of completely walling off your raw data/data lake/whatever and only giving access to modeled data, I've found to be politically/organizationally impossible to enforce in a 2020s environment.

  • @dlb8685
    @dlb8685 ปีที่แล้ว +1

    I don't believe there is a huge conflict between a data lake and a Kimball dimensional model. The data lake is an upstream data storage method, you can use one to store all of your raw data and still build a dimensional model downstream for your most important business processes. As mentioned towards the end of this video.

  • @JoeG2324
    @JoeG2324 ปีที่แล้ว +1

    nope, not needed. we use flat tables by subject area. We don't stuff everything into one table; instead, we create tables that are not too long, but cover lots of dimensions so you dont have to do too many joins.

    • @carltonseymour869
      @carltonseymour869 ปีที่แล้ว

      Interesting Joe. How long are these tables and how broad are they. How many rows and how many columns approximately. What BI tool are you using to gain insights from these tables.

    • @JoeG2324
      @JoeG2324 ปีที่แล้ว

      @@carltonseymour869 anywhere between 15 to 20 columns and 12 million to 50 million records. we use a combination of SSRS, tableua and power BI

    • @IcyClench
      @IcyClench 7 หลายเดือนก่อน

      With the star schema, arbitrary n-way joins can be done in a single pass like a 1-way join. The query optimizer takes the cartesian product of your dimension PKs and passes through the index of the fact table's composite PK a single time.

  • @Milhouse77BS
    @Milhouse77BS ปีที่แล้ว

    Kimball Abides

  • @SamDawson-gv4rm
    @SamDawson-gv4rm ปีที่แล้ว

    interesting but not clear