How to Create a Data Modeling Pipeline (3 Layer Approach)

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 ก.ค. 2024
  • A data warehouse acts as the main hub for most data teams, yet it often becomes a mess.
    While there are many different strategies to handle this, in this video I want to share the approach I follow.
    It's based around a simple 3-layered design to take raw source data into meaningful data marts ready for analytics.
    Enjoy!
    ►► The Starter Guide for Modern Data (Free PDF)→ bit.ly/starter-mds
    Simplify “modern” architectures + better understand common tools & components
    Timestamps:
    0:00 - Intro
    0:28 - High Level Overview
    1:04 - Staging Layer
    1:37 - Warehouse Layer
    1:54 - Marts Layer
    3:13 - Example: Staging
    6:05 - Example: Warehouse
    7:35 - Example: Marts
    8:23 - Importance of Modeling
    Title & Tags:
    [How to Create a Data Modeling Pipeline (3 Layer Approach)
    #kahandatasolutions #dataengineering #datamodeling

ความคิดเห็น • 25

  • @KahanDataSolutions
    @KahanDataSolutions  2 หลายเดือนก่อน +1

    ►► The Starter Guide for Modern Data → bit.ly/starter-mds
    Simplify “modern” architectures + better understand common tools & components

  • @runningwithstelvio
    @runningwithstelvio 2 หลายเดือนก่อน +4

    This is a three layer approach that I like. The weakness I believe is the integration and data loading performance of a star schema, which is more complex then a more normalized model like the Data Vault for example. With the flat wide DM tables in any case you ensure performance in extracting the data from the DWH, so why don't use a normalized model instead of a star schema, to ensure loading performance and agile integration of the model, and then use a star schema or flat tables in the the DMs to ensure performance in extracting the data? Thanks for your video, I like the format and the way you present these data model topics. And luckily we have still someone who strongly believe in data modelling!

    • @KahanDataSolutions
      @KahanDataSolutions  2 หลายเดือนก่อน +2

      Thanks for the comment! Star Schema/Dimensional Modeling is just one of many approaches you could take in that middle "warehouse" layer. It's personally what I'm most comfortable with but you could certainly use other techniques.

  • @minhtungo453
    @minhtungo453 2 หลายเดือนก่อน +1

    Straight to the point, with clear examples. It does help me alot. Thanks Kahan

  • @AdamSmith-lg2vn
    @AdamSmith-lg2vn 2 หลายเดือนก่อน +2

    Really really clearly explained. I like the idea of using separating out a staging view for trivial rename/cleaning vs. business logic in the warehouse layer loads. I push for a very similar architecture but I'm going to integrate that detail going forward.

  • @ahmedsomir
    @ahmedsomir 2 หลายเดือนก่อน

    Simple as great thanks for this, but from my point of view, the staging layer is the raw data tables (incremental one-one from sources) and the enforced reading mechanize with VIEW is perfect with a retention for staging after 2-3 days.
    what do u think?

  • @vishal_uk
    @vishal_uk 2 หลายเดือนก่อน +1

    Thanks a lot! I watch all your vids and I've subscribed. Could you share the layers for DEV and CI, please?

    • @KahanDataSolutions
      @KahanDataSolutions  2 หลายเดือนก่อน +1

      Check out this video - th-cam.com/video/a7H9Rkz2kqo/w-d-xo.htmlsi=e-qIKGRHuo34dvwu

    • @vishal_uk
      @vishal_uk 2 หลายเดือนก่อน

      @@KahanDataSolutions Thanks a bunch!

  • @alxsbn
    @alxsbn 2 หลายเดือนก่อน

    Hi @KahanDataSolution. How did you merge this organization with dbt folder structure recommandations especially for the intermediate and marts layers/folders. Do you advise to create a new folder named warehouse? Did you advise not letting end users having access to facts and dimensions? you're speaking about OBT, is this the way to go?

    • @KahanDataSolutions
      @KahanDataSolutions  2 หลายเดือนก่อน +1

      I tend to break out warehouse as a separate directory. In dbt it'll look something like this:
      models/
      marts
      staging
      warehouse
      For intermediate, these are more like "helper" models. So you could organize them in a separate directory right under models/ or nest it under the particular layer it's supporting. For example:
      models/
      marts
      staging
      warehouse
      intermediate
      As mentioned in the video, I prefer to keep end-user access limited to Marts but some organizations allow direct fact/dimension access. It really depends on the user base, their understanding & familiarity with those concepts.
      Overall, there is no single "right" approach but will be a mixture based on your particular company. But this 3 layer design has worked well for me and others as a great starting point.
      For more on the different "warehouse" models, check out this video - th-cam.com/video/IdCmMkQLvGA/w-d-xo.htmlsi=ZyfIVFSII7qznVBI
      Hope this helps!

  • @reiko551
    @reiko551 2 หลายเดือนก่อน

    This is a idempotent pipeline. Can you do incremental one via dbt?

  • @dmeeuwsen4105
    @dmeeuwsen4105 2 หลายเดือนก่อน

    So where in this example would you implement the dbt snapshotting?
    At the client im working for atm we have the raw source data coming in, in the landing schema. Then in the staging schema the history is build (dbt snapshot, with a select * from landing.table).
    The snapshotted tables are then used in the models, where (among other things ofc) we rename columns and clean some data.

    • @KahanDataSolutions
      @KahanDataSolutions  2 หลายเดือนก่อน

      I'd probably look to do something like this:
      source > snapshot > staging > warehouse > mart

  • @sanjidnet
    @sanjidnet 2 หลายเดือนก่อน +3

    What’s the benefit of setting up staging as view instead of a regular table?

    • @KahanDataSolutions
      @KahanDataSolutions  2 หลายเดือนก่อน +3

      To avoid duplicate storage since typically there are no other joins involved. Think of it like a glorified select statement. However, if performance becomes an issue you may look to deploy as a table.

    • @andynelson2340
      @andynelson2340 2 หลายเดือนก่อน

      Views will always grab the latest data upstream, they'll always be fresh relative to the source data.

  • @StuartWeir
    @StuartWeir 2 หลายเดือนก่อน +1

    Where do you feel Entity Resolution fits into this? For example, I have multiple data sources pertaining to the same data type; does ER fit between staging and the DW in this 3 Layered Approach?

    • @KahanDataSolutions
      @KahanDataSolutions  2 หลายเดือนก่อน

      Sounds like that would be handled in the Warehouse layer. Each table in the different sources would have its own Staging view, then combined in the Warehouse to create the single entity.

  • @chiragpramod308
    @chiragpramod308 2 หลายเดือนก่อน +1

    Thanks alot for Practical pov !!
    Do you still suggest reading Kimball or are there better and more modern & Practical books on modelling
    Love your content man

    • @KahanDataSolutions
      @KahanDataSolutions  2 หลายเดือนก่อน +1

      I still like to follow a star schema for the core "warehouse" model for clarity & organizational purposes. Then create wider user-facing "marts" on top of those to leverage the capabilities of more modern cloud DBs.

    • @chiragpramod308
      @chiragpramod308 2 หลายเดือนก่อน

      ​@@KahanDataSolutionsThanks for your insight.
      I heard about Lakehouse modelling being a Hype but It only happens on Larger companies. So is it important to get thoroughly in touch with the basics before heading towards such Complex Topics?

    • @KahanDataSolutions
      @KahanDataSolutions  2 หลายเดือนก่อน

      @@chiragpramod308 Learning the basics before moving to complex topics is always a good strategy.