Building a robust data pipeline with dbt, Airflow, and Great Expectations

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ม.ค. 2025

ความคิดเห็น • 14

  • @mahmoudnabil3667
    @mahmoudnabil3667 2 ปีที่แล้ว +1

    appreciate your simple explanations , thanks!

  • @kshitijpathak8646
    @kshitijpathak8646 2 ปีที่แล้ว +6

    Great explanations! One quick question, in dbt we can test/profile our data during 'T' (Transformation) and also run test for source data. How is 'Great Expectations' different than the features already built in dbt?

    • @personalbranddata
      @personalbranddata ปีที่แล้ว +8

      Frankly it's unnecessary to incorporate Great Expectations into the stack. You could indeed just replace it with dbt tests and be done.
      There are just a few things to consider when choosing between these two:
      - Great Expectation tests are written in Python while dbt tests are written (mostly) in SQL syntax. You might find a particular test to be easier to implement one way or another. Personally I think in 99% of cases dbt test cover everything you need.
      - Great Expectations provides more kinds of tests "out of the box". But to counter this point, there is the dbt-expectations package that provides out of the box tests as well.
      - [This one is huge!] Great Expectations requires you to load data from the database into your Python processing environment while dbt tests work within the database. So Great Expectations tests tend to be more expensive and also take longer to complete than equivalent dbt tests.
      - Great Expectations produces these data quality reports. Perhaps they are important to your organisation.
      I'm sure there are arguments for using Great Expectations as well. You certainly could use both as the talk suggests. I just couldn't think of a good use case. The talk didn't convince me but in its defense it's over been over 2 years. Perhaps dbt tests and the dbt-expectations package had less features at that time. Or perhaps they had a legacy codebase with Great Expectations test before they adopted dbt and didn't want to re-implement these tests.
      In any case my recommendation would be to ignore this advice and just go with dbt tests in most cases.

    • @kshitijpathak8646
      @kshitijpathak8646 ปีที่แล้ว

      @@personalbranddata Thank you!

  • @brads2041
    @brads2041 ปีที่แล้ว +1

    Interesting challenge, at times, to test raw source data because you may have to apply some form of transform to get it into a testable state.

  • @johnfordice5763
    @johnfordice5763 3 ปีที่แล้ว +3

    Would love to see a repo for this!

    • @dbt-labs
      @dbt-labs  3 ปีที่แล้ว +13

      A bit late, but we got you: github.com/spbail/dag-stack

  • @FirstNameLastName-fv4eu
    @FirstNameLastName-fv4eu 5 หลายเดือนก่อน

    This is like making a small problem Very Big then creating a Giant Open Source to solve it!

  • @willianrocha8615
    @willianrocha8615 3 ปีที่แล้ว +1

    Code? Repo ?

  • @smileysuvarna5889
    @smileysuvarna5889 2 ปีที่แล้ว

    How to create dbt pipeline?

  • @naveenn3143
    @naveenn3143 3 ปีที่แล้ว

    Is there a repo to refer ?

  • @JanekBogucki
    @JanekBogucki 3 ปีที่แล้ว

    Nice intro.

  • @echo2net
    @echo2net 3 ปีที่แล้ว

    no repo?