Microsoft Fabric: How to append only Incremental data using Data Pipeline in Lakehouse

แชร์
ฝัง
  • เผยแพร่เมื่อ 30 ก.ค. 2024
  • This video will discover the ways to run Incremental ETL(Data Load) to append new data using Data Pipeline without using SQL Procedure. Lakehouse does not support reading and writing using SQL, then how to append only new data. In this Video I have not used Data Flow Gen 2
    00:00 Overview of Incremental ETL
    02:20 Create Data Pipeline with Append option
    08:20 Create Data Pipeline for Incremental Data Load
    16:30 Test the Data Pipeline
    Last_date =
    select max([collectdate]) maxDate from [dbo].[incr_info]
    max_date =
    @activity('Last_date').output.firstRow.maxDate
    GIT File link - github.com/amitchandakpbi/pow...
    All files: github.com/amitchandakpbi/
    Power BI - Azure Free Tier, Create Azure Data Lake Gen 2 and connect that in Power BI: • Power BI: Connect with...
    Get Azure SQL DB and Configure it on Power BI: • Abstract Thesis 93: Po...
    Official Document: learn.microsoft.com/en-us/fab...
    Video from GuyinaCube: • Microsoft Fabric Launc...
    Buy a Microsoft Fabric subscription: learn.microsoft.com/en-us/fab...
    SKU: learn.microsoft.com/en-us/fab...
    Licenses: learn.microsoft.com/en-us/fab...
    Enable Fabric: learn.microsoft.com/en-us/fab...
    🔵 What is Microsoft Fabric
    Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place.
    With Fabric, you don't need to piece together different services from multiple vendors. Instead, you can enjoy a highly integrated, end-to-end, and easy-to-use product that is designed to simplify your analytics needs.
    How to enable Fabric, find details - learn.microsoft.com/en-us/fab...
    Microsoft Fabric licenses details -learn.microsoft.com/en-us/fab...
    Don't forget to like, comment, and subscribe for more great content!
    ▶️Please Subscribe: / @amitchandak
    ▶️Data:
    • Covid 19: covid19.who.int/WHO-COVID-19-...
    • Get The data and Pbix files on Github: github.com/amitchandakpbi/pow...
    • DropBox files: www.dropbox.com/sh/90ycq7hpmn...
    ---------
    ▶️Follow me on:
    • Medium: / amitchandak
    • Twitter: / amitchandak78
    • LinkedIn: / amitchandak78
    ---------
    ▶️My Other Videos:
    • Mastering Power BI: • Power BI for Beginners...
    • Expertise Power BI: • Expertise Power BI- Ho...
    ~-~~-~~~-~~-~
    Please watch: "Microsoft Power BI Tutorial For Beginners✨ | Power BI Full Course 2023 | Learn Power BI"
    • Microsoft Power BI Tut...
    ~-~~-~~~-~~-~
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 23

  • @brianmorante1621
    @brianmorante1621 11 หลายเดือนก่อน +1

    I have been looking for this. You explained this so I can easily understand! This helped my team. Thank you.

  • @Jhonhenrygomez1
    @Jhonhenrygomez1 21 วันที่ผ่านมา

    Hello, in Fabric, there is a way for incremental loading to be done without a watermark, meaning it does not use fields. When a data source such as PostgreSQL is used to identify changes "automatically," the tools must consume the WAL so that incremental loading can be done with less manual processing. I want to know if Fabric fulfills this function because, under what you mentioned, a 200GB table (for example) would take a long time to refresh and must have a date field to validate incremental topics.

  • @adefwebserver
    @adefwebserver ปีที่แล้ว +1

    Thanks! You are my "Go To" guy on this stuff :)

    • @AmitChandak
      @AmitChandak  ปีที่แล้ว

      Thanks! 🙏
      Hope you enjoying the series
      th-cam.com/video/p-v0I5S-ybs/w-d-xo.html
      Please Share
      Please find 370+ videos, blogs, and files in 70+ hours of content in the form of an organized course. Get Fabricated with Microsoft Fabric, Power BI, SQL, Power Query, DAX
      biworld.graphy.com/courses/Get-Fabricated-Learn-Microsoft-Fabric-Power-BI-SQL-Power-Query-DAX-Dataflow-Gen2-Data-Pipeline-from-Amit-Chandak-649506b9e4b06f333017b4f5

  • @directxrajesh
    @directxrajesh 4 หลายเดือนก่อน +3

    Since there is no upsert..how do we handle updates to existing data at source.

  • @anushav3342
    @anushav3342 8 หลายเดือนก่อน +1

    Hi Amit! How to work with REST API data to append the incremental data into Fabric. Do i need to reset any of the steps or do i need to follow the same procedure.

    • @AmitChandak
      @AmitChandak  8 หลายเดือนก่อน +1

      If you can pass filters to REST API- Between date or >= date then we can implement the same logic. Just let me know what you can pass

  • @remek5758
    @remek5758 6 หลายเดือนก่อน

    Do you happen to know if mapping data flow will be available at some point in Fabric?

    • @AmitChandak
      @AmitChandak  6 หลายเดือนก่อน

      I will update you on this

  • @clvc699
    @clvc699 หลายเดือนก่อน +1

    could you do this with files (parquet) in the lakehouse using incremental data?

    • @AmitChandak
      @AmitChandak  หลายเดือนก่อน +1

      For that you need use pyspark and use append mode

    • @heyrobined
      @heyrobined 29 วันที่ผ่านมา

      @@AmitChandak make a tutorial on that. because on premises data cant be loaded directly to workspace and required external staging storage, so only way is to do files load but there is no option for append here . can you create a way?

  • @sansophany4952
    @sansophany4952 ปีที่แล้ว +1

    Thanks for sharing. this is very helpful. I wounder is it possible to do realtime data ingestion (realtime pipeline) to lakehouse or warehouse?

    • @AmitChandak
      @AmitChandak  ปีที่แล้ว +1

      Yes, Please explore streaming dataset, event streams & Kusto
      learn.microsoft.com/en-us/fabric/real-time-analytics/

    • @sansophany4952
      @sansophany4952 ปีที่แล้ว

      @@AmitChandak Thank you. will go through that.

  • @PrabhakaranManoharan2794
    @PrabhakaranManoharan2794 10 หลายเดือนก่อน +1

    Hi Amit. That was a great tutorial. Can we get a video on the same scenario when the data source is .csv/excel files rather than an SQL Server?

    • @AmitChandak
      @AmitChandak  10 หลายเดือนก่อน +1

      Thanks 🙏
      If the Excel or CSV only has incremental data next time, you can use the append functionality of pipeline or Dataflow gen 2.
      If they have full data, you can follow the same process, but the query will not fold at the source.

  • @christianharrington8145
    @christianharrington8145 ปีที่แล้ว +1

    Thanks great video! 🙂
    I wonder about the strategy when data can be modified. So for ex. if you load purchasing document or any other document, some attributes of measure already loaded to the warehouse might change. In this case, it's not only a matter of adding new records, but also updating them. Since there is no concept of unique primary key in Fabric so far (that might change thought), I wonder how to achieve this?
    That reminds me of the datawarehouse 101 old days where we needed to reverse documents that have changed, say original doc was qty 100, now you load same doc changed with qty of 90, so you needed to add a record with qty -100 and another one with qty +90.
    There might ne some easier solution for sure. Any clue? 😊
    (and there are deletes as well!)
    Thanks!

    • @AmitChandak
      @AmitChandak  ปีที่แล้ว +1

      Yes, Unique Key is something for which I have seen some ideas, already in place. It should be there soon. As of now I have managed update in fact using source key.

  • @saikrishnanimmagadda6469
    @saikrishnanimmagadda6469 10 หลายเดือนก่อน

    Hi Amit,
    I have been actively following your instructional videos on data ingestion via Data Pipeline, specifically from On-premises to Fabric. While attempting to implement the process, I consistently encounter the following error message. I am reaching out to seek your expert guidance in resolving this issue. Your insights and assistance would be greatly appreciated in helping me overcome this obstacle in my data ingestion efforts.
    Thank you in advance for your time and support.
    ERROR [08S01] [Microsoft][ODBC PostgreSQL Wire Protocol driver]Socket closed. ERROR [HY000] [Microsoft][ODBC PostgreSQL Wire Protocol driver]Can't connect to server on 'xxxxxxx' ERROR [01S00] [Microsoft][ODBC PostgreSQL Wire Protocol driver]Invalid attribute in connection string: sslmode.
    Kind Regards,
    Sai.

    • @AmitChandak
      @AmitChandak  10 หลายเดือนก่อน

      Please un-check encrypted flag and try