Extract and Load from External API to Lakehouse using Data Pipelines (Microsoft Fabric)

แชร์
ฝัง
  • เผยแพร่เมื่อ 13 ธ.ค. 2024

ความคิดเห็น • 52

  • @jampeauk
    @jampeauk ปีที่แล้ว +4

    Just want to say a massive thank you for your Fabric videos they have been amazing. Keep up the great work.

    • @LearnMicrosoftFabric
      @LearnMicrosoftFabric  ปีที่แล้ว

      Hi, thanks for watching! don’t worry, there’s plenty more videos to come!

    • @jampeauk
      @jampeauk ปีที่แล้ว

      @@LearnMicrosoftFabric I may have missed this in your videos but do you have a section on how to show the contents of a file directly and load the most recent file (my files all have date stamps in them).
      I have not had any luck with os.listdir().

    • @LearnMicrosoftFabric
      @LearnMicrosoftFabric  ปีที่แล้ว +1

      @@jampeauk Hi James, for file system searching you probably want to use mssparkutils which has that kind of list files in a directory functionality - I plan to cover this in my upcoming video on mssparkutils 👍

    • @jampeauk
      @jampeauk ปีที่แล้ว

      @@LearnMicrosoftFabric awesome thanks Will, looking forward to this.
      To provide a little extra context I would like to list the files located in my S3 Bucket which I have added as a Shortcut.

  • @KurtJ-r8w
    @KurtJ-r8w 3 หลายเดือนก่อน

    Really hope you do more Fabric content.
    You were clear, structured and concise in the teachings done

  • @arunsundar3739
    @arunsundar3739 หลายเดือนก่อน

    that is a fantastic video, could follow along, quality content, not only the concepts are covered, but also the best practices for security & organizing the files, thank you very much :)

  • @chescov
    @chescov ปีที่แล้ว +1

    Much appreciated my good sir 👏👏

  • @Hamza-qs7ez
    @Hamza-qs7ez 17 วันที่ผ่านมา

    Thank you Will.
    Can I ask advice regarding this problem:
    Suppose you have an key to an external API that you could read-write to the backend.
    Are there tools within fabric or elsewhere where I can ensure that I am only requesting GET-requests and not accidentally or maliciously having it open for a non-GET request?

  • @chetan2309
    @chetan2309 ปีที่แล้ว +2

    Hey! Massive thanks! Do you’ve plans to cover any oauth based API on your system! Also how to parallelise these APIs for massive data loads! Let say you want to fetch data for 100 cities on everyday basis. Also triggers when 101st is added all those scenarios

    • @LearnMicrosoftFabric
      @LearnMicrosoftFabric  ปีที่แล้ว

      Hi,
      Greats questions! Absolutely yes, I plan to do more videos about handling different auth scenarios, and also loading v big datasets with parallel reads. Watch this space :)

  • @peternguynguyen5208
    @peternguynguyen5208 ปีที่แล้ว

    Nice instructions, thank you

  • @KAshIf0o7
    @KAshIf0o7 ปีที่แล้ว

    waiting for next part

  • @WillOSullivan-k1q
    @WillOSullivan-k1q ปีที่แล้ว

    Good explanations mate keep up the good work

  • @stevengarcia7277
    @stevengarcia7277 6 หลายเดือนก่อน

    thanks mate, well explained.

  • @samirsahin5653
    @samirsahin5653 ปีที่แล้ว

    I came here for same question. That some people already asked.
    How to call this api for multiple cities.
    I watched your other videos that you used notebook to transform data and in other video scheduled in pipeline. If you can show how to call this api for multiple cities, would be a great project. You can create a playlist as a end to end project.
    I really like your channel, following your daily spark videos.
    I believe this channel will be one of the main source of fabric youtube channels.

    • @samirsahin5653
      @samirsahin5653 ปีที่แล้ว

      Just saw you already have a playlist:)

    • @LearnMicrosoftFabric
      @LearnMicrosoftFabric  ปีที่แล้ว

      Hey! Yes, I plan on continuing this series and going a bit deeper on data pipelines v soon! Thanks for watching and for your kind words 💪🙏

  • @anushav3342
    @anushav3342 ปีที่แล้ว

    Great content. Thanks for explaining about different options available in Fabric. I need to load a Fact data which is a bookings data through REST API call. How to setup the loading into lakehouse for ingesting weekly updates. Do i need to start with pipeline or is there a way to start with notebook directly to load data into the lakehouse.

    • @LearnMicrosoftFabric
      @LearnMicrosoftFabric  ปีที่แล้ว

      thanks for watching! it depends on the complexity of your api call really! if it’s simple, then you can use dataflows or data pipelines, more complex authentication or transformation will require a notebook

  • @hotrung5469
    @hotrung5469 ปีที่แล้ว

    Thank you so much Will for your detailed instructions!!! Could you help me make an instruction to load Excel files in OneLake (specifically stored in lakehouse) into Tables in Datawahouse?

    • @LearnMicrosoftFabric
      @LearnMicrosoftFabric  ปีที่แล้ว

      hey thanks for watching! to read excel into a lakehouse table, you can either use pandas to load into a pandas df and convert to spark df (and then lakehouse table) or you can use the pyspark.pandas library (pandas within spark) - good luck!

  • @matask23
    @matask23 10 หลายเดือนก่อน

    Amazing video, thanks for this Will! I wanted to ask if PySpark would be the most optimal choice to achieve this or if I could use SQL to achieve the same goal?

    • @LearnMicrosoftFabric
      @LearnMicrosoftFabric  10 หลายเดือนก่อน +1

      Yes you could also use SQL! The good thing about fabric is that you're free to use whichever language you are comfortable with! (well as long as it's T-SQL, Python, R, Scala or KQL)

    • @matask23
      @matask23 10 หลายเดือนก่อน

      @@LearnMicrosoftFabric Thanks for that, that's really useful to know! I guess my follow up would be whether there's any compatibility issues or limitations that I might encounter if I was to use SQL within MS Fabric?

  • @gguuyypp
    @gguuyypp 5 หลายเดือนก่อน

    Thanks, can you make a video about extracting a file from SFTP ?

  • @FranciscoRodriguezFabric
    @FranciscoRodriguezFabric 10 หลายเดือนก่อน +1

    Thanks !

  • @mshparber
    @mshparber ปีที่แล้ว

    Thanks. Please explain what is best practice to make a nested api calls and merge the results back into one json file? For example, the first api call /students - gives me a list of all students, then for each I need to make another call /{sudent_id}/courses to get their courses information. I need to save the results of all students’ courses as one json file. It’s easy to do in Dataflow, but it cannot save the results as json, only table. So what is the right way to do it in Pipeline?Thanks!

    • @LearnMicrosoftFabric
      @LearnMicrosoftFabric  ปีที่แล้ว

      Hey it's not something I've done with Data Pipelines tbh, but might be possible with the For loop activity? If you know how to use Python, I would recommend doing this in Fabric Notebooks with the requests library - much easier to manage this kind of logic in a notebook.

    • @mshparber
      @mshparber ปีที่แล้ว +1

      Thaks. One of the main advantages in Power BI tools is low-code/ no-code. I know Python, but I we need a simple GUI low-code experience. Like a Power Query / Dataflow. I hope Pipeline can provide it
      @@LearnMicrosoftFabric

    • @jampeauk
      @jampeauk ปีที่แล้ว

      @@mshparber if it helps there is now a GUI which should do what you are after, do some watching/reading on "Data Wrangler" it is currently only avaliable for Pandas in Notebooks but it should be useful.

  • @sreekanth0112
    @sreekanth0112 6 หลายเดือนก่อน

    Hi,
    Please make the video on extracting the files from share point to lakehouse through Data pipeline ( Data Factory) in fabric

  • @dineshreddy2207
    @dineshreddy2207 7 หลายเดือนก่อน

    Hi, I have an XML file an want to ingest this file into MS Fabric without using notebook, Can you help me ?

    • @LearnMicrosoftFabric
      @LearnMicrosoftFabric  7 หลายเดือนก่อน

      Should be able to use either Dataflow or Data Pipeline, but if it’s horribly nested XML, notebook will probably be necessary

    • @itversityitversity7690
      @itversityitversity7690 7 หลายเดือนก่อน

      I used copy activity but seems some problem and suggestions please give other way..

  • @alex24tech
    @alex24tech 8 หลายเดือนก่อน

    how to run a pipeline for data copying. In fact, I have an API that uses two authentication systems: token and basic authentication (user and password).
    the first connection to the API (via the post method) allows you to retrieve the token which will be used afterward by the second request to execute the request itself. Is it possible to create a paper that can do the job? should I use nodebooks or is there a solution?
    the result of the second query will of course be stored in a lakehouse table.

    • @LearnMicrosoftFabric
      @LearnMicrosoftFabric  8 หลายเดือนก่อน +1

      Yes, should be possible either in Data Pipeline, or Notebook. You can make the post request, then pass the token to your next activity.

    • @alex24tech
      @alex24tech 8 หลายเดือนก่อน +1

      @@LearnMicrosoftFabric Thanks sir. Please do you have any ressource that can help me?

  • @tarunsachdeva3570
    @tarunsachdeva3570 หลายเดือนก่อน

    Brother I am stuck in pagination, can you help

  • @fnplazatuc
    @fnplazatuc 8 หลายเดือนก่อน

    Hi, how are u? After data extraction, How its the next step to transform the data and visualize this in MS PowerBi?

    • @LearnMicrosoftFabric
      @LearnMicrosoftFabric  8 หลายเดือนก่อน

      Hi there, good thanks, you? In this video here I go right from end-to-end talking about extraction , storage and then visualization. Hope it helps 👍th-cam.com/video/hwwU8V48g-4/w-d-xo.html

    • @fnplazatuc
      @fnplazatuc 8 หลายเดือนก่อน

      @@LearnMicrosoftFabric Will how are u? Your video are util! I have a question.. It's possible obtain data from JSON API rest and will transformate to table in a datalake? I can't execute this.. only transform in a Warehouse! Thanks!

  • @DinoAMAntunes
    @DinoAMAntunes 9 หลายเดือนก่อน

    Hello Very good Tks very much. My ERP is 100% online but i can´t connect to it. I think i have all the data necessary. URL, db Name, Username Password or API.

    • @LearnMicrosoftFabric
      @LearnMicrosoftFabric  9 หลายเดือนก่อน +1

      Hey if it's 100% online and an ERP system, it's likely to have an API to connect to. Google " {ERP NAME} API documentation" and find out how to connect to it. Or if it's one of the big ERP systems, you could use a dataflow because they might have a pre-built connector for your ERP system available. Good luck

  • @rashane1000
    @rashane1000 ปีที่แล้ว

    Awesome video, keep it coming! How about having Oauth2 protocol? New subscriber here, thanks very much!

    • @LearnMicrosoftFabric
      @LearnMicrosoftFabric  ปีที่แล้ว +1

      Hey thanks for watching! Currently I haven't covered this yet, but I should make something about oauth2 yes because it's such a common use case.

    • @rashane1000
      @rashane1000 ปีที่แล้ว

      @@LearnMicrosoftFabric thanks heaps.looking forward for your next vids 🔥🔥🔥

  • @rdeheld
    @rdeheld 5 หลายเดือนก่อน

    Thats not complicated. Would like to see it it the other way around