Microsoft Fabric: Inspecting 28 MILLION row dataset in Bronze Lakehouse - Part 2

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 ก.ย. 2024

ความคิดเห็น • 33

  • @endjin
    @endjin  4 หลายเดือนก่อน

    Thank you for watching, if you enjoyed this episode, please hit like 👍subscribe, and turn notifications on 🔔it helps us more than you know. 🙏

  • @ThiagoOliveira-fj9me
    @ThiagoOliveira-fj9me 9 วันที่ผ่านมา +1

    That is brilliant -- thanks for making this series of such quality on Microsoft Fabric.

  • @1974mkc
    @1974mkc ปีที่แล้ว +2

    Excellent Demonstration. Will look forward to the upcoming episodes. Many Thanks

    • @endjin
      @endjin  ปีที่แล้ว

      Thank you! There's a lot more to come!

  • @rajavarman4657
    @rajavarman4657 ปีที่แล้ว +2

    Nice representation of the medallion architecture using Microsoft Fabric!

  • @phoenixoo7
    @phoenixoo7 ปีที่แล้ว +1

    Very nice presentation... Looking forward to the upcoming episodes...

    • @endjin
      @endjin  ปีที่แล้ว

      Thank you! Next episode out next week!

  • @MucahitKatirci
    @MucahitKatirci 6 หลายเดือนก่อน +1

    Thanks

    • @endjin
      @endjin  4 หลายเดือนก่อน

      There should be a new video dropping soon, seeing that you've binged everything so far!

  • @clyderodrigo9183
    @clyderodrigo9183 8 หลายเดือนก่อน +1

    👍👍

    • @endjin
      @endjin  4 หลายเดือนก่อน

      Thanks!

  • @raviv5109
    @raviv5109 ปีที่แล้ว +1

    Really great! But request you to include the link of previous parts and also mention in title. In this way we know it is series and there are other parts. Thx!

    • @endjin
      @endjin  ปีที่แล้ว

      There should be a link to the previous part in the description, but thanks for the suggestion

  • @user-zs5kp5vz3m
    @user-zs5kp5vz3m 11 หลายเดือนก่อน +1

    That's a great video. However, I am still not able to understand the presence of separate semantic layer. It seems like presentation layer not semantic. Need to understand this confusion please. Looking forward to your response as I am currently going to implement it.

    • @endjin
      @endjin  11 หลายเดือนก่อน

      Thanks for the kind words! To address your statement:
      Power BI has effectively branded itself as a "semantic layer" for a number of years now. Being able to capture domain logic, user-friendly naming, calculations and relationships all encapsulates the "semantic layer", which goes above and beyond what you can generally do in your upstream data store. The beauty of Fabric and Direct Lake is that you don't actually need to import a copy of the "Gold" data into the semantic layer - it's the same copy of data and it queries it directly. But what the semantic layer is still useful for is augmenting the Gold layer with additional metadata - column/table renames, additional table relationships and measures.
      Think of Gold as your "serving" layer. You've processed it into the structure where you don't have to do much in the downstream BI layer to get the data into the correct shape (oftentimes your M queries will be nothing more than just pointing to the tables in the data store). All you need to focus on then is adding the finishing touches required for the end-user to consume.

  • @michaelmurgado
    @michaelmurgado ปีที่แล้ว +3

    Would you mind sharing the visio template used?

    • @endjin
      @endjin  ปีที่แล้ว

      Yes, we'll share that soon! If you subscribe to our blog, we'll post when it's available: endjin.com/rss.xml

    • @ChrisDowns88
      @ChrisDowns88 ปีที่แล้ว +1

      @@endjin would be super useful! Has this been released yet?

    • @endjin
      @endjin  ปีที่แล้ว +1

      @@ChrisDowns88 Not yet. Barry is working on it.

  • @andreanneee1995
    @andreanneee1995 6 หลายเดือนก่อน +1

    CAN you share the visio diagram?

    • @endjin
      @endjin  4 หลายเดือนก่อน

      Ed's planning to release it once he gets to the end of the series.

  • @KickersKaiser
    @KickersKaiser 8 หลายเดือนก่อน +1

    Thank you for your video! How would you recommend organizing the 3 lakehouses in terms of data governance and data access? 3 separate workspaces or one workspace? If one workspace, how do you organize data access and governance?

    • @endjin
      @endjin  4 หลายเดือนก่อน

      We would default to one workspace containing all Lakehouses. We'd then have a Dev, Test and Prod version of this workspace. A separate workspace per Bronze, Silver and Gold would increase maintenance complexity, especially when taking into account a Dev/Test/Prod version.
      W.r.t. Data access - this one's tough to answer, because it totally depends on your security requirements. Fabric offers Workspace-level roles, artifact-level permissions, and more recently Data Access Roles (blog.fabric.microsoft.com/en-us/blog/9046/).
      In our experience, if you're developing an Enterprise solution, developers will tend to have workspace level roles. End users will tend to have Artifact-level or more granular Data Access roles within an artifact. E.g. an end-user would be given read-only access to the "Gold" Lakehouse, for example.
      For Self-Service solutions (managed or business-led), you might want to loosen your restrictions to enable users to create items within a workspace, in which case they'd need a workspace level role. Or you might want to just provide write access to existing artifacts - in which case you'd use artifact level permissions.
      My point is - it totally depends. But there are currently quite a few ways to implement security. I would suggest mapping out your roles/personas and understanding what each persona needs to be able to do, and then try to map that to Fabric's permissions model.
      From a Data Governance perspective, as ever, it's all about consistency. There are various features in Fabric to help with Governance and Discoverability (e.g. Domains, Workspace Contact Lists, Certification/Endorsement, Information Protection (Sensitivity Labels), Purview Hub in Fabric, Purview Compliance Portal (for auditing), Metadata scanning). I recommend you read through this article: learn.microsoft.com/en-us/fabric/governance/governance-compliance-overview
      Hope this helps!

    • @endjin
      @endjin  4 หลายเดือนก่อน

      Part 8 - Good Notebook Development Practices - is now available: th-cam.com/video/UyS6ZUgh-Wc/w-d-xo.html

  • @brianmunyao
    @brianmunyao ปีที่แล้ว

    What are the pros and cons of using a lakehouse for each piece of the medallion architecture vs using a single lakehouse and a file folder for each piece?

    • @endjin
      @endjin  ปีที่แล้ว +1

      Hi, thanks for the comment! This question has been asked by other commentors too... so we'll just reshare our answer...
      The reality is that there's no one-size-fits-all approach to architecting solutions in Fabric. When it comes to your specific question about how the Medallion architecture maps to Lakehouses in Fabric: we default to one Lakehouse per layer. That's mainly because of two things - organizational flexibility and security flexibility.
      If we start combining the layers into a single Lakehouse, we lose flexibility on the organization of the managed Delta tables that we create. That's because within a single Lakehouse there's no way to group related tables other than by using a table naming convention - i.e. there's no equivalent of a T-SQL "SCHEMA" in a Lakehouse, nor is there a notion of sub-folders.
      Generally in the Silver and Gold layers we're creating Tables as outputs, rather than Files, so this flexibility is useful in order to separate Tables from one layer to the next layer. If you're purely dealing with "Files" then there's less of an obvious benefit of creating separate Lakehouses. But I would question why Silver/Gold datasets are being stored in the Files section rather than the Tables section - the Tables section is written in Delta format and heavily optimized for reporting purposes.
      From a security perspective, there's a clear separation of data when using separate Lakehouses, which can mean that different security provisions can be put in each layer. You might not want people to have access to `raw` (i.e. `Bronze` data), but you might want them to be able to access `Silver` data. Having everything in a single Lakehouse would make this a little tricky.
      In reality, though, "it depends". If you have a really simple use-case, one Lakehouse could be sufficient. I've also seen people suggest combining "Bronze and Silver" into a single Lakehouse and "Gold" into a separate Lakehouse, since in "Bronze" you're usually only dealing with "Files", and therefore you may as well utilize the "Tables" section for your "Silver" layer. You just need to determine what works best for your use-case, factoring in the above points alongside other concerns such as data residency and cost management/chargeback. And its likely that even internally, your Lakehouse architecture will differ from one project to the next.
      Remember in Fabric we have the power of "Shortcuts", which allows us to seamlessly combine data from other Lakehouses. So you can't go far wrong whatever architecture you choose!

  • @mnhworkdev1652
    @mnhworkdev1652 ปีที่แล้ว +1

    Not too familiar with Visio however I know it has a components library, where did you find all of the icons and HM Land Registry logo

    • @endjin
      @endjin  ปีที่แล้ว +1

      You can download Microsoft Fabric icons from learn.microsoft.com/en-us/fabric/get-started/icons

    • @GuillaumeBerthier
      @GuillaumeBerthier ปีที่แล้ว +1

      @@endjin thanks for this video series it's very interesting! is there any advantage to break down Bronze and Silver across 2 different Fabric Lakehouse items ? why not just considering 1 unique Fabric item where the File zone will be the Bronze layer and the (managed) Table zone would be the Silver layer ?
      PS: and thanks for the Fabric item icons link ; would you mind to share your End2End Demo architecture Visio file in the description section as a reference for this video eventually?

    • @endjin
      @endjin  ปีที่แล้ว +1

      One reason is that some folks do want to store their `raw` data in a queryable format (i.e. Delta tables). If we were to share a Lakehouse with the `Silver` layer and then create formal tables for our raw `Bronze` data, then we'd have `Bronze` and `Silver` tables intermixed, which is a bit of an organizational headache in a single Lakehouse (since there's no way to group the separate tables other than by table name convention - i.e. there's no concept of a "schema" in a Lakehouse).

      There's also a clear separation of data when using separate Lakehouses, which can mean that different security provisions can be put in each zone. You might not want people to have access to `raw` (i.e. `Bronze` data), but you might want them to be able to access `Silver` data. Having everything in a single Lakehouse would make this a little tricky.

      W.r.t. the Visio diagram - yes, we'll share that soon! If you subscribe to our blog, we'll publish a post when it's available: endjin.com/rss.xml

    • @GuillaumeBerthier
      @GuillaumeBerthier 10 หลายเดือนก่อน

      @@endjin I did subscribe to your YT Channel and RSS feed but I didn't see the Visio diagram , any chance I missed just it ? Thanks😜

  • @shivaog007
    @shivaog007 8 หลายเดือนก่อน

    Can you share the architecture link here?

    • @endjin
      @endjin  4 หลายเดือนก่อน

      I think the plan is to release some assets once the series is complete.