Advancing Fabric - A Quick Microsoft Fabric Tour

Advancing Analytics

มุมมอง 7 792

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 31 พ.ค. 2023
So you've watched Microsoft Build, you've seen the marketing slides, maybe seen a few demo clips and heard lots about the various experiences and workloads available... but what does it actually look and feel like to use Fabric? What are you going to see when you open up that box?
In this video Simon is joined by Craig Porteous, our very own Chief Fabricator, to help show us around our Microsoft Fabric. We'll see some of the workloads, like Data Engineering and Data Warehousing, and get a sense for how new objects like Lakehouses fit into the Power BI Workspace concept.
If you're thinking about building a platform on Microsoft Fabric, and need to speak to some real Lakehouse experts, get in touch with Advancing Analytics

ความคิดเห็น • 18

@ianstats97 ปีที่แล้ว ⁺³
I really like all your videos. I am wondering how CI-CD will work for data engineering in the new fabric workspace
@EngineerNick ปีที่แล้ว ⁺¹
Thank you for the tour :)
@vjraitila ปีที่แล้ว ⁺¹
I agree that it is a bit confusing that "managed" tables are not also visible on the files side under the lakehouse workload. Does it also mean that you cannot refer to tables by path in your notebooks? And the same for tables in a warehouse?
@HiYurd ปีที่แล้ว ⁺¹
Great video. Really like seeing a demo. So how are the Synapse "Data Pipelines" in Fabric different than the Data Factory components in Fabric?
@chasedoe2594 ปีที่แล้ว
It is the same thing. When you click on the pipelines, it will just lead you to the data factory view. There seems to be now concept of ADF workspace anymore. After you save, it will appear as pipeline in Fabric workspace, which will include other stuff like DWH, LH, Spark etc.
@amitnavgire8113 ปีที่แล้ว ⁺¹
Wheen i sign up for fabric it redirects me to powerbi and i dont see these options like data factory etc
ปีที่แล้ว
❤
@crouch.g ปีที่แล้ว
How is performance/cost managed?
No choices on Spark pools etc.
In one way this is great in another how will Fabric manage that in the background?
@jordanfox470 ปีที่แล้ว
You can choose spark pools at the workspace level. Xsmall, small, medium, large, etc. Core availability is managed at the sku level for fabric across a 24 hr period though, so it's going to be super confusing and honestly extremely expensive if every workspace is popping on their own spark pools and warehouses.
@chasedoe2594 ปีที่แล้ว ⁺¹
I am still confused regarding the compute size of either Spark or Data Wareshouse compute.
Fabrics seems to back by "per capacity" so does this means I need to reserve CPU capacity ahead of using spark or data warehouse compute?
Coz normally in Synapse / databricks we can provision whatever we need without any reservation. I hope I misunderstand that coz if it requires reservation on the fabric level, then who the hell will pay for 24x7 capacity reservation for 4 hours a day spark utilization.
@AdvancingAnalytics ปีที่แล้ว
I agree, it's a confusing model! You pay for a capacity, and that capacity can be used for any of the Fabric workloads. If you get the smallest capacity, an F2, that gives you basically 4 CPUs of power. This is averaged over 24 hours - so you could run a spark job using 12 cores for a couple of hours on that F2 capacity, and that averages out well within your capacity. I think it's going to take a while before people figure out how best to work - do you take one big capacity and try and fit all of your different workloads inside it, or several smaller capacities so you don't end up throttling other important workloads.
There are some hints at further things coming - surge protection etc so you can put some guiderails around workloads to keep some of your capacity for other activities
@crouch.g ปีที่แล้ว
@@AdvancingAnalytics there is no way I can see in fabric of choosing 12 cores for a couple of hours? Am I missing somthing?
@chasedoe2594 ปีที่แล้ว
@@AdvancingAnalytics Really, what happened if there is a surge amount of data like month-end processing? Will that mean it will just not finish the execution since my capacity reservation is not enough?
Or do I need to do Synapse Dedicated SQL Pool style scaling which I need to call Azure API to scale up my capacity.
Data Factory in Fabric is just the worst data factory ever. The only upside of ADF is DE can orchestrate jobs over various azure resources without handling any API integration. Now they took out all the Azure Activity, and requires AWS styles integration that everything just go through AWS lambda. The connection management is now switch to "PoweBI Data source" (now just call Data) which has horrible UI. So what's the point of using Data Factory. And the connection to Fabric cannot be created manually, and when it not just not automatic create them... Data Factory just cannot load data into Lake House/warehouse and there is nothing I can do about it. And yes we can do that API integration, we currently do custom made CI/CD for PowerBI report based on PowerBI API, and after Build 2023, MS did slightly update some of its API and my PowerBI CICD just broke. And that's the M/A cost for manually do the API integration to any system.
And MS selling this product as "single workspace for everything" but what I speculate that, enterprise need to split workspace for data transformation workload (Spark+DF) and query workload (adhoc query + BI) anyway in order to manage compute resource allocation. So it will not be a single workspace anyway. Moreover managing per capacity on PowerBI Premium already a single dedicate job to do. Now that same capacity will be use with additional 4 more services. From my POV, this is just unnecessary complexity for the sake of so-called "simplicity and single compute and workspace."
Synapse warehouse update is welcomed but also expecting serverless option of Dedicated SQL Pool to become serverless since AWS Redshift and GBQ already have those option on the table. Rather than giving serverless pricing model, now MS is all into RI styled pricing which is even worse for Engineer to manage the surge of usage. Yes there might be workaround method to manage such issue but other we just don't need to manage that on AWS and GCP.
Since, Synapse has been pushed out of Azure in long run and without serverless pricing. And Data Factory might become just a crippled orchestrator. And managing Spark Compute Infrastructure might get even more complicated. Upside I can see is just PowerBI will leverage Lake Storage directly. Since we have to refactor/migrate out of Azure to O365.... I think we might re-evaluate cloud vendor entirely. I really not sure there is enough upside to be on Fabric. I can see use case for using Fabric as a single presentation layer. But apart from that, other PaaS products seems to be a better option. IMHO. And there still is no good option for serverless data warehouse.
Sorry for a very long ranting.
@amitnavgire8113 ปีที่แล้ว
The copy ativity has so limited connectors and also the activities in data factory are so less... fabric is basically a scaled down version of Synapse
@chasedoe2594 ปีที่แล้ว
Yeah, I still say this is another repackage and rebrand BS from MS again. Yes they have new synapse that works with Delta but that's it, the rest is just make synapse workspace worse and even harder to managed.
@NeumsFor9 ปีที่แล้ว
I love how vendors create a market requiring people to rearrange their mental image map to do essentially the same things you've been doing.....but really what's the new business value? Where is THAT?!
@AnnChu-tb4hp 9 หลายเดือนก่อน
Are they copying Databricks?

ต่อไป

เล่นอัตโนมัติ

Advancing Fabric - First Week Microsoft Fabric Q&A