Using SQL Server On-Prem as Power BI source data must be one of the most extended use cases out there. Before Fabric exposing SQL Server tables to Power BI was very straight forward using Power BI Gateways. Seems that now with fabric we need to create SQL Server partitions, then provision BLOB storage in ADLS, then create a CETaS to create that partition as a parquet file on ADLS, then expose this to OneLake using shortcuts, then format again the parquet virtualized files in OneLake using a delta table, then create the Power BI data model in Power BI and only then you can create a Power BI report out of a SQL Server table. Am I the only one that sees this as some convoluted crazy overengineering process? Probably there are other ways using Azure Pipelines to copy data from SQL Server On-Prem to OneLake (not in preview I believe). But even when that is GA, the concept of having to copy your on-prem SQL Server structured data to a BLOB storage persistence layer to do simple reporting looks twisted. Companies that want to embrace Fabric are going to have to re-engineer their ETL processes to write on OneLake and deal with a heck of one-time data load for the existing historical data. Just to reach probably a worse performance scenario than the one they already have. Direct lake connectivity might be the one saving grace on this whole thing... we will have to see. But as of now I'm very sceptical and a bit pissed at MS with their strategy to push everyone (eventually) to this one Lake architecture that in many use cases is absolutely not required.
Appreciate the long comment and thougts you put into that. To be clear, the approaches you've done in the past with Power BI and leveraging data from on-prem, still work! That hasn't changed. There are new approaches and things are evolving. If you still want to Import your data or use DirectQuery against something like SQL Server via a gateway, go for it! If you want to leverage OneLake for reuse of data, you can do that now as well. They are different architectures though. It's the question of would you benefit from more of a lakehouse approach, or something different. Fabric gives you a lot of options and you have different tools in your toolbelt. A lot of companies already have a data lake type approach where data is already in storage of some kind. This is an easy fit for Fabric. As you mentioned, if you want to take your existing on-prem assets to a lake centric approach, there is some thought and architecture that needs to go into doing that. But again, if what you have is working, that gives you time to think about that approach and if it makes sense from a business perspective. What problem are you trying to solve? Does a lake centric approach with Microsoft Fabric's new capabilities help solve that problem? The ability to use the gateway with pipelines, in Fabric, isn't an option yet. Get your feedback in at aka.ms/fabricideas! Lots of folks commenting the desire to have it - and I feel it makes sense.
Thanks for making this comment - I thought I was taking crazy pills watching that video. Taking data from an existing table, running it through several very brittle engineering steps, only to convert to a new table for use had me wondering what I was missing. Very skeptical of this use case.
@@jeffgray8076I agree with you. I would have liked to see Fabric as just repackaged Azure storage gen2+Data Factory+Synapse into RDB database like Azure SQL Database or Serverless. This move to Parquet is ONLY beneficial to Databricks imo. AND you get stuck with basically having to write a Scala/Python ELT framework just to handle the Delta Lake Ingestion/conversion/time logic.. Oh and It's real fun querying the data in ssms and having to use an exact Azure url 🤣🤣 Sidenote i hate Parquet and Databricks already after 8months tackling it Not to mention you never knew the past years If Excel/Teams stored data in Onedrive/Sharepoint Onedrive/MS Stream. AND now there's Dataverse also, that actjally looks very promising for the power platform with the new power pages to maybe replace sharepoint altogether!
would love to see a video from you guys showing some examples or best practices how to get data from different sources and databases into Power BI. For example: we have a CRM running on mysql, a ERP on firebird and a financial ERP on MSSQL. What is the best strategy to bring data togehter in order to analyse it in Power BI. And where?! (in fabric one lake, in an azure sql Datawarehouse,...)
We are already thinking about this. When we think of on-prem data, for lakehouse and warehouse, the only real option right now is Dataflows Gen2. The existing approaches with Power BI and the on-premises data gateway are still valid approaches for reporting on data. That doesn't take advantage of the new workloads though. I'll go back to what business problem are you trying to solve though and which tool will help you solve that problem. Going with a lake centric approach with Lakehouse/Warehouse/OneLake should be approached with some architecture in mind and the understanding that orchestration will need to take place regardless of the sources.
Question: I can easily extract data from a .html financial stocks table using a PySpark notebook w BeautifulSoup crawler, define schema and column headers, create a dataframe and save result in Delta (Stocks.part01,Stocks.part02..,Stocks.part0n) in Lakehouse subfolder under Files. (This could also be achieved using Python .py file as Spark Job Definition w schedule).Still Delta doesn't bring much options other than read (Preview) at this stage. So guess we have not much choice than create a SQL Table in Warehouse using CETAS. Sample shows a single file, but what if we need to... sort of "concatenate" all Delta parts (Stocks.part01,Stocks.part02..,Stocks.part0n) into one table from same Delta location.
Any word on if we will have the ability to see usage metrics when moving items over from SQL into a Fabric Lakehouse or Warehouse similar to how we currently can utilize the Monitoring features in Azure to see Azure SQL server space and DTU usage when performing read/write operations? If I point a Power BI Report/Dataset to the Lakehouse SQL Endpoint and set an hourly refresh rate on that, it would be helpful to compare that read/write usage to the comparable Azure SQL metrics to see what makes sense to use cost-wise for different scenarios.
things are getting very excited! Is there are way to stream live SQL Server production data into Fabric? We need to report on live data but we don't want to drop and recreate the data sources.. what's the best option for streaming live data into Fabric?
Am i right in understanding that you can not create shortcuts in a MS Fabric lakehouse to directly access PAAS sql server tables ? So you need complex work-around to make it happen . What is the story behind this limitation ? Is there a more straightforward solution on it's way in future?
Got a specific question and don’t know how to navigate. I have an Azure Manager SQL Instance in the same Resource Group as my Fabric capacity however I cannot for the life of me get Fabric Data Factory to connect to it. I can connect to the instance remotely SSMS. Any idea what to trouble shoot? Looks networking related somehow.
If It aint broke don't fix It. This is just ETL with convoluted extra steps 😅 I am really starting to miss Blob Storage and Gen 2 storage and a fool prrof SQL Database.. If the Datalake filesystem was a GUI, fine. But cases I've seen you need a Scala/Python framework to handle Parquet ingestion/conversion. Oh and querying the data needing an Azure URL string is no fun 😭
I want to ask that as on one click fabric show dashboard report of power bi same as on one click fabric can show sql report mean as we find total sales,total order,max sales etc same these and other sql query can find in one click in fabric if this happen then it will be great then we dont need to learn sql we can get all sql queries in one click in fabric pls give my question
"I didn't write a single line of code"- as he shows approximately 60 lines of SQL Scripting. I think this no-code thing is not quite as ready for primetime as Microsoft wants us to believe.
Agreed. Settkng up external table managing probably isn't static and has to run in a stored procedure. On top of that tou have that sweet sweet SQL that needs to Query an dxact Azure storage URL 😂
Hot data is what you typically use. For instance measures and KPIs in Power BI, you jsually compare to "Same Period Last Year". So hot data is usually 2 years. Cold data is historic data >2years. Lets say year 3 through year 10.
Just waiting for the perfect world where delta performs as well as, or better, than an indexed relational data warehouse. I'm skeptical of Microsoft's use of delta in Fabric Warehouse - and I hope it improves, the performance has been poor. It seems like Synapse Serverless.
Using SQL Server On-Prem as Power BI source data must be one of the most extended use cases out there. Before Fabric exposing SQL Server tables to Power BI was very straight forward using Power BI Gateways.
Seems that now with fabric we need to create SQL Server partitions, then provision BLOB storage in ADLS, then create a CETaS to create that partition as a parquet file on ADLS, then expose this to OneLake using shortcuts, then format again the parquet virtualized files in OneLake using a delta table, then create the Power BI data model in Power BI and only then you can create a Power BI report out of a SQL Server table. Am I the only one that sees this as some convoluted crazy overengineering process?
Probably there are other ways using Azure Pipelines to copy data from SQL Server On-Prem to OneLake (not in preview I believe). But even when that is GA, the concept of having to copy your on-prem SQL Server structured data to a BLOB storage persistence layer to do simple reporting looks twisted. Companies that want to embrace Fabric are going to have to re-engineer their ETL processes to write on OneLake and deal with a heck of one-time data load for the existing historical data. Just to reach probably a worse performance scenario than the one they already have.
Direct lake connectivity might be the one saving grace on this whole thing... we will have to see. But as of now I'm very sceptical and a bit pissed at MS with their strategy to push everyone (eventually) to this one Lake architecture that in many use cases is absolutely not required.
Well said
Appreciate the long comment and thougts you put into that. To be clear, the approaches you've done in the past with Power BI and leveraging data from on-prem, still work! That hasn't changed. There are new approaches and things are evolving. If you still want to Import your data or use DirectQuery against something like SQL Server via a gateway, go for it! If you want to leverage OneLake for reuse of data, you can do that now as well. They are different architectures though. It's the question of would you benefit from more of a lakehouse approach, or something different. Fabric gives you a lot of options and you have different tools in your toolbelt.
A lot of companies already have a data lake type approach where data is already in storage of some kind. This is an easy fit for Fabric. As you mentioned, if you want to take your existing on-prem assets to a lake centric approach, there is some thought and architecture that needs to go into doing that. But again, if what you have is working, that gives you time to think about that approach and if it makes sense from a business perspective. What problem are you trying to solve? Does a lake centric approach with Microsoft Fabric's new capabilities help solve that problem?
The ability to use the gateway with pipelines, in Fabric, isn't an option yet. Get your feedback in at aka.ms/fabricideas! Lots of folks commenting the desire to have it - and I feel it makes sense.
Thanks for making this comment - I thought I was taking crazy pills watching that video. Taking data from an existing table, running it through several very brittle engineering steps, only to convert to a new table for use had me wondering what I was missing. Very skeptical of this use case.
@@jeffgray8076I agree with you. I would have liked to see Fabric as just repackaged Azure storage gen2+Data Factory+Synapse into RDB database like Azure SQL Database or Serverless.
This move to Parquet is ONLY beneficial to Databricks imo.
AND you get stuck with basically having to write a Scala/Python ELT framework just to handle the Delta Lake Ingestion/conversion/time logic..
Oh and It's real fun querying the data in ssms and having to use an exact Azure url 🤣🤣
Sidenote i hate Parquet and Databricks already after 8months tackling it
Not to mention you never knew the past years If Excel/Teams stored data in Onedrive/Sharepoint Onedrive/MS Stream.
AND now there's Dataverse also, that actjally looks very promising for the power platform with the new power pages to maybe replace sharepoint altogether!
Yoooo!!! Great content guys. You guys are making my fabric journey easy to navigate with your amazing content. I love the fabric community
This was my second video I've watched. Wowsers, blown away, all that movement of data and zero code.
A+
BAM! Welcome to the channel. Appreciate you watching!
would love to see a video from you guys showing some examples or best practices how to get data from different sources and databases into Power BI.
For example: we have a CRM running on mysql, a ERP on firebird and a financial ERP on MSSQL. What is the best strategy to bring data togehter in order to analyse it in Power BI. And where?! (in fabric one lake, in an azure sql Datawarehouse,...)
We are already thinking about this. When we think of on-prem data, for lakehouse and warehouse, the only real option right now is Dataflows Gen2. The existing approaches with Power BI and the on-premises data gateway are still valid approaches for reporting on data. That doesn't take advantage of the new workloads though.
I'll go back to what business problem are you trying to solve though and which tool will help you solve that problem. Going with a lake centric approach with Lakehouse/Warehouse/OneLake should be approached with some architecture in mind and the understanding that orchestration will need to take place regardless of the sources.
Question: I can easily extract data from a .html financial stocks table using a PySpark notebook w BeautifulSoup crawler, define schema and column headers, create a dataframe and save result in Delta (Stocks.part01,Stocks.part02..,Stocks.part0n) in Lakehouse subfolder under Files. (This could also be achieved using Python .py file as Spark Job Definition w schedule).Still Delta doesn't bring much options other than read (Preview) at this stage. So guess we have not much choice than create a SQL Table in Warehouse using CETAS. Sample shows a single file, but what if we need to... sort of "concatenate" all Delta parts (Stocks.part01,Stocks.part02..,Stocks.part0n) into one table from same Delta location.
Fabric & one data lake concept is cool
We agree! Can take things to a whole new level 👊
Any word on if we will have the ability to see usage metrics when moving items over from SQL into a Fabric Lakehouse or Warehouse similar to how we currently can utilize the Monitoring features in Azure to see Azure SQL server space and DTU usage when performing read/write operations? If I point a Power BI Report/Dataset to the Lakehouse SQL Endpoint and set an hourly refresh rate on that, it would be helpful to compare that read/write usage to the comparable Azure SQL metrics to see what makes sense to use cost-wise for different scenarios.
Great video, cool demo from Bob 🥳
Much appreciated! Thanks for watching 👊
things are getting very excited! Is there are way to stream live SQL Server production data into Fabric?
We need to report on live data but we don't want to drop and recreate the data sources.. what's the best option for streaming live data into Fabric?
Amazing, info! Thanks, Bob!
Appreciate that Scott! 👊
Am i right in understanding that you can not create shortcuts in a MS Fabric lakehouse to directly access PAAS sql server tables ? So you need complex work-around to make it happen .
What is the story behind this limitation ? Is there a more straightforward solution on it's way in future?
Got a specific question and don’t know how to navigate. I have an Azure Manager SQL Instance in the same Resource Group as my Fabric capacity however I cannot for the life of me get Fabric Data Factory to connect to it. I can connect to the instance remotely SSMS. Any idea what to trouble shoot? Looks networking related somehow.
If It aint broke don't fix It. This is just ETL with convoluted extra steps 😅
I am really starting to miss Blob Storage and Gen 2 storage and a fool prrof SQL Database..
If the Datalake filesystem was a GUI, fine. But cases I've seen you need a Scala/Python framework to handle Parquet ingestion/conversion. Oh and querying the data needing an Azure URL string is no fun 😭
Is there anyway to do this prior to SQL Server 2022?
I want to ask that as on one click fabric show dashboard report of power bi same as on one click fabric can show sql report mean as we find total sales,total order,max sales etc same these and other sql query can find in one click in fabric if this happen then it will be great then we dont need to learn sql we can get all sql queries in one click in fabric pls give my question
How long until the Azure SQL table can exist as a Fabric table (the lines are becoming blurred).
Great question! Not sure if that's even in the works. Get your feedback in though at aka.ms/fabricideas if that's something you'd like to see.
I wonder what the SQL DB compatibility level needs to be to achieve this.
There's no compatibility level requirement for CETAS.
Yo ! To both of y’all!!
"I didn't write a single line of code"- as he shows approximately 60 lines of SQL Scripting.
I think this no-code thing is not quite as ready for primetime as Microsoft wants us to believe.
Right....I was like how did you create the archive tables 😂
@@reams It is a fair comment. What we meant is no code is needed once I had the cold data archived in Azure Storage
Agreed. Settkng up external table managing probably isn't static and has to run in a stored procedure. On top of that tou have that sweet sweet SQL that needs to Query an dxact Azure storage URL 😂
Now how do I convince my IT team, who controls all this but doesn't use any of these tools, that we need this by the end of the week?
Cold data vs Hot Data ?
Hot data is what you typically use. For instance measures and KPIs in Power BI, you jsually compare to "Same Period Last Year". So hot data is usually 2 years.
Cold data is historic data >2years. Lets say year 3 through year 10.
Amazing content.
Glad you liked it! 👊
#QuickCreateEverything!!!
hahaha it got the job done!
Important question... are people seriously calling it "The Fabric"?
Just waiting for the perfect world where delta performs as well as, or better, than an indexed relational data warehouse.
I'm skeptical of Microsoft's use of delta in Fabric Warehouse - and I hope it improves, the performance has been poor. It seems like Synapse Serverless.
#IWatchPatrick #TheFabric
haha Patrick appreciates that! 🙏👊
great video
I just did a yo😂
BAM! and... YOOO!
the same thing that we have already in synapse analystics but now we can use a new word...shortcut....
Shortcuts are amazing! This helps you to leverage the next generation of Synapse!
Geat video, but anyone else feeling uncomfortable due to a lack of naming conventions used for sql tables and columns? 😀