Import vs DirectQuery in Power BI

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 ต.ค. 2024
  • Should you use Import or DirectQuery data connectivity mode for your Power BI models? Let's make the right choice by comparing the performance and scalability of the two options!
    Optimizing DAX video course: www.sqlbi.com/...
    How to learn DAX: www.sqlbi.com/...
    The definitive guide to DAX: www.sqlbi.com/...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 85

  • @kirillperian
    @kirillperian 3 ปีที่แล้ว +11

    All I have to do is show this to the client and drop the mic... Grazie, Alberto!

  • @abrammakgalemele3491
    @abrammakgalemele3491 2 ปีที่แล้ว +1

    Great stuff Marco Russo. Thank you. There is more meat in this video. I will view it at a slow pace.

  • @philhersh
    @philhersh 3 ปีที่แล้ว +30

    Power BI is much more sexy with an Italian accent.

    • @nicknick-71
      @nicknick-71 3 ปีที่แล้ว +1

      "Forza BI" in that case.

    • @jjessica7582
      @jjessica7582 3 ปีที่แล้ว +1

      Agreed 👍👍👍👍👍👍👍😁😁😁😁

  • @johng5295
    @johng5295 2 ปีที่แล้ว +1

    Thanks in a million! Very well explained. This is the nth time that I am watching this again. Great content. Awesome. I couldn't find this explanation--simply put anywhere else. “Great teachers are hard to find”. Grade: A++ 💥

  • @sane7263
    @sane7263 2 ปีที่แล้ว +3

    I was searching for this video and here I am, First thank you, Alberto, great vid!❤
    I have a question: If a One-side table in DQ mode is connected to a Many-side table that is in DUAL mode, why do we have a Weak Relationship?
    Even when both have a Live Connection to the same data source?
    (I know it should both be DUAL in order to have a strong relationship)
    Every relationship is hard to understand, you know 😀

  • @ericet19
    @ericet19 3 ปีที่แล้ว +3

    Awesome video. Thank you for your expertise and taking the time to share it here.

  • @noenavarro7482
    @noenavarro7482 2 ปีที่แล้ว +1

    Such a great video and comparison. Thank you for putting this together!

  • @franskrost
    @franskrost 3 ปีที่แล้ว +1

    Directquery is also required if you need to keep the data on your own servers, ie. for client data contract reasons.

    • @АндрейГаркуша-ж5и
      @АндрейГаркуша-ж5и 3 ปีที่แล้ว +1

      Its PBIRS required for such kind of things, DQ on even few million rows will bring end user anger anyway.
      I like thinking bout dq as a tool for analysts who use pbi with raw data (who fed up with excel limitations), but when you biuld a solution for common users - only import is a go, you need to analyze the whole model and decide which part could be remained in DQ, the minimum part as possible

  • @gonzaloj.coaquira7434
    @gonzaloj.coaquira7434 3 ปีที่แล้ว +2

    Muy buena la explicación, muchas gracias.

  • @kebincui
    @kebincui 10 หลายเดือนก่อน +1

    Great video. Thanks Alberto 🌹👍

  • @huqsami
    @huqsami 3 ปีที่แล้ว +1

    Thanks, beautifully explained!!!

  • @trinitxu
    @trinitxu 3 ปีที่แล้ว +5

    well explained! I am very intrigued by your monitor setup. How do you use 5 monitors / role of each in data modeling? Are you using a special graphics card or a usb adapter for this?

  • @vishaljhaveri7565
    @vishaljhaveri7565 ปีที่แล้ว

    SQL Server Database
    Selection of the Import mode and the Direct Query Mode in the Power BI.
    The Direct Query Mode is better than the Import Mode if you have a very large database and you are professional working on that large database.
    Else Import Mode is quite better, as the data evaluated in the VertiPaq retrieves and evaluates the calculation way way faster than Direct Query.

  • @RodWhiteley
    @RodWhiteley 3 ปีที่แล้ว +4

    One thing I am confused by is the refresh experience, I was hoping you might clarify. An advantage of Direct Query is the immediate background refresh of the source data without any scheduled refresh. Is this advantage worthwhile in your opinion, or what do you think the workflow should be for Import Mode to ensure the user experience is good with up to date data (say, new data is being added daily)

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +7

      Import is better for:
      - query performance
      - scalability
      DirectQuery is a second choice in case you the refresh time is not possible. Disadvantages of DirectQuery:
      - slower query performance
      - reduced scalability (the cost of each query is bigger and concurrent users are a problem much faster than Import)
      - data consistency: the total in a report could be different than the sum of individual values in the same or in another visual because there are different underlying queries - if data change in the meantime, the user could notice the inconsistency. If you want real-time, you get real-time, with all the consequences.
      You can definitely refresh data every hour (with Analysis Services we have implementations with refresh every 15 minutes).
      If you need a latency lower than 15 minutes, then you need DirectQuery. But you have to manage the related costs and disadvantages.

    • @thanomnoimoh9299
      @thanomnoimoh9299 ปีที่แล้ว +1

      @@SQLBI I also read your answer too. Cleary all.

  • @Kmsbi
    @Kmsbi 3 ปีที่แล้ว +2

    Thanks for sharing great video

  • @nithinshankar866
    @nithinshankar866 3 ปีที่แล้ว +5

    U didn't cleared cache, when running the dax 2nd time. Won't that give incorrect server timing?

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +1

      Good catch! In this particular example the difference was not present, but just because we were lucky, for this reason we didn't make another shot. In reality we should have done that!

  • @chaitanyabhardwaj6371
    @chaitanyabhardwaj6371 ปีที่แล้ว +1

    Grazie mille!

  • @jorgeeduardomoltenimoreno3048
    @jorgeeduardomoltenimoreno3048 3 ปีที่แล้ว +1

    Great video thanks!

  • @dba3579
    @dba3579 2 ปีที่แล้ว +1

    Alberto, great information. Can I ask you the about desktop memory utilization
    equirements in Import vs Directquery? Would Import consume more of the desktops memory? Btw, you make using DAX Studio so easy. Thanks

  • @vsr1727
    @vsr1727 3 ปีที่แล้ว +1

    Thank you

  • @JeffersonCanedo
    @JeffersonCanedo 2 ปีที่แล้ว +1

    Nice video thanks 👍

  • @sau002
    @sau002 2 ปีที่แล้ว +1

    Very nice video. What was the tuning you did on SQL Server to give it the performance boost?

  • @ansharora5698
    @ansharora5698 4 หลายเดือนก่อน

    Anyway to connect to a database over SSH tunnel?

  • @maazmajid2948
    @maazmajid2948 3 ปีที่แล้ว +1

    Thanks A lot

  • @Irmasdedragoesdepapel
    @Irmasdedragoesdepapel 3 ปีที่แล้ว +1

    How we connect SAS?

  • @gampashivakumar
    @gampashivakumar 3 ปีที่แล้ว +1

    Excellent

  •  3 ปีที่แล้ว

    Awesome content as usual! I'd like to ask two questions though:
    1) how much RAM does the 8bn row dataset take on PBI desktop? I would have thought it's impossible to have so much data on a "normal" desktop machine..
    2) does the same apply for complex DAX calculations? If I use measures that calculate temporary tables in the process (summarize / groupby families), would there not be cases where the performance may tip towards direct query? We've been having some performance issues with some dynamic retention calculations in PBI and were told that moving to DQ mode may help. Now I'm no longer sure!

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +1

      1) the 4bn rows requires Analysis Services or PBI Premium, it requires 16gb of RAM and partitioning (not available in desktop).
      2) No, it would be incredibly strange to see a case where DAX runs faster with DirectQuery. first, optimize DAX....

  • @excelrati
    @excelrati 3 ปีที่แล้ว

    Ciao Alberto.
    Thank you for yet another great video, as usual.
    Do to have a comparison between Import VS LiveConnection?

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว

      If you have a Live Connection you consume a remote model. By using Import, you create a copy of the data losing all the measures and relationships of the remote model. Take a look at the recent videos about the new composite models.

    • @excelrati
      @excelrati 3 ปีที่แล้ว +1

      @@SQLBI will do. Thanks a million!

  • @FabioRBelotto
    @FabioRBelotto 7 หลายเดือนก่อน

    Which SQL servers are optimized for direct query?

  • @aurelbk8214
    @aurelbk8214 2 ปีที่แล้ว

    I have an ssas tabular cube. What is the best mode to connect to the data source? Import mode or Live connection? Thanks you

  • @erinfield1943
    @erinfield1943 6 หลายเดือนก่อน

    But what if you have users who use the pbi service dashboard without telling you, and you need to click refresh in the backend first in order for them to get the up-to-date info they need? My report is geared toward users who don't know how to use the desktop version.

    • @SQLBI
      @SQLBI  6 หลายเดือนก่อน

      You can schedule an automatic refresh on the backend in Power BI.

  • @hafidzfairiz8398
    @hafidzfairiz8398 ปีที่แล้ว

    Thanks, Well explained! But i'm considering about the size of .pbi file, it must be one of disadvantage of Import mode.
    Do you have some tips and tricks about it?
    To be honest i'm really need Import mode for speed, but at the same time, size of .pbi file is annoying me.
    Thanks.

    • @SQLBI
      @SQLBI  ปีที่แล้ว

      There ain't no such thing as a free lunch!

  • @pl4195
    @pl4195 2 ปีที่แล้ว

    Hi Sir I have one question. I make one sales report everyone with one excel as data source. How can I save all these weekly report in the same power BI report so I could look back previous report by clicking the filter.

  • @Bharath_PBI
    @Bharath_PBI 3 ปีที่แล้ว

    I feel import should be the first choice, DQ should only be used with exception as suggested. Great.👍
    I have direct query model, but wanted to make it into a composite model, so that I can use some derived calculated tables in DAX, along with making dimensions dual in order for slicer to fetch the list quickly.
    Is that 4bn rows fit within the limits of 1gb model size of power bi pro and Along with other dimensions which probably would be few millions. What's the model size shown in vpax?

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว

      The model is around 16Gb in memory.

    • @Bharath_PBI
      @Bharath_PBI 3 ปีที่แล้ว

      @@SQLBI Wow, is if it is loaded in Power bi desktop then need to have enough RAM and CPU. Since it's memory intensive operation, how much ram is recommended for this 4bn model?

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว

      32 gb should be enough for that particular model. please note it is extremely optimized for Tabular, he compression depends on number of columns and data distribution. you cannot generalize.

    • @Bharath_PBI
      @Bharath_PBI 3 ปีที่แล้ว

      @@SQLBI Agree. Thank you 👍

  • @MattiasGregersen
    @MattiasGregersen 3 ปีที่แล้ว

    In the last example between Import and Direct Query, could you give some info about the size of each model? 4 billion sounds like it potentially could put some serious demands on the machine used.
    Thanks anyway, great video!

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +1

      The 4 billion rows is stored in 16Gb of RAM (it is extremely optimized, of course!).

    • @atomek1000
      @atomek1000 3 ปีที่แล้ว

      @@SQLBI how big was the PBIX and how long did the import take? Dataset refresh time needs to be considered in a real world scenario and especially on Premium nodes that have limits on number of concurrent refreshes.

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว

      The PBIX is 1.5Gb - You can use partition refreshes in Premium (not the incremental used by default in Power BI). Processing time is several hours (full refresh), depending on the hardware.

  • @devallamahesh989
    @devallamahesh989 2 ปีที่แล้ว

    I have two sources , one is sql and other is web connection i made merge both the tables by using m code in desktop its getting refresh but service its not getting refresh can you tell me what is the reason
    Error : rebuild the query

  • @angusadams
    @angusadams 3 ปีที่แล้ว

    Great video thanks. Can you outline the rough steps that would need to be taken to optimise SQL server for direct query?

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +3

      No silver bullets, you have to analyze queries and optimize the SQL Server database.

    • @tangtom2478
      @tangtom2478 3 ปีที่แล้ว

      Also second the comment. Maybe some example will be helpful! Thank you!

    • @jurgenfabender9719
      @jurgenfabender9719 3 ปีที่แล้ว +2

      @@SQLBI like always 😄😉

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +1

      You have to look at other channels for SQL Server, we cannot cover that, too! :)

  • @trench6118
    @trench6118 3 ปีที่แล้ว

    How do you feel about DirectQuery with dataflows created with the enhanced compute engine? Do you think it is best to filter the dataflow for a smaller amount of data, create the model, and then publish and remove the filters, compared to using DirectQuery? It takes a lot longer to import data to PBI desktop (or perhaps it is due to laptops + VPN and multiple Geographies), so it can be tempting to use DirectQuery (also since the dataset will refresh automatically instead of requiring a separate schedule - the refresh of a source dataflow does not trigger the refresh of downstream datasets).

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว

      you can publish the model on PBI Premium (now available per user) or Azure Analysis Services so you don’t have the memory limit of the client.
      some ego and preaggregation can reduce the size so the dataset fits in memory. The main concept takeaway of the video is to clarify the extreme performance difference between the two options, because it is a common mistake to choose DirectQuery when Import is a better choice.

  • @sau002
    @sau002 2 ปีที่แล้ว

    Alberto, were you involved in the development of the Vertipaq engine?

    • @SQLBI
      @SQLBI  2 ปีที่แล้ว +1

      No: VertiPaq has been developed internally by Microsoft, we do not work at Microsoft.

  • @maurotorres5675
    @maurotorres5675 ปีที่แล้ว

    Was your SQL server on prem or on Azure? I would expect the latter to probably run much quicker.

    • @SQLBI
      @SQLBI  ปีที่แล้ว

      No, it would have been slower because of latency and bandwidth. The hardware used in these demos is good...

  • @ketanpatil3489
    @ketanpatil3489 3 ปีที่แล้ว

    Hi Alberto - what if we optimize SQL/DB tables to form denoralized with facts and dimension attributes in single table and then apply direct query on that in power BI... Will it help? Pls do share more tips on direct query and also composite models. thank you

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว

      A single table in SQL wouldn't be optimized, the star-schema is the best approach. However, if the tables are large and you want good performance, you also have to implement columnstore indexes and materialized views. It is going to be expensive...

    • @bandishgupta2046
      @bandishgupta2046 3 ปีที่แล้ว

      @@SQLBI Hi Alberto, As star schemas are the best approach, what are the approaches to be followed to convert the source model to a star schema, when the source is an ERP and a CRM and direct query is the requirement. Are views created at source level to convert to star schema?

  • @MichaelsGaming2032
    @MichaelsGaming2032 3 ปีที่แล้ว

    Hi Alberto - I appreciate the video and the trick on the query timing that will come in hand.
    However, the comparison seems a little incomplete or reviewing from the consumption side of things.
    Can you comment on the time it took to load the 4bil rows into memory? While the end query may be faster, what if the dataset was updated the following hour, then a usage of analysis services may be needed ? Thanks and good video.

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว

      The 4 bil rows in example comes from a real project where the incremental update can balance the processing time. However, the reason why we created this video is because we measured that 90% of the DirectQuery model we see with performance issues did not have a valid reason to be in DirectQuery instead of Import mode. Many people are not aware of the huge performance difference at query time.
      Of course, as always, it depends.

    • @MichaelsGaming2032
      @MichaelsGaming2032 3 ปีที่แล้ว

      @@SQLBI of course got it. Let's say the model was updated say weekly/monthly or annually. Something where constant maintenance didn't need to take place. For your computer specs, how long did the model take to load ?

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +2

      Loading time depends on too many factors: data model, data source, hardware. The 4 billion rows example requires a few hours, but it has a very small number of columns. There could be tables with more columns and less rows that require days for a full refresh because of a slow data source, but only a few hours for the monthly refresh. As usual, it depends...

  • @jurgenfabender9719
    @jurgenfabender9719 3 ปีที่แล้ว

    Alberto what would you do in an environment where the data is held in a SSAS tabular cube, but the dashboard is in Power BI service. Direct Mode or Import Mode?

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +5

      Live connection!

  • @zeisslaimen1
    @zeisslaimen1 3 ปีที่แล้ว +1

    as always, the italians are the best.

  • @ThePunisher005
    @ThePunisher005 3 ปีที่แล้ว

    I'm wondering if important mode will create serious issues when running large datasets. Loading all these files in the memory will create serious performance issues and also depend on the complexity of your model, im using a model that has connection to over 20 various databases and not all of them structured properly, if I did load import mode im sure my query will crash or my memory won't support.
    When you say large data, what is the definition of large data?
    What if you are dealing with unstructured data?

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +1

      Importing data does not mean "import data as-is". You should follow best practices in data modeling for analytics, see www.sqlbi.com/blog/marco/2017/10/02/why-data-modeling-is-important-in-powerbi/ and www.sqlbi.com/articles/start-learning-data-modeling-for-free/
      To apply data transformation you need a proper tool, like SQL, Power Query, or specialized ETL tools.
      Once you have a proper data model, if you have more than 50 billion rows in the largest table and you cannot reduce the granularity, then you need DirectQuery - but don't expect response times in less than one second.
      On the other hand, if you connect a DirectQuery model to 20 different data sources without any transformation, you will probably get bad performance and biggest scalability issues for the data sources.
      The choice between Import and DirectQuery mode discussed in the video assumes we are getting the same data (model).

    • @ThePunisher005
      @ThePunisher005 3 ปีที่แล้ว

      ​@@SQLBI thanks so much for your kind reply. I work in Advanced Analytics in a Bank with over 100 years of operation (Some of our corporate clients has current accounts that is 50+ years). I look after various data sources from origination, marketing, risk, sales, Digital analytics, operations, programs, finance, etc... with very strict IT and a very challenging mindset of IT infrastructure design and security.
      Digital Analytics by itself is 7 different databases that I'm not sure how you will model because the source OR scope of the data doesn't always align with the other tables ie. (Pageviews vs Events) or (Behavioral vs Transactional)
      IT tried doing a proper restructure for our mainframe data, that process took over 7 - 9 years and massive losses in investment, that i'm today questioning if such investment was even necessary, yes we evolve from 1970's to 2000's yet when i'm looking at your data model i just cant help not feeling "it must be great where you are, coz that is nothing close to our reality" unless you are working in smaller or new company with limited historical data, then yes this is totally possible.
      While I believe, a well structured best practice tables managed by effective ETL Processes can be something amazing. Achieving such a vision might cost 10's of millions of dollars and years of development in such an environment. Is this ideal? Definitely no, but it is reality and it is very complex to change the bank. Marketing keeps on adding new sources on regular basis, while risk keeps on also expanding their tools for fraud detection, the same goes for other areas. So having a perfect data infrastructure is not possible practically and therefore I'm constantly challenged to balance between best practices and what is possible. Splitting these databases into different tables will be a very interesting project, but I'm sure will be 100's different tables to properly structure the data. I will look into this methodology and see if it is possible in my world.

    • @marcorusso7472
      @marcorusso7472 3 ปีที่แล้ว

      @@ThePunisher005 A virtual data warehouse or a semantic model without any transformation linking 20 different data sources is just a way to postpone the solution. Which could be a good idea if you don't need a solution at all.
      I completely understand the problem of the physical transformation in a single consistent data warehouse. However, the underlying problem is the consistency of data coming from different data sources. You can try to solve this problem in many different ways and applying business logic to transform and reconcile data in different layers. However, I am pretty sure that it is not a single product or methodology that can remove the required "manual" work to identify how to correctly combine and adapt data.
      We can try to be more flexible in the methodology (we published that a few years ago: www.sqlbi.com/whitepapers/sqlbi-methodology/ ) but you always have to find a balance between:
      - creating "one version of the truth" that works for any question
      - quickly create a report that provides an answer to a specific question, but must be rebuilt for a slightly different one (or for a different data source)
      - validate data within an acceptable error margin
      The last item is often underestimated. Many recent approaches are not easy to validate and certify, because there is a single transformation of the data and it is hard to validate the numbers. This is not a problem if you are measuring the number of likes on TH-cam videos (you can live with some error), but it's definitely not acceptable if you produce the quarterly reports for a public traded company. In the middle, it depends.
      You don't have an easy problem to solve, I know!

  • @joannak4329
    @joannak4329 3 ปีที่แล้ว

    can I have DirectQuery from Odata feed? PBI does not allow me to switch

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว +1

      No, DirectQuery is supported on a limited number of data sources: docs.microsoft.com/en-us/power-bi/connect-data/power-bi-data-sources

  • @baskarvijayan3282
    @baskarvijayan3282 3 ปีที่แล้ว

    Hi Alberto,
    Need your help to understand what's wrong with my model with Direct connection ( Oracle ).
    Source : Oracle
    Connection Type : Direct
    Table used : Global Team Members ( Max : 25980 Rows ) , Address ( Max : 400000 Rows )
    Direction : One ( Address ) to Many ( Global Team Members )
    Used Visual : Table
    Direct Query Executing less than 300 ms in Oracle , But DAX Query taking max time ( 3 Minutes ).
    All direct column used in table, There is no calculation.
    with help of DAX Studio i have captures the timings
    Storage engine : 300 ms
    Formaula engine : 3 Minutes
    Need your help to understand what is going on there. Past 2 weeks working on it and referred many articles but no clue/help for my case.

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว

      It's probably something related to the DAX measures. You can use our remote consulting services for this: www.sqlbi.com/p/remote-consulting/

    • @baskarvijayan3282
      @baskarvijayan3282 3 ปีที่แล้ว

      @@SQLBI There is no DAX measure

    • @SQLBI
      @SQLBI  3 ปีที่แล้ว

      It could be something related to the relationships. You should analyze the query plan and try to reduce and isolate the problem working on the DAX query executed.