Man, that was impressive from start to end. I'm starting to get into Data Analysis and trying to get also on Data Science and this content is crystal clear. Thanks for the content!
Another great video. My approach has always been to create calculated columns in Power Query so that they're available to use in PQ transformations, which isn't the case with DAX calculated columns.
I will say obvious, they are the best DAX teachers on the planet who has significant positive influence on career of many people. Microsoft is very lucky to have them, they uncover the power of the tool.
Thank you for the explanation of the subject. I have a question. Is there a method how to optimize a model in case of hidded optimization logic of the PowerBI DB engine?
Yes, but 1 place where we can not use calc column in DAX is when we want to base that column as a sort by column , as it creates circular dependency. First part of the video was really quick and was expecting it. Happy to hear talking about large models. Yes sir, we're there with you till the end of the video. Why DAX studio numbers are off at 15:00 ? Even I have faced it sometimes, especially after resizing the columns.
It would be interesting to show a fourth example of this calculated column in SQL. Although I realize that the point of this video is that it doesn't really matter.
I was so curious about that subject and happy to see you with it 🙂 but i was also curious about query perfomans. Yes you told that there will not be a big differance cause of model size but it would be good to see in the means of seconds
The difference, if any, would not be measurable. It is way below the margin of error of the clock used to gather the timings. Only in some veeeery borderline scenarios there might be a difference, due to compression.
I have 25M records view to pulls into SSAS on prim model from Oracle cloud database Unfortunately, I don't have ability to create materialized table from the view however I can modify the view on the server.. I had three options to create columns: 1) create additional columns in SQL view - it's probably best practice, however due to complex logic it's much easier to do it in Power Query. 2) create additional columns in Power Query - I created columns but solution started to fail. I suspect this happens because Power Query tries to use folding on top of already complex view on top of cross cloud data pull. it either timed out or Oracle database choked running the view due to not enough resources. 3) create additional columns in the model itself using DAX - I had no problem doing it even though they have to recalculate every time on update of single partition (there are 3 now). In my case I used DAX calculated columns as a method of load distribution between Oracle Database with not enough resources to execute entire view and Analysis Services Database as the last step in creating complete view. In other case, I had to create calculated column based on logic required two tables coming from different clouds. Power Query did join operations extremely slow with cross cloud data on 25 mln records while DAX engine even didn't blink an eye. So the answer IT DEPENDS... :) there could be hundreds of reasons to determine one option over other. In general, my order is SQL, Power Query, DAX but it's not always possible for one reason or other and I have to deviate from general order.
I like to think of power query for data cleaning stuff and dax for analysis purposes. Splitting into two columns? Power query. Price range? Dax. Great video as usual!
I am really amazed about your explanations and outcome, so I can foresee some changes on some of my models. However, does it have an impact in the model the fact the "Price Range" calculated column through Power Query was defined not as a "Text"(ABC) rather than a non defined field (ABC-123)? Thanks again, learning from you is always great!
For me, one point. When you increase the number of calculated columns or tables in your model by DAX It increase the "waiting time" to create or edit meassures, relationships, columns, tables. especially meassures that refer calculated columns or tables. In huge datasets the creation can take too long
This is caused by the internal validation of the model, which happens also when you create many measures. When you have this problem, you have another reason to switch editing to Tabular Editor, which is not affected by the issue (validation happens when you commit the changes, not for each single one).
Hello Alberto, there is a big one discussion in the company I work for. One part of the reporting team considers the usage of Power Query to be a bad practice. Their point of view, is that all the transformations and calculations, except for maybe some basic filtering, should happen in the data source itself (we base our reports on SQL Server databases). Another part is in favor of Power Query since it is extremely flexible and and allows to implement adjustments very fast. Which group has a better point? Is there an easy answer to this question?
As usual, no easy answers. But if you have a central data warehouse with all the transformations made there, it's wonderful. If you have to create a quick report combining data from different sources, Power Query is very handy. In the middle, it depends. But we think that both tools are useful in different scenarios.
It's not the same. Calculated tables are compressed in an optimal way (opposed to calculated columns), but the price to pay is that the entire table must be materialized uncompressed in memory before the compression, so it could be memory-expensive. It could be a bad idea for large tables, requiring too much memory during the process.
I used power query to create primary key in 2million rows dimension table then merge that key into 6million rows fact table, applying the query into model took >15minutes, i guess sometimes its better just go back to sql to do these things
Very informative! I have a question Mr Ferrari sir, how do you come to learn about this "sorting" functions that happen during data load? Are you in contact with the PBI developers directly?
This is how the VertiPaq engine works. It was explained by the product team in public sessions during conferences in 2010-2012 and we reported it in our content, with additional details we gathered over time.
It depends - do your own benchmark and evaluation, the general pros and cons are the same described in the video, but your priorities might change depending on the specific data model.
I find one import aspect that you haven’t mentioned in this video, and it is the (very slow) speed of the PBI desktop when recalculating the model when you have several calculated columns in the model. I think we should vote an idea for manual recalc of the model in settings, what do you think?
Although this was good , I don't believe it was sufficient to test one new column against another. The overall model size was a couple of hunderd mb smaller using PQ; if 10 new columns were added the gap will start to widen and a clear winner (my guess is PQ) will emerge? Regardless, you guys are awesome 👌
As usual, it depends. In general, PQ is better when you just need to compute other columns of the same row, whereas DAX could be more convenient when you have to aggregate rows of other table. But there are exceptions, so it's better to know all the options available.
Can't thank you enough for 1) your knowledge 2) the way you share your knowledge 3)the clarity. Your talent and humility are both amazing!
i'm smilling while watching this video. This guy is not only GOD of DAX, but also have the humor of a smart man !
Man, that was impressive from start to end.
I'm starting to get into Data Analysis and trying to get also on Data Science and this content is crystal clear.
Thanks for the content!
Just started the video and already know that it is going to be a good one! Your work is of astonishing value!
Another great video. My approach has always been to create calculated columns in Power Query so that they're available to use in PQ transformations, which isn't the case with DAX calculated columns.
This is the best explanation I have ever seen so far! Clean, simple and satisfying.Thanks a lot
I will say obvious, they are the best DAX teachers on the planet who has significant positive influence on career of many people. Microsoft is very lucky to have them, they uncover the power of the tool.
Haha I loved section 1 Alberto. Grazie mille per tutti questi video. Ho imparato tantissimo da voi!
I literally almost spit my breakfast out when you said "That's the end of section 1" 😂
Great video as usual
Very good tutorials. From your Very deep level discussion everyone will be benefited. Thank you Alberto Ferrari. Respect & ❤️ from Bangladesh.
This was a legitimately interesting video, with a clear explanation. Thanks.
Great explanation. I guess an advantage of using PQ over DAX in this instance is that more of your 'like' logic (i.e. data modelling) is in one place.
This was a great insight into DAX and model management. Have subscribed and will be back for more. Many thanks from Scotland.
Thank you for sharing your knowledge. Which method is best on performance?
It depends!
Incredible, thanks for your great collaboration, a big hug from Brazil
Super interesting point about the partitions that I never considered. Im glad I watched this video
Soooooo helpful! Thank you Alberto. This answered my question in the exact detail I needed.
Thank you, enjoyed the reasoning behind using the different methods.
Thank you for the explanation of the subject. I have a question. Is there a method how to optimize a model in case of hidded optimization logic of the PowerBI DB engine?
Thank you ... my dax is 10 times better with your videos!
Amazing class..... You are "top" Alberto.
Yes, but 1 place where we can not use calc column in DAX is when we want to base that column as a sort by column , as it creates circular dependency.
First part of the video was really quick and was expecting it. Happy to hear talking about large models.
Yes sir, we're there with you till the end of the video. Why DAX studio numbers are off at 15:00 ? Even I have faced it sometimes, especially after resizing the columns.
It depends on how you create the calc column, circular dependency is not guaranteed, it depends...
@@SQLBI Yes.👍
It would be interesting to show a fourth example of this calculated column in SQL. Although I realize that the point of this video is that it doesn't really matter.
I was so curious about that subject and happy to see you with it 🙂 but i was also curious about query perfomans. Yes you told that there will not be a big differance cause of model size but it would be good to see in the means of seconds
The difference, if any, would not be measurable. It is way below the margin of error of the clock used to gather the timings. Only in some veeeery borderline scenarios there might be a difference, due to compression.
this is a truly MASTERPIECE
I have 25M records view to pulls into SSAS on prim model from Oracle cloud database Unfortunately, I don't have ability to create materialized table from the view however I can modify the view on the server.. I had three options to create columns:
1) create additional columns in SQL view - it's probably best practice, however due to complex logic it's much easier to do it in Power Query.
2) create additional columns in Power Query - I created columns but solution started to fail. I suspect this happens because Power Query tries to use folding on top of already complex view on top of cross cloud data pull. it either timed out or Oracle database choked running the view due to not enough resources.
3) create additional columns in the model itself using DAX - I had no problem doing it even though they have to recalculate every time on update of single partition (there are 3 now).
In my case I used DAX calculated columns as a method of load distribution between Oracle Database with not enough resources to execute entire view and Analysis Services Database as the last step in creating complete view.
In other case, I had to create calculated column based on logic required two tables coming from different clouds. Power Query did join operations extremely slow with cross cloud data on 25 mln records while DAX engine even didn't blink an eye.
So the answer IT DEPENDS... :) there could be hundreds of reasons to determine one option over other. In general, my order is SQL, Power Query, DAX but it's not always possible for one reason or other and I have to deviate from general order.
Great video--incredibly helpful and informative.
Very well explained. Thank you.
Good and interesting question.. Complete and useful answer. I'll take it into account.. 1000 thanks!!
Nice one, very funny. I laughed when you said "That's the end of section 1" and then kept staring hahaha. You fooled us hahaha Nice!
I like to think of power query for data cleaning stuff and dax for analysis purposes. Splitting into two columns? Power query. Price range? Dax. Great video as usual!
I am really amazed about your explanations and outcome, so I can foresee some changes on some of my models. However, does it have an impact in the model the fact the "Price Range" calculated column through Power Query was defined not as a "Text"(ABC) rather than a non defined field (ABC-123)?
Thanks again, learning from you is always great!
Very well explained, thank you!!!!!!!!
great explanation Alberto
Grazie tanto!
Your videos are great and insightful as always
Thank you. Great video, as always.
Thanks so much! Great video
For me, one point. When you increase the number of calculated columns or tables in your model by DAX It increase the "waiting time" to create or edit meassures, relationships, columns, tables. especially meassures that refer calculated columns or tables. In huge datasets the creation can take too long
This is caused by the internal validation of the model, which happens also when you create many measures. When you have this problem, you have another reason to switch editing to Tabular Editor, which is not affected by the issue (validation happens when you commit the changes, not for each single one).
Great video and chess in the background! :)
Hello Alberto,
there is a big one discussion in the company I work for. One part of the reporting team considers the usage of Power Query to be a bad practice. Their point of view, is that all the transformations and calculations, except for maybe some basic filtering, should happen in the data source itself (we base our reports on SQL Server databases). Another part is in favor of Power Query since it is extremely flexible and and allows to implement adjustments very fast.
Which group has a better point? Is there an easy answer to this question?
As usual, no easy answers. But if you have a central data warehouse with all the transformations made there, it's wonderful.
If you have to create a quick report combining data from different sources, Power Query is very handy.
In the middle, it depends. But we think that both tools are useful in different scenarios.
@@SQLBI Thank you Alberto! Always looking forward to learn new things from your videos!
I love the way you explain this! I have 1 question: Is this the same for tables created by DAX and tables created in Power Query?
It's not the same. Calculated tables are compressed in an optimal way (opposed to calculated columns), but the price to pay is that the entire table must be materialized uncompressed in memory before the compression, so it could be memory-expensive. It could be a bad idea for large tables, requiring too much memory during the process.
Once there are huge models that can be updated with ALM Toolkit - calculated column wins
I used power query to create primary key in 2million rows dimension table then merge that key into 6million rows fact table, applying the query into model took >15minutes, i guess sometimes its better just go back to sql to do these things
Yes, SQL better if there's no query folding..
Definitely - managing slowly changing dimensions (SCD) is a very bad idea in both DAX and M - use SQL or other ETL tools for that!
Great video! Thanks as always.
he really know this thing
Very informative! I have a question Mr Ferrari sir, how do you come to learn about this "sorting" functions that happen during data load? Are you in contact with the PBI developers directly?
This is how the VertiPaq engine works. It was explained by the product team in public sessions during conferences in 2010-2012 and we reported it in our content, with additional details we gathered over time.
What about group by? Should I use group by in power query or create new table via dax?
It depends - do your own benchmark and evaluation, the general pros and cons are the same described in the video, but your priorities might change depending on the specific data model.
@@SQLBI thank you! I will analyse and decide whether to go by group by In power query or use summarisecolumns
Super useful tutorial. Thanx :-)
LOL I loved section 1 of the video 😉
Main idea of any video on this channel: You don't know the power of a DAX-side, young padawan!!!)))
I find one import aspect that you haven’t mentioned in this video, and it is the (very slow) speed of the PBI desktop when recalculating the model when you have several calculated columns in the model. I think we should vote an idea for manual recalc of the model in settings, what do you think?
That was brilliant!!
I'm always till the end 🌝
Lol very funny short answer, excellent video
Thanks
Terrific video!!
Although this was good , I don't believe it was sufficient to test one new column against another. The overall model size was a couple of hunderd mb smaller using PQ; if 10 new columns were added the gap will start to widen and a clear winner (my guess is PQ) will emerge? Regardless, you guys are awesome 👌
As usual, it depends. In general, PQ is better when you just need to compute other columns of the same row, whereas DAX could be more convenient when you have to aggregate rows of other table. But there are exceptions, so it's better to know all the options available.
Hilarius!!!!!
"That's the end of Section One"
Il Padrino Power BI
My wait time increases when creating dax column
do you still there ?? hahahhaah