Thanks for the follow up, i was really confused when I was reviewing my models. To people asking about performance. If you're having to use filter within your measures over a large dataset then I would make a calculated column that returns 1 if the logic is met, then have a measure that just sums the calc column. If you are only doing arithmetic then the calc column doesn't make much sense, unless you are going to have the value presented in a raw data table in your report. With a raw data table, the measure will execute over each row, and you will possibly exceed resources or have performance issues. As Marco is pointing out, it's all a balance of storage, performance, and customer requirements. Sometimes an extra column can't be done if the table is too large.
An effective engine data to pack; fabulous, only the necessary. Thanks Marco and Patrick for the power of vertipaq and the session with this data sample.
What I'm thinking of is, could you mention a few regular use-cases where you would sacrifice memory for using a "calculated column" rather than an iteration function? Let's say summing up 5, 10, 20, 50 columns, or if the calculated column's logic is just "complex" (I know-I know, "What is complex?" :D), using conditions, multiplications, etc. How can you decide not to use a measure, but rather store the value in a column, so you can just sum on that, meaking the measure much more faster and less resource intensive. Thanks in advanced.
It depends. For example: with 10 billions rows a column with Line Amount could be 30% faster than multiplying the two columns. However, the storage for the additional column could be 20-30 GB. In a report, the difference can be 1-2 seconds faster, something like 3 seconds instead of 4.5 seconds. Memory is not free, CPU is not free. Customer choice, but the real issue is that you cannot continue to add memory, so you have to make choices. However, if you have 10 million rows, the difference is smaller in percentage, it could be actually faster and in any case the difference (if any) would be in milliseconds. So I would always save memory in those cases. For Power BI, it's a no brainer: save memory - you cannot have 10 billions rows.
I have what I think could be a common problem to many: My millions of rows of data have a unique identifier in them. I need to keep the identifier despite its terrible cardinality :-(. However, it's worse that that. I need to add a string prefix to the unique Id field because it ends up being a lengthy URL which I display as a link icon in the UI within a table. This makes the problem much worse because the size of the column and its unique values prevent any significant compression and the file becomes huge. Is there something that I can do to improve this in the DAX instead? Maybe in a similar way to your SUMX function described in the video? My concern would be that a measure that 'calculates' the eventual URL string would get run for every row in my display table whenever a filter was changed and that would take several seconds...
I second this comment. This video kind of frustrated me a little because how can we remove IDs when we want our users to be able to identify and return to the records at the Order or OrderLine level of granularity? Is this for only summary level models that don't require drill-down? And I agree that it's even worse in most datamodels because of the hyperlinks that are created for these IDs to be used on the querystring because it's not only one column for each ID, but now two because of the URL. These ID and URL-to-ID-record columns take up the vast majority of space in our model. After thinking about this some more, I was able to make the hyperlink into a measure instead of a calculated column www.screencast.com/t/9o4sT9jV1 which saves a little bit of space, but not nearly as much as I was expecting based upon what DAX Studio showed. Also, you can't just get rid of an ID that is used to link tables together can you? Granted the SUMX function was a true size-saver in that you didn't sacrifice any functionality but could still save some space, but I left feeling that the ID removal was completely impractical. Maybe I'm just missing something... I actually hope that I am.
Hello, thank you for the video. In the last portion of the video, by using SUMX in a measure instead of a calculated column - granted, the model size is reduced. But by doing this, won't you put additional strain on the formula engine, and user interactions that use the measure with SUMX be significantly slower?
Awesome video! Managed to get a recent report from 200 down to 40. Seeing the impact of a single column is really useful. Do I really need it, can I use a measure instead of a column? It really makes a difference :) Thanks!
Nice tips, thank you. I think there is a balance to be struck as well though... I generally advise creating calculated columns as far back in the process as possible - ideally on the database prior to connecting Power BI. This is to centralise the "single version of the truth" and minimise maintenance. This is an approach I would stick with unless performance proved to be an issue. Thoughts?
Correct - but the memory consumption described in the video is not caused by the calculated column, it is caused by the cardinality of the column that produces low compression.
What happens to rendering speed when executing the new measure? Maybe it doesn't matter so much in this example, but in general, I would think you would embed compution heavy formulas in a calculated column, especially if there are dependencies.
Thanks for the video, always love the performance optimization videos! I was just wondering though; is it always better to replace calculated columns with measures? Isn't a calculated column calculated once (on load of the initial report), whereas measures are recalculated with constantly when you navigate the visuals pane? I'd imagine there would be some cases where I'd rather want to wait a bit longer initially to have snappier performance navigating the dashboard once loaded.
Calculated columns are not compressed with as much efficiency as the regular data due to when they are created in the build process. Therefore from a compression standpoint, they are not preferred.
@@dbszepesi This is very specific to calculated columns, if you create it in PQ, you are "fine". Your model will be still bigger, obviously, but the engine can compress your model therefore making it smaller compared to a DAX calculated column.
@@arklur3193 Not really, the biggest difference in compression is the number of unique values of the column, which is bigger in Line Amount rather than Quantity / Net Price if you consider this example. The difference between computed columns and calculated columns is minimal and relevant only for columns that have a small number of unique values (up to 10/100, certainly not 1000 or more).
If you're having to use filter within your measures over a large dataset then I would make a calculated column that returns 1 if the logic is met, then have a measure that just sums the calc column. If you are only doing arithmetic then the calc column doesn't make much sense, unless you are going to have the value presented in a raw data table in your report. With a raw data table, the measure will execute over each row, and you will possibly exceed resources or have performance issues.
Quoting a comment from below: "I had the same problem. But then downloaded and installed 2.9.5 from ci.appveyor.com/project/DarrenGosbell/daxstudio/builds/28512060/artifacts and it worked as expected."
I had the same problem. But then downloaded and installed 2.9.5 from ci.appveyor.com/project/DarrenGosbell/daxstudio/builds/28512060/artifacts and it worked as expected.
So basically Marco recommends not using calculated cols unless completely necessary because of their memory consumption. What's to stop you from computing this in the source and importing it in ready to use. I would assume you can use this against tabular models aswell ? 12 millions rows is not big in pbi.
If you create your calculated column in PQ rather than DAX, your modell will be still bigger but not just that big as if using DAX. And yes, 12M rows is not that much, but this is just a video explaining this stuff, you only need to worry about these if your model is much bigger.
@@arklur3193 See comment above - the difference would be minimal in this case, computed columns in PQ are equally expensive to calculated columns if there is a medium-high cardinality.
Thanks for the follow up, i was really confused when I was reviewing my models.
To people asking about performance.
If you're having to use filter within your measures over a large dataset then I would make a calculated column that returns 1 if the logic is met, then have a measure that just sums the calc column.
If you are only doing arithmetic then the calc column doesn't make much sense, unless you are going to have the value presented in a raw data table in your report. With a raw data table, the measure will execute over each row, and you will possibly exceed resources or have performance issues.
As Marco is pointing out, it's all a balance of storage, performance, and customer requirements. Sometimes an extra column can't be done if the table is too large.
Thanks Marco and Patrick for the insights...really nice seeing both of you together
Simple and effective, thank you Patrick & Marco
Thank you Marco & Patrick very Helpful👍
Thank you Marco for this useful introduction
Mind blown! Thank you Patrick and Marco!
An effective engine data to pack; fabulous, only the necessary. Thanks Marco and Patrick for the power of vertipaq and the session with this data sample.
Awesome Video. Marco, I'm your fan after seeing this. You do a much better job explaining than Alberto. Productive 12 mins of my life.
great value in video - implemented !!!!
great help guys! thanks
Great video.
awesome video!!!!! Thank you so much guys!!!
Thanks for watching! 👊
Thank you guys, great as usual!
Great Optimization
This is one of the most important item when building a report thank you
Fantastic job, guys, thank you!
Awesome, thank you for this video!!!
Informative 👍
Very helpful! Thank you
First nice t-shirt
Second, your videos with Marco are a great good one, Patrick 👍👍👍
Great session. Nicely explained (for us lesser mortals) .. more please!
love it!
Amazing
Amazing!!!!!
Wow marco and Patric together
Hi there, how did you get de Advance mode in Dax Studio?
You need to enable it in the Options, as Marco said --> imgur.com/GnFxG1c
Thanks all
What I'm thinking of is, could you mention a few regular use-cases where you would sacrifice memory for using a "calculated column" rather than an iteration function? Let's say summing up 5, 10, 20, 50 columns, or if the calculated column's logic is just "complex" (I know-I know, "What is complex?" :D), using conditions, multiplications, etc. How can you decide not to use a measure, but rather store the value in a column, so you can just sum on that, meaking the measure much more faster and less resource intensive. Thanks in advanced.
I had a similar question. There has to be a trade off somewhere between file size and execution speed
It depends. For example: with 10 billions rows a column with Line Amount could be 30% faster than multiplying the two columns. However, the storage for the additional column could be 20-30 GB. In a report, the difference can be 1-2 seconds faster, something like 3 seconds instead of 4.5 seconds. Memory is not free, CPU is not free. Customer choice, but the real issue is that you cannot continue to add memory, so you have to make choices. However, if you have 10 million rows, the difference is smaller in percentage, it could be actually faster and in any case the difference (if any) would be in milliseconds. So I would always save memory in those cases.
For Power BI, it's a no brainer: save memory - you cannot have 10 billions rows.
@@marcorusso7472 Thanks for the detailed answer, I'll keep it in mind!
Hi! Is there an easy way to see which columns in the data model are not used? Thanks!
I have what I think could be a common problem to many: My millions of rows of data have a unique identifier in them. I need to keep the identifier despite its terrible cardinality :-(. However, it's worse that that. I need to add a string prefix to the unique Id field because it ends up being a lengthy URL which I display as a link icon in the UI within a table. This makes the problem much worse because the size of the column and its unique values prevent any significant compression and the file becomes huge.
Is there something that I can do to improve this in the DAX instead? Maybe in a similar way to your SUMX function described in the video? My concern would be that a measure that 'calculates' the eventual URL string would get run for every row in my display table whenever a filter was changed and that would take several seconds...
I second this comment. This video kind of frustrated me a little because how can we remove IDs when we want our users to be able to identify and return to the records at the Order or OrderLine level of granularity? Is this for only summary level models that don't require drill-down? And I agree that it's even worse in most datamodels because of the hyperlinks that are created for these IDs to be used on the querystring because it's not only one column for each ID, but now two because of the URL. These ID and URL-to-ID-record columns take up the vast majority of space in our model.
After thinking about this some more, I was able to make the hyperlink into a measure instead of a calculated column www.screencast.com/t/9o4sT9jV1 which saves a little bit of space, but not nearly as much as I was expecting based upon what DAX Studio showed.
Also, you can't just get rid of an ID that is used to link tables together can you? Granted the SUMX function was a true size-saver in that you didn't sacrifice any functionality but could still save some space, but I left feeling that the ID removal was completely impractical. Maybe I'm just missing something... I actually hope that I am.
Hello, thank you for the video.
In the last portion of the video, by using SUMX in a measure instead of a calculated column - granted, the model size is reduced. But by doing this, won't you put additional strain on the formula engine, and user interactions that use the measure with SUMX be significantly slower?
Awesome video! Managed to get a recent report from 200 down to 40. Seeing the impact of a single column is really useful. Do I really need it, can I use a measure instead of a column? It really makes a difference :) Thanks!
nicely done fellas....for what it's worth, 'yo Marco' is much more approachable than 'professor Marco' :-)
Is there a way to use PowerShell to export the vpax file from the service automatically?
Nice tips, thank you. I think there is a balance to be struck as well though... I generally advise creating calculated columns as far back in the process as possible - ideally on the database prior to connecting Power BI. This is to centralise the "single version of the truth" and minimise maintenance. This is an approach I would stick with unless performance proved to be an issue. Thoughts?
Correct - but the memory consumption described in the video is not caused by the calculated column, it is caused by the cardinality of the column that produces low compression.
What happens to rendering speed when executing the new measure? Maybe it doesn't matter so much in this example, but in general, I would think you would embed compution heavy formulas in a calculated column, especially if there are dependencies.
It depends - see similar comments above
Thanks for the video, always love the performance optimization videos! I was just wondering though; is it always better to replace calculated columns with measures? Isn't a calculated column calculated once (on load of the initial report), whereas measures are recalculated with constantly when you navigate the visuals pane? I'd imagine there would be some cases where I'd rather want to wait a bit longer initially to have snappier performance navigating the dashboard once loaded.
Calculated columns are not compressed with as much efficiency as the regular data due to when they are created in the build process. Therefore from a compression standpoint, they are not preferred.
@@dbszepesi This is very specific to calculated columns, if you create it in PQ, you are "fine". Your model will be still bigger, obviously, but the engine can compress your model therefore making it smaller compared to a DAX calculated column.
@@arklur3193 Not really, the biggest difference in compression is the number of unique values of the column, which is bigger in Line Amount rather than Quantity / Net Price if you consider this example. The difference between computed columns and calculated columns is minimal and relevant only for columns that have a small number of unique values (up to 10/100, certainly not 1000 or more).
If you're having to use filter within your measures over a large dataset then I would make a calculated column that returns 1 if the logic is met, then have a measure that just sums the calc column.
If you are only doing arithmetic then the calc column doesn't make much sense, unless you are going to have the value presented in a raw data table in your report. With a raw data table, the measure will execute over each row, and you will possibly exceed resources or have performance issues.
But the Order Number if it is the foriegn key how you deleted it?
Genius!!
Do the same rules apply with direct query models?
No, DirectQuery doesn't import data and you cannot use aggregations for measures that have row-level calculations like in this demo.
Hey, How to enable Vertipaq Analyzer Preview ?
Quoting a comment from below:
"I had the same problem. But then downloaded and installed 2.9.5 from ci.appveyor.com/project/DarrenGosbell/daxstudio/builds/28512060/artifacts and it worked as expected."
@@DreeKun Great, worked for me as well. thanks.
Following Marco's instructions but cannot view result, i.e the VertiPaq Analyzer Preview is not showing....
Please read this thread and check possible solutions or post more details: github.com/DaxStudio/DaxStudio/issues/235
I had the same problem. But then downloaded and installed 2.9.5 from ci.appveyor.com/project/DarrenGosbell/daxstudio/builds/28512060/artifacts and it worked as expected.
So basically Marco recommends not using calculated cols unless completely necessary because of their memory consumption. What's to stop you from computing this in the source and importing it in ready to use. I would assume you can use this against tabular models aswell ? 12 millions rows is not big in pbi.
If you create your calculated column in PQ rather than DAX, your modell will be still bigger but not just that big as if using DAX. And yes, 12M rows is not that much, but this is just a video explaining this stuff, you only need to worry about these if your model is much bigger.
@@arklur3193 See comment above - the difference would be minimal in this case, computed columns in PQ are equally expensive to calculated columns if there is a medium-high cardinality.
Is it just me or are they blurry?