i wish there are 100000 LIKE buttons. THE BEST VIDEO on the azure synapse distribution. Understood clearly about the distributions with the demo.Thank you so much 🙏
Really happy to find this video. Loved the practical demo on how the distributions happened. Subscribed(500th subscriber😁). Waiting for more such awesome content🤩
Very nicely explained the Azure Synapse specially SQL pool. I have question here. Both Synapse and Azure Data bricks have spark engine. How would I choose one between them for my my project work?
Thanks for this video! Question you touched quickly on creating statistics in Synapse prior to running queries based on the query patterns.. For my case I have a large group of users from admins to analysts to developers and I can not predict the types of queries that they will run. Is there a best practices that I can pass on to the users when planning to create the stats before running their queries? Do you plan on future tutorials on this topic? thanks!
Thanks Tiffany! While creating stats in advance is a proactive way to optimize the performance, engine also learns from first time submitted queries to optimize the performance for future submissions when AUTO_CREATE_STATISTICS setting is ON (which is ON by default). You can find more details about it here: docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-statistics To shorten statistics maintenance time, be selective about which columns have statistics, or need the most frequent updating. For example, you might want to update date columns where new values may be added daily. Focus on having statistics for columns involved in joins, columns used in the WHERE clause, and columns found in GROUP BY. docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool#maintain-statistics
Thank you for the video, one of the bests that I ever watched in terms of learning data. Just a quick question, in round-robin table, you said the data will be shuffled when you query the group by ProductKey, and the distribution will be organized by that field, so, what if after that, I decide to execute the same query, but grouping by a different field? The shuffle will happen again? and the distribution will be by this other field that I'm considering to group?
Thank you for the clear explaination . however i am not clear about where does 60 buckets or 60 distribution gets stored , Is it in azure storage ? In short not getting the purpose/difference of azure storage and SQL Database instance attached with compute node , Could you please explain more about it ?
For developers, I think the important thing to consider is how it scales out, for example, if you have 2 nodes, each of these nodes will have 30 distributions attached to it, likewise if you 4 nodes, each of these nodes will have 15 distributions. By this scaling out from 2 to 4 nodes, each of these nodes now will have roughly half of the data (assuming there is no data skewness), and will take roughly half the time to complete processing. docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/memory-concurrency-limits#service-levels
The 60 distributions are stored in the sql database instance in the sql pool. data from azure store are distributed to the distributions in different patterns, depending on the distribution type defined on the sql pool table during table creation. sql engine then gets these data from the distributions as instructed in your query, which may require it to move data around or not before executing the aggregate function on the data and sending the output to the control node, which in turn sends the same to the user for viewing.
This is the first time I ever subscribed a channel.
One of the best Synapse videos out there; highly recommend!!!
This is the first time I ever subscribed a channel as well. Huge thanks !!!!
What hard work in creating this video. Very good content
Have watched many videos related to this but yours is awesome.
i wish there are 100000 LIKE buttons. THE BEST VIDEO on the azure synapse distribution. Understood clearly about the distributions with the demo.Thank you so much 🙏
wow so detailed explaination with all the visuals and query example is making so easy to understand...
Very well explained how data is distributed in Synapse SQL DW
Thanks Anuradha, I am happy it was helpful for you!
Thank you very much for this video.
It was a very helpful and learnt alot about synapse.
Thanks for sharing this knowledge. Really helpfully!!
JUST GOLD
Simply Amazing Explanation !
👍🏻👍🏻👍🏻
Amazing explanation and nice representation of all the aspects. Thank you so much Arshad
Really happy to find this video. Loved the practical demo on how the distributions happened. Subscribed(500th subscriber😁). Waiting for more such awesome content🤩
Great Sir 👌
Never saw explanation like this on azure synapse, Amazing :)
Thanks Vaibhav for your kind words, glad it was helpful!
Thank you a lot ALI, very useful in my case
Amazing explanation, thanks for concepts are very clear and practical to understand. I hope find more contents from you. 🤗
This is a very well done and helpful video. Thank you for making it.
Looking forward for the next session
Thanks Mohammed, I just posted a video on CI/CD and planning to post few more in next couple of weeks.
👍👍👍👍
Extraordinary 👌
Thank you to explain the concepts in detail.
You are welcome!
Thank you so much. You are amazing.
Very Good session to understand the concepts in Synapse Analytics
Thanks Mohammed for your kind words, glad it was helpful!
Excellent explanation. Thank you.
You are welcome!
Excellent explanation.. Thanks..
You are welcome
Very nicely explained the Azure Synapse specially SQL pool. I have question here. Both Synapse and Azure Data bricks have spark engine. How would I choose one between them for my my project work?
Thanks for this video! Question you touched quickly on creating statistics in Synapse prior to running queries based on the query patterns.. For my case I have a large group of users from admins to analysts to developers and I can not predict the types of queries that they will run. Is there a best practices that I can pass on to the users when planning to create the stats before running their queries? Do you plan on future tutorials on this topic? thanks!
Thanks Tiffany! While creating stats in advance is a proactive way to optimize the performance, engine also learns from first time submitted queries to optimize the performance for future submissions when AUTO_CREATE_STATISTICS setting is ON (which is ON by default). You can find more details about it here: docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-statistics
To shorten statistics maintenance time, be selective about which columns have statistics, or need the most frequent updating. For example, you might want to update date columns where new values may be added daily. Focus on having statistics for columns involved in joins, columns used in the WHERE clause, and columns found in GROUP BY. docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool#maintain-statistics
How do I make external table
Thank you for the video, one of the bests that I ever watched in terms of learning data.
Just a quick question, in round-robin table, you said the data will be shuffled when you query the group by ProductKey, and the distribution will be organized by that field, so, what if after that, I decide to execute the same query, but grouping by a different field? The shuffle will happen again? and the distribution will be by this other field that I'm considering to group?
yes.
how does replicate distribution work when we have 1 compute node?
From where I need to store files in blob storage
Thank you for the clear explaination . however i am not clear about where does 60 buckets or 60 distribution gets stored , Is it in azure storage ? In short not getting the purpose/difference of azure storage and SQL Database instance attached with compute node , Could you please explain more about it ?
For developers, I think the important thing to consider is how it scales out, for example, if you have 2 nodes, each of these nodes will have 30 distributions attached to it, likewise if you 4 nodes, each of these nodes will have 15 distributions. By this scaling out from 2 to 4 nodes, each of these nodes now will have roughly half of the data (assuming there is no data skewness), and will take roughly half the time to complete processing. docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/memory-concurrency-limits#service-levels
The 60 distributions are stored in the sql database instance in the sql pool. data from azure store are distributed to the distributions in different patterns, depending on the distribution type defined on the sql pool table during table creation. sql engine then gets these data from the distributions as instructed in your query, which may require it to move data around or not before executing the aggregate function on the data and sending the output to the control node, which in turn sends the same to the user for viewing.