Arshad Ali - Aas Trailblazers
Arshad Ali - Aas Trailblazers
  • 11
  • 97 465
Azure Synapse Analytics | Spark pool | Catalyst Optimizer and Adaptive Query Execution
Today, as businesses demand faster time to insights, data engineers, data scientists and developers are expecting platform to provide not only quicker way to write their code so that learning curve is minimized but also to optimize it in the best possible way during execution so that it can run faster. This will let them to focus on solving business problems while platform will do all the heavy-lifting behind the scene.
This video talks about Spark Catalyst Optimizer and Adaptive Query Execution, an enhancement made in Spark 3.0. The Catalyst Optimizer is a query optimization engine which optimizes your code on submission for execution as well as during execution of your code whereas Adaptive Query Execution (AQE) reoptimizes and adjusts query plans based on runtime metrics collected during the execution of the query.
00:00 Introduction
00:43 Catalyst Optimizer
06:03 Physical Execution of Spark Jobs
07:57 RDD vs Dataframe (Spark SQL)
09:50 Catalyst Optimizer - Its stages and optimization
15:10 RDD vs Dataframe vs Dataset APIs
20:56 Adaptive Query Execution (AQE)
28:13 Demo - Understand and Analyze Execution Plan
Thank you once again for watching, please do like, subscribe and let me know your feedback or any specific topic you would like me to cover next.
GitHub Repo to download deck and script used in the video:
github.com/AasTrailblazers/AzureSynapse/tree/main/Spark%20pool
Apache Spark
spark.apache.org/
RDD vs Dataframe vs Dataset APIs
docs.microsoft.com/en-us/azure/hdinsight/spark/optimize-data-storage#choose-data-abstraction
Dataframe - performance comparison with RDD
databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html
Adaptive Query Execution
databricks.com/blog/2020/05/29/adaptive-query-execution-speeding-up-spark-sql-at-runtime.html
Adaptive Query Execution Configuration
spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution
มุมมอง: 1 987

วีดีโอ

Azure Synapse Analytics | Spark pool | Delta Lake - Part 1
มุมมอง 7K3 ปีที่แล้ว
Delta lake is an open-source storage layer (a sub project of The Linux foundation) that sits in Azure Data lake store, when you are using it within Spark pool of Azure Synapse Analytics. In other way, you can think of Delta Lake as an optimized Spark table that brings data reliability and performance optimization to the scale. It makes your data lake faster, more reliable and accelerates the pa...
Azure Synapse Analytics | Spark pool | Introduction and Getting Started
มุมมอง 3.4K3 ปีที่แล้ว
Azure Synapse Analytics is a comprehensive and unified platform for all your analytical needs. Whether you are building a modern data warehouse, a big data solution or aiming to combine these two capabilities to build a lake house architecture, Azure Synapse has all the capabilities in a single comprehensive service for your end-to-end needs. This video is focused on introducing and getting you...
Azure Synapse Analytics | Continuous Integration and Continuous Delivery (CI/CD)
มุมมอง 28K3 ปีที่แล้ว
With ever changing business dynamics and ever evolving technical landscape, for IT teams being agile and nimble is no longer a choice but a necessity. This video talks in detail about Azure Synapse’s native integration with Azure DevOps and GitHub and demonstrate how you can setup git repository and build pipelines for continuous integration and delivery of synapse workspace’s artifacts (or app...
Azure Synapse Analytics | Data Ingestion Patterns, Polybase, and Copy Command
มุมมอง 8K3 ปีที่แล้ว
Having right data ingestion framework and patterns are important considerations to have consistent data ready for analytics in timely manner, especially when you are dealing with large data volume and expect your data to grow every day. In this video, you will learn about different data ingestion patterns and options available with Azure Synapse Analytics - dedicated SQL pool. 00:00:00 Introduc...
Azure Synapse Analytics | Workload Management and Concurrency
มุมมอง 6K3 ปีที่แล้ว
In any data warehouse system, there are different types of workloads - in broad categories, we can say there is one to load and process data, and there is another one for data querying. Even within these broad categories, there might be few requests which require high importance and more resources than others. For example, there might be critical data load which needs to be completed within tig...
Azure Synapse Analytics | Table Partition | Best Practices
มุมมอง 12K3 ปีที่แล้ว
Table partitioning is a powerful feature to optimize the performance of data load and data querying. This feature is available with almost all the data platform technologies we have available today and Azure Synapse Analytics - SQL pool is no exception here. In this video, you will learn about Table partition, how it works and how it helps in both data load (or data lifecycle management) and da...
Azure Synapse Analytics | Index Options | Columnstore Index | Best Practices
มุมมอง 9K3 ปีที่แล้ว
Being able to create database indexes are powerful feature to improve the speed of data retrieval and to speed querying. However, same type of indexes might not be suitable for all types of workload, like transaction workload vs analytical workload. In this video, you will learn columnstore index, how it accelerates performance of analytical queries and what are best practices and consideration...
Azure Synapse Analytics | Data Distribution Strategy and Best Practices
มุมมอง 13K3 ปีที่แล้ว
In any distributed system, for efficient parallel processing and for better performance, the data distribution strategy to store data evenly and colocation of data across nodes play important roles. In this video, in the context of Azure Synapse Analytics - dedicated SQL pool, I am going to walk you through data distribution strategy - by way of distributions, different data strategies like rou...
Azure Synapse Analytics | Introduction and Getting Started
มุมมอง 7K3 ปีที่แล้ว
This series of videos is going to cover Azure Synapse Analytics from end to end perspective - whether you want to use it for interactive querying, batch processing, stream processing, machine learning etc. types of workloads. In this first video of this series I am going to explain what Azure Synapse Analytics is, its architecture with different components, different types of compute engines an...
Aas Trailblazers | Introduction
มุมมอง 4833 ปีที่แล้ว
Welcome to this channel. This channel was created for anyone working on data platform and data analytics or wants to build career in this domain. This introductory video walks you through what you can expect to learn from this channel and who all are potential intended audiences. References used in this introductory video: www.gartner.com/en/information-technology/trends/top-priorities-data-ana...

ความคิดเห็น

  • @abc_987
    @abc_987 5 หลายเดือนก่อน

    JUST GOLD

  • @contactbhasker7483
    @contactbhasker7483 5 หลายเดือนก่อน

    background music is very disturbing

  • @auhumanmedium
    @auhumanmedium 6 หลายเดือนก่อน

    Thank You, Sir.... This is ULTIMATE

  • @abhishektalghatkar3416
    @abhishektalghatkar3416 8 หลายเดือนก่อน

    Will it create partition for future dates/years automatically? Please do reply.

  • @Mohammad.aarif_222
    @Mohammad.aarif_222 9 หลายเดือนก่อน

    From where I need to store files in blob storage

  • @Mohammad.aarif_222
    @Mohammad.aarif_222 9 หลายเดือนก่อน

    How do I make external table

  • @plearns4551
    @plearns4551 9 หลายเดือนก่อน

    Too good sir, pls keep going

  • @datoalavista581
    @datoalavista581 9 หลายเดือนก่อน

    Thank you for sharing !!

  • @amiyakumarnayak8286
    @amiyakumarnayak8286 9 หลายเดือนก่อน

    It's the most detailed and thorough explanation. Thank you

  • @JustBigdata
    @JustBigdata 10 หลายเดือนก่อน

    I have added x largerc in database_principals to target user. But still if that user runs queries, it always used smallrc. I want that user queries to use xlargec . What am I missing? Thanks

    • @camvinh3522
      @camvinh3522 9 หลายเดือนก่อน

      go to this user and run "select user_name() to get exactly the user info that should be grant

  • @GirishTharwani1992
    @GirishTharwani1992 11 หลายเดือนก่อน

    Loved the video, its amazing information.

  • @orxanbabashov
    @orxanbabashov 11 หลายเดือนก่อน

    This is the first time I ever subscribed a channel as well. Huge thanks !!!!

  • @mohsinalam8085
    @mohsinalam8085 11 หลายเดือนก่อน

    wow wow Just amazing. You have done a great job explaining the concepts .

  • @jamieclarke2694
    @jamieclarke2694 11 หลายเดือนก่อน

    Im halfway through this video and it is amazing amd well detailed, thank you! I am well versed in old-school on-prem SQL Server (2005-2012 and some 2016-2019). I now work for a very large company and am part of a project building Azure DW on top of our new Data Lake to replace an old BYOD system. We started with serverless SQL Pool and are now building pipelines to pump data from the Lake into a Dedicated SQL Pool and testing performance. I have so far learned about distribution groups and am using a stored procedure with a CTAS statement to create a new table with only the transaction data from one company in, to speed up future querying. I am using Hash method with the field we use to join in most our queries, and am applying relevant indexes to this table. The next step was learning about partitioning to see if i can speed up the querying of this new table even more. So far you have given me a lot of information on this and i think it will be useful. Its the most interesting project i have been a part of in a long time!

  • @ParveenKumar-oc3np
    @ParveenKumar-oc3np ปีที่แล้ว

    Thanks for this video. This is one of the best video to understand synapse ci cd deployment.

  • @ContainsAll
    @ContainsAll ปีที่แล้ว

    Excellent and very well structured video. Very helpful. Thanks for Uploading and sharing the knowledge !

  • @madhushr6058
    @madhushr6058 ปีที่แล้ว

    Whats the difference between delta lake and lakehouse in synapse?

  • @Farisito
    @Farisito ปีที่แล้ว

    Thank you a lot ALI, very useful in my case

  • @andrejbelak9936
    @andrejbelak9936 ปีที่แล้ว

    Sir, dont use backround music please. Thanks

  • @CliffordRayGentiles-n1b
    @CliffordRayGentiles-n1b ปีที่แล้ว

    Hello good day. We have 2 existing synapse workspaces, dev and prod. After setting up git configuration on dev workspace, I noticed that all artifacts (pipelines, storage, datasets, scripts, notebooks) from prod are now existing on the master branch. How is this possible to see prod artifacts when I haven't explicitly connected prod into our dev workspace? Thanks

  • @azurelearner4055
    @azurelearner4055 ปีที่แล้ว

    Clear and simple explantion !! kudos

  • @Ali-q4d4c
    @Ali-q4d4c ปีที่แล้ว

    👍👍👍👍

  • @KathyLoisAmores
    @KathyLoisAmores ปีที่แล้ว

    Thanks so much

  • @KathyLoisAmores
    @KathyLoisAmores ปีที่แล้ว

    Easy to follow. Thanks so much.

  • @gudukumar748
    @gudukumar748 ปีที่แล้ว

    Content and explanation is really simple to understand...thank you sir jee. awesome to see you after long time.

  • @xuananh431
    @xuananh431 ปีที่แล้ว

    Thank you !

  • @Ali-q4d4c
    @Ali-q4d4c ปีที่แล้ว

    👍🏻👍🏻👍🏻

  • @ashutoshmishra287
    @ashutoshmishra287 ปีที่แล้ว

    Thank you

  • @oluwadamilolaadeniji9737
    @oluwadamilolaadeniji9737 ปีที่แล้ว

    Thank you so much for this!

  • @getsid25
    @getsid25 ปีที่แล้ว

    Arshad sir.... amazing series...sincere thanks for making the effort to educate all

  • @husnabanu4370
    @husnabanu4370 ปีที่แล้ว

    wow so detailed explaination with all the visuals and query example is making so easy to understand...

  • @cebabu
    @cebabu ปีที่แล้ว

    Nice tutorial. My query about DWC. Can you please suggest what volume of data should i go for 1000DW and above.

  • @naveengaddameedi7193
    @naveengaddameedi7193 ปีที่แล้ว

    Thank you for the cool demo, it is very much useful.. one quick question on the storage account when you have created 3 synapse workspaces. in my case I want to use a different storage account names for each workspace individually, does it still work without parameterising the storage account name ?

  • @elinuxbr
    @elinuxbr ปีที่แล้ว

    Your introduction about Azure Synapse is incredible. Thanks for sharing your knowledge.

  • @NikhilShetye-tn1cs
    @NikhilShetye-tn1cs ปีที่แล้ว

    Can we remove partition from table ?

  • @VK-ln9vk
    @VK-ln9vk ปีที่แล้ว

    wondeful video as usual.Thank you for sharing your knowledge with us.

  • @sergeiillarionov4057
    @sergeiillarionov4057 ปีที่แล้ว

    It was wonderful. Pretty clear. Thank you.

  • @user-eg1ss7im6q
    @user-eg1ss7im6q ปีที่แล้ว

    thank you, i have learned something.

  • @VK-ln9vk
    @VK-ln9vk ปีที่แล้ว

    wonderful explanation as always

  • @VK-ln9vk
    @VK-ln9vk ปีที่แล้ว

    wonderful video on indexes.Thank you so much sir

  • @VK-ln9vk
    @VK-ln9vk ปีที่แล้ว

    i wish there are 100000 LIKE buttons. THE BEST VIDEO on the azure synapse distribution. Understood clearly about the distributions with the demo.Thank you so much 🙏

  • @teamlorio
    @teamlorio ปีที่แล้ว

    Great tutorial!

  • @julianromero3359
    @julianromero3359 ปีที่แล้ว

    Amazing explanation, thanks for concepts are very clear and practical to understand. I hope find more contents from you. 🤗

  • @hoanglieuit
    @hoanglieuit ปีที่แล้ว

    This is the first time I ever subscribed a channel.

  • @pradeepyadagani7856
    @pradeepyadagani7856 ปีที่แล้ว

    Good video, but it would be better if the background music is turned off. It is very disturbing

  • @jiashengfan2269
    @jiashengfan2269 ปีที่แล้ว

    Awesome explanation.

  • @raghuram424
    @raghuram424 2 ปีที่แล้ว

    It was a great session. Can you cover a video on Serverless Performance

  • @karthickj8045
    @karthickj8045 2 ปีที่แล้ว

    This was really helpful in implementing sliding window of partition. Thank you

  • @LifeOfDreams0
    @LifeOfDreams0 2 ปีที่แล้ว

    Hello sir, I have a question , after successful deploy in UAT/PROD , is there anyway like date/time stamp to verify changes in upper environment ??

  • @f4bglv
    @f4bglv 2 ปีที่แล้ว

    Excellent and useful video. Thank you for sharing your experience!