Part 5 - Data Transformation (1) | End to End Azure Data Engineering Project |Mounting the Datalake
ฝัง
- เผยแพร่เมื่อ 29 ก.ย. 2024
- #azuredataengineer #endtoendproject #azuredataengineeringproject #azureintamil #azuredatafactory #azuredatabricks #azuresynapseanalytics #azuredatalake #datalake #powerbi #keyvault
This is a long awaited video of mine- Lets build a complete End to End Azure Data Engineering Project. In this project we are going to create an end to end data platform right from Data Ingestion, Data Transformation, Data Loading and Reporting.
The tools that are covered in this project are,
1. Azure Data Factory
2. Azure Data Lake Storage Gen2
3. Azure Databricks
4. Azure Synapse Analytics
5. Azure Key vault
6. Azure Active Directory (AAD) and
7. Microsoft Power BI
The use case for this project is building an end to end solution by ingesting the tables from on-premise SQL Server database using Azure Data Factory and then store the data in Azure Data Lake. Then Azure databricks is used to transform the RAW data to the most cleanest form of data and then we are using Azure Synapse Analytics to load the clean data and finally using Microsoft Power BI to integrate with Azure synapse analytics to build an interactive dashboard. Also, we are using Azure Active Directory (AAD) and Azure Key Vault for the monitoring and governance purpose.
Part 6 will be uploaded soon. Stay tuned.
- - - Book a Private One on One Meeting with me (1 Hour) - - -
www.buymeacoff...
- - - Express your encouragement by brewing up a cup of support for me - - -
www.buymeacoff...
- - - Other useful playlist: - - -
Azure Data Factory Playlist: • Azure Data Factory Tut...
Azure General Topics Playlist: • Azure Beginner Tutorials
Microsoft Fabric Playlist: • Microsoft Fabric Tutor...
Azure Databricks Playlist: • Azure Databricks Tutor...
Azure End to End Project Playlist: • End to End Azure Data ...
Databricks CICD Playlist: • CI/CD (Continuous Inte...
End to End Azure Data Engineering Project: • An End to End Azure Da...
- - - Let’s Connect: - - -
Email: mrktalkstech@gmail.com
Instagram: mrk_talkstech
- - - Tools & Equipment (Gears I use): - - -
Disclaimer: Links included in this description might be affiliate links. If you purchase a product or service with the links that I provide, I may receive a small commission. There is no additional charge to you! Thank you for supporting me so I can continue to provide you with free content each week!
DJI Mic: amzn.to/3sNpDv8
Dell XPS 13 Plus 13.4" 3.5K : amzn.to/45KqH1c
Rode VideoMicro Vlogger Kit: amzn.to/3sVFW8Y
DJI Osmos Action 3: amzn.to/44KYV3x
DJI Mini 3 PRO: amzn.to/3PwRwAr
- - - About me: - - -
Mr. K is a passionate teacher created this channel for only one goal "TO HELP PEOPLE LEARN ABOUT THE MODERN DATA PLATFORM SOLUTIONS USING CLOUD TECHNOLOGIES"
I will be creating playlist which covers the below topics (with DEMO)
1. Azure Beginner Tutorials
2. Azure Data Factory
3. Azure Synapse Analytics
4. Azure Databricks
5. Microsoft Power BI
6. Azure Data Lake Gen2
7. Azure DevOps
8. GitHub (and several other topics)
After creating some basic foundational videos, I will be creating some of the videos with the real time scenarios / use case specific to the three common Data Fields,
1. Data Engineer
2. Data Analyst
3. Data Scientist
Can't wait to help people with my videos.
- - - Support me: - - -
Please Subscribe: / @mr.ktalkstech
Without any doubt, Mr K is the best teacher for technical topics, especially Azure. I find them to be focused and informative. Please keep up the good work
I have gone through many but did not find which really makes me to understand concepts easily. Finally my wait is over. Tq for such valuable videos. I am sure many can not find this much informative videos in paid courses from other platforms. Keep going.
Thank you so much :)
In order to perform this exercise, we need premium subscription.
in case I wanted to have pay as go subscription, how much will be the total cost?
Your videos are invaluable. Extremely helpful. Thank you.
Thank you soo much :)
Your videos are so good. I paid huge amount to get adf training but not so happy with trainer’s way of conducting classes. I skipped all his videos and was regretting joining his classes. Your videos are so good that i watched all 6 parts back to back without loosing focus. Keep it up. I really enjoyed your videos.
Thank you so much :) Glad that you liked the videos :)
Bro, pls mention that training name , so that others wont join
Clean N precise... explanation 👏
Access Azure Data Lake Storage using Microsoft Entra ID (formerly Azure Active Directory) credential passthrough (legacy)
Is it good to use such a way(Microsoft Enterprise ID credential passthrough (legacy)) to connect databricks to adls, no security/privacy issues ? Does this method get used in real time ? Please do reply
Your explaination awesome. But why didnt you explaining how to create resources like ADLS Gen2, Databricks? I am very new to Azure but facing issues while creating them and assigning roles to them.
Hi what is the access level of bronze/silver/gold you created in my case they are private(default option) and i cannot get data from it how can I fetch data from private container Thanks in advance
I am working on azure free account and trying to create a cluster in the databricks service. I have tried to create single node cluster. It continues to show that its creating and after 1 hour it fails. I have tried Canada central and Australia east as region. Please help.
Great presentation and content. Keep up the good work!!
in Data Bricks we no need to create Spark-Session if yes why ?
Hello! Can you please share the SQL files please so we can practice in parallel.
Thanks for the video. waiting for next video. Pls use Autoloader incremental process if possible
Thank you :) Sorry I am not doing incremental processing in this project, will cover that in the future videos :)
I dont have a premium data bricks, what I do next? Please reply
Try to mount the data lake using the service principal way by following the below article, but I recommend you to use the premium workspace as they have taken lot of compatibility out from the standard workspace.
www.linkedin.com/pulse/how-mount-adls-gen-2-storage-account-databricks-ananya-nayak
very informative...kindly share the upcoming videos in daily basis, as it would be helpful for us to convey the same in hte interviews
thank you :) I am still working on it- will upload the rest of the videos asap- thanks for understanding :)
bring more projects like this..
and bring real time interview questions as well
Sure, will do- thanks :)
Hey, can we mount ADLS to Databricks community edition. Thank you for help.
Haven't used that before, but based on the documentation it seems like its not supported :)
Yes we can and i mounted adls gen 2 to community edition and worked around
Really loved the way you are explaining. Waiting for next part.
Thank you :)
amazing content
Hi , I followed your steps but Databricks is not able to write the data back to storage account , it can only read .
Could you please check if the databricks service principal have the blob storage contributor access or not? Might be you given the blob storage reader access?
@@mr.ktalkstech I have given the blob storage contributor access , even i followed the steps in microsoft site. still no luck
This section requires premium subscription to follow along.
premium subscription on what resources?
@@KapperPedersen Databricks
Hey.. Thank u for the video. For Trail version ,"Azure Data Lake Storage credential passthrough" isn't enabled. Because of that I got an error,"com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token".
Can u please help me on this. Thanks in advance.
Hi, you need to enable the credential passthrough option to access the datalake, re-create a new cluster with the credential passthrough enabled and after that check if you have the blob storage contributor access to the data lake and then give it a try, it would work.
@@mr.ktalkstech It is only available for Premium members.
Yes, credential passthrough is available only for the premium workspace.
@@mr.ktalkstech Shall we connect on linkedin !!??
please connect with me on my Instagram channel - mrk_talkstech
Hi! Are you going to use autoloader for incremental processing?
Hi, nope- I haven't used the incremental processing in this project- will cover this topic later in a separate video- thanks :)
@@mr.ktalkstech Then, how will you ensure that once the pipeline is executed, the already processed data will not be processed?
Its not handled in this project (just overwriting the data)- since the dataset that we deal with is a small one - I will make a complete data warehousing (Type-2) project in the future :)
Excellent tutorial
Thank you so much :)
Great one.... also waiting for next section :)
Thank you :)
@@mr.ktalkstech when is the next one?
Will upload it asap- I don't have everything ready yet- I am working on it, will edit and upload the next part before end of this week. Thanks :)
@@mr.ktalkstech Great and thanks.... lovely content
Extremely good
Thank you so much :)
Please share next part today or till tomorrow morning please
Sorry it took me some time to make the next part ready- I have just uploaded it- Thanks :)