Part 8 - Data Loading (Azure Synapse Analytics) | End to End Azure Data Engineering Project

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 มิ.ย. 2023
  • #azuredataengineer #endtoendproject #azuredataengineeringproject #azureintamil #azuredatafactory #azuredatabricks #azuresynapseanalytics #azuredatalake #datalake #powerbi #keyvault
    This is a long awaited video of mine- Lets build a complete End to End Azure Data Engineering Project. In this project we are going to create an end to end data platform right from Data Ingestion, Data Transformation, Data Loading and Reporting.
    The tools that are covered in this project are,
    1. Azure Data Factory
    2. Azure Data Lake Storage Gen2
    3. Azure Databricks
    4. Azure Synapse Analytics
    5. Azure Key vault
    6. Azure Active Directory (AAD) and
    7. Microsoft Power BI
    The use case for this project is building an end to end solution by ingesting the tables from on-premise SQL Server database using Azure Data Factory and then store the data in Azure Data Lake. Then Azure databricks is used to transform the RAW data to the most cleanest form of data and then we are using Azure Synapse Analytics to load the clean data and finally using Microsoft Power BI to integrate with Azure synapse analytics to build an interactive dashboard. Also, we are using Azure Active Directory (AAD) and Azure Key Vault for the monitoring and governance purpose.
    Part 9 will be uploaded soon. Stay tuned.
    - - - Book a Private One on One Meeting with me (1 Hour) - - -
    www.buymeacoffee.com/mrktalks...
    - - - Express your encouragement by brewing up a cup of support for me- - -
    www.buymeacoffee.com/mrktalks...
    - - - Other useful playlist: - - -
    Azure Data Factory Playlist: • Azure Data Factory Tut...
    Azure General Topics Playlist: • Azure Beginner Tutorials
    Microsoft Fabric Playlist: • Microsoft Fabric Tutor...
    Azure Databricks Playlist: • Azure Databricks Tutor...
    Azure End to End Project Playlist: • End to End Azure Data ...
    Databricks CICD Playlist: • CI/CD (Continuous Inte...
    End to End Azure Data Engineering Project: • An End to End Azure Da...
    - - - Let’s Connect: - - -
    Email: mrktalkstech@gmail.com
    Instagram: mrk_talkstech
    - - - Tools & Equipment (Gears I use): - - -
    Disclaimer: Links included in this description might be affiliate links. If you purchase a product or service with the links that I provide, I may receive a small commission. There is no additional charge to you! Thank you for supporting me so I can continue to provide you with free content each week!
    DJI Mic: amzn.to/3sNpDv8
    Dell XPS 13 Plus 13.4" 3.5K : amzn.to/45KqH1c
    Rode VideoMicro Vlogger Kit: amzn.to/3sVFW8Y
    DJI Osmos Action 3: amzn.to/44KYV3x
    DJI Mini 3 PRO: amzn.to/3PwRwAr
    - - - About me: - - -
    Mr. K is a passionate teacher created this channel for only one goal "TO HELP PEOPLE LEARN ABOUT THE MODERN DATA PLATFORM SOLUTIONS USING CLOUD TECHNOLOGIES"
    I will be creating playlist which covers the below topics (with DEMO)
    1. Azure Beginner Tutorials
    2. Azure Data Factory
    3. Azure Synapse Analytics
    4. Azure Databricks
    5. Microsoft Power BI
    6. Azure Data Lake Gen2
    7. Azure DevOps
    8. GitHub (and several other topics)
    After creating some basic foundational videos, I will be creating some of the videos with the real time scenarios / use case specific to the three common Data Fields,
    1. Data Engineer
    2. Data Analyst
    3. Data Scientist
    Can't wait to help people with my videos.
    - - - Support me: - - -
    Please Subscribe: / @mr.ktalkstech
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 71

  • @pridename2858
    @pridename2858 9 หลายเดือนก่อน +3

    Not only your videos but also the question and answer are much more beneficial. Thanks you for all this.

  • @parivallalm
    @parivallalm 2 หลายเดือนก่อน +1

    Man, you nailed it..!! Thank you... One of the best.

  • @balamuruganpandiyan5435
    @balamuruganpandiyan5435 หลายเดือนก่อน +1

    Absolutely stunning... please do more videos like this.. fantastic channel....to upskills

  • @mehmetadalar7606
    @mehmetadalar7606 ปีที่แล้ว +1

    Following all the parts so far. Great job. It gives me a lot of ideas for my current work.

  • @jackvieiraoficial
    @jackvieiraoficial 3 หลายเดือนก่อน +1

    Absolutely stunning, precise, and direct content. Definitely one of the best! 👏👏👏

    • @mr.ktalkstech
      @mr.ktalkstech  2 หลายเดือนก่อน

      Thank you so much :)

  • @top_1_percent
    @top_1_percent ปีที่แล้ว

    Thanks for taking this time to make these videos. These were very insightful and clarified many doubts I had. You are a legand.

  • @Amarjeet-fb3lk
    @Amarjeet-fb3lk ปีที่แล้ว

    I was looking for this type of project only, and this series is worth watching. thanks for making this video series. Waiting for more videos and Incremental load

    • @mr.ktalkstech
      @mr.ktalkstech  ปีที่แล้ว

      Thank you so much :) sure, will do :)

  • @Majdi_Akrmi
    @Majdi_Akrmi ปีที่แล้ว +1

    You are the best thanks for this End-to-End project ❤

  • @helovesdata8483
    @helovesdata8483 10 หลายเดือนก่อน

    This project is top tier and should be in every ones portfolio. I'm going to create my own project using different data

    • @mr.ktalkstech
      @mr.ktalkstech  10 หลายเดือนก่อน

      Thank you so much :)

  • @kanikevenki9988
    @kanikevenki9988 11 หลายเดือนก่อน

    Thanks sir.
    Learnt a new things. Today
    Waiting for interview perspective questions

    • @mr.ktalkstech
      @mr.ktalkstech  11 หลายเดือนก่อน

      Thank you so much :)

  • @hannahabigail909
    @hannahabigail909 6 หลายเดือนก่อน

    Thank you very much for your effort on this project. May I ask why we have to create a link service connection to access the stored procedure?

  • @emily-bose
    @emily-bose หลายเดือนก่อน

    This is a very useful video. Thanks for this resource. Just a question as I am just learning this path , why the Azure Synapse is being used here. Could we not do with the Azure synapse part with ADF and Databricks and connect to Power BI from here.

  • @mevijay1610
    @mevijay1610 26 วันที่ผ่านมา

    Hello, thanks for the detailed explanation. How can I copy from gold storage to dedicated database instead of serverless database? Please help...

  • @tejanaidu926
    @tejanaidu926 7 หลายเดือนก่อน

    Nyc explanation

    • @mr.ktalkstech
      @mr.ktalkstech  7 หลายเดือนก่อน

      Thank you soo much :)

  • @mizan903092
    @mizan903092 9 หลายเดือนก่อน

    Hi I really loved your Videos. That boost my confidence in Azure Data Engineering. However, I have two questions 1) How do I make a showcase of Azure data Engineering project to potential employer? I mean we can easily show a website to potential employer but how can we show our work as Azure data Engineer to the potential employer? 2) You used multiple files (txt/sql etc) in Azure databricks and Synapse, could you share them with use please. Thanks

  • @lucaslira5
    @lucaslira5 6 หลายเดือนก่อน

    My pipeline creation is in the data factory, how can I make this store procedure run in the data factory whenever my copy data is finalized in gold? Do I need to call this synapse pipeline there?

  • @techieKundan
    @techieKundan 11 หลายเดือนก่อน

    Thank you for the insightful video. It was really helpful. As Synapse has all the features of ADF and Databricks, why does the organisation not use just Synapse for the entire project?

    • @mr.ktalkstech
      @mr.ktalkstech  11 หลายเดือนก่อน

      Thank You- some organisation will just use Synapse for the complete ETL process, it depends upon each companies preference in terms of performance and budget.

  • @aymanarmiti3
    @aymanarmiti3 ปีที่แล้ว

    Thanks and great job, how to move the new data after you run the pipeline, how to sync the CDC, and where to save it in datalake?

    • @mr.ktalkstech
      @mr.ktalkstech  ปีที่แล้ว

      Please do watch the earlier videos, I have covered those points already (as mentioned in the earlier video, CDC is not covered in this project- will cover it in the future project videos). Thanks :)

  • @destroyer3706
    @destroyer3706 9 หลายเดือนก่อน

    can we use synaps to ingest the data from very beginning, instead of using ADF. In that case we will have only data lake gen2 as a storage and then from there we use Synaps to create pipelines for data cleansing and string in datawarehous. what are your thoughts

    • @mr.ktalkstech
      @mr.ktalkstech  9 หลายเดือนก่อน +2

      Yes, we can use synapse instead of ADF :)

  • @sanjuktabaruah5586
    @sanjuktabaruah5586 2 หลายเดือนก่อน

    I am not able to find the gold container under the linked tab's Azure Data Lake Storage Gen2. Although i can see the managed identity created by the Azure synapse but inside it, there are no containers from ADL

    • @user-nv6ho7uk8b
      @user-nv6ho7uk8b 2 หลายเดือนก่อน

      You will need to make the container on your own, it won't be auto created.

  • @Amarjeet-fb3lk
    @Amarjeet-fb3lk ปีที่แล้ว

    Hi, can you explain why you have N after set @statement = .
    Why u have used N in statement if statemnt is of varchar type and not of nvarchar?

    • @mr.ktalkstech
      @mr.ktalkstech  ปีที่แล้ว +1

      You can use both varchar and nvarchar for this scenario.
      but the key difference between them is,
      If your data consists only of non-Unicode characters, using VARCHAR will save storage space and may provide better performance.
      If your data includes Unicode characters or if you want to ensure compatibility with different languages and character sets, you should use NVARCHAR.

  • @Amarjeet-fb3lk
    @Amarjeet-fb3lk ปีที่แล้ว

    Hi, when we create sql dedicated pool,does it provision a sql server or where the data is stored?

    • @mr.ktalkstech
      @mr.ktalkstech  ปีที่แล้ว +1

      It will be stored in distributed data files within the dedicated SQL pool itself.
      When you create a table in the dedicated SQL pool, the data is automatically partitioned and distributed across multiple compute nodes within the dedicated pool.

    • @Amarjeet-fb3lk
      @Amarjeet-fb3lk ปีที่แล้ว

      @@mr.ktalkstech so what will be it’s structure. Will it be stored as flat files or a sql structure?

    • @mr.ktalkstech
      @mr.ktalkstech  ปีที่แล้ว +2

      @@Amarjeet-fb3lk just the SQL structure (relational tables)

  • @sivaani37
    @sivaani37 5 หลายเดือนก่อน

    How will this pipeline be triggered?

  • @TheMapleSight
    @TheMapleSight ปีที่แล้ว

    Could you please include scripts, that you are loading, in the description? Thanks for great content!

    • @mr.ktalkstech
      @mr.ktalkstech  ปีที่แล้ว +2

      Just pasting the code in here, hope that helps :)
      USE gold_db
      GO
      CREATE OR ALTER PROC CreateSQLServerlessView_gold @ViewName nvarchar(100)
      AS
      BEGIN
      DECLARE @statement VARCHAR(MAX)
      SET @statement = N'CREATE OR ALTER VIEW ' + @ViewName + ' AS
      SELECT *
      FROM
      OPENROWSET(
      BULK ''mrkdatalakegen2.dfs.core.windows.net/gold/SalesLT/' + @ViewName + '/'',
      FORMAT = ''DELTA''
      ) as [result]
      '
      EXEC (@statement)
      END
      GO

  • @muhammedsherinkc3195
    @muhammedsherinkc3195 11 หลายเดือนก่อน +1

    Why we use binary format? Please reply

    • @mr.ktalkstech
      @mr.ktalkstech  11 หลายเดือนก่อน

      Binary format is easy to used for listing the files or directories :)

  • @anjibabumakkena
    @anjibabumakkena 11 หลายเดือนก่อน

    In an organisation, whose responsibility is to create modelling.
    is it possible to create modelling while doing transformation in data bricks note books.

    • @mr.ktalkstech
      @mr.ktalkstech  11 หลายเดือนก่อน

      Mostly a Data Engineer job as well- yup we can create it using Notebooks.

    • @anjibabumakkena
      @anjibabumakkena 11 หลายเดือนก่อน

      ​@@mr.ktalkstechThen can make a video on how to modelling in azure data bricks, it would be great if u do that video

  • @muhammedsherinkc3195
    @muhammedsherinkc3195 10 หลายเดือนก่อน

    I cant get childitems in for each statement I can see only value and count , why its shows like that, please reply

    • @mr.ktalkstech
      @mr.ktalkstech  10 หลายเดือนก่อน

      Please follow the steps correctly, you should be able to do it :) You still facing the same issue?

    • @muhammedsherinkc3195
      @muhammedsherinkc3195 10 หลายเดือนก่อน

      @@mr.ktalkstech I added all thing like your video click new and clicj childitems but after foreach using I cant get output.chilitems I only get output.value and output.count

    • @helovesdata8483
      @helovesdata8483 10 หลายเดือนก่อน

      change value to childitems and see if that works@@muhammedsherinkc3195

  • @anjibabumakkena
    @anjibabumakkena 11 หลายเดือนก่อน

    Again, why we are loading into Azure synopsis analytics.
    my point is ADLS Gen2 has Bronze, Silver, Gold Layers in this Gold Layer has transformed data
    My point is, Instead of loding into Synopse , directly connect to PBI na.
    in Synopse, why we need to create Views and Sps here.

    • @mr.ktalkstech
      @mr.ktalkstech  11 หลายเดือนก่อน +1

      We can do that as well, but it's a best practice to use any DB as a serving layer to feed in Power BI reports for the scalability and easy maintenance.

  • @charles-sambo
    @charles-sambo ปีที่แล้ว

    When is the next one...?

    • @mr.ktalkstech
      @mr.ktalkstech  ปีที่แล้ว

      Just uploaded it :)

    • @charles-sambo
      @charles-sambo ปีที่แล้ว

      @@mr.ktalkstech Thanks .... hope you thinking of adding CICD to the series later

    • @mr.ktalkstech
      @mr.ktalkstech  ปีที่แล้ว

      Yes, I will do that as a new series :)

  • @manjulakumarisammidi1833
    @manjulakumarisammidi1833 ปีที่แล้ว +1

    Why do we need Data loading to synapse while we have the purest form of data in the Gold layer/DL? can't we directly send the data from data lake to Power BI? what is the main use of having synapse here?

    • @mr.ktalkstech
      @mr.ktalkstech  ปีที่แล้ว

      We can do that as well, It's a best practice to use any DB as a serving layer to feed in Power BI reports for the scalability and easy maintenance.

    • @manjulakumarisammidi1833
      @manjulakumarisammidi1833 ปีที่แล้ว

      @@mr.ktalkstech Okay, doesn't it increase the cost?

    • @mr.ktalkstech
      @mr.ktalkstech  ปีที่แล้ว

      Surely there would be some cost associated with it, much minimal for the serverless DB and if we need to scale it to use the dedicated pool, then the cost would be higher. This is the trade-off that we need to consider if we need to increase the scalability over the cost.

    • @harshitagrwal9975
      @harshitagrwal9975 11 หลายเดือนก่อน

      can we replace synapse with snowflake? here if yes then what is the procedure.
      i am not that familiar with syanapse how much snowflake is being used with azure...please do reply

  • @nileshyadav7543
    @nileshyadav7543 ปีที่แล้ว

    New Video Coming ??

    • @mr.ktalkstech
      @mr.ktalkstech  ปีที่แล้ว +1

      Working on it, will be uploaded soon.

  • @RahulKumar-we1zk
    @RahulKumar-we1zk 11 หลายเดือนก่อน +1

    Please help me why i am getting error in last movement please reply as soon as possible sir,
    For each table name
    {
    "errorCode": "ActionFailed",
    "message": "Activity failed because an inner activity failed; Inner activity name: Stored procedure1, Error: Execution fail against sql server. Please contact SQL Server team if you need further support. Sql error number: 13807. Error Message: Content of directory on path cannot be listed.",
    "failureType": "UserError",
    "target": "For each table name",
    "details": ""

    • @mr.ktalkstech
      @mr.ktalkstech  11 หลายเดือนก่อน +1

      Could you please check if the synapse analytics have blob storage contributor access to the data lake?

    • @TrinadhKumarMadupu
      @TrinadhKumarMadupu 11 หลายเดือนก่อน

      @RahulKumar-we1zk i got same error. In stored procedure (In select statement), change file format from DELTA to PARQUET. it worked.

    • @hamzaelbelghiti
      @hamzaelbelghiti 8 หลายเดือนก่อน

      @@mr.ktalkstech Yes! For me this was the fix, you just need to add the Blob storage contributor access to the Synapse Analytics service and you're good to go! Thank you for the fix.

    • @user-er6io1mv6e
      @user-er6io1mv6e 2 หลายเดือนก่อน

      @@hamzaelbelghiti "errorCode": "2402",
      "message": "Execution fail against sql server. Please contact SQL Server team if you need further support. Sql error number: 156. Error Message: Incorrect syntax near the keyword 'SELECT'.",
      "failureType": "UserError",
      "target": "Stored procedure1",
      "details": []
      }
      i got this error can u fix it

    • @user-er6io1mv6e
      @user-er6io1mv6e 2 หลายเดือนก่อน

      @@hamzaelbelghiti can u please tell me how to add blob storage contributer acess to the synapse analytics