Data Transformation (Part-4) - End to End Azure Data Engineering Project using Microsoft Fabric

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ต.ค. 2024
  • Unlock the power of Bing News Data Analytics in our latest tutorial, where we embark on a comprehensive End-to-End Azure Data Engineering Project utilizing Microsoft Fabric! 🌐
    The Topics covered in this Project are,
    1. Data Ingestion from Bing API using Data Factory: Learn how to seamlessly pull in data from external sources, setting the foundation for your analytics project.
    2. Data Transformation using Synapse Data Engineering: Dive into the process of shaping and refining your raw JSON data to a curated Delta Table, including techniques like incremental loading to keep your processes efficient.
    3. Sentiment Analysis using Synapse Data Science: Uncover insights hidden within the news description by predicting the sentiment of the news classified as Positive, Negative or Neutral.
    4. Orchestration using Data Factory via pipelines: Discover the art of orchestrating your data workflows, ensuring smooth and efficient operations.
    5. Data Reporting using Power BI: Visualize your data in a compelling and actionable manner, empowering stakeholders with valuable insights.
    6. Configuring Alerts using the Data Activator: Stay ahead of potential issues by setting up alerts and notifications within your Power BI visuals using a new tool called Data Activator.
    7. End to End Pipeline Testing: The complete flow will be tested right from the data ingestion to the data transformation and until the report gets updated with the incoming new data to Validate the integrity and performance of your pipelines, ensuring reliability and accuracy.
    This project revolves around Bing News Data Analytics, a practical application that involves ingesting news data daily and generating insightful reports. By walking through each step in a simplified manner, I aim to make Azure Data Engineering accessible to all enthusiasts, regardless of their background.
    - - - Book a Private One on One Meeting with me (1 Hour) - - -
    www.buymeacoff...
    - - - Express your encouragement by brewing up a cup of support for me - - -
    www.buymeacoff...
    - - - Other useful playlist: - - -
    1. Microsoft Fabric Playlist: • Microsoft Fabric Tutor...
    2. Azure General Topics Playlist: • Azure Beginner Tutorials
    3. Azure Data Factory Playlist: • Azure Data Factory Tut...
    4. Databricks CICD Playlist: • CI/CD (Continuous Inte...
    5. Azure Databricks Playlist: • Azure Databricks Tutor...
    6. Azure End to End Project Playlist: • End to End Azure Data ...
    7. End to End Azure Data Engineering Project: • An End to End Azure Da...
    - - - Let’s Connect: - - -
    Email: mrktalkstech@gmail.com
    Instagram: mrk_talkstech
    - - - About me: - - -
    Mr. K is a passionate teacher created this channel for only one goal "TO HELP PEOPLE LEARN ABOUT THE MODERN DATA PLATFORM SOLUTIONS USING CLOUD TECHNOLOGIES"
    I will be creating playlist which covers the below topics (with DEMO)
    1. Azure Beginner Tutorials
    2. Azure Data Factory
    3. Azure Synapse Analytics
    4. Azure Databricks
    5. Microsoft Power BI
    6. Azure Data Lake Gen2
    7. Azure DevOps
    8. GitHub (and several other topics)
    After creating some basic foundational videos, I will be creating some of the videos with the real time scenarios / use case specific to the three common Data Fields,
    1. Data Engineer
    2. Data Analyst
    3. Data Scientist
    Can't wait to help people with my videos.
    - - - Support me: - - -
    Please Subscribe: / @mr.ktalkstech

ความคิดเห็น • 6

  • @EzaPierre
    @EzaPierre 3 หลายเดือนก่อน

    Hi, I'm not sure why at 14:21 after using new_json = json.loads (json_list[25]), it didn't have "category". Could it be the bing resource changed ? Thanks for the video, really help my study.

  • @metihosseini
    @metihosseini 7 หลายเดือนก่อน +2

    Great tutorial, can't wait for your next videos.
    I think we also could use getItem method, which would result in a less complex code and we would not collect the dataframe into the driver. like this:
    def colextract(sourcecol, colname):
    colout = col(sourcecol).getItem(colname)
    return colout
    cols_to_extract = ["name", "description", "category", "url", "datePublished"]
    extracted_cols = []
    for colname in cols_to_extract:
    extracted_col = colextract("json_value", colname).alias(colname)
    extracted_cols.append(extracted_col)
    image_col = col("json_value").getItem("image").getItem("thumbnail").getItem("contentUrl").alias("image")
    provider_col = col("json_value").getItem("name").alias("provider")
    df_table = df_exploded.select(*extracted_cols, image_col, provider_col)

  • @joostbazelmans6093
    @joostbazelmans6093 3 หลายเดือนก่อน

    I am getting an error when trying to write the data to the data_lake. 'DataFrameWriter' object has no attribute 'saveAsTAble'. How to solve this?

  • @NaveenKumar-kb2fm
    @NaveenKumar-kb2fm 5 หลายเดือนก่อน

    I have a scenario as I have hundreds of tables in HANA as source with different schemas in each table , while migrating data to azure ADLS using ADF it is converting decimal data type to decimal(38,18) by default . how can I migrate all the tables dynamically with their original datatypes as source using ADF? can you do a demo on this please?

  • @SynonAnon-vi1ql
    @SynonAnon-vi1ql 7 หลายเดือนก่อน

    Thank you for this great video and playlist! I have been watching your videos very closely as we are evaluating fabric for ourselves.
    Currently we are heavily using databricks but I feel like fabric has obviated the need for it. Is my understanding correct?
    Also with the serverless cluster option I am feeling a little nervous about lack of control on pricing eg use it or lose it $9k. Could you please explain the benefit of it and which scenarios in which it would be useful?
    Thank you so much!

  • @paradoxsingh
    @paradoxsingh 7 หลายเดือนก่อน +1

    I am first