CognitiveCoders
CognitiveCoders
  • 150
  • 36 975
How to create incremental key using Data Flow in ADF | Azure Data Factory | Real Time Scenario
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to generate surrogate key using data flow in Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions.
Why Use Azure Data Factory?
Azure Data Factory (ADF) is a powerful cloud-based data integration service that allows you to create data-driven workflows for orchestrating and automating data movement and data transformation. It's a key component for any data engineer working with big data, ETL processes, and data lakes in the Azure environment.
🚀 Key Topics Covered:
Real-Time Data Ingestion: Learn how to ingest data in real-time using Azure Data Factory and integrate it with Azure Stream Analytics and Event Hubs.
Building Complex Pipelines: Understand the step-by-step process of designing and deploying complex data pipelines tailored to real-world business needs.
Data Transformation & Mapping: Explore advanced data transformation techniques using ADF’s Mapping Data Flow for real-time data processing.
Automation & Monitoring: Discover how to automate and monitor your pipelines to ensure seamless data flow in production environments.
🔗 Additional Resources:
PySpark playlist : th-cam.com/play/PL7DrGo85HcssOo4q5ihH3PqRRXwupRe65.html
PySpark RealTime Scenarios playlist : th-cam.com/play/PL7DrGo85HcstBR0D4881RTIzqpwyae1Tl.html
Azure Datafactory playlist : th-cam.com/play/PL7DrGo85HcsueO7qbe3-W9kGa00ifeQMn.html
Azure Data Factory RealTime Scenarios playlist : th-cam.com/play/PL7DrGo85HcsulFTAXy2cgcS6bWRlIsHnn.html&si=N-h5PCLjhHRrrE1s
PySpark Interview Question : th-cam.com/play/PL7DrGo85Hcssd_NsYufl8K7T8VSuZUy8j.html
Scenario Based Interview Question : th-cam.com/play/PL7DrGo85Hcsu_Qbtbp373uhUVODcs-XAP.html
Unit Testing in PySpark : th-cam.com/play/PL7DrGo85HcsuiY6C_08z_9jHoO_j3ZTe0.html
SQL Interview Series : th-cam.com/play/PL7DrGo85HcsvC5gCN_hsjCHOUnbn9ddct.html&si=2z-CtbNUZoAWhJC4
👨‍💻 About the Instructor:
As an experienced data engineer and interview coach, I've helped countless professionals land their dream jobs at top tech companies. My practical approach and industry insights will give you the confidence to excel in your interview.
🔔 Stay Connected:
Don't miss out on future interview tips and data engineering content! Subscribe to this channel and hit the notification bell to stay updated.
📚 Additional Learning Materials:
ADF Documentation : learn.microsoft.com/en-us/azu...
Linkedin : / pritam-saha-060516139
GitHub : github.com/Pritamsaha627/Pyspark
Telegram channel : t.me/CognitiveCoders
WhatsApp channel : whatsapp.com/channel/0029Va4x...
Instagram : / pritamsaha627
📧 Need Personalized Guidance?
For one-on-one coaching or specific interview questions, feel free to reach out:
Email: pritamsaha2708@gmail.com
Topmate : topmate.io/pritamsaha627
Like, Share & Subscribe for More Tutorials!
If this video helped you, please give it a thumbs up and share it with your network. Don’t forget to subscribe and hit the notification bell for more tutorials on Azure, cloud computing, and data engineering.
#AzureDataFactory #RealTimeData #DataEngineering #dataflow #etlprocesses #surrogatekey #incrementalkey
มุมมอง: 41

วีดีโอ

How to remove duplicate rows using dataflow in ADF | Azure Data Factory | Real Time Scenario
มุมมอง 60วันที่ผ่านมา
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to remove duplicate data using dataflow in Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a powerful clou...
Latest Data Engineering Interview Question from PWC | BigData | SQL | Azure Data Engineer
มุมมอง 16214 วันที่ผ่านมา
If you like this video please do like,share and subscribe my channel. PySpark playlist : th-cam.com/play/PL7DrGo85HcssOo4q5ihH3PqRRXwupRe65.html PySpark RealTime Scenarios playlist : th-cam.com/play/PL7DrGo85HcstBR0D4881RTIzqpwyae1Tl.html Azure Datafactory playlist : th-cam.com/play/PL7DrGo85HcsueO7qbe3-W9kGa00ifeQMn.html Azure Data Factory RealTime Scenarios playlist : th-cam.com/play/PL7DrGo8...
How to process fixed length text file using ADF DataFlow| Azure Data Factory | Real Time Scenario
มุมมอง 4114 วันที่ผ่านมา
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to handle fixed length text file using ADF dataflow. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a powerful cloud-based da...
How to copy last n days data incrementally from ADLS Gen2 | Azure Data Factory | Real Time Scenario
มุมมอง 9328 วันที่ผ่านมา
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to Get last n days data from Source ADLS Gen2 using Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a powe...
Delta Lake : Slowly Changing Dimension (SCD Type2) | Pyspark RealTime Scenario | Data Engineering
มุมมอง 212หลายเดือนก่อน
How to handle Slowly Changing Dimension Type2 (SCD Type2) requirement in Databricks using Pyspark? This video covers end to end development steps of SCD Type 2 using Pyspark in Databricks environment If you like this video please do like,share and subscribe my channel. PySpark playlist : th-cam.com/play/PL7DrGo85HcssOo4q5ihH3PqRRXwupRe65.html PySpark RealTime Scenarios playlist : th-cam.com/pla...
How to copy latest or last modified file from ADLS Gen2| Azure Data Factory | Real Time Scenario
มุมมอง 160หลายเดือนก่อน
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to Get latest or last modified File from Source folder in Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is ...
Latest Tiger Analytics coding Interview Questions & Answers | Data Engineer Prep 2024
มุมมอง 987หลายเดือนก่อน
Top Tiger Analytics SQL Interview Questions and Boost Your Data Engineering Career in 2024! Are you preparing for a data engineering interview at Tiger Analytics? You've come to the right place! This comprehensive guide covers essential pyspark interview questions and provides in-depth answers to help you succeed in your upcoming interview. Whether you're a seasoned professional or just startin...
How to get source file name dynamically in ADF | Azure Data Factory | Real Time Scenario
มุมมอง 103หลายเดือนก่อน
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to Get Source File Names Dynamically from Source folder in Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is...
Top LTIMindtree SQL Interview Questions | Data Engineering Career Guide 2024 | Data Engineering
มุมมอง 1.2Kหลายเดือนก่อน
Master LTIMindtree SQL Interview Questions and Boost Your Data Engineering Career in 2024! Are you preparing for a data engineering interview at LTIMindtree? You've come to the right place! This comprehensive guide covers essential SQL interview questions and provides in-depth answers to help you succeed in your upcoming interview. Whether you're a seasoned professional or just starting your ca...
How to upsert data into delta table using PySpark | Pyspark RealTime Scenario | Data Engineering
มุมมอง 153หลายเดือนก่อน
upsert implementation in Delta Lake. If you like this video please do like,share and subscribe my channel. PySpark playlist : th-cam.com/play/PL7DrGo85HcssOo4q5ihH3PqRRXwupRe65.html Azure Datafactory playlist : th-cam.com/play/PL7DrGo85HcsueO7qbe3-W9kGa00ifeQMn.html&si=IVjV6k5Yzr_xaU6p PySpark RealTime Scenarios playlist : th-cam.com/play/PL7DrGo85HcstBR0D4881RTIzqpwyae1Tl.html PySpark Intervie...
Write a pyspark code to get the given output | Data Engineering Interview Question | DealShare
มุมมอง 1402 หลายเดือนก่อน
Write a pyspark code to get the given output | Data Engineering Interview Question | DealShare
11. Write spark code to find the employee count under each manager | Pyspark | SQL Solution
มุมมอง 4942 หลายเดือนก่อน
11. Write spark code to find the employee count under each manager | Pyspark | SQL Solution
10. Find the employees with their primary department | Pyspark | SQL Solution
มุมมอง 1312 หลายเดือนก่อน
10. Find the employees with their primary department | Pyspark | SQL Solution
Find the top three high earner employees from each department | Data Engineer Interview | Michaels
มุมมอง 1512 หลายเดือนก่อน
Find the top three high earner employees from each department | Data Engineer Interview | Michaels
Find the popularity percentage for each user on Meta | Data Engineering Interview Question | Meta
มุมมอง 1323 หลายเดือนก่อน
Find the popularity percentage for each user on Meta | Data Engineering Interview Question | Meta
Find the house that has won max no of battles for each region | Data Engineering Interview | fractal
มุมมอง 1923 หลายเดือนก่อน
Find the house that has won max no of battles for each region | Data Engineering Interview | fractal
How to parameterize Linked Services in ADF | Azure Data Factory Tutorial for Begineers
มุมมอง 733 หลายเดือนก่อน
How to parameterize Linked Services in ADF | Azure Data Factory Tutorial for Begineers
calculate the percentage difference of total sales Q1 & Q2 | Data Engineering Interview | Prologis
มุมมอง 1133 หลายเดือนก่อน
calculate the percentage difference of total sales Q1 & Q2 | Data Engineering Interview | Prologis
Integration Runtime in ADF | Azure Data Factory Tutorial For Beginners
มุมมอง 543 หลายเดือนก่อน
Integration Runtime in ADF | Azure Data Factory Tutorial For Beginners
BandhanBank SQL Interview Questions and Answers | Data Engineering | SQL Interview Question
มุมมอง 4213 หลายเดือนก่อน
BandhanBank SQL Interview Questions and Answers | Data Engineering | SQL Interview Question
How to Install Microsoft SQL Server & SSMS 20.1 on Windows | Complete guide
มุมมอง 743 หลายเดือนก่อน
How to Install Microsoft SQL Server & SSMS 20.1 on Windows | Complete guide
Write a query to find out third highest salary | SQL Interview Question | HCLTech
มุมมอง 1.2K3 หลายเดือนก่อน
Write a query to find out third highest salary | SQL Interview Question | HCLTech
Find the top employees from each shop with highest sales | Data Engineering Interview | Maersk
มุมมอง 1874 หลายเดือนก่อน
Find the top employees from each shop with highest sales | Data Engineering Interview | Maersk
Different types of trigger in ADF | Azure Data Factory Tutorial For Beginners
มุมมอง 534 หลายเดือนก่อน
Different types of trigger in ADF | Azure Data Factory Tutorial For Beginners
How to trigger one pipeline from another pipeline | Azure Data Factory Tutorial For Beginners
มุมมอง 604 หลายเดือนก่อน
How to trigger one pipeline from another pipeline | Azure Data Factory Tutorial For Beginners
9. Find the credit card and how many cards were issued in its launch month | PySpark
มุมมอง 1274 หลายเดือนก่อน
9. Find the credit card and how many cards were issued in its launch month | PySpark
How to delete data using ADF | Azure Data Factory Tutorial For Beginners
มุมมอง 644 หลายเดือนก่อน
How to delete data using ADF | Azure Data Factory Tutorial For Beginners
Find the room type that are searched most no. of times | Data Engineering Interview | Airbnb
มุมมอง 5324 หลายเดือนก่อน
Find the room type that are searched most no. of times | Data Engineering Interview | Airbnb
How to copy data in ADF | Azure Data Factory Tutorial For Beginners
มุมมอง 804 หลายเดือนก่อน
How to copy data in ADF | Azure Data Factory Tutorial For Beginners

ความคิดเห็น

  • @AkhilShaik-p4k
    @AkhilShaik-p4k 2 วันที่ผ่านมา

    Thanks for sharing🎉

  • @BeingHanumanLife
    @BeingHanumanLife 6 วันที่ผ่านมา

    Can you please create video on DLT streaming tables. I'm facing issues while using SCD1. My bronze notebook is seperate and Silver notebook is seperate. I'm facing issues while calling bronze table as stream and loading into silver.

  • @siddhantmishra6581
    @siddhantmishra6581 6 วันที่ผ่านมา

    thanks for sharing. keep up the good work!!

  • @rahuldave6699
    @rahuldave6699 7 วันที่ผ่านมา

    query = spark.sql("""with cte as (select dept_id,emp_name,salary, row_number() over(partition by dept_id order by salary desc, emp_name) as rn, count(dept_id) over (partition by dept_id order by dept_id) as dept_count from emp ) select dept_id,max(case when rn = 1 then emp_name else Null end ) as max_salary, min(case when rn = dept_count then emp_name else Null end) as min_salary from cte group by dept_id""")

  • @rahuldave6699
    @rahuldave6699 8 วันที่ผ่านมา

    product_data = [(1, 'Laptop', 'Electronics'), (2, 'Jeans', 'Clothing'), (3, 'Chairs', 'Home Appliances')] product_schema = ['product_id', 'product_name', 'category'] product_df = spark.createDataFrame(product_data, product_schema) product_df.show() sales_data = [(1, 2019, 1000.00), (1, 2020, 1200.00), (1, 2021, 1100.00), (2, 2019, 500.00), (2, 2020, 600.00), (2, 2021, 900.00), (3, 2019, 300.00), (3, 2020, 450.00), (3, 2021, 400.00)] sales_schema = ['product_id', 'year', 'total_sales_revenue'] sales_df = spark.createDataFrame(sales_data, sales_schema) sales_df.withColumn("year",to_date("year",'YYYY').cast(DateType())) sales_df.show()

  • @sibanandaroutray9721
    @sibanandaroutray9721 14 วันที่ผ่านมา

    This question is for how many years of experience candidates ?

  • @harshavardhansaimachineni587
    @harshavardhansaimachineni587 16 วันที่ผ่านมา

    Can we Write In CTE2 AS ( SELECT DISTINCT company from cte1 where rnk =1) ?

  • @gauravgaikwad2939
    @gauravgaikwad2939 19 วันที่ผ่านมา

    Why your audio is echoing?

    • @CognitiveCoders
      @CognitiveCoders 18 วันที่ผ่านมา

      We've resolved the issue. From next video onward you'll not face the issue.

  • @prajju8114
    @prajju8114 20 วันที่ผ่านมา

    match_df1=match_df.withColumn('team',expr("concat((team_A),',',team_B)")) match_df1=match_df1.drop('team_A','team_B') match_df1.show() match_df1=match_df1.withColumn('team',split(col('team'),',')) match_df1=match_df1.withColumn('team',explode(col('team'))) match_df1=match_df1.select('team','win') match_df1.show() match_df2=match_df1.groupBy('team').agg(count('*').alias('played')) match_df3=match_df1.groupBy('win').agg((count('*')/2).cast('int').alias('total_win')) final_df=match_df2.join(match_df3,col('team')==col('win'),'left').orderBy(col('total_win').desc()) final_df=final_df.select('team','played','total_win',coalesce(col('total_win'),lit(0)).alias('total_wins')) final_df=final_df.drop('total_win') final_df.show() this is my alternative approach and its working well.

  • @siddu1036
    @siddu1036 22 วันที่ผ่านมา

    For duplicates, With CTE (Select *, row_number() over (partition by dept,name,salary order by salary) as rnk from emp) SELECT * from CTE where rnk = 1 Will get only the records with out duplicate

  • @prajju8114
    @prajju8114 22 วันที่ผ่านมา

    Here is my approach, this also works. from pyspark.sql.functions import * from pyspark.sql.window import Window win=Window.partitionBy('dept_id').orderBy(col('salary').desc()) df1=df.withColumn('highest_salary',dense_rank().over(win)) df1.show() df2=df1.groupBy('dept_id').agg(min(when(col('highest_salary')==2,col('emp_name'))).alias('min_salaried_emp')\ ,max(when(col('highest_salary')==1,col('emp_name'))).alias('max_salaried_emp')) df2.display()

  • @prajju8114
    @prajju8114 25 วันที่ผ่านมา

    i think we should use row_number not dense_rank, please clarify

  • @prajju8114
    @prajju8114 25 วันที่ผ่านมา

    Cant we use any other command for populating character * instead of repeat?

  • @prajju8114
    @prajju8114 25 วันที่ผ่านมา

    this was straightforward question, From the glimpse of the dataset, I came to know that we have to create an array and explode to get the room types.

  • @june17you
    @june17you หลายเดือนก่อน

    Small suggestion. Your voice is echoing so little bit disturbance to hear you properly. Appreciate you for creating this kind of videos

    • @CognitiveCoders
      @CognitiveCoders หลายเดือนก่อน

      Thanks for the feedback. We'll try to improve the sound quality

  • @prajwalreddy2882
    @prajwalreddy2882 หลายเดือนก่อน

    select distinct * from employee order by salary desc offset 2 rows fetch next 1 row only;

  • @prajwalreddy2882
    @prajwalreddy2882 หลายเดือนก่อน

    with cte as (select * from employee e1 where not exists ( select 1 from employee e2 where e1.emp_id = e2.manager_id)), cte2 as (select department ,emp_id, dense_rank() over(partition by department order by salary desc ) as dnk from cte) select department, emp_id,dnk from cte2 where dnk= 1;

  • @KaiwalyaSevekar
    @KaiwalyaSevekar หลายเดือนก่อน

    bhai be original . trying to talk in different accent

  • @vighneshbuddhivant8353
    @vighneshbuddhivant8353 หลายเดือนก่อน

    df=spark.read.csv("/content/jobs.csv",header=True) df.show() result=df.groupBy('job').agg(count('name').alias('total_count')) result.show() rows=result.rdd.collect() result_dict = dict((row['job'], row['total_count']) for row in rows) print(result_dict)

  • @vighneshbuddhivant8353
    @vighneshbuddhivant8353 หลายเดือนก่อน

    customer_df=spark.createDataFrame(customer_data,customer_schema) order_df=spark.createDataFrame(order_data,order_schema) customer_df.show() order_df.show() group_df=customer_df.join(order_df,'customer_id','left_anti') group_df.show()

  • @bulluhemanth9149
    @bulluhemanth9149 หลายเดือนก่อน

    Haven’t you already filtered out nulls in the last second step? Coaelesce was unnecessary. cdf there was a spelling mistake it is cDf

    • @CognitiveCoders
      @CognitiveCoders หลายเดือนก่อน

      Please share your solution for all the community members

  • @arsalanansari5066
    @arsalanansari5066 หลายเดือนก่อน

    To solve Question 1 use the following query: select name,dept,salary from emp e where salary = (select max(salary) from emp where dept = e.dept ) order by dept

  • @vijaybandi5417
    @vijaybandi5417 หลายเดือนก่อน

    Nice session. Thanks a lot.

  • @ramswaroop1520
    @ramswaroop1520 หลายเดือนก่อน

    Can you share your resume please .... I'm applying im naukri since one month not even single call or reply received. Please gudie me

  • @vikasbk8573
    @vikasbk8573 2 หลายเดือนก่อน

    Infosys interview question: I have two sources : Col1 col2 1 2 3 4 5 6 C1 c2 c3 A b c D e f I want to show the records like this C1 c... I want to show the records like this C1 c2 c3 c4 c5 1 3 5 A D 2 4 6 b e . . . C f Will you please solve this and upload the video. OR You can give the code on comment.

  • @sabesanj5509
    @sabesanj5509 2 หลายเดือนก่อน

    SELECT employee_id, department_id FROM employees GROUP BY employee_id HAVING COUNT(employee_id) = 1 UNION SELECT employee_id, department_id FROM employees WHERE primary_flag = 'Y';

  • @srinubathina7191
    @srinubathina7191 2 หลายเดือนก่อน

    can you please suggest resource to learn Pyspark testing

  • @NiteeshKumarPinjala
    @NiteeshKumarPinjala 2 หลายเดือนก่อน

    At final , How come region The house has same rank? we have used dense rank right, it shouldn't give us the same rank. Please clarify.

  • @CognitiveCoders
    @CognitiveCoders 2 หลายเดือนก่อน

    data = [(1,'Rahul',33,'Kolkata'),(2,'Raj',12,'Delhi'), (2,'Raj',12,'Delhi'), (3,'Priya',37, 'Mumbai'),(3,'Priya',37, 'Indore')] schema = "Id int, Name string, Roll int, City string"

  • @akshaychowdhary8534
    @akshaychowdhary8534 3 หลายเดือนก่อน

    plz try to give dataset in description and nice explanation, also ur voice doesnt seem clear !!

  • @gsreenivasulu3246
    @gsreenivasulu3246 3 หลายเดือนก่อน

    Place data sets in discription

    • @CognitiveCoders
      @CognitiveCoders 2 หลายเดือนก่อน

      data = [(1,5),(1,3),(1,6),(2,1),(2,6),(3,9),(4,1),(7,2),(8,3)] schema = ['user1','user2']

    • @gsreenivasulu3246
      @gsreenivasulu3246 2 หลายเดือนก่อน

      @@CognitiveCoders we are requesting please place data sets in comming videos

  • @atharvjoshi9959
    @atharvjoshi9959 3 หลายเดือนก่อน

    voice is echoing. pls rectify it

  • @UjjwalSinghPal-um3pb
    @UjjwalSinghPal-um3pb 3 หลายเดือนก่อน

    if i started learning for now than there are very less data engineering internships(if one has data internship it is easy to get a job ,and that person also get hand on experience ) ,,,,so should i do data analyst first only for internship and after that i will study for data engineering for job because it is also a good way , and i need to do an internship because it is in our college criteria , they do not support much if you do not have done internship??

  • @sabesanj5509
    @sabesanj5509 3 หลายเดือนก่อน

    Bro first add the question in either comments or description section.

  • @VinayGautam-s8s
    @VinayGautam-s8s 3 หลายเดือนก่อน

    How do we run the notebook in prod environment. I mean cicd , IAC , and anything else please help us with that as well.

    • @CognitiveCoders
      @CognitiveCoders 3 หลายเดือนก่อน

      using that env cluster

    • @VinayGautam-s8s
      @VinayGautam-s8s 3 หลายเดือนก่อน

      @@CognitiveCoders I meant, how do we schedule and orchestrate the notebook created in databrick.

  • @TheOmen08
    @TheOmen08 3 หลายเดือนก่อน

    Thanks for the help, this works well. 💌

    • @CognitiveCoders
      @CognitiveCoders 3 หลายเดือนก่อน

      Please do Like, Share & Subscribe for supporting us.

  • @ranerutuja
    @ranerutuja 3 หลายเดือนก่อน

    Thanks, helped a lot