Question 11: PWC Interview Questions part 2| data engineers |

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ก.ย. 2024
  • In this video I have discussed on Interview question asked in PWC interview for data engineers.
    Q : Suppose you have a dataset with information about employee projects, and you want to find the most recent project and the total number of projects for each employee.
    employee_projects_data = [
    (1, 'Project1', '2022-01-10'),
    (1, 'Project2', '2022-02-15'),
    (1, 'Project3', '2022-03-20'),
    (2, 'Project1', '2022-01-05'),
    (2, 'Project2', '2022-02-10'),
    (2, 'Project3', '2022-03-15'),
    (2, 'Project4', '2022-04-20')
    ]
    schema = "employee_id int ,project_name string, project_date string"
    df_employee_projects = spark.createDataFrame(data=employee_projects_data, schema=schema)
    Solution is in PySpark
    Check out this video and do let me know your doubts we can connect on
    linkedIn : / priyam-jain-0946ab199
    Do subscribe @pysparkpulse for more such Questions.
    #pyspark #spark #bigdata #bigdataengineer #dataengineering #dataengineer #deloitte #pwc #mnc

ความคิดเห็น • 4

  • @bolisettisaisatwik2198
    @bolisettisaisatwik2198 4 หลายเดือนก่อน

    We can also order by date and then filter the latest date.

  • @siddharthchoudhary103
    @siddharthchoudhary103 7 หลายเดือนก่อน +1

    what is date data is not in order then we have to use rank and then filter right?

    • @pysparkpulse
      @pysparkpulse  7 หลายเดือนก่อน

      Yes right 😊