Data Engineering Mock Interview | Myntra | Part 1

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 ต.ค. 2024

ความคิดเห็น • 13

  • @vikram--krishna
    @vikram--krishna 8 หลายเดือนก่อน +2

    Great video!!
    It would be great if any one of your panel members would share their approach to the scenario based question on LinkedIn or here, thanks!

  • @adijos92
    @adijos92 8 หลายเดือนก่อน +2

    when will you upload 2nd part of this interview...?? very great video of interview

    • @TheBigDataShow
      @TheBigDataShow  8 หลายเดือนก่อน

      In day or two

    • @TheBigDataShow
      @TheBigDataShow  8 หลายเดือนก่อน

      It is uploaded now. Please check

  • @user-yv1uu2op8o
    @user-yv1uu2op8o 8 หลายเดือนก่อน

    Hi Team ,1 ques to you on spark executor memory calc..................
    10 nodes, Each node has 16GB Ram, 8 cores?
    scenario 1 : 7 cores per executor and total 10 executors with each executor having 16 ram(includes memory overhead)
    scenario 2: 3 cores per executor and total 20 executor with each executor having 8 gb ram(includes overhead)
    if input file has 600 parititions which scenario is more useful?

  • @ameygoesgaming8793
    @ameygoesgaming8793 6 หลายเดือนก่อน +3

    My Solution to the question:
    We want to process 1TB of Data everyday.
    Ingestion: Assumption - Filenames will be - 'filename_timestamp.csv'
    I will ingest all the files separating them on days of granularity since, we only require a last 90 days of the data.
    I will have deletion active for the each file in S3 for 90 days.
    Once the file is processed successfully, archive it, if error occurred re-upload to the location so that, it will be picked up by next job.
    Processing:
    A file is processed everyday, entirely and records are inserted in WAP basis - write - audit - production after processing in into Spark
    The records will be written in staging area, all data quality checks will be applied, any records are audits which failed processing, de-duplication is applied, and then using UPSERT records will be written to production.
    Spark Calculation:
    I will have 5 executors, with 16 GB memory and 5 cores per executor. Driver will be 16 GB 5 Cores.
    To process 1TB file, it will require, 1024 GB / (16GB * 5 executors) - around 13 GB / executor will be processed.
    Also, I didn't take. into account, overhead memory - we can do it to go in detail later.
    Warehousing:
    Warehouse will be designed in fact-dimentions table, where facts will have a transactions and one of the dimension is date dimension, fact and date dim will be joined to query last 90 days of the data via analytical team.
    Please suggest / improve my answer

    • @TheBigDataShow
      @TheBigDataShow  6 หลายเดือนก่อน +1

      Good approach👏

    • @TheBigDataShow
      @TheBigDataShow  6 หลายเดือนก่อน +1

      Please watch the latest Data Engineering mock interview video that we have uploaded on our channel. You will find more interesting system design problems to solve. Take your time to comprehend the problem and share your solution in the comments section. I will review your comments and provide feedback to help you improve.
      & You are already doing good.

  • @ameygoesgaming8793
    @ameygoesgaming8793 6 หลายเดือนก่อน

    I felt question is not clear....90 days data from the now he said, but he also said, 90 days data for the last 5 years.

    • @SanketPatole
      @SanketPatole 6 หลายเดือนก่อน

      He meant any consecutive 90 days from the past 5 years.

  • @Sharath_NK98
    @Sharath_NK98 8 หลายเดือนก่อน +2

    Not Satisfied Answers

    • @TheBigDataShow
      @TheBigDataShow  8 หลายเดือนก่อน +5

      Please check the next part of the video. This interview has gone on for more than two hours and Vipul has answered many questions wonderfully. It takes courage to come into the public eye and give an interview. People do get nervous, but answering questions for more than two hours shows lots of intelligence, calibre, and a wonderful attitude.

    • @noob_2377
      @noob_2377 8 หลายเดือนก่อน

      ​@@TheBigDataShow👏👏