Data Engineering Interview | System Design

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ก.ค. 2024
  • Data Engineering Mock Interview - First Round
    Join Ankur Ranjan, an experienced Data Engineering professional with over 5 years of experience, and Aastha for an exciting and informative Data Engineering mock interview session.
    If you're preparing for a Data Engineering interview, this is the perfect opportunity to enhance your skills and increase your chances of success. The mock interview simulates a real-life scenario and provides valuable insights and guidance. The topics covered include discussion on SQL vs NoSQL, Git Practices, Deployment methodology in the Data Engineering etc.
    It covered questions from Apache Spark, SQL, Airflow, File Formats, AWS technology like SQS, SNS, Step Functions, Lambda Functions, AWS Glue, #aws Athena, EMR clusters, data modelling, database technologies, cloud platforms, and more. You'll get to see how professionals tackle technical questions and problem-solving challenges in a structured and efficient manner.
    By watching this mock interview, you'll learn effective strategies to approach technical questions and problem-solving scenarios, gain familiarity with the data engineering interview process and format, enhance your communication skills and ability to articulate your thoughts clearly, identify areas of improvement, receive expert feedback on your performance, boost your confidence, and reduce nervousness for future interviews.
    This mock interview suits all levels of experience, whether you're a fresh graduate, a career changer, or a seasoned professional looking to improve your interview skills. Don't miss out on this invaluable learning experience! Subscribe to our channel and hit the notification bell to be notified when the mock interview is released. Stay tuned for a deep dive into the world of data engineering.
    Subscribe now and be the first to watch the Data Engineering Mock Interview with Ankur & Aastha.
    🔅 To book a Mock interview - topmate.io/ankur_ranjan/15155
    🔅 LinkedIn - / thebigdatashow
    🔅 Instagram - / ranjan_anku
    🔅 Ankur(Interviewer) 's LinkedIn profile - / thebigdatashow
    🔅 Aastha (Interviewee)'s LinkedIn profile - / aastha-jain-851ab3140
    Chapters:
    00:00 - Introduction
    04:54 - What is a fact and dimension table? And what is the difference between fact & dimension?
    06:33 - What is the difference between row base file format & columnar file format? And why columnar-based file formats such as parquet or orc are favoured for analytics?
    08:00 - Different compression techniques such as snappy, biz2 and LZO. And which one to choose?
    08:20 - What is the write-ahead log?
    09:19 - When do we need to choose DataLake and when to use DataWarehouse
    10:41 - What is the difference between #datalake & #datawarehouse ?
    11:38 - Difference between RDD, Dataframe & Dataset
    12:28 - Difference between SparkSession & SparkContext
    12:57 - Pyspark optimisation technique
    16:35 - Spark is an in-memory compute engine then why do we need cache in #apachespark ?
    17:20 - Difference between cache and persist
    19:55 - Lazy evaluation in Pyspark
    22:53 - What are SQS and SNS in AWS?
    24:00 - What is the work of Step functions in #aws?
    25:40 - Use of AWS Glue
    27:28 - How do you decide which should go to the Data Warehouse and which should be treated as an external table?
    28:29 - How do you choose the database?
    29:47 - What is elastic search?
    31:18 - SQL Question
    32:48 - System Design Question in #dataengineering
    56:00 - Review of interview & recommendations for systems design in Data Engineering
    #dataengineering #interview #interviewquestions #bigdata #mockinterview

ความคิดเห็น • 27

  • @vineetjain7518
    @vineetjain7518 4 หลายเดือนก่อน +4

    This is quality data engineering.. There is something for everyone

    • @TheBigDataShow
      @TheBigDataShow  4 หลายเดือนก่อน +1

      Thanks a lot Vineet. Keep learning 👏🎉

    • @abdulsami-xn6ss
      @abdulsami-xn6ss 4 หลายเดือนก่อน +2

      ❤❤❤

  • @priyankashaw2956
    @priyankashaw2956 4 หลายเดือนก่อน +1

    All the available interviews on this channel are great and helping. Thank you for uploading and keep going.

    • @TheBigDataShow
      @TheBigDataShow  4 หลายเดือนก่อน

      Thank you Priyanka 🙇🙏🎊

  • @vikram--krishna
    @vikram--krishna 2 หลายเดือนก่อน

    Can you please share your approach for below question?
    Design data pipeline for a news broadcast app.
    consideration :
    1. Active users : 1million
    2: news will be push notified
    3. User can comment on each news
    4. User can like,dislike the news
    5. Based on the reactions, customize news type for a specific user group by running the ML model
    6. Pipeline should be fast/ near real time
    7. Users should also get messages based on their current location ( local news)

  • @Sandip_Patle
    @Sandip_Patle 4 หลายเดือนก่อน +2

    Got a great understanding over Big Data interviews today again. Thanks for such a useful content.

    • @TheBigDataShow
      @TheBigDataShow  4 หลายเดือนก่อน +2

      Thank you @Sandip_Patle . Thank you for your kind words.

    • @Sandip_Patle
      @Sandip_Patle 4 หลายเดือนก่อน

      Sir, can I get a video where the candidate explains its data engineering project related to RETAIL DOMAIN only?
      I've been following this channel for a long time now but I haven't seen this project in any mock interview so far. Perhaps I missed it.
      Could you please share the relevant link if possible?

  • @savirawat6671
    @savirawat6671 3 หลายเดือนก่อน +1

    Great video ,in depth knowledge sharee

    • @TheBigDataShow
      @TheBigDataShow  3 หลายเดือนก่อน

      Thank you for your kind words 😊 Keep learning

    • @TheBigDataShow
      @TheBigDataShow  3 หลายเดือนก่อน

      We have more than 25 other Data Engineering Mock Interview videos. Do watch them in your free time and let me know your thoughts.

  • @lakshaychopra
    @lakshaychopra 3 หลายเดือนก่อน +1

    God bless your channel 🙏🏻

    • @TheBigDataShow
      @TheBigDataShow  3 หลายเดือนก่อน

      Thank you for your kind words Lakshay :)

  • @bhaveshchavan6075
    @bhaveshchavan6075 4 หลายเดือนก่อน +2

    Hello, can u conduct a data engineer interview for a Fresher,cause it is a very advance interview for us Freshers who are doing cdac pgdbda course.

    • @TheBigDataShow
      @TheBigDataShow  4 หลายเดือนก่อน

      There are many videos in the Data Engineering playlist for freshers. Try watching old videos of the playlist. All are present in our channel.

  • @brownwolf05
    @brownwolf05 4 หลายเดือนก่อน +2

    For the Last question of sharing data to external systems then we can create a public portal with role based access and then we can give a data retrieval request form and the email to which we can send the mail through SMTP server and the data will also be available to get downloaded from the portal and the data will be always filtered as in the request form there'll be a field for user for which you need data and the range from which you need the data for that user.
    Kindly share your thoughts on this approach

    • @TheBigDataShow
      @TheBigDataShow  4 หลายเดือนก่อน

      This is a really good approach and answer Arnav✨👏👍

    • @brownwolf05
      @brownwolf05 4 หลายเดือนก่อน +1

      we can optimise this more bit by digging more into and try to filter out the data in initial stage as this will be text data which will be moslty consumed in json format through api's which is slow to read, so rather than taking one time dump from api we can have an incremental load pipeline to dump the data in document object based db so that we can fetch it in lower latency when needed then rest of the process.
      correct me in places where i can improve my observation and thinking capabiltiy
      @@TheBigDataShow

  • @louisxuan-em6lk
    @louisxuan-em6lk 4 หลายเดือนก่อน +1

    why i can not find the break point of the video?

    • @TheBigDataShow
      @TheBigDataShow  4 หลายเดือนก่อน

      There are chapters maintained in the video description. You can click on that & use the break point.

  • @sharankarthick3364
    @sharankarthick3364 3 หลายเดือนก่อน +1

    Insightful!!

    • @TheBigDataShow
      @TheBigDataShow  3 หลายเดือนก่อน

      We are also creating multiple Data Engineering Interview questions to practice in the community section of our TH-cam channel. Visit our channel and then go to the community tab to find all the questions for practice.
      Nd Do watch our other Data Engineering Mock interview by following the Mock Interview playlist too. We have more than 25 Data Engineering Mock interviews.

  • @AraviDen
    @AraviDen 4 หลายเดือนก่อน

    The candidate did not state why they need a data lake vs a warehouse. Lake can store semi structures and unstructured, but warehouse can’t.

    • @TheBigDataShow
      @TheBigDataShow  4 หลายเดือนก่อน +1

      Please check 10:41. But your answer is also good👏 keep learning

    • @AraviDen
      @AraviDen 4 หลายเดือนก่อน

      @@TheBigDataShow now I see you discussed about it. Thanks for your response