Ace Your Data Science Interview: Top SQL Questions to Master

แชร์
ฝัง
  • เผยแพร่เมื่อ 30 ก.ย. 2024

ความคิดเห็น • 34

  • @emma_ding
    @emma_ding  2 ปีที่แล้ว +10

    All SQL problems and resources mentioned in this video
    medium.com/@emmading/practice-for-3-types-of-sql-interview-questions-2bd057a88b4f

    • @avishayisraeli1894
      @avishayisraeli1894 2 ปีที่แล้ว +1

      really nice, i was asked a similar ratio question in Meta a month ago... I solved it a little bit different but it was good.

    • @emma_ding
      @emma_ding  2 ปีที่แล้ว +1

      @@avishayisraeli1894 Thanks for sharing your experience! :)

  • @sophiezheng4850
    @sophiezheng4850 2 ปีที่แล้ว +6

    I think it could be also helpful if you can summarize leetcode python questions in basic algorithm which are good for data scientists to prepare for technical interviews. Some of the companies do ask basic algorithm questions during coding interviews.

    • @emma_ding
      @emma_ding  2 ปีที่แล้ว

      Thank you for your feedback!

  • @TheElementFive
    @TheElementFive 2 ปีที่แล้ว +2

    SUM(col) OVER (ORDER BY date ROWS BETWEEN 4 PRECEDING and CURRENT ROW) is an easy way to calculate a rolling sum. Useful in problems like the last one you showed.

  • @songsong2334
    @songsong2334 2 ปีที่แล้ว +2

    Hi Emma! Good video overall. One small mistake in the Data Categorization section. '[10-15>' is tricky here since the above code won't return the answer as 0. Because we cannot case when something does not exist in the table. To do this, we might want to union all to consider 0 as the case.

  • @ashleyblanchard8993
    @ashleyblanchard8993 2 ปีที่แล้ว +1

    thanks! looking forward to window functions vidoe!

  • @kaitoukid1088
    @kaitoukid1088 2 ปีที่แล้ว +1

    Not SQL-related but what do you think are the skills data scientists need to learn now aside from Python and SQL?

    • @emma_ding
      @emma_ding  2 ปีที่แล้ว

      Hope this video answers your questions th-cam.com/video/yhjK6fC68eU/w-d-xo.html :)

  • @sarveshpatki
    @sarveshpatki 2 ปีที่แล้ว +1

    Wow, this is a great video!

  • @cococnk388
    @cococnk388 ปีที่แล้ว

    Hello Emma,
    For the Categorization problem here is my solving method.
    1) I create the table that I going to use fo the join
    DROP TABLE IF EXISTS bins_value
    CREATE TABLE bins_value (
    bin_name varchar(20)
    )
    INSERT INTO bins_value VALUES
    ('[0-5>'),
    ('[5-10>'),
    ('[10-15>'),
    ('[15-20>'),
    ('20 ans more')
    2) The query to the problem
    WITH BINS as(
    SELECT
    CASE
    WHEN duration >= 0 and duration < 5 THEN '[0-5>'
    WHEN duration >= 5 and duration < 10 THEN '[5-10>'
    WHEN duration >= 10 and duration < 15 THEN '[10-15>'
    WHEN duration >=15 and duration < 20 THEN '[15-20>'
    ELSE '20 ans more'
    END as bin
    FROM SESSIONS
    )
    SELECT a.bin_name, ISNULL(b.total,0) T_total FROM
    (SELECT bin, COUNT(*) total FROM BINS
    GROUP BY bin) as b
    right join bins_value as a on b.bin = a.bin_name

  • @hammadahmed2339
    @hammadahmed2339 2 ปีที่แล้ว

    Hi Emma! A very good video.
    But I think games_played_so_far will not show the running total. It will show total games played by some one on certain date.
    Reason:
    1st Step: Join two tables
    2nd Step: Extract rows using where clause (it would return rows from the derived/joined table and WILL EXECUTE ONLY ONCE in query for all the rows where a.event_date is either greater or equat to b.event_date)
    3rd Step: Grouping data according to a.player_id, a.event_date
    4th Step: Selecting a.player_id, a.event_date and SUM(b.games_played). Here summation gives the total number of games played by a player on certain date.
    So far as I know we do not need WHERE clause here. I think the condition should be changed in ON clause and it should be a.event_date >= b.event_date.
    I may be wrong. Please guide if I am.

  • @selinsu5790
    @selinsu5790 2 หลายเดือนก่อน

    really a clear breakdown of the SQL problems, thanks!

  • @dougkenney
    @dougkenney ปีที่แล้ว

    One suggestion for the first order question: the "first_order" table technically doesn't only reflect customers' first orders. consider, for example, a customer whose first order is placed on 1/1/2022 for delivery on 1/2/2022, and then in their second order they place it also on 1/1/2022 but this one is in fact an immediate delivery (for 1/1/2022). they'll show up now in your first query, but their first order was not, in fact, an immediate order (their second order was).

  • @kexinfu8647
    @kexinfu8647 2 ปีที่แล้ว

    Will be having a live SQL interview next week, the video helps a lot!

  • @MashiroRedo
    @MashiroRedo 2 ปีที่แล้ว

    For Data Categorization, can you show me how to Left join to get the missing bin? I checked around and everyone uses a union to satisfy that condition.

    • @cococnk388
      @cococnk388 ปีที่แล้ว +2

      Hello try this :
      1) First create your session table :
      DROP TABLE IF EXISTS SESSIONS
      CREATE TABLE SESSIONS (
      session_id int,
      duration int
      )
      INSERT INTO SESSIONS VALUES
      (1, 30),
      (2, 199),
      (3, 299),
      (4, 580),
      (5, 1000),
      (6, 2),
      (7, 10),
      (8,5)
      2) Create the table to show all the bins even the ones with no values
      DROP TABLE IF EXISTS bins_value
      CREATE TABLE bins_value (
      bin_name varchar(20)
      )
      INSERT INTO bins_value VALUES
      ('[0-5>'),
      ('[5-10>'),
      ('[10-15>'),
      ('[15-20>'),
      ('20 ans more')
      3) Answer to the question
      WITH BINS as(
      SELECT
      CASE
      WHEN duration >= 0 and duration < 5 THEN '[0-5>'
      WHEN duration >= 5 and duration < 10 THEN '[5-10>'
      WHEN duration>= 10 and duration < 15 THEN '[10-15>'
      WHEN duration>=15 and duration < 20 THEN '[15-20>'
      ELSE '20 ans more'
      END as bin
      FROM SESSIONS
      )
      SELECT a.bin_name, ISNULL(b.total,0) T_total FROM
      (SELECT bin, COUNT(*) total FROM BINS
      GROUP BY bin) as b
      right join bins_value as a on b.bin = a.bin_name
      I use the platform Microsoft SQL SERVER, just copy and paste you will have the answer, later on understand the steps.
      Hope it helps.

  • @djjiang3718
    @djjiang3718 2 ปีที่แล้ว

    Thank you Emma! Good stuff!

  • @__goyal__
    @__goyal__ 2 ปีที่แล้ว

    The only challenge that I have been facing with leetcode is the quality of test cases against which the solution is tested. Also, the solution sometimes doesn't go with what was asked in the first place!

    • @emma_ding
      @emma_ding  2 ปีที่แล้ว

      Thanks for sharing your experience Shubham!

  • @anggipermanaharianja6122
    @anggipermanaharianja6122 2 ปีที่แล้ว

    clear and concise, nice vid

  • @lizhouf
    @lizhouf 2 ปีที่แล้ว

    Nice video! Thank you, Emma! One question would be: for the categorization example, will it be easier to use UNION to add the 0 record row (10-15 in this case, at 08:50)? I was not able to come up with/find an easy solution using JOIN here, could you point me to some possible answers? Thanks!

    • @songsong2334
      @songsong2334 2 ปีที่แล้ว

      Try this it works.
      SELECT '[0-5>' bin, SUM(CASE WHEN duration*1.0/60 >=0 and duration*1.0/60 =5 and duration*1.0/60 =10 and duration*1.0/60 =15 THEN 1 ELSE 0 END) total
      FROM Sessions

    • @lizhouf
      @lizhouf 2 ปีที่แล้ว

      @@songsong2334 Thank you! I came up with a similar solution before, using UNIONm while still wondering if there is anything that can directly improve from Emma's solution that doesn't cover the 0 cases.

  • @howardsmith4128
    @howardsmith4128 2 ปีที่แล้ว

    Hello at 4:22 Customer_id 3 and Delivery_id have the same order_date and customer_pref_delivery_date. Why is Customer 3 not satisfying the criteria you outlined?

    • @MohanRaj-tm2ml
      @MohanRaj-tm2ml 2 ปีที่แล้ว +2

      @Howard Smith Because the first order of customer_id 3 is on 2019-08-21 which is in next row and that order is not an immediate order.

    • @howardsmith4128
      @howardsmith4128 2 ปีที่แล้ว +1

      @@MohanRaj-tm2ml Thanks for pointing that out.

    • @MashiroRedo
      @MashiroRedo 2 ปีที่แล้ว

      It's not the cust id 3 first order. They have an earlier order that doesn't satisfy the criteria.

  • @__goyal__
    @__goyal__ 2 ปีที่แล้ว

    How long do we have to wait till your next video on window functions?

    • @emma_ding
      @emma_ding  2 ปีที่แล้ว

      Hang in there, it won't be long!

  • @oldfairy
    @oldfairy ปีที่แล้ว

    too much hands movement

  • @cumibakar10
    @cumibakar10 ปีที่แล้ว

    But how do I compute the L ratio?