Advanced SQL Questions From Amazon (Handling complex logic in data science interviews)

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 ม.ค. 2025

ความคิดเห็น • 94

  • @vibhutikathuria7857
    @vibhutikathuria7857 3 ปีที่แล้ว +10

    Hi Nate, Don't know if you're gonna see this. But I religiously followed your youtube channel and even took a subscription at StrataScratch for an amazon interview for the role of data engineer. I can't believe that I cracked it, just got the mail that I got selected!!! Thank you so much. Everything you taught was put to right use and came in handy during my interviews! THANK YOU SO MUCH OMG ILY

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +2

      Congrats on your new role! Very happy for you and I'm glad you found the videos useful. Thank you so much for the kind words. Good luck on the start of your new job!

  • @priyankalad7789
    @priyankalad7789 3 ปีที่แล้ว +13

    that was a great explanation for a tough problem, Really could not think of all scenarios. I agree there isn't much good quality content of these series out there. Thanks for doing these

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +1

      Thanks so much Priyanka! I'm glad I can add to the technical content out there on TH-cam.

  • @LeviewFPV
    @LeviewFPV 4 ปีที่แล้ว +6

    Man, thanks for these videos. I'm in an interview process and always get nervous or mix up in tech interviews. These videos are really helpful to structure the questions and the answers! Already subbed!

    • @stratascratch
      @stratascratch  4 ปีที่แล้ว +1

      Thanks for watching and I'm glad you found this channel! I'll keep posting more videos for sure. Let me know if you have any topics you'd like me to cover. I always try to make videos that my audience wants.

    • @LeviewFPV
      @LeviewFPV 4 ปีที่แล้ว +1

      @@stratascratch keep going with your current content IMO. As you said in another video, there isn't much of this on YT! (or good quality at least)

    • @stratascratch
      @stratascratch  4 ปีที่แล้ว +1

      @@LeviewFPV Then I'll keep doing what I do! Thanks for watching and for your input.

  • @yashvirriar8833
    @yashvirriar8833 3 ปีที่แล้ว +9

    Dude, seriously helpful stuff here to walk through your thought process. Very hard to translate practical experience into words, but it was extremely beneficial into breaking this down into digestible parts!

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +1

      Thanks so much for watching and for the kind words! Hope you stick around. Will be creating more and more videos!

  • @Qornv
    @Qornv 4 ปีที่แล้ว +3

    Very clear and nice explanation. I'm watching these out of pure curiosity to discover new ways I can approach problems. If problems/explanations on the platform are just as real-world oriented it might be worth to sign up 😏

    • @stratascratch
      @stratascratch  4 ปีที่แล้ว

      Thanks for watching! Yes, those questions on the platform are real interview questions from companies. Many of them are straight out of interviews, while others are re-structured but still test for the same concepts. There are a few difficult ones I put on the platform to show you what a few real world questions would look like on the job. Hope they are helpful! There's 50 free questions on the platform so just do those until you want more=)

    • @akashchandra2223
      @akashchandra2223 4 ปีที่แล้ว

      @@stratascratch are those 50 free ones important too?

    • @stratascratch
      @stratascratch  4 ปีที่แล้ว

      @@akashchandra2223 Yes, they are. They are/were interview questions from companies.

  • @vsmanyamvadali7344
    @vsmanyamvadali7344 2 ปีที่แล้ว +1

    excellent way of explanation👍👌

  • @amadysvlog3449
    @amadysvlog3449 3 ปีที่แล้ว

    I love how you break the question down and explained it . Thank so for this video and for sharing the link to that question.

  • @jameshizon4861
    @jameshizon4861 3 ปีที่แล้ว +2

    This one is pretty good. I needed to improve on understanding the logic of solving difficult SQL questions. Will have to practice more.
    Recap:
    1. Identify all scenarios and edge cases.
    2. Individually map out solution for each scenario.
    3. Keep the logic separate from each other.

  • @rodrigozuniga9
    @rodrigozuniga9 2 ปีที่แล้ว +2

    I find your videos very useful but in this case I think your solution is a little more complicated than needed. You could select the user if and their min purchase date and create a table with just what they bought the first time. You can then join on the min date and eliminate all the records of the first purchase date. You then do the same join but using product id. You are then left with a table that exclude for each user their first date purchase and any latter purchase that includes any of the products purchased the first date. The count of distinct user is gives you the result of how many users were reached by your campaign.

  • @StoicRichie
    @StoicRichie 2 ปีที่แล้ว +1

    Thank you so much for this content . Can you start similar series for python interview for data science.

  • @Juan-Hdez
    @Juan-Hdez 2 ปีที่แล้ว +1

    Very useful. Thank you!

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว

      You're welcome. I am glad it helped you.

  • @sauravkumar9454
    @sauravkumar9454 4 ปีที่แล้ว +1

    These kinda questions will only improve the ability to solve them. Please keep posting advance SQL questions

    • @stratascratch
      @stratascratch  4 ปีที่แล้ว +1

      Thanks for watching! Will keep doing these videos!

  • @heyMrSIDD
    @heyMrSIDD 2 ปีที่แล้ว +1

    this is very helpful thanks

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว

      You're welcome. Visit our channel for more helpful videos.

  • @Manoj-nn8gd
    @Manoj-nn8gd 5 หลายเดือนก่อน

    Plz make more videos with advanced SQL querys

  • @deepak12428
    @deepak12428 ปีที่แล้ว

    the last scenario says multiple products on multiple days, shouldn't 46 not be in included in the marketing campaign as well ?

  • @debjyotiroy842
    @debjyotiroy842 3 ปีที่แล้ว

    Man this channel should be given a Nobel prize

  • @majubodas6206
    @majubodas6206 2 ปีที่แล้ว

    Thanks a lot Nate!

  • @datalearningsihan
    @datalearningsihan 2 ปีที่แล้ว

    I will always miss out on many products many days but same as the first purchase scenario. No matter how much i revise this same question and answer.

  • @muzafferckay2609
    @muzafferckay2609 3 ปีที่แล้ว

    Since you can see multiple solutions for the same problem this is one of my favorite platforms to practice SQL skills.

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      Thanks for the kind words! It's one of the features everyone loves.

  • @caiyu538
    @caiyu538 2 ปีที่แล้ว

    Thank you for explanation, this question is quite difficult.

  • @rick2591
    @rick2591 2 ปีที่แล้ว +1

    what about using CTEs...
    with fst as (
    select user_id, min(created_at) firstDate
    from marketing_campaign
    group by user_id),
    firstOrders as (
    select * from marketing_campaign a
    where exists (select 1 from fst b where b.user_id=a.user_id and b.firstDate=a.created_at)),
    newOrders as (
    select * from marketing_campaign a
    where not exists (select 1 from firstOrders b where b.user_id=a.user_id and
    (b.created_at=a.created_at or b.product_id=a.product_id)))
    select count(distinct user_id) from newOrders

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว

      put it in the platform and see if it validates! =)

  • @saikatdasgupta2006
    @saikatdasgupta2006 3 ปีที่แล้ว +16

    Easier solution
    select count(distinct m.user_id) from marketing_campaign m,
    (select user_id, min(created_at) from marketing_campaign group by user_id) a where
    a.user_id = m.user_id and
    m.product_id not in (
    select product_id from marketing_campaign where created_at=a.min and user_id=a.user_id )

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +3

      Much more efficient. Thanks for watching the video.

    • @tarekelias463
      @tarekelias463 3 ปีที่แล้ว +1

      Still trying to understand it but looks really good

    • @laurak5196
      @laurak5196 3 ปีที่แล้ว

      I think the reason this is confusing for me to read is that there’s a subquery in the from clause along with a table…..but I’m glad you shared this. It’s always helpful to see the different ways to solve the same problem, keeping in mind efficiency too. I’d also like to know how long it took people like you and Nate to be this good at SQL lol. I’m still new but sometimes it’s frustrating when I don’t learn everything as fast as I’d like to.

    • @kaushikshankar6051
      @kaushikshankar6051 2 ปีที่แล้ว

      I have written a similar code but im getting 1 extra user id (24 instead of 23). Could anyone tell me what is wrong.
      select count(distinct(a.user_id)) from marketing_campaign a
      inner join (select user_id,created_at,b.product_id from marketing_campaign b group by 1,2,3 )b on a.user_id = b.user_id and a.created_at < b.created_at and a.product_id b.product_id
      ;

  • @yujiefu2251
    @yujiefu2251 3 ปีที่แล้ว

    I have learned a lot from you. I am trying to use your framework to in my following interviews.

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      Thanks great. Glad you watched the videos!

  • @himanish2006
    @himanish2006 2 ปีที่แล้ว

    After getting a job will stratscratch help me to become expert in database or this is just for interviews ?

  • @aakashbansal314
    @aakashbansal314 3 ปีที่แล้ว +1

    QQ Nate: If we remove 1st subsql, then also code should work. What is the purpose to select user_id from the subsql? We should just find those combination of concat(userid,productid) that are bought on 1st day, and just filter out from main table. please suggest.

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      I believe you can do it that way. My code isn't the most efficient. A lot of that has to do with how I would explain the code to an interviewer. It's sometimes easier to write less efficient code but have the code follow logic step by step, especially if you're trying to walk someone through your solution. The last part of the interviews are usually talking about how to optimize the code, so that's where your version would work.

    • @aakashbansal314
      @aakashbansal314 3 ปีที่แล้ว

      @@stratascratch thank you for your advise.

  • @shipragupta4382
    @shipragupta4382 3 ปีที่แล้ว

    Amazing Nate. Thanks for this

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      Thank you for watching the video!

  • @459B
    @459B 4 ปีที่แล้ว +4

    A difficult but inspiring question

    • @stratascratch
      @stratascratch  4 ปีที่แล้ว +5

      Probably one of the more difficult questions you'd see out there. The last part in handling one of the edge cases was definitely difficult so I hope I explained it right.

  • @winnumber101
    @winnumber101 3 ปีที่แล้ว

    The actual process for uncovering this is so elegant though-the initial word problem made me discouraged tbh

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +1

      It's definitely not great phrasing on the problem. I may have reworded it a bit on the platform but it's probably still pretty difficult to really understand. This was a last rounder, in-person interview question that's meant to be comprehensive so I tried my best to word the problem as concisely as possible. Thanks for the feedback and thanks for watching!

  • @laurak5196
    @laurak5196 3 ปีที่แล้ว

    Nate this is a really helpful video, as are all of these where you work through a real problem. As someone wanting to become a data analyst, I’m wondering how in-depth my SQL interview questions are likely to be, vs those for a data science position. Any idea?

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +2

      For SQL, there is no difference between a DA and DS. The DS position differs because you need to know much more modeling and statistics.

    • @laurak5196
      @laurak5196 3 ปีที่แล้ว

      @@stratascratch thanks Nate!

  • @Sarthak631
    @Sarthak631 2 ปีที่แล้ว

    Will it work if instead of concatenating the user_id and product_id column, we just take product_id from the subquery and do product_id NOT IN subquery?

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว +1

      That sounds like a good approach. You can code it up on the platform and see if it produces the same output.

    • @Sarthak631
      @Sarthak631 2 ปีที่แล้ว

      @@stratascratch Yes, I will try it and share the results with you in the comments.

    • @Sarthak631
      @Sarthak631 2 ปีที่แล้ว

      @@stratascratch I got your point about why we need to use CONCAT. If we don't create a new column, we will lose all the required rows as wel because they have the same ID's as well. Very nice question and an even better explanation.

  • @AmanSingh-od2ue
    @AmanSingh-od2ue 4 ปีที่แล้ว

    Hey Nate,
    I don't understand why do we have to remove all the users with rank 1 in the second part of the query. In that, not all users buy the same product on a different date!

    • @stratascratch
      @stratascratch  4 ปีที่แล้ว +1

      I'm actually only removing user-product combo for the user's 1st purchase because I don't care about their 1st transaction. If the user purchased the same items on a different date, then those transactions are also removed, which is what we want to do according to the question. What is kept are the products that are different than those the user purchased in a future transactions. This is definitely a difficult question to keep all cases in your head. What I'd do is follow a few of the users that I mentioned in the video. Head over to the platform and for each logic in the query, output the user and products purchased. Then see how specific rules filter out users. Hope that helps!

    • @AmanSingh-od2ue
      @AmanSingh-od2ue 4 ปีที่แล้ว

      @@stratascratch Yeah, that helps now. I got confused for a while but I got it now. Thanks!!

  • @shekharbagwe
    @shekharbagwe 3 ปีที่แล้ว

    How about this?
    select count(distinct user_id) from marketing_campaign where concat(user_id,'_',product_id) not in(
    select usr_prd from (select concat(user_id,'_',product_id) usr_prd, created_at, rank() over(partition by user_id order by created_at asc) rnk from marketing_campaign ) up where rnk = 1
    );
    subquery creates a list of user_id+product_id for all first transactions. I am using rank window function to identify all first transactions. After that using "not in" clause to select all other combinations.

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      Looks like it would work! Test it out on the platform!

    • @shekharbagwe
      @shekharbagwe 3 ปีที่แล้ว

      @@stratascratch yeah i tested this, works fine :)

  • @xl2274
    @xl2274 2 ปีที่แล้ว

    Great video! But, I'm wondering whether there is a candidate who can finish this problem within 30 min during a Amazon tech interview.

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว +1

      This is a question that is likely for a white boarding session that would take place on an on-site interview. I believe you could finish the problem in 30-min on a white board since it's more collaborative with the interviewer than coding on a text editor via Zoom. =)

  • @gill4u
    @gill4u 3 ปีที่แล้ว

    With userFirstPurchase(user_id, first_purchase_dt,created_at, product_id)
    as
    (select user_id, min(created_at) over (partition by user_id), created_at, product_id
    from marketing_campaign )
    select count(distinct user_id)
    from userFirstPurchase
    where created_at > first_purchase_dt
    and concat(user_id,product_id) not in (select concat(user_id,product_id) from userFirstPurchase where first_purchase_dt=created_at )

  • @Datapassenger_prashant
    @Datapassenger_prashant 6 หลายเดือนก่อน

    I still don't understand,why 46 is in the marketing campaign.. if on next date he bought the same product?

  • @DeepanshuKalra
    @DeepanshuKalra 4 ปีที่แล้ว +1

    Hi Nate, love your channel, and I have binge-watching your videos for the last 3 days.
    More power to you, for helping all of us.
    I came up with this solution using a self join:
    select
    COUNT(DISTINCT md1.user_id)
    from
    marketing_campaign md1 --day one
    JOIN
    marketing_campaign md2 --day two
    ON md1.user_id=md2.user_id --same user purchasing again
    AND md1.created_at::date

    • @stratascratch
      @stratascratch  4 ปีที่แล้ว +2

      That's a nice solution. You're missing an edge case where a user cannot purchase the same items as the items in their first purchase. You have user 25 in your output but if you look at the raw data, user 25 purchase both item id 114, 115 during their 1st purchase. And then purchased the same items in the future. So 114 and 115 should not count and thus user 25 should not be counted.
      My code:
      concat((user_id),'_', (product_id)) not in
      (SELECT user_product
      FROM
      (SELECT *,
      rank() over(PARTITION BY user_id
      ORDER BY created_at) AS rn,
      concat((user_id),'_', (product_id)) AS user_product
      FROM marketing_campaign
      Handles this exact edge case.

  • @followmycrafts8811
    @followmycrafts8811 4 ปีที่แล้ว

    Thanks a lot Sir

  • @yashsinha16
    @yashsinha16 3 ปีที่แล้ว

    select count(distinct mc1.user_id)
    from marketing_campaign mc1
    join marketing_campaign mc2 on
    mc1.user_id=mc2.user_id and mc2.created_at>=mc1.created_at+interval '1 day'
    and mc2.product_id!=mc1.product_id
    i am getting 24 and not 23? can anyone tell

  • @nargisparvin4267
    @nargisparvin4267 4 ปีที่แล้ว

    Thank You !!!

    • @stratascratch
      @stratascratch  4 ปีที่แล้ว

      Thanks for watching. Please let me know if you have any topics you'd like me to cover!

    • @tahiliani22
      @tahiliani22 3 ปีที่แล้ว

      @@stratascratch advanced SQL. basically, continue with these videos, please.

  • @athraaproductions
    @athraaproductions 4 ปีที่แล้ว +1

    Do you need video editor? I'll do it for you

    • @stratascratch
      @stratascratch  4 ปีที่แล้ว

      I'm okay for now. Thanks man!

  • @goingfurther8092
    @goingfurther8092 2 ปีที่แล้ว

    My answer;
    SELECT
    COUNT(DISTINCT user_id)
    FROM(
    SELECT
    user_id
    ,created_at
    ,product_id
    ,last_order
    ,DATEDIFF(created_at,last_order) AS date_diff
    FROM(
    SELECT
    user_id
    ,created_at
    ,product_id
    ,LAG(created_at) OVER(PARTITION BY user_id ORDER BY created_at) last_order
    FROM(
    SELECT
    user_id
    ,created_at
    ,product_id
    ,ROW_NUMBER()OVER(PARTITION BY user_id,product_id ORDER BY created_at) AS row_num
    FROM marketing_campaign)a
    WHERE row_num = 1)b)c
    WHERE date_diff > 0

  • @BummerSlug
    @BummerSlug 3 ปีที่แล้ว

    User 50 bought
    product 118 x4 at a price of 35
    Then next day
    product 118 x4 at a price of 29
    Then three weeks later
    product 118 x5 at a price of 299
    Then next day
    product 118 x2 at a price of 199
    Must have been GameStop stocks.

  • @usmanyousaf841
    @usmanyousaf841 4 ปีที่แล้ว

    Hi Nate!
    Hope you are doing fine.
    Please check your email.

  • @imvikraman
    @imvikraman 5 หลายเดือนก่อน

    WITH CTE AS (
    SELECT
    user_id,
    product_id,
    created_at,
    RANK() OVER (PARTITION BY user_id ORDER BY created_at) AS rn
    FROM
    marketing_campaign
    ),
    firstDayPurchases AS ( SELECT * FROM CTE WHERE rn = 1 ),
    subsequentPurchases AS ( SELECT * FROM CTE WHERE rn > 1 )
    SELECT
    COUNT(DISTINCT user_id)
    FROM
    subsequentPurchases
    WHERE
    (user_id, product_id) NOT IN (SELECT user_id, product_id FROM firstDayPurchases)

  • @plekkchand
    @plekkchand 2 ปีที่แล้ว

    Hello. Not "hey guys".

  • @vijaycvcU
    @vijaycvcU 3 ปีที่แล้ว

    Exists - ensures that the current row is not the first transaction for the user
    Not exists - ensure that the current product in the row is bought previously
    Select count(distinct user_id) from Market_Campaign MC
    Where exists (select 1 from Market_Campaign MCI where MCI.user_id = MC.user_id and MC.created_at > MCI.created_at)
    and not exists (select 1 from Market_Campaign MCI where MC.user_id = MC.user_id and MC.created_at > MCI.created_at and MC.product_id = MCI.product_id)

  • @prasadmundewadi
    @prasadmundewadi 2 ปีที่แล้ว

    Nice explanation. But while testing it seems that it is missing one edge case. A user bought same product on Day 1 and Day 2 but he bought different products on Day3 and Day4. To cover all edge cases you can use below query:
    Here is dataset (check user_id 6):
    User_id Prod_id Created_at
    1 11 1/1/2022
    1 12 1/2/2022
    2 13 4/1/2000
    2 13 2/1/2022
    3 14 3/1/2022
    4 15 3/15/2022
    4 16 3/15/2022
    5 17 5/15/2022
    5 18 5/16/2022
    5 18 5/17/2022
    SELECT COUNT (DISTINCT USER_ID)
    FROM USER3
    WHERE USER_ID IN
    ( SELECT USER_ID
    FROM USER3
    GROUP BY USER_ID
    HAVING COUNT (DISTINCT CREATED_AT) > 1 AND COUNT (DISTINCT PRODUCT_ID) > 1)
    AND
    USER_ID NOT IN
    (
    SELECT USER_ID
    FROM (SELECT PRODUCT_ID,
    USER_ID,
    RANK () OVER (PARTITION BY USER_ID ORDER BY CREATED_AT) RN
    FROM USER3) A
    WHERE RN IN (1, 2) --rank 1 and 2 are the 1st and 2nd orders which should be different
    GROUP BY USER_ID
    HAVING COUNT (DISTINCT PRODUCT_ID) = 1
    )

  • @techiewithcamera
    @techiewithcamera ปีที่แล้ว

    My Solution:
    with first_user_purchase as
    (select *
    from
    (select user_id,
    created_at,
    string_agg(distinct product_id::varchar, ';#') first_product,
    dense_rank() over (partition by user_id order by created_at::date) rank
    from marketing_campaign
    group by 1, 2)x
    where rank=1)
    select *
    from
    (select t1.user_id,
    t1.created_at,
    t1.product_id,
    t2.first_product
    from marketing_campaign t1
    join first_user_purchase t2 on t1.user_id=t2.user_id
    and t1.created_att2.created_at)x
    where first_product not like '%'|| product_id ||'%';