Advanced Facebook Data Science SQL interview question [RANK()]

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ต.ค. 2024

ความคิดเห็น • 79

  • @rick2591
    @rick2591 2 ปีที่แล้ว +4

    I took a different approach:
    with sums as (
    SELECT country,
    sum(case when created_at between cast('2019-12-1' as date) and cast('2019-12-31' as date) then number else 0 end) dec,
    sum(case when created_at between cast('2020-1-1' as date) and cast('2020-1-31' as date) then number else 0 end) jan
    FROM users a inner join counts b on a.user_id=b.user_id)
    group by country),
    ranks as (
    select country,
    dense_rank() over (partition by country order by dec) decRank,
    dense_rank() over (partition by country order by jan) JanRank
    from sums)
    select country
    from ranks
    where janRank < decRank

  • @kumarshishir92
    @kumarshishir92 3 ปีที่แล้ว +14

    Thanks for the walkthrough bro! Key takeaways: Understand data, formulate approach and then write code. BTW you used 2 CTEs which means 2 different joins on the same set of tables. You could have just used the one join along with sum(case when ...) statements to sum up the Dec and Jan comments. Just some food for thought. Cheers!

  • @ishannahsi
    @ishannahsi 3 ปีที่แล้ว +5

    For sure the best resources online for interview prep for not just solving SQL problem but also approaching it conceptually. Thanks again.

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +1

      Thanks for watching! Really appreciate the support.

  • @d.xinshengguo970
    @d.xinshengguo970 3 ปีที่แล้ว +2

    Just wanted to say that your videos and the StrataScratch platform have been incredibly helpful in preparing for interviews. Thanks for the amazing content.

  • @sandeepreddy4178
    @sandeepreddy4178 2 ปีที่แล้ว +6

    This is much faster and efficient:
    With cte as
    (
    select base.*, dense_rank() over(partition by year_month order by total_comments desc) as ranks from
    (Select country, format(created_at, ‘YYYY-MM) year_month, sum(comments) total_comments
    From fb_active_users u
    inner join comments_per_user c
    On c.user_id = u.user_id and
    created_at between ‘2019-12-01’ and ‘2020-01-30’
    Group by country, year_month) base
    )
    select cte.country, ranks, lead(ranks) over (partition by country order by year_month) as next_month_ranks from cte
    where next_month_ranks < ranks;

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว +1

      Love it! Thanks for sharing

    • @harshalpatel555
      @harshalpatel555 2 ปีที่แล้ว

      I like your query but I think Sandeep there is flaw when u r calculating the rank based on partition of only year and month , as it is calculating the rank for Dec or Jan month by comparing with all the Dec or Jan dates of all countries. What if we put country also in that partition so it would give us the rank based on country and year_month. Please correct me if I am going wrong way

    • @harshalpatel555
      @harshalpatel555 2 ปีที่แล้ว

      We can discuss more on this Sandeep if you are okay.

    • @sandeepreddy4178
      @sandeepreddy4178 2 ปีที่แล้ว

      Could work! I honestly didn’t run this against real data, just thought using a two level ranking function could make the query smaller so put it in here quick and dirty! So whatever works best 😊

  • @rajyalakshmikadiyala6002
    @rajyalakshmikadiyala6002 2 ปีที่แล้ว

    Can't we use Rank function here in window function

  • @callravik
    @callravik 2 ปีที่แล้ว

    God, am I glad I found StrataScratch or what?

  • @SelmanAy
    @SelmanAy 3 ปีที่แล้ว +1

    ​ @Nate at StrataScratch Hi Nate, I love your videos and they are very helpful, would you also prefer similar videos for python? for those people who prepares data engineer interviews ( like me :) ) would be very useful as well. Thanks for creating amazing content.

  • @shobhamourya8396
    @shobhamourya8396 2 ปีที่แล้ว

    Hi Nate, here's my solution without left join
    with cte1
    as
    (
    -- for users who have commented get the country details
    -- extract the month and year from created_at
    select c.user_id, u.country,
    extract(YEAR from created_at) as year,
    extract(MONTH from created_at) as month,
    number_of_comments as comments
    from fb_comments_count c,
    fb_active_users u
    where c.user_id = u.user_id
    ),
    cte2
    as
    (
    -- for each country find all the comments in dec and jan
    select country,
    sum(case when month = 12 and year = '2019' then comments else 0 end ) as dec_comments,
    sum(case when month = 1 and year = '2020' then comments else 0 end ) as jan_comments
    from cte1
    group by country
    order by country
    ),
    cte3
    as
    (
    -- rank the each country based on dec comments and jan comments
    select country,dec_comments,
    dense_rank() over(order by dec_comments desc) dec_rank,
    jan_comments,
    dense_rank() over(order by jan_comments desc) jan_rank
    from cte2
    order by country
    )
    -- highest number of comments get first rank, second-highest second rank
    -- lesser rank in Jan means risen in ranking
    --select country, jan_comments,jan_rank, dec_comments, dec_rank
    select country
    from cte3
    where jan_rank < dec_rank
    order by country

  • @shivaniagarwal1841
    @shivaniagarwal1841 3 ปีที่แล้ว +3

    Hi, thank you for your time and effort in creating these videos! Immensely helpful!I had a query about the date syntax - Is there a specific reason you use the operator clause for filtering instead of between?

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +1

      No there was no specific reason. In reality, you can use either. I often switch. between the two myself. Thanks for watching!

  • @sauravkumar9454
    @sauravkumar9454 3 ปีที่แล้ว +2

    Great Explanation Nate. Thanks. Btw I also implemented the solution using group by on "Country" and "Month" and pivoted the table on the month column then used dense_rank() over "December" and "January" to extract the country that improved their ranking. It sounds complex though.😁

  • @oscararmandocisnerosruvalc8503
    @oscararmandocisnerosruvalc8503 2 ปีที่แล้ว +1

    Bro, your channel is just gold !!!! I´ve been spending 3 hours a day watching every day because before watching your videos I was not able to solve a single question, but since I started watching your videos everything seems to be a piece of cake.

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว

      Wow, thanks! Music to my ears. I am happy the videos I prepared, together with my team have helped you.

  • @jaymo2024
    @jaymo2024 2 ปีที่แล้ว +1

    I believe the online ide is not working properly, unless you put in the expected result. Can someone please explain, why it is throwing errors when I use user_id and country columns (it is saying column name is ambiguous). I had to comment it out in order for the above code to run.
    SELECT
    --user_id,
    sum(number_of_comments) as comments_dec,
    --country,
    dense_rank() over(order by sum(number_of_comments)desc) as country_rank
    FROM fb_active_users as a
    LEFT JOIN fb_comments_count as b
    ON a.user_id = b.user_id
    WHERE created_at BETWEEN DATE('2019-12-01') AND ('2019-12-31') and country is not null
    GROUP BY country;

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว +1

      It's throwing an error because there's column user_id in both tables. So, yes, these columns are ambiguous. What you're missing is an alias in front of the column user_id. Which alias you'll use depends on which table you want to fetch user_id from. But if you're following my solution, there's no user_id in the SELECT statement; only country, SUM(), and DENSE_RANK()

    • @jaymo2024
      @jaymo2024 2 ปีที่แล้ว +1

      @@stratascratch I was a super noop when I wrote that comment but I know now. Thanks 🙏🏾 for the tutorials

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว

      @@jaymo2024 Everyone has to start somewhere. You're doing great!

    • @jaymo2024
      @jaymo2024 2 ปีที่แล้ว

      @@stratascratch I don’t think I should have been working on window functions as a beginner. I hope you make a video guide of the sql concepts to learn first from A to Z

  • @parantikaghosh4396
    @parantikaghosh4396 3 ปีที่แล้ว +1

    you're the best SQL teacher I ever got!! :O amazing explanations :)

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      Thanks for watching! Really appreciate it 😎

  • @priyapatil724
    @priyapatil724 2 ปีที่แล้ว

    I have doubt that in your output Australia is having same number of comments in both the months means there is no rise so how it should be in the output. Was is necessary to use ranking function? Instead can we compare the comments?

  • @faiazrummankhan5589
    @faiazrummankhan5589 3 ปีที่แล้ว +1

    Your videos helped me a lot in increasing my SQL skills ! Do you use any other language (python / R) at your work ?! Thanks !

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +1

      Yes, I use python a lot! Some people use R but python helps with automation

  • @yuthpatirathi2719
    @yuthpatirathi2719 2 ปีที่แล้ว

    Thanks nate

  • @ashokvishalsr5591
    @ashokvishalsr5591 2 ปีที่แล้ว

    That was awesome!💥🔥💡 Thanks a lot.

  • @bindidesai2191
    @bindidesai2191 2 ปีที่แล้ว

    An attendance log for every student in a school district attendance_events :
    date | student_id | attendance
    • A summary table with demographics for each student in the district all_
    students : student_id | school_id | grade_level | date_of_birth | hometown
    Using this data, you could answer questions like the following:
    • What percent of students attend school on their birthday?
    • Which grade level had the largest drop in attendance between yesterday
    and today?

  • @manphu2515
    @manphu2515 2 ปีที่แล้ว

    Great solution, learn a lot from this. Thank you

  • @tigerbear3038
    @tigerbear3038 2 ปีที่แล้ว

    Hey Nate, your "created_at

    • @rick2591
      @rick2591 2 ปีที่แล้ว

      Not a huge deal as the optimizer will determine this and switch to an inner join.

  • @beyzayurt5350
    @beyzayurt5350 3 ปีที่แล้ว +1

    Amazing video! One question: Every Facebook Data Science interview example I've seen included SQL only, so I was wondering if FB only asks SQL coding questions? Or do they also ask Python?

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +1

      I think most people decide to use SQL. You can pick any scripting language to answer coding questions on FB interviews (from my experience).

    • @SelmanAy
      @SelmanAy 3 ปีที่แล้ว

      Hi Nate, I love your videos and they are very helpful, would you also prefer similar videos for python? for those people who prepares data engineer interviews ( like me :) ) would be very useful as well. Thanks for creating amazing content.

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +3

      @@SelmanAy Yes, I've been playing around with the idea of some python videos! I'm planning to add some algorithm questions that pops up on a few interviews so I may use python for that. Thanks for watching!

    • @SelmanAy
      @SelmanAy 3 ปีที่แล้ว

      @@stratascratch so happy to hear this! looking forward to it!

  • @fadwa2413
    @fadwa2413 3 ปีที่แล้ว

    best of the best
    please, keep going

  • @joaojoca100
    @joaojoca100 2 ปีที่แล้ว

    this is a hard question....

  • @kebincui
    @kebincui 2 ปีที่แล้ว

    Thanks Nate👍🏻👍🏻👍🏻

  • @oguzhanakkurt3947
    @oguzhanakkurt3947 3 ปีที่แล้ว

    If you can shoot a video each week, it will be amazing for us. Thanks Nate :)

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +1

      That was my original plan! It's been difficult but I have more videos coming out soon.

  • @ayeoh47
    @ayeoh47 2 ปีที่แล้ว

    no Coalesce?

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว

      If you wanted to, I supposed you could. But it wasn't required for this question.

  • @plttji2615
    @plttji2615 3 ปีที่แล้ว

    Hello, I've been watching this youtube channel a lot and found it really helpful. If you don't mind me asking a simple question?. Although I saw your video related which kinds of interview questions do DS role has to take, is it quite common to solve normal algorithm coding test (the one that for SWE for usual) for a DS interview? Thank you.

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +1

      Yea it's pretty common these days to be asked algo questions on a DS interview. It might not be very common for a data analyst or similar position but for a DS it's starting to trend. I'm hoping to do a few videos on this once I add these types of questions to my platform. Thanks for watching!

    • @plttji2615
      @plttji2615 3 ปีที่แล้ว +2

      @@stratascratch Thank you so much for answering my question. I got a lot of help from watching your channel as a MS DS student. Thank yoiu

  • @walter_ullon
    @walter_ullon 2 ปีที่แล้ว

    Basically, I'm screwed... :(

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว +1

      Keep practicing! it gets much easier!

  • @joaopedroreissilva7075
    @joaopedroreissilva7075 3 ปีที่แล้ว

    That was AMAZING!

  • @michaelb1082
    @michaelb1082 3 ปีที่แล้ว

    This explanation is so good. I wish I found this channel a few months earlier!

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +1

      Thanks for watching, buddy!

  • @playlist4371
    @playlist4371 3 ปีที่แล้ว

    Thanks for this Nate, really helpful. Quick question: is there a reason why you are summing in the window functions? I am assuming each country only has one row

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      The window function just helps with the ranking. I'm not partitioning my data but just taking the ranking over the entire window (which is the entire dataset) so that I can get a ranking.

    • @emanne8067
      @emanne8067 2 ปีที่แล้ว

      Hi @@stratascratch: Shouldn't the dense_rank be ordered by num_of_comments rather than sum(num_of_comments)

  • @szco9814
    @szco9814 3 ปีที่แล้ว

    Dude, you rock!

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      Thanks, dude! Really appreciate it!

  • @nikhilverma2605
    @nikhilverma2605 3 ปีที่แล้ว

    Should i learn MYSQL or Postgresql
    please tell me that

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +2

      Given the 2, I would learn postgres bc it's the engine is created for analytics in mind. Many analytical teams use postgres. But in practice, it doesn't really matter. Most industry jobs will use other engines like Snowflake or HIVE. What's important is that you learn any SQL engine and learn to think about how to solve analytical questions.

    • @nikhilverma2605
      @nikhilverma2605 3 ปีที่แล้ว

      Would you reccomend learning from udemy

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      @@nikhilverma2605 yea some courses are ok. I don’t know if any Udemy courses unfortunately. Try some free courses first like Mode Analytics before paying.

    • @nikhilverma2605
      @nikhilverma2605 3 ปีที่แล้ว

      well what about data analytic by google at cousera

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      @@nikhilverma2605 ahh yea that's right. I forgot about that one. I hear great things about it.

  • @y_yy_2844
    @y_yy_2844 2 ปีที่แล้ว

    Obtain country and count of comments by month. Restrict it to December 2019 and January 2020 comments. Subtract December 2019 from January 2020 value. See the country that had the maximum positive difference.
    I could do this query with click and drag in about 3 seconds in Microsoft Access, but no, let's write out SQL code like it's 1976.

    • @ssoossdd
      @ssoossdd 2 ปีที่แล้ว

      Now try to do this with 1 billion rows , ok buddy?

    • @y_yy_2844
      @y_yy_2844 2 ปีที่แล้ว

      @@ssoossdd ? You just make the query in MS Access so all t he SQL is written for you. Then copy it to whatever else you're using.

  • @rameespudussery2700
    @rameespudussery2700 2 ปีที่แล้ว

    Hi Nate. Thanks for your video. I tried the below code to get sum of number of comments for Dec and Jan. But I am getting a different number when compared with yours. For Eg : In USA for Dec and Jan i am getting 11 and 9 respectively.
    But for you both months are 11. Confused!!
    select COUNTRY,
    SUM(CASE WHEN MONTH(created_at)='12' AND Year(created_at)='2019' THEN number_of_comments END) as dec_2019,
    SUM(CASE WHEN MONTH(created_at)='1' AND Year(created_at)='2020' THEN number_of_comments END) as Jan_2020
    from fb_active_users a
    LEFT JOIN fb_comments_count b
    ON a.user_id=b.user_id
    GROUP BY COUNTRY;

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว +1

      Hey, thanks for giving it a try. Can you leave a comment in the discussion forum on the platform? Someone from my team will be able to help much more quickly with technical questions!

    • @rameespudussery2700
      @rameespudussery2700 2 ปีที่แล้ว

      @@stratascratch Sure. Thanks for your reply