SQL Window Functions In a Data Science Interview Question by Airbnb in 2021

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ก.ย. 2024

ความคิดเห็น • 33

  • @karunakaranr2473
    @karunakaranr2473 4 หลายเดือนก่อน

    Thank you for your time and effort. Explained in detail...

  • @elfridhasman4181
    @elfridhasman4181 2 ปีที่แล้ว

    Thank you very much Nate🤗🤗🤗

  • @AdnanKashemOxe
    @AdnanKashemOxe 3 ปีที่แล้ว

    Your contents are awesome. Keep up the good work.. Best Wishes

  • @bobbybarnes4249
    @bobbybarnes4249 3 ปีที่แล้ว

    Thank you for breaking down each step. Love the videos!

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      Thanks for watching! more videos to come

  • @manavsaxena5579
    @manavsaxena5579 3 ปีที่แล้ว

    Came across your channel by accident. It took me One Video to Subscribe! Great Content, please keep it coming.

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      Thanks so much. Let me know if you have any topic requests or have any feedback!

  • @poesaste
    @poesaste 3 ปีที่แล้ว

    This was great Nate 👍

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +2

      Appreciate the kind words. I wrap up window functions next week with a video on the 4 different types of window functions you'll see on a coding interview and examples of each. Hope you watch that one too.

  • @RedShipsofSpainAgain
    @RedShipsofSpainAgain 3 ปีที่แล้ว +3

    Really great Nate. Curious how you would implement this in Python using Pandas?

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +1

      If you go to the link in the description, I have the python solution in the platform =)

  • @ashimneupane1234
    @ashimneupane1234 3 ปีที่แล้ว +1

    Thank you

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      Thanks for watching. Feel free to provide comments, new ideas, and feedback.

  • @dwaipayansaha4443
    @dwaipayansaha4443 2 ปีที่แล้ว +1

    My solution,
    with t1 as (select year(host_since) yearly,count(id) pres_year,lag(count(id),1,0) over(order by year(host_since)) prev_year from airbnb_search_details
    group by yearly)
    select yearly,pres_year,prev_year,round((pres_year-prev_year)/prev_year,2)*100 per_growth from t1
    order by yearly

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว

      That is awesome! Thank you for sharing.

  • @arijitchakraborty266
    @arijitchakraborty266 3 ปีที่แล้ว

    If you kindly post the test data, it will be much helpful. Thanks much !!

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว +1

      There's a link to the test data in the description (platform.stratascratch.com/coding-question?id=9637&python=). You can code on the platform. It's totally free to use the data and execute the code.

    • @arijitchakraborty266
      @arijitchakraborty266 3 ปีที่แล้ว

      @@stratascratch Take love Sir ! What you are doing just awesome. God bless you.

  • @pramodyadav4422
    @pramodyadav4422 2 ปีที่แล้ว

    Hello Nate,
    Thanks for your videos, Each video brings a new learning.
    I'm little bit confuse regarding ::FLOAT. Sometime it works, sometime it don't.
    I started to use CAST function to do this.
    Can you please explain when ::FLOAT works?
    I was using MySQL(Beta) as interpreter. I remember few days back it worked for me with ::FLOAT.
    But for this question it gave an error.

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว +1

      ::float is a postgres notation for cast(column as float). You can use cast() for either postgres or mysql. But ::float is a postgres syntax so it only works for postgres.

    • @pramodyadav4422
      @pramodyadav4422 2 ปีที่แล้ว

      @@stratascratch thank you for clarifying.
      I didn't know I learnt postgres as well while watching your videos 😊

  • @nasos00
    @nasos00 2 ปีที่แล้ว

    The date field mentions "host since". Doesn't that mean that we need to count that host for every subsequent year as well? So before using the lag function, shouldn't we compute the cumulative sum of current_year_host? Of course this assumes that once they registered as a host they remained a host and didn't change their status. Anyway, curious about your thoughts on this.

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว +1

      That's a good assumption and something I would consider. However, when I interpret the question, it's taking the # of hosts registered in the current year vs previous year. So I'm not caring about hosts from other years and thus I don't take the cum sum. That's my way of thinking about the problem and interpreting it. But with most data science questions, there are grey areas and it's always about answering the question given the business context. This is a question I would talk through with the interviewer to get a better understanding of what he/she wants.

  • @shobhamourya8396
    @shobhamourya8396 2 ปีที่แล้ว

    Would it affect the performance if I used count(*) instead of count(id) ?
    Here's my solution using CTE:
    with cte1
    as
    (
    select id, EXTRACT(YEAR from host_since) curr_year
    from airbnb_search_details
    order by curr_year
    ),
    cte2
    as
    (
    select curr_year, count(id) hosts_curr_year
    from cte1
    group by curr_year
    order by curr_year
    ),
    cte3
    as
    (
    select curr_year, hosts_curr_year,
    lag(hosts_curr_year, 1) over(order by curr_year) hosts_prev_year
    from cte2
    )
    select curr_year, hosts_curr_year, hosts_prev_year,
    round(((hosts_curr_year - hosts_prev_year)*1.0/hosts_prev_year*1.0)*100.00::Decimal) as growth_rate
    from cte3

    • @stratascratch
      @stratascratch  2 ปีที่แล้ว

      It won't effect performance. But you should ask yourself you're interested in counting the rows or counting non-null rows!

    • @shobhamourya8396
      @shobhamourya8396 2 ปีที่แล้ว

      @@stratascratch count(*) will count all rows, including Null values of the aggregating column whereas count(id) will count only not-Null values of column id for the aggregating column. In this dataset, host ids are unique and not-Null so it won't make a difference.

  • @bananaboydan3642
    @bananaboydan3642 ปีที่แล้ว

    why is just checking if our solution is correct a premium feature? or am i doing something wrong on the site

    • @stratascratch
      @stratascratch  ปีที่แล้ว

      Checking your solution is a premium feature. Most platforms will lock the question completely if it's premium. We allow you to execute code and manually check the output of your solution. But the solution itself is a premium feature.

  • @MadSKWIREL
    @MadSKWIREL 3 ปีที่แล้ว

    Not the best showcase of the window function, because self "join on t1.year = t2.year +1" is more simple and intuitive solution.

    • @stratascratch
      @stratascratch  3 ปีที่แล้ว

      That's true. A self join with a year = year+1 is another great way of solving the problem. It would save you from actually implementing the lag(). Most things would stay about the same as the calculations use both the current year and last year values. Thanks for actually watching the coding part of the video.

  • @kritiverma1342
    @kritiverma1342 2 ปีที่แล้ว +1

    Same method:
    Select year, current, prev, ROUND(((current-prev)/prev::FLOAT)*100) as perc_growth
    FROM(
    (select extract(year from host_since::DATE) as year, count(id) as current,
    LAG(count(id) , 1) Over (Order by extract(year from host_since::DATE)) as prev
    from airbnb_search_details
    WHERE host_since IS NOT NULL
    GROUP BY extract(year from host_since::DATE))) sub

    • @anggipermanaharianja6122
      @anggipermanaharianja6122 2 ปีที่แล้ว

      yours is basically true, for readability I think it would be better if you put it like this:
      with cte as (
      select extract(year from host_since::date) as curr_year,
      count(id) as curr_year_count
      from airbnb_search_details
      group by 1
      order by 1 asc
      )
      select cte.curr_year as current_year,
      cte.curr_year - 1 as previous_year,
      coalesce(100 * (cte.curr_year_count - lag(curr_year_count, 1) over (order by curr_year asc)) / lag(curr_year_count, 1) over (order by curr_year asc),0) as growth_pct
      from cte

  • @techiewithcamera
    @techiewithcamera ปีที่แล้ว

    My Solution:
    select
    airbnb_year,
    curr_host_cnt,
    prev_host_cnt,
    round((curr_host_cnt-prev_host_cnt)/prev_host_cnt::numeric, 2)*100 growth_rate
    from
    (select
    to_char(host_since , 'YYYY') airbnb_year,
    count(distinct id) curr_host_cnt,
    lag(count(distinct id)) over (order by to_char(host_since , 'YYYY')) prev_host_cnt
    from airbnb_search_details
    group by 1)t1;