SQL Window Functions on Data Science Interviews in 2021 | Asked By Airbnb, Netflix, Twitter, Uber

StrataScratch

มุมมอง 80 877

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 21 ก.ค. 2024
Let's cover 4 SQL window function interview questions asked on Airbnb, Netflix, Twitter, and Uber data science interviews. I cover the questions and how to solve them, and exactly what the functions are doing to the data.
Link to questions:
Aggregate Functions -- Uber: platform.stratascratch.com/co...
Ranking -- Twitter: platform.stratascratch.com/co...
NTILE() -- Netflix: platform.stratascratch.com/co...
LEAD/LAG -- Airbnb: platform.stratascratch.com/co...
______________________________________________________________________
This series is for both beginner and intermediate data scientists and analysts interested in learning how to solve common data science interview questions in SQL.
👉 Subscribe to my channel: bit.ly/2GsFxmA
👉 Playlist for more data science interview questions and answers: bit.ly/3jifw81
👉 Practice more real data science interview questions: platform.stratascratch.com/co...
______________________________________________________________________
Timestamps:
Intro: (0:00)
Aggregate Functions: (1:03)
Rankings(): (7:35)
Ntile(): (11:27)
Lag/Lead: (16:00)
Conclusion: (21:05)
______________________________________________________________________
About The Platform:
I'm using StrataScratch, a platform that allows you to practice real data science interview questions. There are over 1000+ interview questions that cover coding (SQL and python), statistics, probability, product sense, and business cases.
So, if you want more interview practice with real data science interview questions, visit platform.stratascratch.com/co.... All questions are free and you can even execute SQL and python code in the IDE, but if you want to check out the solutions from me or from other users, you can use ss15 for a 15% discount on the premium plans.
______________________________________________________________________
Contact:
If you have any questions, comments, or feedback, please leave them here!
Feel free to also email me at nathan@stratascratch.com
#sql #datascience #sqlinterviews

ความคิดเห็น • 174

@tarekelias463 3 ปีที่แล้ว ⁺⁶
Dude that was so helpful! You’re a great teacher as well
@luckykukreja95 3 ปีที่แล้ว ⁺³
Loved the problems curated, and the way you explained the function with a problem to practice it upon! Awesome work!
@stratascratch 3 ปีที่แล้ว
Thanks for watching! Let me know if there are other topics you'd like me to cover and I'll try my best.
@AnnieCushing 3 ปีที่แล้ว ⁺³⁴
LOVE the way you chart out your approach before you start coding. I'm adopting that stat. I learned so much from this video and then discovered you have a playlist of similar videos. Feels like Christmas.
@stratascratch 3 ปีที่แล้ว ⁺²
Thanks for watching! There are dozens of videos you can watch if you want to watch how I answer questions =) I use the same framework every time.
@glin96 2 ปีที่แล้ว ⁺²
I love how you make every question so easy to understand after you break it down! Thank you for sharing your knowledge!
@stratascratch 2 ปีที่แล้ว
Glad you appreciate it! Thanks for watching, Gabby! 😎
@alinazem6662 2 ปีที่แล้ว
Loving the way you lay out the problem and break it down to bits. That "VISION" is gold. Thank you for every second of your time you spend educating people.
@Zzzz-hk5ft 2 ปีที่แล้ว
Great stuff here, I went in having only used windowed functions by copy and pasting from stackoverflow, but after watching this I really feel like I understand what is going on. Hope I get to the next round of my interview tomorrow.. wish me luck!
@stratascratch 2 ปีที่แล้ว
Glad I could help! And good luck on your interview.
@abhishekarora1 3 ปีที่แล้ว ⁺¹
Really loved your video! Appreciate all the effort you put in it. From good selection on question to very good explanation.
@stratascratch 3 ปีที่แล้ว ⁺¹
Thank you so much! Will keep going and making more videos!
@hugomichaelnunezmejia3384 3 ปีที่แล้ว ⁺⁴
Thank you for posting this video along with the website where we can practice the queries. Very well explained!
For the ranking part of the video I was able to get the same result with only one derived table instead of two and no grouping clause. I think this query is a bit easier to digest:
select distinct department,salary, ranking
from
(
select *, dense_rank() over( partition by department order by salary desc ) as Ranking
from twitter_employee
) DerivedTable1
where ranking < 4
order by department, salary desc
Also, for the Percentile Fraud section, your query returns 6 rows instead of 5 for Texas. I was able to fix this problem by incrementing Ntile from 100 to 101
@stratascratch 3 ปีที่แล้ว ⁺²
I like it! Thanks for engaging with the video and actually coding with me!
@stutidey5902 2 ปีที่แล้ว
This video has become my go-to refresher before any interview
@niveditakumari701 3 ปีที่แล้ว
Really awesome to clarify the concepts and their usage, loved it
@tamago_roe 2 ปีที่แล้ว
This is THE ONE video that I review before every single one of my interviews. Thanks for the video!
@stratascratch 2 ปีที่แล้ว
Glad it was helpful!
@granand 2 ปีที่แล้ว
Really awesome mate. I subscribed and happy to know if you have a awesome courses you teach which is as unique and productive as these. Thank you
@bananaboydan3642 ปีที่แล้ว
Amazing video man, exactly what i was looking for
@davidallantodd7999 ปีที่แล้ว
Huge thumbs up for being so crisp and concise, which imho projects a more positive character than content with fluff and frills.
@swapnilrana3080 2 ปีที่แล้ว
Thanks a lot for creating this awesome channel, its amazingly well organized and makes things easy.
@shaunliu 3 ปีที่แล้ว
Loving the example walkthroughs, very helpful, thank you!
@stratascratch 3 ปีที่แล้ว ⁺¹
Thank you for watching! Hope you enjoy the series of walkthroughs
@judyhe686 3 ปีที่แล้ว ⁺²
Thanks for the video! I have 4 questions regarding the Distance to Dollar example that confused me: 1) why do we need to also group by dist_to_cost and avg_dist_to_cost when calculating their difference for each request date? I thought we only need to group by request date to get the difference for that date, which is returning an error... 2) the question asks about the avg difference for each year_month, does that mean we need to further group by year_month to get that average? 3) when calculating average dist_to_cost for each year_month, I noticed you used avg function, I'm actually thinking of using total sum of dist_to_travel/total sum of monetary_cost for each year_month which is giving a different average value, I wonder why should we use avg function here? 4)do we assume abs is needed to get the difference between dist_to_cost and avg_dist_to_cost since the question doesn't specify that? Thanks a lot!!
@sugandhtiwari1667 3 ปีที่แล้ว
You are really good at this , best explained!!!
@BellTollsForThee 2 ปีที่แล้ว
Great content I love your videos and clarity of speech!
Thanks
@mstalponabanu6227 3 ปีที่แล้ว ⁺⁷
Love Your teaching style
@stratascratch 3 ปีที่แล้ว
Thank you for the kind words! Hope you check out more of my vids. Let me know if you have any feedback or requests for content.
@shalinigarg859 2 ปีที่แล้ว
Thank you so much. Your videos are really informative and helpful. Much appreciated 👍
@ohh_nina_nyc 3 ปีที่แล้ว
Hi Nate, thank you for the video. For the distance per dollar question. I don't think we need group by in the end. Because we just need difference per day.
@stratascratch 3 ปีที่แล้ว ⁺¹
I think that rounding function was giving me an error if I didn't add a group by? But theoretically you're right that you don't need a group by since we're just calculating the difference per day. If anything, we could be de-duplicating on accident if we're not careful.
@yingqixu7002 2 ปีที่แล้ว
Thanks for patiently going through this problem! I have a question regarding the Distance Per Dollar case. If there are multiple rides per day, should we calculate the total distance and total cost first for each day and then generate the distance-per-dollar matrics? Thanks!
@stratascratch 2 ปีที่แล้ว
Yes of course. You should definitely try the problem on the platform and then post your comments/questions on the discussion forum. Someone from my team help guide you through the problem and answer any specific questions. We try to get to it in a few days. That'll be much faster than waiting for me!
@RaviTheVlogger 2 ปีที่แล้ว
Excellent way of explanation. thank you for the video.
@julias.2712 3 ปีที่แล้ว
Thanks for all your videos, you are awesome! Quick question on the last one: If we want to also look at growth over year for each city, would we additionally include a partition before the order by year ?
@stratascratch 3 ปีที่แล้ว ⁺¹
Yea that's right, you'd include a PARTITION BY city ORDER BY year. Then you'll have a breakdown of year, city, growth. Thanks for following and it's great that you're so engaged even with the code!
@warnercooler4488 2 ปีที่แล้ว
Awesome content, well presented. Thank you so much!
@eugeniosp3 2 ปีที่แล้ว
such an underrated channel.
@user-vw9dn2rz9s 2 ปีที่แล้ว
Thank you so much for this!
@stratascratch 2 ปีที่แล้ว
Thanks, Jermaine! Appreciate it
@laurax.5738 3 ปีที่แล้ว ⁺¹
hi Nate , love your approach/video . qq , how about remove the second sub-query as following ?
select a.request_date
, round(abs(a.avgCost - AVG(a.avgCost) over (partition by a.mnth))::DECIMAL,2) as diffcost
from (
select request_date
,(distance_to_travel/monetary_cost) as avgCost
,to_char(request_date::date, 'YYYY-MM') as mnth
from uber_request_logs
) a
@stratascratch 3 ปีที่แล้ว
Yes, great optimization. The code in some of my videos isn't optimized to the extent it can be. Usually it's because I'm guiding the "interviewer" through my understanding and approach to solving the problem. Most times in an interview, you'll be asked to optimize your code at the end. In this case I would have remove the 2nd subquery. Thanks for your feedback!
@snigdho642 2 ปีที่แล้ว ⁺²
Very good explanation of the problems. Really enjoyed the video. Just one thought - on the first question about averaging the fare/km for the month, what we did here is basically an average of the ratio. Should we instead calculate it as the SUM(distance for the month)/SUM(fare for the month)?
@terryh558 ปีที่แล้ว
I agree with you.
@nadavnavon2140 2 ปีที่แล้ว
Superb!
Quick question regarding the first Uber task, how about dropping the subquery and just use:
SELECT
request_date,
distance_to_travel / monetary_cost as dist_to _cost
avg( distance_to_travel / monetary_cost) over (partition by extract(month from date_request)) as avg_dist_to_cost
FROM
uber_request_logs
@stratascratch 2 ปีที่แล้ว ⁺¹
Sounds like it would work in theory and I like the optimization. You can also test your code on the platform (links to the questions are in the description) to see if you get the same output.
@sabaamanollahi5901 2 ปีที่แล้ว
very useful, thank you
@kshitijdhakal 2 ปีที่แล้ว
Love this tutorial. It's very easy to grasp everything.
@stratascratch 2 ปีที่แล้ว ⁺¹
Thank you, glad you like it.
@ahyoungkim8256 3 ปีที่แล้ว
best sql teacher ever
@stratascratch 3 ปีที่แล้ว
Thanks for watching! Will keep uploading these videos!
@Osteomorphis 3 ปีที่แล้ว
If anyone isn't a fan of using a subquery in the WHERE clause, try using a WITH clause, aka Temporary Tables. I use WITH more often with window functions just out of habit.
@stratascratch 3 ปีที่แล้ว
Great approach. Do you just join the table to your CTE and use that as a filtering mechanism?
@Osteomorphis 3 ปีที่แล้ว ⁺²
@@stratascratch Yes. I find my brain can wrap around a temp table using WITH better than a subquery in the FROM clause.
@alexwhb122 3 ปีที่แล้ว
Fantastic video. Thank you!
@stratascratch 3 ปีที่แล้ว
Thank you for watching! Hope you like the series.
@prernakalra5281 3 ปีที่แล้ว ⁺¹
very structured content, I am finding now fun to revise some SQL concepts. Thank you Leon !
@stratascratch 3 ปีที่แล้ว ⁺¹
Hi thanks for watching! I'll keep making these videos! Let me know if there are any topics that interests you.
@prernakalra5281 3 ปีที่แล้ว ⁺¹
@@stratascratch yes , If its possible can you make one video taking one real time dataset and performing exploratory data analysis ?. would be a great help !
@stratascratch 3 ปีที่แล้ว
@@prernakalra5281 Hi Prerna that's a great idea. Are you thinking of exploratory analysis in SQL or in python? I'm also guessing you're interested in how someone would explore data when starting a project?
@prernakalra5281 3 ปีที่แล้ว
@@stratascratch Yes, I am interested of exploratory analysis in sql. can we pick any dataset from Kaggle and start answering questions using SQL Queries ?
@stratascratch 3 ปีที่แล้ว ⁺¹
@@prernakalra5281 Yes, that's something I can do. Some of my datasets on StrataScratch already come from Kaggle or other company take home exams so I'll take one of those and show you how I go about doing exploratory analysis. I'll queue this up in my schedule and see where it can fit in.
@emiliogarza6446 2 ปีที่แล้ว
Really useful, thanks a lot
@stratascratch 2 ปีที่แล้ว ⁺¹
Glad to hear that!
@VarunSood0109 2 ปีที่แล้ว
I think a more efficient solution to the Unique Salaries question would be as below:
select distinct x.department, x.salary from
(select *, dense_rank() over (partition by department order by salary desc) as salary_rank
from twitter_employee) x
where x.salary_rank < 4
order by department , salary desc;
Let me know if I am correct. Thanks! :)
@anirudhvallabh419 3 ปีที่แล้ว
For the NTILE question I think it would make more sense to ORDER BY ASC on fraud score in the window function. Currently your percentile calculation takes the highest fraud score to be 1st percentile but it should actually be 100th percentile.
So 90th percentile would mean, your score is higher than 90% of all the scores out there.
@timmytesla9655 ปีที่แล้ว
I had the same thought
@ryezizzle 3 ปีที่แล้ว
nate you the GOAT
@stratascratch 3 ปีที่แล้ว
haha thanks!
@joaopedroreissilva7075 2 ปีที่แล้ว
Another really good content, thanks.
But Nate, on the second question (ranking), this method you used to have the unique values could be done with 'DISTINCT' too, right?
About the third question(NTILE), instead of NTILE, dense_rank would works by exactly the same logic, right?
@stratascratch 2 ปีที่แล้ว ⁺¹
In theory yes, you could use DISTINCT too for the 2nd question. Give it a try on the platform and see if the output is the same. NTILE and dense_rank aren't the same. You might be able to use the 2 functions interchangeably due to the dataset and question, but I wouldn't think of the 2 functions as the same. The use of them solely depends on the dataset and business context.
@yiangzhu8717 3 ปีที่แล้ว
Hey Nate! Thank you so much for uploading these SQL videos, and I learn a lot from your step-by-step breakdown of those questions. Now I can develop similar framework when I tackle these hard questions! I just got a question about the ranking question, and here is my code:
WITH result AS (
SELECT department,
salary,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM twitter_employee)
SELECT DISTINCT department,
salary
FROM result
WHERE salary_rank
@stratascratch 3 ปีที่แล้ว
I think the only difference is dense_rank vs rank and why you'd want to use one over the other. Otherwise I don't see any edge cases and your solution is similar to mine so no problems there either. Good job on the code! I like that you tried it out!
@PurpleTurkeyPatty 2 ปีที่แล้ว
I think that you can do this in just one subquery without the groupbuys:
select abs(round(dpd_month.dpd - avg(dpd_month.dpd) over(partition by dpd_month.request_month)::DECIMAL,2)) as difference,
dpd_month.request_date
from (
select, *
to_char(request_date::date, 'YYYY-MM') as request_month,
distance_to_travel/monetary_cost as dpd
from uber_request_logs
) dpd_month
order by dpd_month.request_date asc
@VarunSood0109 2 ปีที่แล้ว
Thanks so much for such an informative video! I have a question: Why do we even need to do a Group by here? For me, the solution ended where we selected the date and the abs column from the subquery. Kinda confused here!
@yuye2941 3 ปีที่แล้ว
Hi Nate,
Thanks for the video. May I ask why do we put "a" after order by year in 19:25? Especially we are not referring "a" in the select statement.
@stratascratch 3 ปีที่แล้ว
It's an alias given to the subquery. Sometimes it's required when writing a SQL query. You're basically "naming" the table.
@CeruleanAnthracite 2 ปีที่แล้ว
Hi, I'm confused about why you had to group by all columns in the aggregate functions section for the final query? once you got the table after doing the over(partition by...) bit, why can't we just calculate the difference b/w the columns directly? (i'm referring to the very first question you discussed in the video)
@chicagobeast12 3 ปีที่แล้ว ⁺²
In the first example with the rank function, the question states unique salaries. So wouldn't you need a SELECT statement that counts distinct salaries and filters out any with a count > 1? And then self join back into the table and then perform rank?
@stratascratch 3 ปีที่แล้ว
That's a great question. The rank() function will give the same rank for the same value, so we'll have department, salary, rank in the SELECT clause (in the inner query) with a GROUP BY of department and salary. This will group all duplicates since any salary that has a tie also has the same rank. But I agree with you that adding a distinct would be more explicit especially when someone else is reading your code.
@ramdas9761 3 ปีที่แล้ว
Really informative !
@stratascratch 3 ปีที่แล้ว
Thank you!
@pralineblues 2 ปีที่แล้ว
can someone explain why we are interested in (distance per dollar) and (the absolute difference bw that and the average distance per dollar for the month) or how they would be used as metrics
@Saboor-Hamedi 2 ปีที่แล้ว
It would be good to write query instead of skipping it, that way we can see the errors and the way you fix it
@mertcancoskun4658 3 ปีที่แล้ว ⁺¹
Superb.
@stratascratch 3 ปีที่แล้ว
Thank you!
@callravik 2 ปีที่แล้ว
I wish you included sliding window options in these tutorials.
@MaxKashirin 2 ปีที่แล้ว ⁺¹
Hi Nate, I have a question.
Why in the first task you count avg(dist_to_cost) partition by month, not SUM(distance_to_travel)/SUM(monetary_cost) partition by month ?
@sophie20324940 3 หลายเดือนก่อน
agree, I think the solution he provided is wrong. I think the distance per dollar for year month should use avg distance_per_dollar not on dist_to_cost, since each request have different distance_to_cost.
@haneulkim4902 3 ปีที่แล้ว
Thanks! When using ntile(100) what if for each partition there are more than 100 rows?
@stratascratch 3 ปีที่แล้ว ⁺²
It'll just bucket all rows into 100 partitions basically. You definitely aren't required to have only 100 rows to use ntile(100). It's just a measure of how many buckets you want. net/sql-server-window-functions/sql-server-ntile-function/
@arthuramanyire526 3 ปีที่แล้ว
Super helpful. Could you please share the dataset?
@stratascratch 3 ปีที่แล้ว
It's on the platform! Most people will just copy and paste the output to an excel
@ranpan2861 3 ปีที่แล้ว
For the NTILE question, if i'm understanding this function correctly, it just divides the data into 100 bins. So what if I want to find a percentile when the data is less than 100, say 80? Then the top 5 percentile isn't the top 5 bins anymore. Thanks
@stratascratch 3 ปีที่แล้ว
Yes, that's right. It wouldn't necessarily be the top 5 bigs. It may just be the top 4 bins that represent the top 5 percent since you'll only have 80 bins.
@classkori5507 3 ปีที่แล้ว ⁺¹
Thank you Sir
@stratascratch 3 ปีที่แล้ว
Thanks for watching!
@hendrywaldochendrawan6065 3 ปีที่แล้ว
For the second question - 11:24
why do we need to create query = a ? why not just use one with rank ID as the subquery. is there a flaw by doing so?
SELECT DEPARTMENT, SALARY, RANK_ID
FROM (
SELECT DEPARTMENT, SALARY, RANK() OVER (PARTITION BY DEPARTMENT ORDER BY SALARY DESC) AS RANK_ID
FROM TWITTER_EMPLOYEE
GROUP BY 1, 2
ORDER BY 1, 2 desc) A
WHERE RANK_ID < 4
@diftq8191 ปีที่แล้ว ⁺¹
Is there an operational definition you could give for when to use each type of window function? For partition by I've been going with: "use partition by when you want to group by something but not lose any rows" but that doesn't help me with most partition by questions. Any help here?
@stratascratch ปีที่แล้ว
I think of it more like..."what do I need to do in order to answer the question?". If it means that I need to group by something but not lose any rows, then I'll use a partition by. I just start with envisioning the output and then I pick appropriate functions and implement the right logic to get me there.
@arpan8076 2 ปีที่แล้ว
Hello
I work in a supermarket and I normally do some query in SQL Server as my basic day to day work
I want to upgrade my selft and learn more into this and get a high payable job
From where can I learn medium and advance SQL and What PL language will I need to know to be able to get a job in DA,DS ?? Can i get some useful links to study and get certified ..
Thank you
@iamyuvi009 2 ปีที่แล้ว
Hey Nate, thank you for the video.. do you see anything wrong with this code?
select department, salary from
(select department, salary, dense_rank() over(partition by department order by sal desc) drnk from twitter_employee
group by 1,2
order by 1) a
where drnk
@stratascratch 2 ปีที่แล้ว
I don't see anything wrong with it but give it a try on the platform to see if you see any syntax errors!
@SandeepKumar-nk2ru 2 ปีที่แล้ว
why can't you use rank instead of ntile in the scenario you explained?
@evazhong4419 2 ปีที่แล้ว
Hi, I have a question for the first question, You use a avg() window function, I am kinda confused that can we just use normal avg() function and group by after the from statement in this case
@stratascratch 2 ปีที่แล้ว
It's because I want to find the avg of the month. So I could probably use the avg() function with a group by, however, the a.dist_to_cost column would get in the way. So if you remove that, then you can go with your approach. But if you want all 3 columns, you'll need to take the average of the month's partition.
@Quenchedfooty 3 ปีที่แล้ว ⁺¹
Thanks SQL part is clear but a non SQL question about this example is bothering me, to compute average distance to cost - should we consider average of row averages (distance_to_travel/monetary_cost) or consider Sum(distance_to_travel) / Sum(monetary_cost ). Kindly help.
@stratascratch 3 ปีที่แล้ว
"averages (distance_to_travel/monetary_cost) " I think should work on this. If you have a specific question like this, feel free to ask the question on the platform. Someone on my team will help you within a few days.
@tarekelias463 2 ปีที่แล้ว
In the subquery you’ve used to remove duplicates, couldn’t a select distinct do the job?
@stratascratch 2 ปีที่แล้ว
But you have a count function as the 2nd column, so you have to add a group by in the subquery. It does remove the duplicates but the main purpose is to allow the count function to operate. You'll get an error if you used a distinct instead of a group by. Hope that helps!
@yuye2941 3 ปีที่แล้ว
May I also ask what does "t1) t2" means in 20:18? Thanks so much
@stratascratch 3 ปีที่แล้ว
Hi, it's the same thing as your last question. We're aliasing the subqueries and giving them a name.
@kritiverma1342 2 ปีที่แล้ว
Why do we need to group by in the very first example? I would appreciate it if you can point out a mistake in the below query
select a.request_month_yr, ROUND(ABS((a.dist_cost - AVG(a.dist_cost) OVER (partition by a.request_month_yr ))), 2) AS absolute_diff
from (
select format (request_date, ‘YYYY-MM’) AS request_month_yr, (distance_to_travel/monetary_cost) as dist_cost
FROM uber_request_logs) a
@stratascratch 2 ปีที่แล้ว
Great try! I would suggest posting this code and asking for help on our forum on the platform. Someone from my team will analyze your code and get back to you about any errors they see. This would be the fastest way to an answer!
@blackswan2020 3 ปีที่แล้ว
When do you need to do `a.column_name` vs just `column_name`?
@stratascratch 3 ปีที่แล้ว
Great question. It's mandatory if you have 2 tables with the same column name. So you need to alias the tables so SQL knows which column to use. A lot of people will alias the tables anyways and include it in their code even if all the column names are unique across both tables.
@shobhamourya8396 2 ปีที่แล้ว
Here's my solution using CTE:
with cte1
as
(
-- calculate the distance per dollar, and extract the yyyy-mm from request date
select request_date,
avg(distance_to_travel*1.0/monetary_cost*1.0) over(partition by request_date order by request_date) distance_per_dollar,
to_char(cast(request_date as date), 'YYYY-MM') as request_year_month
from uber_request_logs
),
cte2
as(
-- use the yyyy-mm to find monthly distance_per_cost
select request_date, distance_per_dollar,
avg(distance_per_dollar) over(partition by request_year_month order by request_year_month) as year_month_avg
from cte1
)
-- calculate the difference between monthly and daily distance_per_cost
select distinct request_date, round(abs(year_month_avg - distance_per_dollar)::DECIMAL,2) diff
from cte2
order by request_date
@stratascratch 2 ปีที่แล้ว ⁺¹
Great solution! Thank you for adding it to the comments.
@arielbarbosa7337 2 ปีที่แล้ว
Do you recommend any books to learn advanced sql? Thanks.
@stratascratch 2 ปีที่แล้ว ⁺¹
I haven't found any books that I liked for advanced SQL. My favorite tutorial is Mode Analytics SQL (mode.com/sql-tutorial/introduction-to-sql/). They have everything you need to learn the functions and theory. The rest is practice!
@bananaboydan3642 ปีที่แล้ว
Why cant you use distinct for unique salaries in the twitter employees example?
@stratascratch ปีที่แล้ว
You lose information if you use a distinct for unique salaries. Give it a try. The link to the question is in the description. You can code and see the output with the free tier.
@niveditakumari701 2 ปีที่แล้ว
Hey, quick question, GROUP BY should be a.request_date as not b. as we are doing AVG() and have a.request_date and a.dist_to_cost there?
@niveditakumari701 2 ปีที่แล้ว
For the 1st example here (Uber)
@stratascratch 2 ปีที่แล้ว
Not sure if it makes a difference here. I don't think we'll get a different solution.
@niveditakumari701 2 ปีที่แล้ว
@@stratascratch thanks, I just wanted to verify my understanding!
@niveditakumari701 2 ปีที่แล้ว
For the Twitter Question can we deduplicate by using DISTINCT keyword instead of GROUP BY?
@stratascratch 2 ปีที่แล้ว ⁺¹
Yes, I believe that should work.
@niveditakumari701 2 ปีที่แล้ว
@@stratascratch Thanks!
@rubyjiang8836 3 ปีที่แล้ว
can you please explain why you need to group by 3 columns in the aggregate problem? i'm very confused here. Thanks
@stratascratch 3 ปีที่แล้ว
Are you talking about @20:39? That problem? If so, that's basically the rule for using an aggregate. If you have columns that are not being aggregated, you'll need to group then for the aggregate function.
@yidanshang1382 3 ปีที่แล้ว ⁺¹
@@stratascratch I have a similar question - @7:02 why you need to group by the 3 fields? looks to me there is no need to do so to group by the request_date, dis_to_cost, and avg_dis_to_cost
@stratascratch 3 ปีที่แล้ว
@@yidanshang1382 I think for sure you need to group by the 1st two columns since avg_dis_to_cost is an avg() and the other 2 are columns. But if you group by those 3, you're basically deduplicating any rows. I dont' think you'll need to worry about duplicate rows since there probably isn't a high chance that there are duplicates between those 3 columns. So group by 1, 2 should work just fine.
@PurpleTurkeyPatty 2 ปีที่แล้ว
heres a solution for the rankings() problem:
WITH result AS (
SELECT
salary,
department,
dense_rank() OVER(partition by department order by salary desc) as rank
from twitter_employee
group by 1, 2
order by 2, 1 desc
)
SElECT department, salary
FROM RESULT
WHERE RANK
@kareemmajzoub4426 2 ปีที่แล้ว
Why did you use a group by request date in the first problem?
@stratascratch 2 ปีที่แล้ว
This is a syntax rule. All the columns appearing in the SELECT statement have to appear in the GROUP BY clause. The question requires column request_date to appear in the solution, so it also must be included in the GROUP BY clause.
@sbedekar93 3 ปีที่แล้ว
What if there are multiple years in the first Question? So the average for a month will change by the year
@stratascratch 3 ปีที่แล้ว ⁺¹
Yea, you'd need to add a year column or concat year-month to get to the right avg per month and not mix up years.
@sbedekar93 3 ปีที่แล้ว
@@stratascratch Thanks
@sumeshkumar71078 3 ปีที่แล้ว
Can I get job in FANG with only SQL experience. ?
@stratascratch 3 ปีที่แล้ว
Probably not. Most DE, DA, DS jobs will require at least another scripting language like python or R.
@manuelsojan9093 3 ปีที่แล้ว
For the ranking question, I did this and it worked, but it doesn't seem like the "correct" way:
SELECT DISTINCT Department, Salary FROM
(SELECT department, salary, DENSE_RANK() OVER(PARTITION BY department ORDER BY salary DESC) AS dense_rank from twitter_employee) AS D1
WHERE dense_rank IN (1,2,3)
order by department, salary desc
@stratascratch 3 ปีที่แล้ว ⁺¹
I like it! Looks like a correct way to me!
@manuelsojan9093 3 ปีที่แล้ว
@@stratascratch thanks!
@lanasedayanoch 3 ปีที่แล้ว
is this only for PostgreSQL?
@stratascratch 3 ปีที่แล้ว
Yes this is postgres. By the end of the year, the platform should be able to handle most db engines. What db engine are you looking for?
@mysteriousbd3743 3 ปีที่แล้ว ⁺¹
Excellent, Sir can you upload new video quickly?
@stratascratch 3 ปีที่แล้ว ⁺¹
Of course I can. I upload videos once a week! Let me know what you'e like to see.
@larryhatcher8927 2 ปีที่แล้ว
I've been working in Stratascratch for about three month doing hundreds of the problems and I feel like I have a pretty good grasp of coding in MYSQL......even in the windows functions that this video covers. I'm probably not going to get a job in a FAANG company but my question is: The coding that I have learned in Stratascratch.....Is that all I need when I see a job that requires a year of SQL? Is there more to it than just the coding?
@stratascratch 2 ปีที่แล้ว
For a job that requires a year of SQL, coding using some advance techniques like window functions and rankings is all you probably really need. Having a good grasp of how to breakdown problems and code up solutions is a bonus at that level (which I'm guessing is sort of entry level analytics if it's requiring a year of experience). Understanding the business context and best practices would probably be taught on the job. Good luck!
@larryhatcher8927 2 ปีที่แล้ว
@@stratascratch Thanks......one thing that prompted me to ask this question was several weeks ago I was on the linkedin site and just on a whim I took a skills test for MYSQL. I thought considering how many problems I done on StrataScratch I should do well. There was a lot of stuff on that test that I had not seen before. A few things about downloading data and a few other things I had not seen. These questions didn't seem to have very much to do with queries and it just crossed my mind.....is there more I should be learning.....Of course you are right...a lot of stuff would be taught on the job
@larryhatcher8927 2 ปีที่แล้ว
@@stratascratch BTW my biggest bug-a-boo in coding are the REGEXP functions. The windows functions, once understand, have not been a problem.
@ZhensongRen 2 ปีที่แล้ว
--Distance Per Dollar
SELECT DISTINCT request_mnth,
ROUND(abs(m.dis_to_cost - m.monthly_avg)::DECIMAL, 2) AS difference
FROM (SELECT
to_char(request_date, 'YYYY-MM') AS request_mnth,
distance_to_travel/monetary_cost AS dis_to_cost,
AVG(distance_to_travel/monetary_cost) OVER (PARTITION BY to_char(request_date, 'YYYY-MM')) as monthly_avg
FROM uber_request_logs) m
ORDER BY request_mnth

ต่อไป

เล่นอัตโนมัติ

SQL Window Functions In a Data Science Interview Question by Airbnb in 2021