Im new to your channel and so happy I ve found it; quite rare to find real world queries clearly explained beginning to end. I'm writing similarly long queries at work(I m a BI) and it was a bit scary at the beginning. I m curious what are some advanced SQL concepts in data science? Thank you for the great content!
A lot of the advanced concepts are things like window partitions, rankings, playing around with dates/timestamps, and subqueries. But at work, what makes things advanced is the complexity of the problem and having to deal with problems with a lot of corner cases or problems where you need to manipulate/clean the data more than normal. This makes your code long even if advanced functions aren't used. On interviews, they'll test you on the latter (questions with corner cases) and test your ability to identify and solve each corner case. Whether or not you use an advanced sql concept is up to you as you can usually solve problems without them (but it might take longer)
My sql solution:- with t1 as(select facility_name,pe_description,record_id from los_angeles_restaurant_health_inspections lashi where facility_name like '%TEA%' or facility_name like '%CAFE%' or facility_name like '%JUICE%'), t2 as(select pe_description, count(record_id) no_rec,rank() over(order by count(record_id) desc) rnk from t1 group by pe_description) select facility_name from t1 join t2 on t1.pe_description=t2.pe_description where rnk=3
is this correct? can someone check? with ranks as ( select *, dense_rank() over (order by count_of_cateogry desc) as ranks from ( select pe_description, count(*) as count_of_cateogry from los_angeles_restaurant_health_inspections where lower(facility_name) Like '%cafe%' or lower(facility_name) Like '%tea%' or lower(facility_name) Like '%juice%' group by pe_description)temp ) select la.facility_name, r.pe_description from ranks r join los_angeles_restaurant_health_inspections la on r.pe_description = la.pe_description where ranks = 3
Thanks Nate! Your videos are so useful to understand how to break down and approach interview questions. Also, just for reference, here is a simplified solution to the same question: with cte_3rd_most as ( select pe_description from ( select pe_description , dense_rank() over(order by count(facility_name) desc) as rk from ( select facility_name , pe_description from los_angeles_restaurant_health_inspections where lower(facility_name) ~ '\y(tea|cafe|juice)\y' ) as t1 group by pe_description) as t2 where rk = 3) select facility_name from los_angeles_restaurant_health_inspections where lower(facility_name) ~ '\y(tea|cafe|juice)\y' and pe_description in (select pe_description from cte_3rd_most) ;
It's all practice in my opinion. You can try on StrataScratch, Leetcode, HackerRank, etc. I would try to solve the problems yourself and look at other user solutions.
Thanks for making these videos, you are awesome. Like you explained it is really hard to find these tips specially in SQL and though process on solving problems. Usually we only find basic syntax / or anything related to what is joins are , what is rank or sql . Which we never get asked in interview.
In my experience, this type of question would appear on one of the last rounds of the interview process where the questions are often much more complex and lengthy. I would expect this to take about 15-20 minutes on the whiteboard while also talking to the interviewer.
It would be just as efficient to use a case statement vs ilike. You can give it a try in the platform (link in the description). If you get stuck, just ask for help in the discussion. Someone from my team or myself will answer you!
HI Nate, i have a problem that require to get the total lenght for these data [0,30] [5,10] [15,20] [25,40] where we dont count the overlapp more than once. Can you pls try to solve this using SQL?
@@stratascratch i see i Tried to solve it using SQL and using CTE which you have taught in your sample. Below is the code. The answer seems correct when I simulate it PostgreSQL. FIND THE TOTAL LENGHT FROM THE LIST OF SEGMENT -- 5 WITH MAX_CTE AS ( SELECT 1 x , b.l, max(b.r) max from segments b group by x, b.l order by x,b.l ) , OVERLAP AS ( SELECT 1 as x , a.l as al, c.l ,c.max , (case when a.l > c.l and a.l < c.max and a.r > c.max then 'overlap' else '' end ) overl, (case when a.l > c.l and a.l < c.max and a.r > c.max then a.l else c.max end ) newl FROM segments a LEFT JOIN MAX_CTE c ON x = c.x ) , GET_MIN AS ( SELECT max(e.l), max(e.newl) max2, max(e.max) emax from ( SELECT d.l,d.newl,d.max from OVERLAP d ) e -- 3 GROUP BY e.newl, e.l ORDER BY l,newl ) ,
SAMPLE_CTE AS ( -- 4 select f.max, min(f.max2) from GET_MIN f group by f.max order by f.max -- 4 ) , SAMPLE_CTE5 AS ( SELECT 1 AS X, MIN(Min) MIN4 , MAX(MIN) MAX4 FROM SAMPLE_CTE GROUP BY X ) , SAMPLE_CTE45 AS ( SELECT 1 AS X,min AS MIN45, min-max as lenght2 FROM SAMPLE_CTE order by max asc LIMIT 1 ) , SAMPLE_CTE55 AS ( select 1 AS X,max as min from SAMPLE_CTE GROUP BY max LIMIT 1 ) , SAMPLE_CTE3 AS ( SELECT 1 AS X,yy.MAX AS l, yy.MIN as r , yy.min, ZZ.MIN45, JJ.MIN, zz.lenght2,MAX4, yy.min - yy.max as lenght, (case when yy.max zz.MIN45 then yy.min - ZZ.MIN45 else 0 end ) as right
FROM SAMPLE_CTE YY LEFT JOIN SAMPLE_CTE45 ZZ ON ZZ.X=X LEFT JOIN SAMPLE_CTE5 XX ON xx.X = XX.X LEFT JOIN SAMPLE_CTE55 JJ ON XX.X = JJ.X ) , SAMPLE_CTE4 AS ( SELECT 1AS X,L,R,R-L AS LEN from SAMPLE_CTE3 ORDER BY L LIMIT 1) , SAMPLE_CTE6 AS ( SELECT * FROM SAMPLE_CTE4 A LEFT JOIN SAMPLE_CTE5 B ON A.X=B.X )
--SELECT * FROM SAMPLE_CTE6
select a.x,sum(A.left)+sum(A.right)+MAX(B.LEN) Total_length,MAX(B.LEN) as first_element_lenght , sum(A.left)+sum(A.right) Others_Lenght froM SAMPLE_CTE3 A LEFT JOIN SAMPLE_CTE4 B ON A.x=B.x GROUP BY A.x
Thanks Nate! Love your channel! Just a question after watching a couple of your videos. I cannot find some of the cases in your video in StrataScratch, and I guess they are removed or renamed? This is a little bit inconvenient for me as I cannot preview the table and follow your step by step to test the stage output, only frequently pausing the video to see the structure of the original table, and if my syntax and structure are the same with yours. I've noticed that each case has a case Id, and I have been trying to search by this ID in StrataScratch but does not work.
Hi, you're right that some of the questions aren't found in the platform anymore. If you click on the direct link in the description, you'll still have access to the question. But the question itself isn't searchable on the platform. That's because we've had to take a few off the platform (I think around 4?). Most videos will be searchable on the platform though. You can search using the question's title. Thanks for following and sorry for the inconvenience. Unfortunately, those 4 that were taken off happen to be very popular on YT
@@stratascratch I tried the CTE but got error like "with is not valid at this position for this server version". I'm not able to fix it. so I did subquery instead and here's my solution. SELECT facility_name FROM ( SELECT qq.minI,Y.pe_desc FROM ( SELECT min(Q.n_issues) as minI from (SELECT pe_desc,sum(score) n_issues FROM facility GROUP BY pe_desc ORDER BY n_issues DESC limit 3 ) Q ) as qq LEFT JOIN ( SELECT pe_desc,sum(score) n_issues FROM facility GROUP BY pe_desc ) Y ON Y.n_issues = qq.minI ) zz LEFT JOIN ( SELECT facility_name,pe_desc FROM facility ) J ON J.pe_desc = zz.pe_desc WHERE facility_name LIKE '%TEA%' OR facility_name LIKE '%CAFE%' OR facility_name LIKE '%JUICE%'
@@PATRICKCHUAD Can you do something like this? I'm not sure what your position was with the WITH but this works: WITH counts AS (SELECT pe_description, COUNT(record_id) cnt FROM los_angeles_restaurant_health_inspections WHERE facility_name ILIKE '%tea%' OR facility_name ILIKE '%cafe%' OR facility_name ILIKE '%juice%' GROUP BY 1), ranks AS (SELECT pe_description, DENSE_RANK() OVER( ORDER BY cnt DESC) rnk FROM counts) SELECT facility_name FROM los_angeles_restaurant_health_inspections WHERE pe_description IN (SELECT pe_description FROM ranks WHERE rnk = 3) AND ((facility_name ILIKE '%CAFE%' OR facility_name ILIKE '%TEA%' OR facility_name ILIKE '%JUICE%'))
Hi, thanks for watching. There's no shortcut unless you program the hotkey yourself. Otherwise, it's an issue between using tab or 4 spaces. I am a 4 space type of guy and mainly just will either hotkey it in my editor (I use sublime) or just type in the 4 spaces. Hope that helps.
Im new to your channel and so happy I ve found it; quite rare to find real world queries clearly explained beginning to end. I'm writing similarly long queries at work(I m a BI) and it was a bit scary at the beginning.
I m curious what are some advanced SQL concepts in data science?
Thank you for the great content!
A lot of the advanced concepts are things like window partitions, rankings, playing around with dates/timestamps, and subqueries. But at work, what makes things advanced is the complexity of the problem and having to deal with problems with a lot of corner cases or problems where you need to manipulate/clean the data more than normal. This makes your code long even if advanced functions aren't used. On interviews, they'll test you on the latter (questions with corner cases) and test your ability to identify and solve each corner case. Whether or not you use an advanced sql concept is up to you as you can usually solve problems without them (but it might take longer)
My sql solution:-
with t1 as(select facility_name,pe_description,record_id from los_angeles_restaurant_health_inspections lashi
where facility_name like '%TEA%' or
facility_name like '%CAFE%' or
facility_name like '%JUICE%'),
t2 as(select pe_description, count(record_id) no_rec,rank() over(order by count(record_id) desc) rnk from t1 group by pe_description)
select facility_name from t1
join t2
on t1.pe_description=t2.pe_description
where rnk=3
Thank you for sharing!
Excellent, excellent, excellent!! Thank you for sharing, this is super helpful!!
Thank you for watching!
is this correct? can someone check?
with ranks as (
select *, dense_rank() over (order by count_of_cateogry desc) as ranks from (
select pe_description, count(*) as count_of_cateogry
from los_angeles_restaurant_health_inspections
where lower(facility_name) Like '%cafe%'
or lower(facility_name) Like '%tea%'
or lower(facility_name) Like '%juice%'
group by pe_description)temp )
select la.facility_name, r.pe_description from ranks r
join los_angeles_restaurant_health_inspections la on r.pe_description = la.pe_description
where ranks = 3
Thanks Nate! Your videos are so useful to understand how to break down and approach interview questions.
Also, just for reference, here is a simplified solution to the same question:
with cte_3rd_most as (
select
pe_description
from (
select
pe_description
, dense_rank() over(order by count(facility_name) desc) as rk
from (
select
facility_name
, pe_description
from los_angeles_restaurant_health_inspections
where lower(facility_name) ~ '\y(tea|cafe|juice)\y' ) as t1
group by pe_description) as t2
where rk = 3)
select
facility_name
from los_angeles_restaurant_health_inspections
where lower(facility_name) ~ '\y(tea|cafe|juice)\y'
and pe_description in (select pe_description from cte_3rd_most) ;
That is wonderful. Keep rockin!
Do you or does anyone have a recommendation, either books or courses, where I could deep dive this type of workflow when querying databases?
It's all practice in my opinion. You can try on StrataScratch, Leetcode, HackerRank, etc. I would try to solve the problems yourself and look at other user solutions.
I have one doubt-if we use dense-rank function to find top 3 and then just filter where rank=3 then we don’t need 3rd CTE i.e Categories
Probably works! Try it out on the platform and see if you get the same output.
Thanks for making these videos, you are awesome. Like you explained it is really hard to find these tips specially in SQL and though process on solving problems. Usually we only find basic syntax / or anything related to what is joins are , what is rank or sql . Which we never get asked in interview.
Thanks for watching! Totally agree with you. YT has a ton of basic SQL stuff but nothing overly complicated. Glad I can fill the gap!
Solving this during the interview would take time. How quick is this question expected to be solved in a typical interview?
In my experience, this type of question would appear on one of the last rounds of the interview process where the questions are often much more complex and lengthy. I would expect this to take about 15-20 minutes on the whiteboard while also talking to the interviewer.
Hi, is it possible to replace the where Ilike by case when ilike here to use count ? if yes how would it be, and would it be more efficient or no ? :D
It would be just as efficient to use a case statement vs ilike. You can give it a try in the platform (link in the description). If you get stuck, just ask for help in the discussion. Someone from my team or myself will answer you!
Need to say.. best video on SQL for knowing how to solve complex queries
HI Nate, i have a problem that require to get the total lenght for these data [0,30] [5,10] [15,20] [25,40] where we dont count the overlapp more than once. Can you pls try to solve this using SQL?
This might be better to solve on python =)
@@stratascratch i see
i Tried to solve it using SQL and using CTE which you have taught in your sample. Below is the code. The answer seems correct when I simulate it PostgreSQL.
FIND THE TOTAL LENGHT FROM THE LIST OF SEGMENT
-- 5
WITH MAX_CTE AS (
SELECT 1 x , b.l, max(b.r) max from segments b
group by x, b.l
order by x,b.l
)
,
OVERLAP AS (
SELECT 1 as x , a.l as al, c.l ,c.max ,
(case when a.l > c.l and a.l < c.max and a.r > c.max then 'overlap' else '' end ) overl,
(case when a.l > c.l and a.l < c.max and a.r > c.max then a.l else c.max end ) newl
FROM segments a
LEFT JOIN MAX_CTE c
ON x = c.x
)
,
GET_MIN AS (
SELECT max(e.l), max(e.newl) max2, max(e.max) emax
from (
SELECT d.l,d.newl,d.max from
OVERLAP d
) e
-- 3
GROUP BY e.newl, e.l
ORDER BY l,newl
)
,
SAMPLE_CTE AS (
-- 4
select f.max, min(f.max2) from
GET_MIN f
group by f.max
order by f.max
-- 4
)
,
SAMPLE_CTE5 AS (
SELECT 1 AS X, MIN(Min) MIN4 , MAX(MIN) MAX4
FROM SAMPLE_CTE
GROUP BY X
)
,
SAMPLE_CTE45 AS (
SELECT 1 AS X,min AS MIN45, min-max as lenght2 FROM SAMPLE_CTE order by max asc
LIMIT 1
)
,
SAMPLE_CTE55 AS (
select 1 AS X,max as min from SAMPLE_CTE GROUP BY max LIMIT 1
)
,
SAMPLE_CTE3 AS (
SELECT 1 AS X,yy.MAX AS l, yy.MIN as r , yy.min,
ZZ.MIN45, JJ.MIN, zz.lenght2,MAX4, yy.min - yy.max as lenght,
(case when yy.max zz.MIN45
then yy.min - ZZ.MIN45 else 0 end ) as right
FROM SAMPLE_CTE YY
LEFT JOIN SAMPLE_CTE45 ZZ
ON ZZ.X=X
LEFT JOIN SAMPLE_CTE5 XX
ON xx.X = XX.X
LEFT JOIN SAMPLE_CTE55 JJ
ON XX.X = JJ.X
)
,
SAMPLE_CTE4 AS (
SELECT 1AS X,L,R,R-L AS LEN from SAMPLE_CTE3
ORDER BY L
LIMIT 1)
,
SAMPLE_CTE6 AS (
SELECT * FROM SAMPLE_CTE4 A
LEFT JOIN SAMPLE_CTE5 B
ON A.X=B.X
)
--SELECT * FROM SAMPLE_CTE6
select a.x,sum(A.left)+sum(A.right)+MAX(B.LEN) Total_length,MAX(B.LEN) as first_element_lenght ,
sum(A.left)+sum(A.right) Others_Lenght
froM SAMPLE_CTE3 A
LEFT JOIN SAMPLE_CTE4 B
ON A.x=B.x
GROUP BY A.x
One word only… Amazing!!! The way u explained. I was having so much confusion & was looking for some internal tips n boom.. subscribed..
How to go for subqueries, any study material you would suggest
Why didn't I find your videos earlier? -_-
I guess my WHERE clause weren't too specific :p
=) Expert level joke! Glad you found my channel!
Hey, really helpful explanation, Can we also do a OFFSET along with LIMIT to get 3rd highest issue?
I think so? give it a try on the platform and see if you get the same solution!
Thanks Nate! Love your channel! Just a question after watching a couple of your videos. I cannot find some of the cases in your video in StrataScratch, and I guess they are removed or renamed? This is a little bit inconvenient for me as I cannot preview the table and follow your step by step to test the stage output, only frequently pausing the video to see the structure of the original table, and if my syntax and structure are the same with yours. I've noticed that each case has a case Id, and I have been trying to search by this ID in StrataScratch but does not work.
I am not sure if it is the case.Maybe due to my bad search approach😂
Hi, you're right that some of the questions aren't found in the platform anymore. If you click on the direct link in the description, you'll still have access to the question. But the question itself isn't searchable on the platform. That's because we've had to take a few off the platform (I think around 4?). Most videos will be searchable on the platform though. You can search using the question's title. Thanks for following and sorry for the inconvenience. Unfortunately, those 4 that were taken off happen to be very popular on YT
This approach is very helpful -- thanks for documenting how you solve these types of SQL problems!
Thanks a lot, Jim! Thanks for watching my videos!
Hello Nate,
The shortened link opens your YT video, please can you update the URL.
(Neat content & presentation!)
Thank You!
Hey Hari, sorry about that. Here you go! platform.stratascratch.com/coding-question?id=9701&python=
Thanks!
You've earned a sub +1
💪🏽
@@hariguhan8399 I appreciate it man! Feel free to let me know if you have specific topics or anything else you're interested in learning.
Excellent, Thanks for your dedication Nate
Thanks for always watching!
Thanks for sharing. Really find this sample very helpful.
Thanks so much for watching.
@@stratascratch I tried the CTE but got error like "with is not valid at this position for this server version". I'm not able to fix it. so I did subquery instead and here's my solution.
SELECT facility_name
FROM
(
SELECT qq.minI,Y.pe_desc
FROM
(
SELECT min(Q.n_issues) as minI from
(SELECT pe_desc,sum(score) n_issues
FROM facility
GROUP BY pe_desc
ORDER BY n_issues DESC limit 3 ) Q
) as qq
LEFT JOIN
(
SELECT pe_desc,sum(score) n_issues
FROM facility
GROUP BY pe_desc
) Y
ON Y.n_issues = qq.minI
) zz
LEFT JOIN
(
SELECT facility_name,pe_desc
FROM facility
) J
ON J.pe_desc = zz.pe_desc
WHERE facility_name LIKE '%TEA%' OR
facility_name LIKE '%CAFE%' OR
facility_name LIKE '%JUICE%'
@@PATRICKCHUAD Can you do something like this? I'm not sure what your position was with the WITH but this works:
WITH counts AS
(SELECT pe_description,
COUNT(record_id) cnt
FROM los_angeles_restaurant_health_inspections
WHERE facility_name ILIKE '%tea%'
OR facility_name ILIKE '%cafe%'
OR facility_name ILIKE '%juice%'
GROUP BY 1),
ranks AS
(SELECT pe_description,
DENSE_RANK() OVER(
ORDER BY cnt DESC) rnk
FROM counts)
SELECT facility_name
FROM los_angeles_restaurant_health_inspections
WHERE pe_description IN
(SELECT pe_description
FROM ranks
WHERE rnk = 3)
AND ((facility_name ILIKE '%CAFE%'
OR facility_name ILIKE '%TEA%'
OR facility_name ILIKE '%JUICE%'))
@@stratascratch let me try this. thks
@@stratascratch i tried it works. Thanks.
👍👌👌
Thanks for watching! Please let me know if there's any topics you'd like for me to cover
Hi, May I know the shortcut to indent multiple lines of code at once for organizing.
Hi, thanks for watching. There's no shortcut unless you program the hotkey yourself. Otherwise, it's an issue between using tab or 4 spaces. I am a 4 space type of guy and mainly just will either hotkey it in my editor (I use sublime) or just type in the 4 spaces. Hope that helps.
I guess he’s asking about indenting multiple lines at the same time… It will mostly be select the lines and use Ctrl+] or CMD+]
@@ChandraKanth7 yes, you are right. Thank you sharing!
Just highlight the block and hit tab
Wow so good
Thanks for watching!
Thanks sir
Thanks for watching!