I think it could be also helpful if you can summarize leetcode python questions in basic algorithm which are good for data scientists to prepare for technical interviews. Some of the companies do ask basic algorithm questions during coding interviews.
SUM(col) OVER (ORDER BY date ROWS BETWEEN 4 PRECEDING and CURRENT ROW) is an easy way to calculate a rolling sum. Useful in problems like the last one you showed.
Hi Emma! Good video overall. One small mistake in the Data Categorization section. '[10-15>' is tricky here since the above code won't return the answer as 0. Because we cannot case when something does not exist in the table. To do this, we might want to union all to consider 0 as the case.
Hello Emma, For the Categorization problem here is my solving method. 1) I create the table that I going to use fo the join DROP TABLE IF EXISTS bins_value CREATE TABLE bins_value ( bin_name varchar(20) ) INSERT INTO bins_value VALUES ('[0-5>'), ('[5-10>'), ('[10-15>'), ('[15-20>'), ('20 ans more') 2) The query to the problem WITH BINS as( SELECT CASE WHEN duration >= 0 and duration < 5 THEN '[0-5>' WHEN duration >= 5 and duration < 10 THEN '[5-10>' WHEN duration >= 10 and duration < 15 THEN '[10-15>' WHEN duration >=15 and duration < 20 THEN '[15-20>' ELSE '20 ans more' END as bin FROM SESSIONS ) SELECT a.bin_name, ISNULL(b.total,0) T_total FROM (SELECT bin, COUNT(*) total FROM BINS GROUP BY bin) as b right join bins_value as a on b.bin = a.bin_name
Hi Emma! A very good video. But I think games_played_so_far will not show the running total. It will show total games played by some one on certain date. Reason: 1st Step: Join two tables 2nd Step: Extract rows using where clause (it would return rows from the derived/joined table and WILL EXECUTE ONLY ONCE in query for all the rows where a.event_date is either greater or equat to b.event_date) 3rd Step: Grouping data according to a.player_id, a.event_date 4th Step: Selecting a.player_id, a.event_date and SUM(b.games_played). Here summation gives the total number of games played by a player on certain date. So far as I know we do not need WHERE clause here. I think the condition should be changed in ON clause and it should be a.event_date >= b.event_date. I may be wrong. Please guide if I am.
One suggestion for the first order question: the "first_order" table technically doesn't only reflect customers' first orders. consider, for example, a customer whose first order is placed on 1/1/2022 for delivery on 1/2/2022, and then in their second order they place it also on 1/1/2022 but this one is in fact an immediate delivery (for 1/1/2022). they'll show up now in your first query, but their first order was not, in fact, an immediate order (their second order was).
For Data Categorization, can you show me how to Left join to get the missing bin? I checked around and everyone uses a union to satisfy that condition.
Hello try this : 1) First create your session table : DROP TABLE IF EXISTS SESSIONS CREATE TABLE SESSIONS ( session_id int, duration int ) INSERT INTO SESSIONS VALUES (1, 30), (2, 199), (3, 299), (4, 580), (5, 1000), (6, 2), (7, 10), (8,5) 2) Create the table to show all the bins even the ones with no values DROP TABLE IF EXISTS bins_value CREATE TABLE bins_value ( bin_name varchar(20) ) INSERT INTO bins_value VALUES ('[0-5>'), ('[5-10>'), ('[10-15>'), ('[15-20>'), ('20 ans more') 3) Answer to the question WITH BINS as( SELECT CASE WHEN duration >= 0 and duration < 5 THEN '[0-5>' WHEN duration >= 5 and duration < 10 THEN '[5-10>' WHEN duration>= 10 and duration < 15 THEN '[10-15>' WHEN duration>=15 and duration < 20 THEN '[15-20>' ELSE '20 ans more' END as bin FROM SESSIONS ) SELECT a.bin_name, ISNULL(b.total,0) T_total FROM (SELECT bin, COUNT(*) total FROM BINS GROUP BY bin) as b right join bins_value as a on b.bin = a.bin_name I use the platform Microsoft SQL SERVER, just copy and paste you will have the answer, later on understand the steps. Hope it helps.
The only challenge that I have been facing with leetcode is the quality of test cases against which the solution is tested. Also, the solution sometimes doesn't go with what was asked in the first place!
Nice video! Thank you, Emma! One question would be: for the categorization example, will it be easier to use UNION to add the 0 record row (10-15 in this case, at 08:50)? I was not able to come up with/find an easy solution using JOIN here, could you point me to some possible answers? Thanks!
Try this it works. SELECT '[0-5>' bin, SUM(CASE WHEN duration*1.0/60 >=0 and duration*1.0/60 =5 and duration*1.0/60 =10 and duration*1.0/60 =15 THEN 1 ELSE 0 END) total FROM Sessions
@@songsong2334 Thank you! I came up with a similar solution before, using UNIONm while still wondering if there is anything that can directly improve from Emma's solution that doesn't cover the 0 cases.
Hello at 4:22 Customer_id 3 and Delivery_id have the same order_date and customer_pref_delivery_date. Why is Customer 3 not satisfying the criteria you outlined?
All SQL problems and resources mentioned in this video
medium.com/@emmading/practice-for-3-types-of-sql-interview-questions-2bd057a88b4f
really nice, i was asked a similar ratio question in Meta a month ago... I solved it a little bit different but it was good.
@@avishayisraeli1894 Thanks for sharing your experience! :)
I think it could be also helpful if you can summarize leetcode python questions in basic algorithm which are good for data scientists to prepare for technical interviews. Some of the companies do ask basic algorithm questions during coding interviews.
Thank you for your feedback!
SUM(col) OVER (ORDER BY date ROWS BETWEEN 4 PRECEDING and CURRENT ROW) is an easy way to calculate a rolling sum. Useful in problems like the last one you showed.
Hi Emma! Good video overall. One small mistake in the Data Categorization section. '[10-15>' is tricky here since the above code won't return the answer as 0. Because we cannot case when something does not exist in the table. To do this, we might want to union all to consider 0 as the case.
thanks! looking forward to window functions vidoe!
Not SQL-related but what do you think are the skills data scientists need to learn now aside from Python and SQL?
Hope this video answers your questions th-cam.com/video/yhjK6fC68eU/w-d-xo.html :)
Wow, this is a great video!
Hello Emma,
For the Categorization problem here is my solving method.
1) I create the table that I going to use fo the join
DROP TABLE IF EXISTS bins_value
CREATE TABLE bins_value (
bin_name varchar(20)
)
INSERT INTO bins_value VALUES
('[0-5>'),
('[5-10>'),
('[10-15>'),
('[15-20>'),
('20 ans more')
2) The query to the problem
WITH BINS as(
SELECT
CASE
WHEN duration >= 0 and duration < 5 THEN '[0-5>'
WHEN duration >= 5 and duration < 10 THEN '[5-10>'
WHEN duration >= 10 and duration < 15 THEN '[10-15>'
WHEN duration >=15 and duration < 20 THEN '[15-20>'
ELSE '20 ans more'
END as bin
FROM SESSIONS
)
SELECT a.bin_name, ISNULL(b.total,0) T_total FROM
(SELECT bin, COUNT(*) total FROM BINS
GROUP BY bin) as b
right join bins_value as a on b.bin = a.bin_name
Hi Emma! A very good video.
But I think games_played_so_far will not show the running total. It will show total games played by some one on certain date.
Reason:
1st Step: Join two tables
2nd Step: Extract rows using where clause (it would return rows from the derived/joined table and WILL EXECUTE ONLY ONCE in query for all the rows where a.event_date is either greater or equat to b.event_date)
3rd Step: Grouping data according to a.player_id, a.event_date
4th Step: Selecting a.player_id, a.event_date and SUM(b.games_played). Here summation gives the total number of games played by a player on certain date.
So far as I know we do not need WHERE clause here. I think the condition should be changed in ON clause and it should be a.event_date >= b.event_date.
I may be wrong. Please guide if I am.
really a clear breakdown of the SQL problems, thanks!
One suggestion for the first order question: the "first_order" table technically doesn't only reflect customers' first orders. consider, for example, a customer whose first order is placed on 1/1/2022 for delivery on 1/2/2022, and then in their second order they place it also on 1/1/2022 but this one is in fact an immediate delivery (for 1/1/2022). they'll show up now in your first query, but their first order was not, in fact, an immediate order (their second order was).
Will be having a live SQL interview next week, the video helps a lot!
For Data Categorization, can you show me how to Left join to get the missing bin? I checked around and everyone uses a union to satisfy that condition.
Hello try this :
1) First create your session table :
DROP TABLE IF EXISTS SESSIONS
CREATE TABLE SESSIONS (
session_id int,
duration int
)
INSERT INTO SESSIONS VALUES
(1, 30),
(2, 199),
(3, 299),
(4, 580),
(5, 1000),
(6, 2),
(7, 10),
(8,5)
2) Create the table to show all the bins even the ones with no values
DROP TABLE IF EXISTS bins_value
CREATE TABLE bins_value (
bin_name varchar(20)
)
INSERT INTO bins_value VALUES
('[0-5>'),
('[5-10>'),
('[10-15>'),
('[15-20>'),
('20 ans more')
3) Answer to the question
WITH BINS as(
SELECT
CASE
WHEN duration >= 0 and duration < 5 THEN '[0-5>'
WHEN duration >= 5 and duration < 10 THEN '[5-10>'
WHEN duration>= 10 and duration < 15 THEN '[10-15>'
WHEN duration>=15 and duration < 20 THEN '[15-20>'
ELSE '20 ans more'
END as bin
FROM SESSIONS
)
SELECT a.bin_name, ISNULL(b.total,0) T_total FROM
(SELECT bin, COUNT(*) total FROM BINS
GROUP BY bin) as b
right join bins_value as a on b.bin = a.bin_name
I use the platform Microsoft SQL SERVER, just copy and paste you will have the answer, later on understand the steps.
Hope it helps.
Thank you Emma! Good stuff!
The only challenge that I have been facing with leetcode is the quality of test cases against which the solution is tested. Also, the solution sometimes doesn't go with what was asked in the first place!
Thanks for sharing your experience Shubham!
clear and concise, nice vid
Nice video! Thank you, Emma! One question would be: for the categorization example, will it be easier to use UNION to add the 0 record row (10-15 in this case, at 08:50)? I was not able to come up with/find an easy solution using JOIN here, could you point me to some possible answers? Thanks!
Try this it works.
SELECT '[0-5>' bin, SUM(CASE WHEN duration*1.0/60 >=0 and duration*1.0/60 =5 and duration*1.0/60 =10 and duration*1.0/60 =15 THEN 1 ELSE 0 END) total
FROM Sessions
@@songsong2334 Thank you! I came up with a similar solution before, using UNIONm while still wondering if there is anything that can directly improve from Emma's solution that doesn't cover the 0 cases.
Hello at 4:22 Customer_id 3 and Delivery_id have the same order_date and customer_pref_delivery_date. Why is Customer 3 not satisfying the criteria you outlined?
@Howard Smith Because the first order of customer_id 3 is on 2019-08-21 which is in next row and that order is not an immediate order.
@@MohanRaj-tm2ml Thanks for pointing that out.
It's not the cust id 3 first order. They have an earlier order that doesn't satisfy the criteria.
How long do we have to wait till your next video on window functions?
Hang in there, it won't be long!
too much hands movement
But how do I compute the L ratio?