SQL Interview Questions For Data Scientists And Data Engineers - Tips For Practicing SQL Interviews

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 ก.ค. 2024
  • SQL has become a common skill requirement across industries and job profiles over the last decade.
    Companies like Amazon and Google will often demand that their data analysts, data scientists and product managers are at least be familiar with SQL. This is because SQL remains the language of data.
    So, in order to be data-driven, people need to know how to access and analyze data.
    With so many people looking at, slicing, manipulating, and analyzing data we wanted to provide some tips to help improve your SQL.
    InterviewQuery
    * www.interviewquery.com/?ref=sdg
    If you enjoyed this video, check out some of my other top videos.
    What Skills Do Data Engineers Need?
    • What Skills Do Data En...
    Data Engineering Project Ideas
    • 5 Data Sources for You...
    If you want to learn more about machine learning, check out DataCamps Machine Learning Course
    bit.ly/3BeLEml
    If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
    seattledataguy.substack.com/​​
    Or check out my blog
    www.theseattledataguy.com/
    Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio
    0:00 Intro
    1:01 Question 1 Answer 1
    3:13 Question 1 Answer 2
    4:56 Question 2
    10:59 Question 3
    _____________________________________________________________
    Subscribe: / @seattledataguy
    _____________________________________________________________
    About me:
    I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
    *I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.

ความคิดเห็น • 90

  • @gatorpika
    @gatorpika ปีที่แล้ว +19

    Damn, been writing SQL for years and just starting to use more CTEs in the past year instead of subqueries, but never realized that you could make the second query dependent on the results of the first. That's huge! Clicked all the thumbs and bells and stuff.

    • @SeattleDataGuy
      @SeattleDataGuy  ปีที่แล้ว

      Thanks for clicking all the things! Hope you find more helpful things!

  • @naheliegend5222
    @naheliegend5222 2 ปีที่แล้ว +17

    SQL is so underrated. Could not imaging how do to my job without it.

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +3

      SQL is always clutch! I love it and probably over use it haha.

  • @jsonify
    @jsonify 2 ปีที่แล้ว +2

    Thanks for putting this out. I heard you on Ken's Nearest Neighbors this past week and it was great to hear more of your backstory. Seattle for the win!!!

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      I am so glad you got to see it! Thanks for the comment

  • @FebbyTV
    @FebbyTV 2 ปีที่แล้ว +19

    Great vid.
    One minor thing - please don't teach people to name their CTEs t1, t2,t3 etc or alias their tables in joins with t1,t2,t3.
    In my opinion you should always try to give the CTEs names that somewhat can explain whats going on.

    • @rimvydasposkus5617
      @rimvydasposkus5617 2 ปีที่แล้ว +1

      Yeah, same. Then you have such queries on your Work place is headache to read!

    • @andreasahrlund-richter2289
      @andreasahrlund-richter2289 6 หลายเดือนก่อน +1

      For the grouping columns Ive noticed most people prefer to use "1, 2, 3" but yes, for ctes an abbreviation that makes sense is good.

  • @subimalkhatua2886
    @subimalkhatua2886 2 ปีที่แล้ว +6

    for second question you could have easily used a mix of dense_rank and row_number ------------> dense_rank() over(partition by user_id order by song_id) r_in_sng ,row_number () over(partition by user_id,song_id order by created_at) r_in_grp ----------> where r_in_sng = 3 and r_in_grp = 1

  • @mirirshadali33
    @mirirshadali33 2 ปีที่แล้ว +3

    This is really coool. I like the 1st problem you solved using group by clause. it never occur to me before. I hope you will continue to publish more sql problems.

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      Glad you enjoyed it! We shall see, I might have to do more! It's always fun to walk through. Maybe next time I can get someone to do better animations.

  • @nanomartin
    @nanomartin 2 ปีที่แล้ว +2

    I really liked this one, more than others! Didn't know about that site. I'll use it for sure. And seeing that, kind of, practical class, helps me a lot figure out something that maybe I didn't know so far.
    Thanks Ben!

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      I am so glad you found this helpful!

  • @severlight
    @severlight 2 ปีที่แล้ว +1

    Hey, a pleasure to watch you, thanks for the practice source

  • @GuyThompsonFWTX
    @GuyThompsonFWTX 2 ปีที่แล้ว +5

    Personal nit-picky thing, I think in learning and training it's a really good practice to use 'AS' to show that you're using an alias. Again, this isn't something you need to do all the time, but for those unfamiliar with writing and reading SQL, I think it's a big help in understanding the syntax.

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +1

      Totally accurate statement. Fun fact, most people at meta would just right NIT before a nit-picky comment. We got rid of a lot of these nits by implementing a style guide that was automatically implemented up saving. Reduces the amount of time code is in review and number of people being like, hey can you tab this over 1 more time.

  • @kedarjoshi5878
    @kedarjoshi5878 2 ปีที่แล้ว +1

    Most awaited video. Thanks for this👍

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      Thanks, hopefully it was helpful!!

  • @BJTangerine
    @BJTangerine 2 ปีที่แล้ว +3

    Love this kind of content, keep it up

  • @shreykhetrapal9585
    @shreykhetrapal9585 2 ปีที่แล้ว +3

    This was really helpful, specially the join on last problem was a fantastic approach

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      Thank you! I am glad you found it helpful!

  • @iamkv
    @iamkv 2 ปีที่แล้ว +1

    Great video, thank you Ben!!

  • @SeattleDataGuy
    @SeattleDataGuy  2 ปีที่แล้ว

    If you like this content, then check out my newsletter! seattledataguy.substack.com/p/the-baseline-data-stack-building
    Also if you want to try out a low code tool for free, you can try out Rivery here bit.ly/3HMnsuo

  • @Alez101010
    @Alez101010 2 ปีที่แล้ว +1

    As always, very useful video! Thank you!! Just getting my DE interview on Wednesday 🙄🙄

  • @vamshibokka9794
    @vamshibokka9794 9 หลายเดือนก่อน

    For the third question we can order by the scores and then use lead() or lag() function to compare the scores

  • @asd855280
    @asd855280 2 ปีที่แล้ว +1

    Another great video, thank you.

  • @rvian4
    @rvian4 2 หลายเดือนก่อน

    second question now requires returning users who dont have a third song with 'null' for column song and data.. for this you can replace join on second cte to a right join

  • @Chowmeingeejah
    @Chowmeingeejah ปีที่แล้ว

    For the last example why not self join onto the rank ordered by score? i.e. join rank(score) = rank(score) - 1 and then calculate the difference of the scores. Wouldn't a cross join be too resource intensive?

  • @iqjayfeng
    @iqjayfeng 2 ปีที่แล้ว +3

    AMAZING VID!

  • @remek5758
    @remek5758 2 ปีที่แล้ว +2

    What i love about this Interview questions is that no one will ever have a working scenario for such dumb tasks x

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      I have :) But everyone works with different SQL problems.

  • @S1A917
    @S1A917 2 ปีที่แล้ว +1

    Minor detail - but your question 1 answer 1 assumes that u.id is non-null (which it should be as an id column)

  • @SHIBIYAYA
    @SHIBIYAYA 4 หลายเดือนก่อน +1

    Thanks very much !

  • @eliseocandaza4581
    @eliseocandaza4581 4 หลายเดือนก่อน +1

    Thank you so much bro~

  • @hassaanali7405
    @hassaanali7405 2 ปีที่แล้ว +17

    Hey Ben! I hope you're extensively looking into the IBM Data Engineering Professional Certificate on Coursera, as it provides an end to end solution using everything from SQL to Airflow to basic Warehousing. Hoping for a video detailing this soon.

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +13

      I filmed it this week and sent it to my editor

    • @BJTangerine
      @BJTangerine 2 ปีที่แล้ว

      just started this course last week

    • @fredbradshaw4308
      @fredbradshaw4308 2 ปีที่แล้ว

      @@SeattleDataGuy Thank you!!!!

    • @HI-bw8fe
      @HI-bw8fe 2 ปีที่แล้ว

      @@BJTangerine how do you like it so far? I was thinking doing this after I finished the IBM Data Analyst.

  • @brucehendry270
    @brucehendry270 2 ปีที่แล้ว +2

    I would have answered the first question with a not exists as it's probably the easiest to read and understand what the code is supposed to be doing, good job on the other two questions, I'll have to have a look at the site and have a play myself 😋

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      Let me know your thoughts on the site!

  • @coding3438
    @coding3438 2 ปีที่แล้ว +1

    The third question about the SAT would’ve been achieved by lead lag functions as well right?

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      That could be a solution as well. Sometimes some companies prefer you not use window functions when you are interviewing. I have heard "can you do it without a window function" a few times

  • @ajaxaj6749
    @ajaxaj6749 2 ปีที่แล้ว +1

    We can do partition by user name and song and use row _num=3 with order by asc. No need of group by CTE in 2nd question

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      You mean row_number() over (partition by username, song_id order by created)? Did you test it out?

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +1

      Because if you do that, all it will do is start counting at 1 at the top of every new song_id.

  • @alexjenkins8026
    @alexjenkins8026 5 หลายเดือนก่อน

    @12:14 Closest SAT Scores...
    You do not need to do a "join of all possible combinations of students" - this is completely overkill since you know the person with the best score and lowest score can't possible have the smallest difference in scores.
    More generally, any pair of students with another student between them in the rankings can't possibly be the smallest difference in scores.
    Therefore, instead of doing this crazy expensive join, we can just use LEAD or LAG to get the scores of the NEXT/PREVIOUS score:
    WITH diff AS (
    SELECT
    student AS one_student,
    LEAD(student) OVER (ORDER BY score) AS other_student,
    ABS(score - LEAD(score) OVER (ORDER BY score)) AS score_diff
    FROM scores
    )
    SELECT
    one_student,
    other_student,
    score_diff
    FROM diff
    WHERE
    score_diff IS NOT NULL
    ORDER BY
    score_diff ASC,
    LEAST(one_student, other_student) ASC
    LIMIT 1;
    As a final note, you cannot use the < join as simply as you stated, because the names of the students might be the wrong way around (one_student might be James and other_student might be Alice for one pair and Bob and Sam for the other - but will be ordered Bob first and James second). You've solved this by joining EVERY pair together so you have a (James, Alice) pair and a (Alice James) pair but this is doubling the computation requirements - which is terrible.
    I've solved this with a LEAST(name1,name2) in the ORDER BY so that the name "higher in the alphabet" is retrievable from either variable.

  • @rememberthename911g
    @rememberthename911g 2 ปีที่แล้ว +3

    How does the order by 3, 2, 1 select the student combination that is higher in the alphabet?

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +1

      So 3,1,2 represents the columns in their respective orders. 3 is the abs(lowest score), 1 student name, 2, other student name. It's a short cut instead of writing out the entire column. Is that what you are asking?

  • @KoldFiyure
    @KoldFiyure 2 ปีที่แล้ว +3

    Can we do this with Python as well? I am a new data engineer that is trying to get to my next assignment. I need to hurry and upskill on Python by doing. I want to use this site youve used here to practice more SQL as well. I could never get good enough with SQL it seems.

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +3

      I might have to put this on the list of videos!

  • @milanmiletic9819
    @milanmiletic9819 2 ปีที่แล้ว +1

    Thank u for this video 👋👋👋👋👋👋👋👋👋

  • @lokesh1505
    @lokesh1505 2 ปีที่แล้ว +2

    Lag or lead would work as well for the third problem

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      Yup! You could. This video was already getting long so I wanted to avoid rambling too long.

  • @ass2412
    @ass2412 2 ปีที่แล้ว +1

    Interview query is expensive. Is stratascratch good enough for product companies? I am thinking of getting that for practicing SQL.

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +1

      I think stratascratch is also solid. I generally start with free things and only go to paid if I need it.

  • @davidoh6342
    @davidoh6342 11 หลายเดือนก่อน

    Why not order by score and lead by 1 and get the diff? Cartesian join must be expensive

  • @lloydwang8108
    @lloydwang8108 ปีที่แล้ว

    First solution may not work if data is unclean, eg if there actually existed a user but his id was accidentally stored as NULL

  • @rememberthename911g
    @rememberthename911g 2 ปีที่แล้ว +1

    For the first question, couldn't a specific neighborhood be null in one row but filled in another row?

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +3

      We are using the neighborhood has the table on the left so when we join what will happen is if there isn't a matching neighborhood_id in the user table then the user id will be null.
      This is a slightly less explicit solution which is why I provide the second solution. They are really doing the same thing. Finding the rows where in the user table didn't have a matching neighborhood id.

  • @karelhalim6967
    @karelhalim6967 2 ปีที่แล้ว +1

    Hello Ben, i've been following your data engineer roadmap, im still learning python, is it important to learn flask/django or any python web framework? Will it help me to understand data engineering better?

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      You don't have too. Personally, I think its not a bad skill to have. But you could get away without it.

    • @karelhalim6967
      @karelhalim6967 2 ปีที่แล้ว

      @@SeattleDataGuy thanks for the reply Ben!

  • @datakube3053
    @datakube3053 2 ปีที่แล้ว +1

    How much understanding of Data structure and algorithms is required to crack top product based companies

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      Facebook, not too much. Understand basic data structures and honestly...not a ton of algorithms. Lyft is pretty ds&a heavy from what I have heard, amazon had 0 coding questions.

  • @kasinath640
    @kasinath640 2 ปีที่แล้ว +1

    I am new to coding and stuff , pls provide best source for learning the SQL.

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      Maybe my video for great data courses will help th-cam.com/video/kW8_l57w74g/w-d-xo.html

  • @coding3438
    @coding3438 2 ปีที่แล้ว +1

    Select distinct name from neighbourhoods where id not in ( select distinct meighborhood_id from users)

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      I believe this should work as well.

  • @frumiousgaming
    @frumiousgaming 4 หลายเดือนก่อน +1

    I don’t understand why greater than failed but less than worked?

  • @carlitos4505
    @carlitos4505 2 ปีที่แล้ว +1

    I was always told by the senior engineers that using a SELECT DISTINCT is a sign of an inexperienced developer. But they never tell me why, could you possibly help me why that is?
    Edit: Spelling

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว +1

      I can see why they would say that. There are many reasons DISTINCT isn't thought of highly. It's often used as a bandage. Instead of fixing bad data, we will just shove "distinct' over it and de duplicate the data. DISTINCT is also often an expensive operation. Also, DISTINCT can come off like you don't fully understand the data that is underlying. Again, I think this answer was literally my 3-second response. The distinct is actually not necessary. But when it comes to interviews you will have to be fast for screening rounds. Facebook will try to get you to answer 5 questions in 30 minutes and most people only get to 2 or 3 in my experience. So when putting code in production, sure, avoid distinct but realize it also has its place.

    • @carlitos4505
      @carlitos4505 2 ปีที่แล้ว

      @@SeattleDataGuy ah, I see. Thank you so much for your speedy response! I love your videos!

  • @Eastmaniac
    @Eastmaniac 2 ปีที่แล้ว

    It's funny to me how a lot of the SQL questions in interviews are things you'll never have to do in your day to day job.
    They just find a leetcode problem and ask it. So dumb.

  • @muhammadbadar6089
    @muhammadbadar6089 ปีที่แล้ว +1

    every time you mention naruto in a video. I pause it go like the video and then resume.

  • @ronin2963
    @ronin2963 2 ปีที่แล้ว +1

    Why would you use SQL for this? Python or JS are much better suited for this type of operation.

    • @SeattleDataGuy
      @SeattleDataGuy  2 ปีที่แล้ว

      Oh! That’s interesting. I would love to hear more on how python and js could do a better job.

  • @galgottlieb2908
    @galgottlieb2908 2 ปีที่แล้ว

    Thank you for your video.
    Can you please explain the meaning of "Unique song" in question 2?
    As far as I understand unique song it is a song that was not not singed by other singers before and your query doesn't approach this situation.
    For example, if Eminem singed a song of Dr Dre that was created in '01/01/2010' in '01/01/2022' , then the created date will be '01/01/2010' but it is not unique at this time.
    This will be my answer to questions two:
    WITH UNIQUE_SONG AS(
    SELECT US.ID, SP.ID , ROW_NUMBER(S[.DATE) OVER (PARTITION BY US.ID ORDER BY SP.DATE) AS 'RN'
    FROM USERS AS US
    INNER JOIN SONG AS SP
    ON SP.ID = US.ID
    WHERE US.SONG_ID IN(
    SELECT Song.id
    FROM SONG as SP
    group by SP.id
    HAVIGN DATE = min(SP.date)))
    SELECT UNIQUE_SONG.*
    FROM UNIQUE_SONG
    WHERE RN=3