Three Tricky Analytics Interview Questions with Andrew

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ต.ค. 2024

ความคิดเห็น • 97

  • @Vintagetube310
    @Vintagetube310 3 ปีที่แล้ว +55

    Man this was so dope. In a data analytics student and I’m so happy that a lot of this resonated with me as I’m still learning. Everyone who has or is studying coding and programming, understands that it’s a steep learning curve and a lot of times you might find yourself in a rut thinking you’re never going to get the material. But when you observe mock interviews and the info they are covering definitely resonates with you, it reinforces your progress knowing you’re heading in the right direction. Everyone who is learning how to program, KEE GOING! thanks fellas’

  • @Auslims
    @Auslims 3 ปีที่แล้ว +25

    I really like andrew’s thought process. It gives me so much insights to prepare for the coming interviews!

  • @KisaanTuber
    @KisaanTuber 2 ปีที่แล้ว +14

    Thanks Jay for this awesome session. I have few sugestions regarding Part 1. We should emphasize on using RANK()/DENSE_RANK() function instead of ORDER BY and LIMIT. RANK()/DENSE_RANK() will take care of scenarios when multiple employees have same salary. Let me know your thoughts.

  • @shashankrai10
    @shashankrai10 3 ปีที่แล้ว +16

    For the second question can we do this:
    SELECT sum(Sum_Of_Salary) as Total_sum
    FROM (
    select sum(e.salary) as Sum_Of_Salary, Count(end_dt) as num_projects
    FROM employees as e
    RIGHT JOIN projects as p
    ON e.id = p.employee_id
    GROUP BY(A.id)
    )
    WHERE num_projects = 0
    The right join takes only the values from projects that have an id in employees. So that takes care of the third definition - employees who have no project started are not counted. Among those employees who have no end_dt, their count of end_dt should appear as 0. And then outside of brackets, we simply sum the salaries where number of projects are 0.

    • @Hotobu
      @Hotobu 2 ปีที่แล้ว

      ...Please don't use right joins, there's never a good reason to use a right join.

    • @5lanediver
      @5lanediver ปีที่แล้ว

      good call -- they never addressed that third case

    • @5lanediver
      @5lanediver ปีที่แล้ว

      @@Hotobu then just rewrite it as a left join -

  • @黃冠豪-y2w
    @黃冠豪-y2w 3 ปีที่แล้ว +12

    Regarding the 2nd question:
    If an employee has 3 projects on hand and none of them are finished, wouldn't her salary be summed up 3 times according to the query in this video?

    • @rakeshpillai2173
      @rakeshpillai2173 2 ปีที่แล้ว

      Yes. Also not sure how this works in MySQL, but in both postgres and sql server, grouping by id will give you separate rows of salaries and won't add them up.

  • @nickayala7263
    @nickayala7263 2 ปีที่แล้ว +3

    First Question:
    I think it would be better to COUNT(DISTINCT Project_id) and have a WHERE End_dt IS NOT NULL.
    We don't know if this is a unique column or not. Could a project have been closed and then re-opened? Would that have the same ID or a new ID?
    Counting just end dates could lead us to incorrect counts.

  • @arjunekbote8979
    @arjunekbote8979 3 ปีที่แล้ว +6

    Steps:
    1. Gets project from boss to find employees who haven't finished project
    2. Remembers that he hasn't completed any project till now
    3. Does not complete project

    • @khiemgluong
      @khiemgluong 3 ปีที่แล้ว

      Were not creating a table of employees we need to "furlough" or anything

  • @shrutisinghal8556
    @shrutisinghal8556 3 ปีที่แล้ว +4

    In second question, Andrew assumed that Employees who have not finished any project should be considered while Jay assumed that any employee who has even one unfinished project must be considered. None of them made clear their assumption and therefore both had such different queries. I like how Andrew thought! Good interview session.

  • @shao4686
    @shao4686 3 ปีที่แล้ว +6

    I really liked the thought process sharing. Thanks.

  • @omerbraude5694
    @omerbraude5694 2 ปีที่แล้ว +2

    In the first problem, how can he order by e.salary without having it in the group by clause? Wouldn't it throw an error? I would imagine it must be 'order by avg(salary)' or 'max(salary)' which would yield the same result. Alternatively, you could add the salary column to the group by clause. Am I right?

    • @0yustas0
      @0yustas0 2 ปีที่แล้ว

      you're right

  • @Mario-ox5dm
    @Mario-ox5dm 3 ปีที่แล้ว +6

    This mock interview definitely was helpful! Thank you Jay & Andrew!

    • @venkatnetha8382
      @venkatnetha8382 3 ปีที่แล้ว

      For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
      payhip.com/b/ndY6
      You can download the sample pages so as to see the quality of the content.

  • @D3Jia
    @D3Jia 2 ปีที่แล้ว +2

    andrew is such articulate and intelligent dude.

  • @ShowTiimee123
    @ShowTiimee123 2 ปีที่แล้ว +1

    Happy to see that what I'm learning in my classes right now is actually applicable to the real world!

  • @AbhishekSharma.0311
    @AbhishekSharma.0311 2 ปีที่แล้ว +2

    Hey Jay.
    I think the query by Andrew has got an issue. He has used Group By e.id and then taken sum(salary). This would output salary for each employee and NOT a SUM of All employees salary.
    Correct me, if otherwise
    Thanx for the good work
    Cheers

  • @dbc3745
    @dbc3745 ปีที่แล้ว

    Thank you for this. Please give us more glimpses of real world data analytics/scientist work and how they solve problems through data

  • @cinyenne
    @cinyenne 3 ปีที่แล้ว +1

    This sort of discussion helps us to understand your thought process better

    • @AndrewMoMoney
      @AndrewMoMoney 3 ปีที่แล้ว

      Cheers, Cy :)

    • @venkatnetha8382
      @venkatnetha8382 3 ปีที่แล้ว

      For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
      payhip.com/b/ndY6
      You can download the sample pages so as to see the quality of the content.

  • @arindampatra6283
    @arindampatra6283 3 ปีที่แล้ว +5

    Not as tricky as I expected , but I enjoyed how the guy thinks

  • @GH-hg6vs
    @GH-hg6vs 2 ปีที่แล้ว

    I was having this question - For the second part, why not just selecting the employees that don't have a valid "end_date". In this case, no matter how many project he/she has been assigned to, he/she didn't finish any of them. Then I saw Jay's solution. Perfect.

  • @antonisdaniil2857
    @antonisdaniil2857 3 ปีที่แล้ว +1

    I believe this is better so we if we have employee with same salary we get all of them
    with cte as
    (select e.id, Dense_RANK() over(order by salary) as r
    from projects t
    join employees e
    on e.id=t.employee_id
    group by t.employee_id
    having count(end_dt)>10)
    select * from cte where r=5

    • @averw3047
      @averw3047 2 ปีที่แล้ว

      My solution is similar to yours, except "Where r

  • @dontstopn0w
    @dontstopn0w 3 ปีที่แล้ว +4

    Really appreciate this video. Extremely helpful!

  • @quishzhu
    @quishzhu ปีที่แล้ว

    i am sooo lucky to pump into this video before the data science tech interview tomorrow!

  • @leos5246
    @leos5246 3 ปีที่แล้ว +2

    During the second task with this complex construction, I literally thought "Why it's so complex? Why you cannot use just where clause" and it's surprising that Jay is added this solution after :).
    The only one thing about performance. If the tables are huge you don't want to use "NOT IN" construction, it's preferable to use NOT EXISTS instead, much more effective but mostly the same code.

  • @briceinsingapore
    @briceinsingapore 3 ปีที่แล้ว +2

    For question 2, does the following query work as well?
    select sum(salary)
    from(
    select distinct id, salary
    from employees e
    inner join projects p on e.id=p.employee_id
    where end_dt is null or end_dt>current_date()) as cte

    • @Han-ve8uh
      @Han-ve8uh 3 ปีที่แล้ว +2

      Strange to name this cte at the end when no WITH is used. You mean subquery? This query will include employees who have also finished some projects. The question wanted people who have started 1 or many projects but never finished a single one

  • @danielxing1034
    @danielxing1034 2 ปีที่แล้ว +1

    Fantastic mock! Very thorough thinking paths! But wonder if we really need to go through this entire path during the interview? Because the number of problems solved is also an important metric.

  • @TalhaFurqan-l4n
    @TalhaFurqan-l4n หลายเดือนก่อน +1

    andrew pretty much messed up the entire second query. There was no need to over-complicate it.
    select sum(salary)
    from employees
    where id IN (
    select distinct employeed_id
    from projects
    where end_dt is null
    )

  • @iqjayfeng
    @iqjayfeng  3 ปีที่แล้ว +8

    Thanks for watching! If you're interested in learning more about SQL, check out our SQL course. The course is designed to help you learn everything you need to know about working with data, from basic concepts to more advanced techniques.
    www.interviewquery.com/learning-paths/sql

    • @mypure
      @mypure 3 ปีที่แล้ว

      Thanks a ton. I will do honest attempt to solve the problems and learn.

  • @GalaxyZhou1006
    @GalaxyZhou1006 3 ปีที่แล้ว +4

    for part 1 the solution does not seem to include case for example that the sixth lowest paid employee has the same salary with the fifth one, do you include the sixth person or not. in that case you could apply rank or dense_rank?

    • @sunny73555
      @sunny73555 3 ปีที่แล้ว

      You should use dense rank

  • @rick2591
    @rick2591 2 ปีที่แล้ว +1

    The join would break the query causing salaries double triple, etc.

  • @brothermalcolm
    @brothermalcolm 3 ปีที่แล้ว +1

    You don’t need a cte if you place the case statement inside the having statement sum(case when...)

    • @YT-yt-yt-3
      @YT-yt-yt-3 3 ปีที่แล้ว

      infact do we need case statement? why not this - having count(e.id) = count(p.end_dt is null)

  • @abhijitbhaduriya117
    @abhijitbhaduriya117 3 ปีที่แล้ว

    These are super helpful. Thanks for this, Jay.

  • @MattIn3rtia
    @MattIn3rtia 5 หลายเดือนก่อน

    Not sure about that first one. The end date might be in the future so the projects unfinished. The Qs a bit ambigous. I'd have gone with the following:
    SELECT
    id
    FROM employees
    WHERE id IN
    (SELECT DISTINCT employee_id
    FROM projects
    WHERE COUNT(DISTINCT project_id_) > =10)
    ORDER BY salary ASC
    LIMIT 5

  • @pratik903
    @pratik903 4 หลายเดือนก่อน

    Just out if curiosity, a weird question related to 3rd scenerio. As it is mentioned in the video that person who completed 10 projects in a week shouldn't be given more raise than a person who completes 5 project in a month.
    Ideally person completing 10 projects in leas time is more efficient and should be given more raise?
    Please suggest where I am wrong

  • @tilibanner5920
    @tilibanner5920 2 ปีที่แล้ว

    Question 2: sum(e.salary) and then group by employee_id? That's not producing what the question is asking for.

  • @LeoLeung.93
    @LeoLeung.93 หลายเดือนก่อน

    They both did wrong on the second question, they agreed on the definition that employees without projects on hand are not considered but Jay's answer included those employees.

  • @brittosabu8401
    @brittosabu8401 2 ปีที่แล้ว +1

    Can we use this for second question ?
    SELECT SUM(salary)
    FROM Employees
    WHERE employee_id IN
    (SELECT employee_id
    FROM Projects
    GROUP BY 1
    HAVING MAX(END_DATE) IS NULL)

    • @mattcarter4749
      @mattcarter4749 2 ปีที่แล้ว

      I believe the only error is using "employee_id" in your where statement. Since you have "From Employees", you would need "Where id IN".

    • @mattcarter4749
      @mattcarter4749 2 ปีที่แล้ว +2

      This is the query I wrote to solve, very similar to yours except I used "having count(end_dt) = 0" instead of max()
      select sum(salary)
      from Employees
      where id in (select employee_id
      from Projects
      group by employee_id
      having count(end_dt) = 0
      )

  • @santiagocanon5281
    @santiagocanon5281 3 ปีที่แล้ว +1

    I have an interview with a large tech company on Monday for a DS position. While I am confident in my SQL skills, it is good to see and understand the soft skills that should be leveraged during the interview. I did want to ask is my position also requires R which I am proficient in as well, but do you have any insight into R interview questions?

    • @venkatnetha8382
      @venkatnetha8382 3 ปีที่แล้ว

      For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
      payhip.com/b/ndY6
      You can download the sample pages so as to see the quality of the content.

  • @hardikojha220
    @hardikojha220 3 ปีที่แล้ว +1

    Very interesting thought process

  • @shifaali1923
    @shifaali1923 3 ปีที่แล้ว

    how about?
    select r.id from (select e.id as id,,e.salary,p.project_id,rank() over(Partition by e.id,p.projectid order by e.salary asc) as rank from employ e inner join project p on e.id=p.project_id where p.end_dt is not null)r
    where r.rank

  • @OkechukwuOfili
    @OkechukwuOfili 3 ปีที่แล้ว +1

    What if different projects finished on the same date:
    [ count (p.end_dt) >= 10 ]
    won't your count then be off?
    Would it not be better to do a
    [ count (distinct p.projects_id) >= 10 ] ?

    • @hariniprabakaran5434
      @hariniprabakaran5434 3 ปีที่แล้ว

      Yes, also We need to handle the end date case

    • @pavelbakunovitch
      @pavelbakunovitch 3 ปีที่แล้ว +1

      No, it won't. count(p.end_dt) is counting every not null p.end_dt even they are the same.

  • @mimansamaheshwari4664
    @mimansamaheshwari4664 2 ปีที่แล้ว

    In the first question if employee table why did we drop cases with null end_date?

  • @5lanediver
    @5lanediver ปีที่แล้ว

    for problem 2 you never tacked case 3, when there is no projectid associated with an employeeid...

  • @iremozen7172
    @iremozen7172 2 ปีที่แล้ว

    great video!! thank you.

  • @Rohit_Ankita
    @Rohit_Ankita 2 ปีที่แล้ว

    For part 1: can we do count(distinct project_id) >= 10 in having clause and where clause should have end date is not null. Thoughts.

  • @yyyj926
    @yyyj926 3 ปีที่แล้ว

    Thank you, Andrew and Jay.

    • @venkatnetha8382
      @venkatnetha8382 3 ปีที่แล้ว

      For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
      payhip.com/b/ndY6
      You can download the sample pages so as to see the quality of the content.

  • @nppreaches
    @nppreaches 3 ปีที่แล้ว

    which query of the two will perform better assuming you have a big number in terms of employees and projects?

  • @akshitsinghal8590
    @akshitsinghal8590 2 ปีที่แล้ว

    Really productive session

  • @khiemgluong
    @khiemgluong 3 ปีที่แล้ว

    learning SQL and this is very helpful

  • @gabiscarabel
    @gabiscarabel ปีที่แล้ว

    Excellent video!

  • @ananyadutta7852
    @ananyadutta7852 3 ปีที่แล้ว

    For the first ques what if two projects have the same end date we would end up counting it as 1 project which could be actually multiple projects. I propose to filer as end date not null and count project id. Am i missing something here?

    • @Amber-ws5hy
      @Amber-ws5hy 2 ปีที่แล้ว

      Andrew used count() instead of count(distinct), so it shouldn't matter I think

  • @max007sh
    @max007sh 3 ปีที่แล้ว +1

    Was wondering why you used aggregating (and having clause) for the first question. Wouldn't be simpler to use a where clause?

    • @nanfengbb
      @nanfengbb 3 ปีที่แล้ว

      Nope. "WHERE COUNT(p.*)>=10 " would show an syntax error. It has to be GROUP BY followed by HAVING.

    • @max007sh
      @max007sh 3 ปีที่แล้ว

      @@nanfengbb @B Karcher select e.id from employee e inner join (select employee_id eid, count(end_dt) c from project group by employee_id) temp on e.id=temp.eid
      where temp.eid>10
      order by ...

    • @max007sh
      @max007sh 3 ปีที่แล้ว

      but now that I think about it, having seems simpler 😅

  • @joehatch3289
    @joehatch3289 3 ปีที่แล้ว

    Can I ask, for part 1 what ensures that there is no duplicates? If the lowest salary emplyee was on multiple projects?

    • @Han-ve8uh
      @Han-ve8uh 3 ปีที่แล้ว

      group by e.id removes possibility of any duplicates in employee/salary that can cause some of lowest 5 salaries to come from same employee

  • @tz1938
    @tz1938 3 ปีที่แล้ว

    This is awesomeeee!

  • @AnujJaiswal91
    @AnujJaiswal91 3 ปีที่แล้ว

    Great process

  • @itsadamjosh
    @itsadamjosh 2 ปีที่แล้ว

    Whats the code for amc and gme to go to the moon

  • @brothermalcolm
    @brothermalcolm 3 ปีที่แล้ว

    Employees who’ve done at least 10 project, then select bottom 5 by salary

  • @SamairaJain11
    @SamairaJain11 2 ปีที่แล้ว

    In the first question he is literally passing the time🤣

  • @flyingcucumber
    @flyingcucumber 2 ปีที่แล้ว

    Got a interview tomorrow. I'm f*cked

  • @Peekabostreet
    @Peekabostreet 3 ปีที่แล้ว

    They miscommunicated a couple times but very helpful

  • @comatosetorpor3602
    @comatosetorpor3602 3 ปีที่แล้ว +1

    i dont think they r tricky at all. for any one who knows and uses sql in industry, they r basic.

  • @annxiao7721
    @annxiao7721 ปีที่แล้ว +1

    I can’t talk and think at the same time. I hate it that straightforward SQL case can appear so much more difficult just because u have to explain to the interviewers while u r working on it.. it’s just anti humanity…. 😢

  • @ravitanwar9537
    @ravitanwar9537 3 ปีที่แล้ว +1

    oo princes of data science :)

    • @AndrewMoMoney
      @AndrewMoMoney 3 ปีที่แล้ว +1

      Wahahaha our royalty status is highly in question

    • @ravitanwar9537
      @ravitanwar9537 3 ปีที่แล้ว

      @@AndrewMoMoney 😂
      I can vouch for that

  • @pinkyellow1able
    @pinkyellow1able 3 ปีที่แล้ว

    can you please do benford's law with the 2020 election numbers? would really get your channel some views and open up the truth to alot of people

  • @masheTyMasha
    @masheTyMasha ปีที่แล้ว

    Cool

  • @jonathancauchi6457
    @jonathancauchi6457 3 ปีที่แล้ว

    Cool video, but not tricky at all.

  • @drkenny7928
    @drkenny7928 2 ปีที่แล้ว

    Dude made this overly complicated

  • @tsume184tg
    @tsume184tg 2 ปีที่แล้ว

    SELECT e.id, e.salary, p.employee_id, p.project_id, COUNT(project_id) AS numberofprojects
    FROM projects AS p
    INNER JOIN employees AS e
    ON e.id = p.employee_id
    WHERE numberofprojects >= 10
    GROUP BY e.id, e.salary, p.employee_id, p.project_id
    ORDER BY e.salary DESC LIMIT 5;
    I think that should work for the first scenario. I am still learning, on day 2. Hopefully i got that correct. If i can get some feedback that would be great.

  • @1danielsz24
    @1danielsz24 2 ปีที่แล้ว

    I am completely started in sql but I imagined a different way and I would like to know if I would return the output correctly
    SELECT
    employee_id,
    MIN(salary),
    COUNT(project_id) AS project_number
    FROM
    employee e
    INNER JOIN
    projects p
    ON
    e.id = p.employee_id
    WHERE project_number >= 10
    GROUP BY employee_id
    ORDER BY salary
    LIMIT = 5