From Scratch: How to Code Logistic Regression in Python for Machine Learning Interviews!

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ส.ค. 2024

ความคิดเห็น • 38

  • @willjohnson9
    @willjohnson9 3 ปีที่แล้ว +32

    I got the job! Thanks so much for these great videos, Emma! Having these as study materials helped me practice in ways that are really hard to duplicate, especially when everyone is remote these days. I think I went through every video about three times, and several of the ideas you raised in your videos were brought up by the interviewers in the on-site. The hypothetical questions were the best parts because I could pause, practice answering the question myself, then follow your input as a way to get feedback. I’m now recommending this channel to anyone in my DataSci network looking to brush up on interview practice. This channel and the StatQuest channel made the difference, you rock!

    • @emma_ding
      @emma_ding  3 ปีที่แล้ว +3

      Congrats, Will! And thanks for the feedback! It is people like you who keep me motivated. Best of luck with your new job!

    • @hariniupparpalli807
      @hariniupparpalli807 2 ปีที่แล้ว

      D

  • @serhiidyshko
    @serhiidyshko 3 ปีที่แล้ว +10

    Even though logistic regression is used for classification, it is still a regression! Only selecting the decision boundary/threshold (which may be different from 0.5) makes it a classification algorithm.

  • @tiantiantianlan
    @tiantiantianlan 3 ปีที่แล้ว +6

    在实际使用GPU计算的过程中,实际上GD是比batch GD快的,因为GD可以parallel compute,而batch GD能parallel的部分比较有限。很多论文都有验证过这一点。您只单独提到说GD很慢有点偏颇了。batch GD最大的好处在于不容易走到gradient为0的地方(Optimization),且更不容易overfit(Generalization)(有相关论文做过实验)

    • @liuauto
      @liuauto 3 ปีที่แล้ว

      能贴出paper吗,可能po主指的是收敛速度?

  • @Han-ve8uh
    @Han-ve8uh 3 ปีที่แล้ว +7

    Similar to the linear regression implementation video, I still have problems with the argument that "the sign used to update the gradient depends on how you set up the loss function". I don't think it does, i'm thinking it should always be param -= gradient * LR.
    At 9:23, there's this discussion of derror_dy = pred - y[i] vs y[i]-pred. How do these 2 equations relate to the logloss at 5:10? Where in the model formulation did we have the freedom to choose to set it up as pred - y[i] or y[i]-pred? (and thus leading to the 9:12 point you want the audience to pay attention to)

    • @Bookerer
      @Bookerer 2 ปีที่แล้ว +1

      I agree with you. I found this stanford resource web.stanford.edu/~jurafsky/slp3/5.pdf where section 5.6 equation 5.25 shows the gradient descent equation. It is always moving in the direction negative of the gradient. Section 5.6.1 equations 5.29 and 5.30 should address your concerns regarding pred-y or y-pred. In the end, it doesnt matter. You just need to be careful of the negative signs (dont forget, the negative LLH itself also has a negative sign) and the final answer should be the same.

  • @jairocarreon6806
    @jairocarreon6806 2 ปีที่แล้ว

    Thank you so much for making these videos 🥺 this weekend I'll watch them all and take notes. This are so helpful 🥰

  • @saeidsamizade6257
    @saeidsamizade6257 3 ปีที่แล้ว +1

    Really clear and easy to understand description. 👍

  • @lydiamai6861
    @lydiamai6861 3 ปีที่แล้ว +1

    Thanks Emma, I am learning regression, p and residual, etc.

    • @emma_ding
      @emma_ding  3 ปีที่แล้ว

      Good to hear, Lydia! Let me know if you have any questions!

  • @diegozpulido
    @diegozpulido 3 ปีที่แล้ว +1

    Thanks. Great video!

  • @pavankulkarni7628
    @pavankulkarni7628 3 ปีที่แล้ว

    Very good explanation. Thank you

  • @user-me2mm2xu7j
    @user-me2mm2xu7j 3 ปีที่แล้ว

    Could you explain a little more about learning rate and how to optimize this parameter?
    Thx!

  • @TheACG22
    @TheACG22 ปีที่แล้ว

    When computing gradient-descent on mini-batches, it's okay to draw random datapoints with replacement? Due to the randint method, you could be getting the same datapoint multiple times.

  • @jiayiwu4101
    @jiayiwu4101 3 ปีที่แล้ว

    I saw some textbooks and documents mention using iteratively reweighted least squares or Newton's method to get estimates of parameters. What are the differences between these two and Gradient Descent? Thank you!

  • @hl1449
    @hl1449 3 ปีที่แล้ว +1

    How do you remember the gradient formula at 6:10 (to use it in interview)?
    Every data scientist can derive it in 10mins if they are **not** under pressure but in an interview setting you'd better remember it. Any tips?
    ---
    BTW for those who like to vectorize - this formula can be succinctly written as:
    dJ/db = sum_over_i ( p(x_i) - y_i ) x_i

    • @junqichen6241
      @junqichen6241 3 ปีที่แล้ว

      It's actually pretty easy to remember. p(x_i) - y_i is almost like residual (difference between prediction and actual).

    • @hl1449
      @hl1449 3 ปีที่แล้ว

      @@junqichen6241
      I think every data scientist knows p(x_i) - y_i is the residual.
      Guess my question is how to find an intuitive explanation - using *** zero derivation *** - why the derivative of the binary log loss is exactly equal to sum_over_i ( p(x_i) - y_i ) x_i .
      Maybe zero derivation is a bit extreme but it may be possible to see this formula is obvious in some sense

    • @maryamaghili1148
      @maryamaghili1148 2 ปีที่แล้ว

      this is a valid point. I have difficulty calculating the derivate of log loss on the fly without memorizing it a few days earlier.

    • @hl1449
      @hl1449 2 ปีที่แล้ว

      @@maryamaghili1148 I know right.
      This is truly the bottleneck of implementing logistic regression in an **interview setting**.
      And she simply glossed over it (regrettably).

  • @nan4061
    @nan4061 3 ปีที่แล้ว

    Hi, Emma, thank you for the great video! Does this kind of knowledge depth apply to the data analyst interview? Thanks!

    • @emma_ding
      @emma_ding  3 ปีที่แล้ว +1

      Typically not, this is more focused on Data Science - Machine Learning or Machine Learning Engineer roles.

  • @nataliatenoriomaia1635
    @nataliatenoriomaia1635 3 ปีที่แล้ว

    Hi Emma, thanks for the great video! I wonder if during the interview they would allow us to use python modules like "numpy" to help implement the algorithm. Do you know if that is usually allowed?

    • @emma_ding
      @emma_ding  2 ปีที่แล้ว +3

      Hey, So in truth it depends on the company. I would suggest checking with recruiters to find that out. For company specific interviews, recruiters are your best friend to get information like this.

  • @katekatebangbang2435
    @katekatebangbang2435 3 ปีที่แล้ว +1

    前排素质三连之后马上开始看-v-

  • @thampasaurusrex3716
    @thampasaurusrex3716 3 ปีที่แล้ว

    Could you explain about what happens if we set derivative of loss function to 0. Why we can`t do that?

    • @jiayingyou8277
      @jiayingyou8277 3 ปีที่แล้ว +1

      Because its hard to compute. See more here: stats.stackexchange.com/questions/23128/solving-for-regression-parameters-in-closed-form-vs-gradient-descent

    • @thampasaurusrex3716
      @thampasaurusrex3716 3 ปีที่แล้ว

      @@jiayingyou8277 thank you

  • @MrReapzzGaming
    @MrReapzzGaming 3 ปีที่แล้ว

    When x = all our independent features
    Shouldn't n = len(x) and m = len(x[0])?
    Because you say that n is the number of dimensions / features
    And m is the number of data points.
    But in your code n and m are the other way around

    • @emma_ding
      @emma_ding  3 ปีที่แล้ว

      No, the number of data points is the first dimension, i.e. len(x), and the number of features is the 2nd dimension, i.e. len(x[0]).

  • @jennywu799
    @jennywu799 3 ปีที่แล้ว

    我的理解是beta是每个feature在logistic regression里边的weight,beta越大,代表这个feature越重要?想问下这么想对吗?

    • @jiayingyou8277
      @jiayingyou8277 3 ปีที่แล้ว

      对 是beta的绝对值准确的来说

    • @ys2660
      @ys2660 2 ปีที่แล้ว

      不是的

    • @xiangyangmeng4354
      @xiangyangmeng4354 2 ปีที่แล้ว

      有一些feature 是correlated,所以beta不一定accurate。

  • @Bookerer
    @Bookerer 2 ปีที่แล้ว

    In the mini-batch gradient descent, how do you ensure you do not sample data points that you have already processed? Doesnt seem like the code handles that?