Why We Don't Use the Mean Squared Error (MSE) Loss in Classification

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ก.ย. 2024

ความคิดเห็น • 18

  • @datamlistic
    @datamlistic  หลายเดือนก่อน

    **Video correction:**
    - BCE should be plus infinity at the end (thx @guilhermethomaz8328)

  • @ShahFahad-ez1cm
    @ShahFahad-ez1cm 2 หลายเดือนก่อน +2

    I would like to suggest a correction in Linear Regression, the data itself is not assumed to come from a normal distribution, but the errors are assumed to come from a normal distribution

    • @datamlistic
      @datamlistic  2 หลายเดือนก่อน +1

      Agreed, sorry for the novice mistake. I've corrected myself in my latest video. :)

  • @guilhermethomaz8328
    @guilhermethomaz8328 หลายเดือนก่อน

    Excelent video.
    BCE should be plus infinity at the end.

    • @datamlistic
      @datamlistic  หลายเดือนก่อน

      Thanks for the correction!

  • @陳峻逸-s6r
    @陳峻逸-s6r ปีที่แล้ว +1

    It's very helpful!! Many thanks.

    • @datamlistic
      @datamlistic  ปีที่แล้ว

      Thanks! Glad it was helpful! :)

  • @shahulrahman2516
    @shahulrahman2516 7 หลายเดือนก่อน

    great lecture

    • @datamlistic
      @datamlistic  7 หลายเดือนก่อน

      Thanks! Glad you liked it! :)

  • @ramirolopezvazquez4636
    @ramirolopezvazquez4636 8 หลายเดือนก่อน

    Thanks for the wonderful video.
    Could anybody be so kind to comment (or share some reference) on why the MSE loss assumes a Gaussian distribution for the underlying data?

    • @datamlistic
      @datamlistic  8 หลายเดือนก่อน +1

      You're welcome! Here's a link that explains in much more detail why the MSE loss assumes a Gaussian prior: towardsdatascience.com/why-using-mean-squared-error-mse-cost-function-for-binary-classification-is-a-bad-idea-933089e90df7. Hope it helps! :)

    • @ramirolopezvazquez4636
      @ramirolopezvazquez4636 8 หลายเดือนก่อน

      Thanks a lot for your kind answer and awesome work! @@datamlistic

  • @atendragautam4925
    @atendragautam4925 ปีที่แล้ว +1

    Summary:
    Q) Why can't we use MSE loss in logistics regression than binary cross-entropy loss?
    Ans:
    1. While maximizing the probability if you assumes output comes from Gaussian distribution than it can be proven that it's equivalent to Minimize MSE loss but If we take output distribution as Bernoulli than BCE loss would come. So There is mismatch in distributions of output
    2. If you use MSE as loss in logistics, loss would become non convex funtion(can be proved by taking the second derivative) whereas with BCE it's convex
    3. MSE doesn't penalize misclassification enough, BCE does

    • @datamlistic
      @datamlistic  ปีที่แล้ว

      Thanks for the summary! :)

    • @anamitrasingha6362
      @anamitrasingha6362 ปีที่แล้ว

      So I tried out the math and am I correct to say that in the interval 0 to 1, the loss function is neither convex nor concave hence it becomes hard to optimize this loss function via the methods which assume functions to be either convex or concave

    • @datamlistic
      @datamlistic  ปีที่แล้ว

      @@anamitrasingha6362 That's really nice. Would you mind sharing your calculation? :)

  • @Lilk-m7d
    @Lilk-m7d ปีที่แล้ว

    Minunat video😢