I would like to suggest a correction in Linear Regression, the data itself is not assumed to come from a normal distribution, but the errors are assumed to come from a normal distribution
Thanks for the wonderful video. Could anybody be so kind to comment (or share some reference) on why the MSE loss assumes a Gaussian distribution for the underlying data?
You're welcome! Here's a link that explains in much more detail why the MSE loss assumes a Gaussian prior: towardsdatascience.com/why-using-mean-squared-error-mse-cost-function-for-binary-classification-is-a-bad-idea-933089e90df7. Hope it helps! :)
Summary: Q) Why can't we use MSE loss in logistics regression than binary cross-entropy loss? Ans: 1. While maximizing the probability if you assumes output comes from Gaussian distribution than it can be proven that it's equivalent to Minimize MSE loss but If we take output distribution as Bernoulli than BCE loss would come. So There is mismatch in distributions of output 2. If you use MSE as loss in logistics, loss would become non convex funtion(can be proved by taking the second derivative) whereas with BCE it's convex 3. MSE doesn't penalize misclassification enough, BCE does
So I tried out the math and am I correct to say that in the interval 0 to 1, the loss function is neither convex nor concave hence it becomes hard to optimize this loss function via the methods which assume functions to be either convex or concave
**Video correction:**
- BCE should be plus infinity at the end (thx @guilhermethomaz8328)
I would like to suggest a correction in Linear Regression, the data itself is not assumed to come from a normal distribution, but the errors are assumed to come from a normal distribution
Agreed, sorry for the novice mistake. I've corrected myself in my latest video. :)
Excelent video.
BCE should be plus infinity at the end.
Thanks for the correction!
It's very helpful!! Many thanks.
Thanks! Glad it was helpful! :)
great lecture
Thanks! Glad you liked it! :)
Thanks for the wonderful video.
Could anybody be so kind to comment (or share some reference) on why the MSE loss assumes a Gaussian distribution for the underlying data?
You're welcome! Here's a link that explains in much more detail why the MSE loss assumes a Gaussian prior: towardsdatascience.com/why-using-mean-squared-error-mse-cost-function-for-binary-classification-is-a-bad-idea-933089e90df7. Hope it helps! :)
Thanks a lot for your kind answer and awesome work! @@datamlistic
Summary:
Q) Why can't we use MSE loss in logistics regression than binary cross-entropy loss?
Ans:
1. While maximizing the probability if you assumes output comes from Gaussian distribution than it can be proven that it's equivalent to Minimize MSE loss but If we take output distribution as Bernoulli than BCE loss would come. So There is mismatch in distributions of output
2. If you use MSE as loss in logistics, loss would become non convex funtion(can be proved by taking the second derivative) whereas with BCE it's convex
3. MSE doesn't penalize misclassification enough, BCE does
Thanks for the summary! :)
So I tried out the math and am I correct to say that in the interval 0 to 1, the loss function is neither convex nor concave hence it becomes hard to optimize this loss function via the methods which assume functions to be either convex or concave
@@anamitrasingha6362 That's really nice. Would you mind sharing your calculation? :)
Minunat video😢
Thanks!