For past two days, I was watching different videos and reading articles to understand the core of ridge regression. I got tired as I wasn't understanding. And here I am, after half past this video, I think I've got the grasp. It'd be biased to say previous contents didn't help me at all, but your lecture is so much insightful than those. Thank you very much for sharing your learnings with us.
Hi, how can we say that if the slope is very high then it will be the case of overfitting, it can be underfitting also. I think high slope doesn't mean it will perfectly fit on our training data. Please help me out.
i watched his video just now after reading your comment, it's mostly same nothing better, even he has not explained why, "what if the incorrect fit line is already on the right side and imaginary true fit is on left ,then ridge will shift it more right away from true fit,, ".?"
Hi Nitish, Very nice video. Just one thing I noticed - around 04:00. For a given intercept b, when m changes, it is basically the orientation or the angle the line makes with the x-axis changing. So when m is either too small or too high, there is underfitting. As can be seen geometrically the line is quite away from the data points for high and low m. So the overfitting - meaning line is very close to the data points is for only certain values of m - particularly between high m and low m values. Please let me know your thoughts on this. Regards, Krish
Sir what if the incorrect fit line is already on the right side and imaginary true fit is on left ,then ridge will shift it more right away from true fit,, ".? It become irregularisation. Isn’t it.?
I appreciate your work and no one can teach you like you but there is just a thing, Overfitting doesn't mean high slope in simple linear regression. Overfitting means you have used very complex model which is not able to generalise for new data which is not in training. Simple linear regression is simplest model, so they can't be overfitting in it. There can be only underfitting.
exactly , simple linear regression mai over fitting ho hi nahi skti , because the line will not bend to pass from each and every data point in the training data set , yes it can underfit and best fit
The whole idea should be to reduce the overfitting of 1st line. But we are having 2nd linewith different parameters. There should be only first line and when we multiply it with lambda*m^2 then it should give less error. Here we already having 2nd line. When i calculated the loss without lambda term loss was even less. Idk. Someone please clearify this.
but in the graph at 7:28 if jo 2 training points hai vo test points ke neeche hotai and best fit line ke slope ko increase krna padta na overfitting ko handle krne ke liye ??? I think is video ka logic flawed hai
But sir why did you choose two training points above the actual dataset. If we chose those two training points below the actual dataset then the correct line's slope is higher than predicted lin's slope. So the loss of the predicted line's slow will be less
exactly for this problem i came into comment box!! i mean if we give all data and not only that two point normal linear regretion will also choose that line that we want after ridge regression
There are two points in our training dataset -> (1,2.3) and (3,5.3). For calculating the loss at the second point, Yi = 5.3, Xi = 3. Y_hat = m*Xi + b where m=0.9, Xi = 3, b=1.5. Y_hat = 0.9*3+1.5 I hope it helps?
aapki jo overfitting wali line hai , vo toh low bias and high variance wali definition ko satisfy hi nahi kr rhi , i dont think that overfitting is possible in case of simple linear regg, because the line cant bend to pass from each and every data point of the training data set
why there is no learning rate hyperparameter in scikit-learn Ridge/lasso/Elasticnet . As it has a hyperparameter called max_iteration that means it uses gradient descent but still there is no learning rate present in hyperparameters . if anyone knows please help me out with it.
I didn't thought about this... Just I seen from documentation, that All Solvers are not using Gradient Descent. (which i think) SAG - uses a Stochastic Average Gradient descent, The (step size/learning rate) is set to 1 / (alpha_scaled + L + fit_intercept) where L is the max sum of squares for over all samples. ‘svd’ uses a Singular Value Decomposition, (Matrices) cholesky, (Matrices) ... Otherwise, Like SAG, all solvers based upon the data and solver, automatically calculate the learning rate What's ur opinion?
I really appreciate your effort and all your videos but I think the explanation is incorrect here. m being high is not the definition of overfitting in a typical linear regression with m + 1 weights, if we do not constrain the value of weights and let them be anything, then they can represent very complex functions and that causes overfitting we have to penalize large values of weights (by adding in the loss function) so that our function has lower capacity to represent complexity and hence it wont learn complex functions that just fit the training data well
You're absolutely right. Overfitting occurs when the model becomes too complex, which can happen if the weights are unconstrained and grow too large, allowing the model to fit the noise in the data. Regularization techniques like Ridge regression help prevent this by adding a penalty to the weights, ensuring the model remains simpler and generalizes better to unseen data. Great explanation!
Sorry to say but you share the bookish knowledge this time. Practical intuition is not there. Adding something parallelly shifts the line upward. How does it able to make change in the slope? You said, mostly ds keeps this model as default as it will only be active if there is a situation of overfitting, Kindly explain that how? How model is able to find the best fit line of test set is that you assumed it on your own. Does algorithm do the same?
For past two days, I was watching different videos and reading articles to understand the core of ridge regression. I got tired as I wasn't understanding. And here I am, after half past this video, I think I've got the grasp. It'd be biased to say previous contents didn't help me at all, but your lecture is so much insightful than those. Thank you very much for sharing your learnings with us.
Charansparsh aapko 🙏
Never ever have I seen such brilliance in teaching.
Well explained sir please i request you never Stop
Omg this is brilliant. Exactly what I've been looking for. Thanks for making our lives easier
You are look like a genius and tech like a professor
at 10:51 why we considered training datapoints in testing but yes there are other training points who will create y-y^ value
very well explained . sir you have done a lot of hardwork on your lectures . keep going .
Mesmerising such lucid explanation
God of ML teaching
Thank You Sir.
You are a game changer sir
Great explanation sir.
Nitish!!!! You are amazing
Awesome series brother. Great work done by you..Looking for Mlops video
Hi, how can we say that if the slope is very high then it will be the case of overfitting, it can be underfitting also. I think high slope doesn't mean it will perfectly fit on our training data. Please help me out.
this is the first video whose concept was'nt very cleared for me but still great video sir
Krish Naik hindi has explained this better ; Rest others till now ; CampusX seems good
For only this algorithm krish naik is explained better or for all algorithms krish naik are explained better?
i watched his video just now after reading your comment, it's mostly same nothing better, even he has not explained why, "what if the incorrect fit line is already on the right side and imaginary true fit is on left ,then ridge will shift it more right away from true fit,, ".?"
Sir Legend ho aap
Hi Nitish,
Very nice video.
Just one thing I noticed - around 04:00. For a given intercept b, when m changes, it is basically the orientation or the angle the line makes with the x-axis changing. So when m is either too small or too high, there is underfitting. As can be seen geometrically the line is quite away from the data points for high and low m. So the overfitting - meaning line is very close to the data points is for only certain values of m - particularly between high m and low m values. Please let me know your thoughts on this.
Regards,
Krish
That statement is incorrect
Sir will you including SVM t-sne and all ahead in the 100 days ML playlist?
Yes
Beautiful explaination!
Sir what if the incorrect fit line is already on the right side and imaginary true fit is on left ,then ridge will shift it more right away from true fit,, ".? It become irregularisation. Isn’t it.?
Thanks for this session.
I appreciate your work and no one can teach you like you but there is just a thing, Overfitting doesn't mean high slope in simple linear regression. Overfitting means you have used very complex model which is not able to generalise for new data which is not in training. Simple linear regression is simplest model, so they can't be overfitting in it. There can be only underfitting.
exactly , simple linear regression mai over fitting ho hi nahi skti , because the line will not bend to pass from each and every data point in the training data set , yes it can underfit and best fit
Sir this video has cames after 5 days , everything is fine now
The whole idea should be to reduce the overfitting of 1st line. But we are having 2nd linewith different parameters. There should be only first line and when we multiply it with lambda*m^2 then it should give less error. Here we already having 2nd line. When i calculated the loss without lambda term loss was even less. Idk. Someone please clearify this.
but in the graph at 7:28 if jo 2 training points hai vo test points ke neeche hotai and best fit line ke slope ko increase krna padta na overfitting ko handle krne ke liye ??? I think is video ka logic flawed hai
Sir thoda dark marker use Karo
the slope value in a linear regression model does not directly indicate overfitting
Yes ofcourse but I think what he's trying to suggest is that some suspiciously high values "might" be indicative of over fitting.
But sir why did you choose two training points above the actual dataset. If we chose those two training points below the actual dataset then the correct line's slope is higher than predicted lin's slope. So the loss of the predicted line's slow will be less
exactly for this problem i came into comment box!!
i mean if we give all data and not only that two point normal linear regretion will also choose that line that we want after ridge regression
Regularization is regression here ??
Just one issue.. why did you multiply 0.9*3 while calculating the loss at second point?
even i am confused on this .,..:-(
It is clearly mentioned by Sir in the video that it is just an assumption.
There are two points in our training dataset -> (1,2.3) and (3,5.3).
For calculating the loss at the second point,
Yi = 5.3, Xi = 3.
Y_hat = m*Xi + b where m=0.9, Xi = 3, b=1.5.
Y_hat = 0.9*3+1.5
I hope it helps?
@@somanshkumar1325 yooo...
Awesome
I guess, Bias is more compared to variance in Overfitting. Vice versa in Underfitting. Please correct me
It's opposite
low bias in overfitting and low variance in underfitting
aapki jo overfitting wali line hai , vo toh low bias and high variance wali definition ko satisfy hi nahi kr rhi , i dont think that overfitting is possible in case of simple linear regg, because the line cant bend to pass from each and every data point of the training data set
Koi please bata do training error generalized error testing error irreducible error kis section main hai mera exam hain 20 dec ko
why there is no learning rate hyperparameter in scikit-learn Ridge/lasso/Elasticnet . As it has a hyperparameter called max_iteration that means it uses gradient descent but still there is no learning rate present in hyperparameters . if anyone knows please help me out with it.
Did u get the answer??
@@near_. no still waiting for some expert to reply
I didn't thought about this...
Just I seen from documentation, that All Solvers are not using Gradient Descent. (which i think)
SAG - uses a Stochastic Average Gradient descent, The (step size/learning rate) is set to 1 / (alpha_scaled + L + fit_intercept) where L is the max sum of squares for over all samples.
‘svd’ uses a Singular Value Decomposition, (Matrices)
cholesky, (Matrices)
...
Otherwise, Like SAG, all solvers based upon the data and solver, automatically calculate the learning rate
What's ur opinion?
THANK
I really appreciate your effort and all your videos but I think the explanation is incorrect here.
m being high is not the definition of overfitting
in a typical linear regression with m + 1 weights, if we do not constrain the value of weights and let them be anything, then they can represent very complex functions and that causes overfitting
we have to penalize large values of weights (by adding in the loss function) so that our function has lower capacity to represent complexity and hence it wont learn complex functions that just fit the training data well
You're absolutely right. Overfitting occurs when the model becomes too complex, which can happen if the weights are unconstrained and grow too large, allowing the model to fit the noise in the data. Regularization techniques like Ridge regression help prevent this by adding a penalty to the weights, ensuring the model remains simpler and generalizes better to unseen data. Great explanation!
Sorry to say but you share the bookish knowledge this time. Practical intuition is not there. Adding something parallelly shifts the line upward. How does it able to make change in the slope? You said, mostly ds keeps this model as default as it will only be active if there is a situation of overfitting, Kindly explain that how? How model is able to find the best fit line of test set is that you assumed it on your own. Does algorithm do the same?
Regularization: th-cam.com/play/PLKnIA16_RmvZuSEZ24Wlm13QpsfLlJBM4.html
Check out this playlist, maybe this will help
plz spend a lot more time on code