there is small typo in sigmoid fuction (1:00) As-is: h_hat = 1 / (1 * e^ "-wx+b") To-be: h_hat = 1 / (1 * e^ "-(wx+b)") Always appreciate you these great videos~
thx dude was searching all over the web if you have to put the truncating mechanism with 0.5 into the predict function which is used by gradient descent/ cost f but you successfully showed me that its just for the prediction hypo which is used afterwards
I have 2 qs: 1.why we are transposing x(i checked from numpy documentation it is used to change the dimension, but i cannot get the point here) 2.how we r getting the summation without applying np.sum Can you please ans ?
Guys for those wondering the gradient descent used here is same as of linear regression so the answer is the derivative of log loss will have same value as X.t*wieghts + bias including 1/n
Is learning calculus a pre-requisite to this series -- I am learning, but feel a bit lost when it comes to the implementations because it is difficult for me to understand the underlying mathematical concepts. I do appreciate the videos!
When I code and run this model on the "advertising data set" from Kaggle the accuracy is only in the 40-50% range while the sklearn LogisticRegression model is over 90%. I've tried varying the number of iterations and learning but I can't get an accuracy score above 50%.
this is because your x has a very high number, so np.exp(-x) is getting too high for your datatype. you could cut your x in this case and set a maximum value, or try to use np.float64 instead of np.float32
Thank you very much for your video. I wonder why you are not checking your model at each iteration and returning the model with lowest error, instead of returning the model with the last w and b parameters of the for loop.
just started learning this and try running the code on jupyter notebook, It keeps saying no module named logistic regression it might be stupid one, but please let me know why it's happening
Hi bro, you have used squared loss function . But logistic regression has log loss .If we derivate square loss wrt to w and b ,do we get same as log loss derivative?
Hi. The loss I'm showing is the log loss (or better known as cross-entropy). However, the gradient is the same as for square loss in this case. You can check the further readings I provided in the description for a detailed gradient calculation :)
Hi your videos are just awesome! One question: the iterations of the fit method's for loop should correspond to neural network's hidden layers . it's true?
No, this training loop here has nothing todo with neural networks, it's simply how long this optimization should try to improve. However, you can compare it with the number of epochs when training a neural net
Great insight thank you. It would be even better if you could have shown us the raw data and just explained the variables and what exactly we were trying to predict etc....thanks
I have questions regards random_state, some sources set it to 42 or 95 and when I changed this number, accuracy change as well, for example in the make_blobs dataset if I set it to 95, the classifier gave a good accuracy(~99%) but when I set it to 42 it gave around 88%. Also, I got this error (RuntimeWarning: overflow encountered in exp return 1 / (1 + np.exp(-x)) with exp when I change the learning rate value.
random state let's use reproduce your result. It does not matter which number you use (some people just like 42) . It affects the training and test samples. some splits work better than others. if your x is too large, you can get an overflow because exp(-x) is too large. try to clip your x to a maximum, or try to use datatype float64 instead of float32
@@patloeber Thank you very much, I will try these solutions, I tried absolute (abs) but the plotting of loss function look wired (got values under zeros)
@@patloeber I am reading the book '"Data Science from scratch, Joel Grus" which is really good too, but sometimes I have some trouble understanding the use of some formulas. Your channel is great because it has the same approach "from scratch" and I think it is really useful for those who are interested in the statistics/math behind all the magic of ML
Can you explain me why i am getting this error ValueError: not enough values to unpack (expected 2, got 1) def fit(self, X, y): ---> 12 n_samples, n_features = X.shape in this 12 number line Edit : when i am doing this with normal logistic regression function from sklearn it works but why not with the one we created
because sklearn can handle incorret shapes and then transforms it to the correct one for you. here you have to do this yourself. X does not have the correct shape here
Hi...i am using the below code line to update weight and bias but is giving me the error..could u please help here is the code. w = w - (alpha_lr_rate * dw) b = b - (alpha_lr_rate * db) where w = np.random.normal(loc=0.0, scale=1, size = X.shape[1]) b=0 error: operands could not be broadcast together with shapes (15,) (15,37500)
Why use gradient decent ... can't we make the derivative with respect to the parameter equal to zero and find out the parameter w and b ....by solving the equations... please answer this question I really need the answer
Good question! Gradient descent is an iterative approach to this solution. In theory your analytical method is optimal. However, in practice, this requires solving complex equations which is too expensive for higher dimensions. Moreover, in the real world, many cost functions don't have valid derivatives everywhere.
Is there any resources for this with you ....any blog which can be read....I read the blog on linear regression but could not find the same method for logistic regression
@@thecros1076 I have a blog at python-engineer.com. Unfortunately, at this moment I do not have articles for the machine learning tutorials, but they will be added in the future
It is very hard to equate the derivative to zero and find minima value because in machine learning optimisation derivatives will be in complex forms. let say if you have derivative of some function as 5x+5, you can equate this to 0 and find minima for inputting x values. So that's why GD/SGD algorithms are used in ML
hi , i am getting a runtime warning RuntimeWarning: overflow encountered in exp return 1/(1+np.exp(-x)) 0.8947368421052632 what should i do to avoid this?
I used this Logistic Regression model algorithm for prediciton of disease and while pickling i got this error for this model . Can you please explain what kind of error is this and how to overcome . Plz help me out UnpicklingError: invalid load key, '\xe2'.
Yes it can. For multinomial logistic regression you have to use the softmax function instead of the sigmoid function to approximate the y. So y_predicted = self._softmax(linear_model). Furthermore you have to apply the cross-entropy as loss function and then calculate the gradients. Then again you can use the update rule with the correct gradient: self.weights -= self.lr * gradient
Hi I wrote your code and tried applying on the below custom dataset. But it raised an error. and i also tried reshaping the vector into 3 features that also gave me error it only work on the code you gave why is that????help me again i tried loading boston dataset and tried your code after that also I faced en error .could you tell me why is that X = np.arange(10) X_train = np.arange(7) y_train = np.arange(7) X_test = np.array([8,9,10]) y_test = np.array([8.5,9.5,10.5])
your X does note have the correct dimension! You have to add one more axis to X_train and x_test: X_test= X_test = np.array([[8,9,10]]). Or use np.newaxis to add a new one (have a look at my new numpy tutorial there I show this.)
@@patloeber thank you so much for your response. I will use your suggestion. I will share my inputs with you , then you can show me where and why i am making the mistake. That way i will get more insight
I follow the exact thing you did, yet I get an error that says “object has no attribute ‘sigmoid’ although I typed the exact thing? In addition your code in video and in github is different and needs updating example learning rate with lr :)
Hello. Please check out my video about SVM to see how a regularization term is applied. Basically you add the regularization term to your cost function, and then calculate the gradients. You also have to use a regularization parameter to balance the effect of the regularization during optimization. You can also check out this code: github.com/pickus91/Logistic-Regression-Classifier-with-L2-Regularization
Hello. For multinomial logistic regression you have to use the softmax function instead of the sigmoid function to approximate the y. So y_predicted = self._softmax(linear_model). Furthermore you have to apply the cross-entropy as loss function and then calculate the gradients. Then again you can use the update rule with the correct gradient: self.weights -= self.lr * gradient
@@patloeber Hello, thank you for putting these great tutorials online. We're earning a lot from them. I have implemented the Logistic regression using softmax function as you described above, but its return a score/ accuracy of zero on the iris dataset. for i in range(self.n_iterations): model = np.dot(X, self.weights) + self.bias y_predicted = self.softmax(model) dw = (1/n_samples)*np.dot(X.T,(y_predicted - y)) db = (1/n_samples)*np.sum(y_predicted - y) self.weights -= self.lr*dw self.bias -= self.lr*db def predict(self, X): model = np.dot(X, self.weights) + self.bias y_predicted = self.softmax(model) print(y_predicted) return y_predicted def softmax(self, x): return np.exp(x) / float(sum(np.exp(x)))
@@dattijomakama9703 same it doesnt work for me too. My model is predicting only one class despite having a balanced dataset. EDIT: Run it for about 1000 iterations, it worked for me.
The approximation is w*x + b. You can think of this in 2D, then this is equivalent to a line equation m*x + t. The bias is the intercept. It can shift the whole data up or down. So if your data is not centered around the origin, then we need to shift it to get the correct prediction. Hence we also try to learn the bias/intercept.
Great job. Your code is so concise and logical, but does require a solid background. Just wish I got better accuracy. I used it for the "Titanic" dataset on Kaggle and could only get 66%. Thats the lowest of all the models I have tried. Monte Carlo Markov Chain gave me the best so far at 78%. Any idea of how I can get a better score?
the cost function I showed also involves the log. Not sure which final formula your teaches uses, but is is very common to keep the logarithm in order to avoid overflow for large numbers...
Hi. The loss I'm showing is in fact the log loss (or better known as cross-entropy). However, the gradient is the same as for square loss in this case. You can check the further readings I provided in the description for a detailed gradient calculation :)
This is the most clear explanation I have seen!! Thank you so much !! :)
thank you!
Thank you buddy! This gives me a lot of sense after my self study of Machine Learning, and using a inbuild sklearn models.
THANK YOU TO THE MOON AND BACK... BEST EVER EXPLANATION I HAD SEEN
Excellent video. I really start to have a good understanding of the ML algorithm after I watch your videos.
that's great!
The way you relate Linear Regression to Logistic Regression makes it so clear thank you so much!
Glad it was helpful!
Although in the material a logarithmic loss function was shared but the gradient descent implementation is done using the square/entropy loss function
there is small typo in sigmoid fuction (1:00)
As-is: h_hat = 1 / (1 * e^ "-wx+b")
To-be: h_hat = 1 / (1 * e^ "-(wx+b)")
Always appreciate you these great videos~
Thanks for the video, now everything makes sense that what is going on in the behind.
glad to hear that :)
Great work bro , I am sure you will reach 100K soon . Best of luck
Thank you :)
You have explained this very easily. Keep it going on. :)
You saved my Ass!!!
thank you !!! your videos help me a lot :)
isnt it a single layer neural net with a sigmoid activation function?
thx dude was searching all over the web if you have to put the truncating mechanism with 0.5 into the predict function which is used by gradient descent/ cost f but you successfully showed me that its just for the prediction hypo which is used afterwards
Thank you for summarization
glad you like it
i was looking for a basic form logistic regression model using algorithmic modeling. thanks you very much . i like your video
Glad I discovered this video.))))))
glad to have you here :)
I have 2 qs:
1.why we are transposing x(i checked from numpy documentation it is used to change the dimension, but i cannot get the point here)
2.how we r getting the summation without applying np.sum
Can you please ans ?
so to evaluate test data we should not use fit_transform. ....... transform only requires??
Guys for those wondering the gradient descent used here is same as of linear regression so the answer is the derivative of log loss will have same value as X.t*wieghts + bias including 1/n
Very nice Explanation ... Thanks
Is learning calculus a pre-requisite to this series -- I am learning, but feel a bit lost when it comes to the implementations because it is difficult for me to understand the underlying mathematical concepts. I do appreciate the videos!
A little bit of calculus would be good here. Have a look at some free math courses here: www.python-engineer.com/posts/ml-study-guide/
this is great thank you!!!
When I code and run this model on the "advertising data set" from Kaggle the accuracy is only in the 40-50% range while the sklearn LogisticRegression model is over 90%. I've tried varying the number of iterations and learning but I can't get an accuracy score above 50%.
Please not that this code is not optimized at all. You can try to apply feature scaling before training.
I wonder how can I plot the logistic regression line calculated from that (the boundary )
Love your video!!!
Thank you!!
Sir, my code(sigmoid function) is giving exp overflow error in its iteration.How can I overcome it?
this is because your x has a very high number, so np.exp(-x) is getting too high for your datatype. you could cut your x in this case and set a maximum value, or try to use np.float64 instead of np.float32
congrat because lot of people do not do it from scratch
hello, in which environment the python codes are written ?
Thank you very much for your video. I wonder why you are not checking your model at each iteration and returning the model with lowest error, instead of returning the model with the last w and b parameters of the for loop.
good point. I wanted to keep it simple here, but in practice of course you can/should check for the best model
just started learning this and try running the code on jupyter notebook, It keeps saying no module named logistic regression
it might be stupid one, but please let me know why it's happening
Import statement might be different when using Windows and/or jupyter notebook. try from .logistic_regression import LogisticRegression (with the dot)
Hi bro,
you have used squared loss function . But logistic regression has log loss .If we derivate square loss wrt to w and b ,do we get same as log loss derivative?
Hi. The loss I'm showing is the log loss (or better known as cross-entropy). However, the gradient is the same as for square loss in this case. You can check the further readings I provided in the description for a detailed gradient calculation :)
No you have used the linear regression cost function you have shown log loss but the derivative answer is of linear regression
thank you so much.
glad you like it!
Thank you sir. if we want to use elastic net regularization along with this logistic regression.....how should we approach?
Nice video, thanks a lot. Very good compared to comparable ones that i have looked at.
Thanks! Glad you like it
Lovelyyyyy. Cheers!
Hi your videos are just awesome! One question: the iterations of the fit method's for loop should correspond to neural network's hidden layers . it's true?
No, this training loop here has nothing todo with neural networks, it's simply how long this optimization should try to improve. However, you can compare it with the number of epochs when training a neural net
Great insight thank you. It would be even better if you could have shown us the raw data and just explained the variables and what exactly we were trying to predict etc....thanks
Hello, please is there any way we could have access to the JUPYTER NOTEBOOK you made reference to in this video.
Hi, not yet, but I'm planning to release them on my website soon.
How implement one vs rest from scratch and intigrate with logistic regression
I have questions regards random_state, some sources set it to 42 or 95 and when I changed this number, accuracy change as well, for example in the make_blobs dataset if I set it to 95, the classifier gave a good accuracy(~99%) but when I set it to 42 it gave around 88%. Also, I got this error (RuntimeWarning: overflow encountered in exp return 1 / (1 + np.exp(-x)) with exp when I change the learning rate value.
random state let's use reproduce your result. It does not matter which number you use (some people just like 42) . It affects the training and test samples. some splits work better than others. if your x is too large, you can get an overflow because exp(-x) is too large. try to clip your x to a maximum, or try to use datatype float64 instead of float32
@@patloeber Thank you very much, I will try these solutions, I tried absolute (abs) but the plotting of loss function look wired (got values under zeros)
what about loss function?
This is a great tutorial! Thank you!
Thank you! Glad you like it :)
@@patloeber I am reading the book '"Data Science from scratch, Joel Grus" which is really good too, but sometimes I have some trouble understanding the use of some formulas. Your channel is great because it has the same approach "from scratch" and I think it is really useful for those who are interested in the statistics/math behind all the magic of ML
@@parismollo7016 Yes some topics can be challenging, but keep going! I'm happy if this is helpful :)
Can you explain me why i am getting this error
ValueError: not enough values to unpack (expected 2, got 1)
def fit(self, X, y):
---> 12 n_samples, n_features = X.shape
in this 12 number line
Edit : when i am doing this with normal logistic regression function from sklearn it works but why not with the one we created
because sklearn can handle incorret shapes and then transforms it to the correct one for you. here you have to do this yourself. X does not have the correct shape here
Excellent!
Hi...i am using the below code line to update weight and bias but is giving me the error..could u please help
here is the code.
w = w - (alpha_lr_rate * dw)
b = b - (alpha_lr_rate * db)
where w = np.random.normal(loc=0.0, scale=1, size = X.shape[1])
b=0
error: operands could not be broadcast together with shapes (15,) (15,37500)
Things are explained much more elaborately
How can I plot the sigmoid curve the same way you plotted the fitted line in Linear Regression at the end?
sigmoid = lambda x: 1 / (1 + np.exp(-x))
x=np.linspace(-10,10,100)
fig = plt.figure()
plt.plot(x,sigmoid(x),'b', label='linspace(-10,10,100)')
@@patloeber Thank you! I also added X and y to the plot to see how they go with the sigmoid curve. Works like a charm.
Why use gradient decent ... can't we make the derivative with respect to the parameter equal to zero and find out the parameter w and b ....by solving the equations... please answer this question I really need the answer
Good question! Gradient descent is an iterative approach to this solution. In theory your analytical method is optimal. However, in practice, this requires solving complex equations which is too expensive for higher dimensions. Moreover, in the real world, many cost functions don't have valid derivatives everywhere.
Is there any resources for this with you ....any blog which can be read....I read the blog on linear regression but could not find the same method for logistic regression
please bro do u have any blogs for this
@@thecros1076 I have a blog at python-engineer.com. Unfortunately, at this moment I do not have articles for the machine learning tutorials, but they will be added in the future
It is very hard to equate the derivative to zero and find minima value because in machine learning optimisation derivatives will be in complex forms. let say if you have derivative of some function as 5x+5, you can equate this to 0 and find minima for inputting x values. So that's why GD/SGD algorithms are used in ML
Where is the entropy loss implemented in this code?
hi , i am getting a runtime warning
RuntimeWarning: overflow encountered in exp
return 1/(1+np.exp(-x))
0.8947368421052632
what should i do to avoid this?
you probably have values in x with very large negative numbers. Try applying a standard scaler
@@patloeber you are the best thnkx..
@@patloeber thankx that solved the issue..
Very good!
thanks!
You save me. thankyou
Where is the loss applied pleased?
I used this Logistic Regression model algorithm for prediciton of disease and while pickling i got this error for this model . Can you please explain what kind of error is this and how to overcome . Plz help me out
UnpicklingError: invalid load key, '\xe2'.
I guess you try to unpickle something that has not been pickled correctly...
Hi, Can this algorithm be extended to a multi class problem?..
Yes it can. For multinomial logistic regression you have to use the softmax function instead of the sigmoid function to approximate the y. So y_predicted = self._softmax(linear_model). Furthermore you have to apply the cross-entropy as loss function and then calculate the gradients. Then again you can use the update rule with the correct gradient: self.weights -= self.lr * gradient
Hi
I wrote your code and tried applying on the below custom dataset. But it raised an error. and i also tried reshaping the vector into 3 features that also gave me error it only work on the code you gave why is that????help me
again i tried loading boston dataset and tried your code after that also I faced en error .could you tell me why is that
X = np.arange(10)
X_train = np.arange(7)
y_train = np.arange(7)
X_test = np.array([8,9,10])
y_test = np.array([8.5,9.5,10.5])
your X does note have the correct dimension! You have to add one more axis to X_train and x_test: X_test= X_test = np.array([[8,9,10]]). Or use np.newaxis to add a new one (have a look at my new numpy tutorial there I show this.)
@@patloeber thank you so much for your response. I will use your suggestion. I will share my inputs with you , then you can show me where and why i am making the mistake. That way i will get more insight
-(wx+b) instead of -wx+b
TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'
I follow the exact thing you did, yet I get an error that says “object has no attribute ‘sigmoid’ although I typed the exact thing? In addition your code in video and in github is different and needs updating example learning rate with lr :)
Compare again with code on GitHub? There must be a typo somewhere. Do you use ‘self’?
Python Engineer i copied exactly!! Where is the code on github. Can you kindly provide link.
github.com/python-engineer/MLfromscratch
Python Engineer Thank u I got the link. I love your videos by the way!
Thanks!
Could you please tell me how to do the logistic regression with L2 regularization?
Hello. Please check out my video about SVM to see how a regularization term is applied. Basically you add the regularization term to your cost function, and then calculate the gradients. You also have to use a regularization parameter to balance the effect of the regularization during optimization. You can also check out this code: github.com/pickus91/Logistic-Regression-Classifier-with-L2-Regularization
@@patloeber Doesn't the bias term do the same thing as a a regularization term though?
how we can update it to multiclass version, more 2 lable
?
Hello. For multinomial logistic regression you have to use the softmax function instead of the sigmoid function to approximate the y. So y_predicted = self._softmax(linear_model). Furthermore you have to apply the cross-entropy as loss function and then calculate the gradients. Then again you can use the update rule with the correct gradient: self.weights -= self.lr * gradient
@@patloeber Hello, thank you for putting these great tutorials online. We're earning a lot from them.
I have implemented the Logistic regression using softmax function as you described above, but its return a score/ accuracy of zero on the iris dataset.
for i in range(self.n_iterations):
model = np.dot(X, self.weights) + self.bias
y_predicted = self.softmax(model)
dw = (1/n_samples)*np.dot(X.T,(y_predicted - y))
db = (1/n_samples)*np.sum(y_predicted - y)
self.weights -= self.lr*dw
self.bias -= self.lr*db
def predict(self, X):
model = np.dot(X, self.weights) + self.bias
y_predicted = self.softmax(model)
print(y_predicted)
return y_predicted
def softmax(self, x):
return np.exp(x) / float(sum(np.exp(x)))
@@dattijomakama9703 same it doesnt work for me too. My model is predicting only one class despite having a balanced dataset.
EDIT: Run it for about 1000 iterations, it worked for me.
Can I use this in writer identification??
Can U respond fast
Yes LR can be used for classification tasks
We dont need to define accuracy function. We can use sklearn.metric.accuracy_score instead
Of course you can. But I want to implement it from scratch ;)
@@patloeber my bad :P I forgot about main goal of the playlist
@@damianwysokinski3285 No problem. It's actually good that you know about these sklearn functions :)
Why do we use the bias?
The approximation is w*x + b. You can think of this in 2D, then this is equivalent to a line equation m*x + t. The bias is the intercept. It can shift the whole data up or down. So if your data is not centered around the origin, then we need to shift it to get the correct prediction. Hence we also try to learn the bias/intercept.
i think you use the linear Regression cost function not the logistic Regression cost function in your code
Great job. Your code is so concise and logical, but does require a solid background. Just wish I got better accuracy. I used it for the "Titanic" dataset on Kaggle and could only get 66%. Thats the lowest of all the models I have tried. Monte Carlo Markov Chain gave me the best so far at 78%. Any idea of how I can get a better score?
Thanks! Yes the model is not optimized at all. Titanic dataset is all about cleaning and preprocessing your data, so maybe that could improve it :)
SHOUDNT DW HAVE AN NP.SUM OUTSIDE AS WELL, YOU HAVE SUMMED UP THE DB's BUT NOT THE DW's
the dot product already includes a sum (np.dot applied for dw)
I'm a bit confused the teacher showed us a different way where the gradient descent is calculated by an ugly formula that involves logarithm
the cost function I showed also involves the log. Not sure which final formula your teaches uses, but is is very common to keep the logarithm in order to avoid overflow for large numbers...
Logistic Regression reduces Log Loss. U r reducing Square Loss... Why it is so?
Hi. The loss I'm showing is in fact the log loss (or better known as cross-entropy). However, the gradient is the same as for square loss in this case. You can check the further readings I provided in the description for a detailed gradient calculation :)
Very nice!
thanks!
i thank you good sir !