Normalization Vs. Standardization (Feature Scaling in Machine Learning)
ฝัง
- เผยแพร่เมื่อ 30 มิ.ย. 2024
- In this video, we will cover the difference between normalization and standardization.
Feature Scaling is an important step to take prior to training of machine learning models to ensure that features are within the same scale.
Normalization is conducted to make feature values range from 0 to 1.
Standardization is conducted to transform the data to have a mean of zero and standard deviation of 1.
Standardization is also known as Z-score normalization in which properties will have the behavior of a standard normal distribution.
Check top-rated Udemy courses below:
10 days of No Code AI Bootcamp
www.udemy.com/course/10-code-...
Modern Artificial Intelligence with Zero Coding
www.udemy.com/course/modern-a...
Python & Machine Learning for Financial Analysis
www.udemy.com/course/ml-and-p...
Modern Artificial Intelligence Masterclass: Build 6 Projects
www.udemy.com/course/modern-a...
AWS SageMaker Practical for Beginners | Build 6 Projects
www.udemy.com/course/practica...
Data Science for Business | 6 Real-world Case Studies
www.udemy.com/course/data-sci...
AWS Machine Learning Certification Exam | Complete Guide
www.udemy.com/course/amazon-w...
TensorFlow 2.0 Practical
www.udemy.com/course/tensorfl...
TensorFlow 2.0 Practical Advanced
www.udemy.com/course/tensorfl...
Machine Learning Regression Masterclass in Python
www.udemy.com/course/machine-...
Machine Learning Practical Workout | 8 Real-World Projects
www.udemy.com/course/deep-lea...
Machine Learning Classification Bootcamp in Python
www.udemy.com/course/machine-...
MATLAB/SIMULINK Bible|Go From Zero to Hero!
www.udemy.com/course/matlabsi...
Python 3 Programming: Beginner to Pro Masterclass
www.udemy.com/course/python-3...
Autonomous Cars: Deep Learning and Computer Vision in Python
www.udemy.com/course/autonomo...
Control Systems Made Simple | Beginner's Guide
www.udemy.com/course/control-...
Artificial Intelligence in Arabicالذكاء الصناعي مبتدئ لمحترف
www.udemy.com/course/artifici...
The Complete MATLAB Computer Programming Bootcamp
www.udemy.com/course/the-comp...
Thanks and see you in future videos!
#featurescaling #normalization - วิทยาศาสตร์และเทคโนโลยี
this is by far the best explanation I've come across. So simple to understand. Thank you Prof. You just earned a follower!!
King !!! very good explantation. I watched multiple videos on yt and i asked Chatgpt many questions but now after your video i finally understand it
Great explanation however i think saying scaling is not required for distance based algorithm is wrong as these algorithm are most affected by the range of features. Can you comment on this.
I think the same
Exactly! Scaling is crucial for distance based algorithm.
I believe that such as in the case of k-means, the algorithm calculates distances based on column versus same column as opposed to a neural network were each column can have a impact on target output.
As distances are measured in the same scale (column x column), of course one feature is going to affect more clusterization {for instance}, but that's the point of k-means, we want to see which features describe data distribution across dimensions.
I am really grateful for your detailed explanation! I am self studying machine learning this summer holiday. And I am at this point now. I am so confused before watching your video. Now I finally understand this point. Thank you so much!
Hi Professor, thank you so much for this video! Clear and concise you have no idea how much I needed this. Keep up the great work, I will be sure to check out your other videos as well 😊
Impressed with your way of teaching. You are explaining very well with the right examples... awesome work of you...
One small request is that in your playlist sequence of 'Artificial Intelligence, Machine Learning, and Deep Learning' is jumbled, please keep the playlist in order for easy learning.
Thank you so much Professor Ryan. You just made my life easy. best explanation. so simple to understand even for someone who doesnt have a background knowledge in machine learning.
This was pretty clearly explained.
For anyone else looking for this, the standardization chapter begins at 6:49.
Your explanation is as amazing as a rainbow cloud after a thunderstorm!!! I'm so glad I found this visual explanation!
Well explained about standardization and normalization.Now i got full clarity on these topics.Thanks for taking this effort and explaining in this way.
Amazing Explanation.. Just in one run, i get your whole point in an easy way. Big Thanks
This professor is so pleasant for all senses. Thanks for sharing knowledge selflessly :)
Thank you so much! I couldn't wait to end this video before thanking you ! you made it super clear.
This was such a crystal clear explanation! Thank you so much sir!
Came here from your udemy course. You are a life saver, prof!
Many thanks for this video... One of the best explanations ever seen by me
A great scientist and teacher. keep it up, sir. thank you.
amazing video! clearly explained! Congratulation Professor !
For ML context : if data is following gaussian distribution ( bell shape) follow standard deviation else go with normalisation ( improves cluster scaling as well).
thanks a lot ...worth watching..u explanined each concept in a simple way...
Great video, would say we need scaling for distance-based as it will get wrong results if features are on different scales. We don't need scaling for tree-based as they are not susceptible to variance.
Awesome. I understand finally. Very good explanation. Easy to follow
Thank You Leonard Hofstadder..🙂
Hahaha thanks ❤️😂
Fantastic Explanation Sir ! Thanks so much !
Many thanks for this video... One of the best explanations
the outlier thing is so crucial actually damn, i havent seen this is in a machine learning course before, banger
Fantastic explanation ! Thank you so much.
thank you very much, I can't pass without thanking you and subscribe for the clarity you gave me on that topic
clear as a crystal, thankyou
Great explanation. Thank you very much, Sir!
Thank you for your best explanation as easy to understand
Such a great explanation. Thank you very much
Awesome explanation. Thank you!
Very Clear Explanation.
Thank you :)
This is my first time that I am watching your video.. You look very ..very much similar to Saif Ali Khan.. In fact the smile is also same. One like vote from me. A gentle smile on face make you different from all the others.
Thank you for the clear explanation!
Amazing explanation!
good. clearly explained. thanks
The best simple explanation ever
thank you boss man, just used normalization instead of standardization, life saver
Outstanding content.
Excellent explanation.
Awesome explanation for a beginner like me. Wish I had access to the S&P 500 dataset.
Thank you so much, Prof!
Hello Professor, Video was able to explain the concepts and its practical implementation in a concise manner. Awesome work
Many thanks!
Excellent thanks!!!
Informative!
Gr8 explanation!!!
Appreciated it, Thanks.
Excellent!
Great explanation!! Could you say more about when the input is image datasets - like CNNs?
Great job 👏👏❤
Thank you Prof!
hey professor, that was a very cool and simple video to follow and understand, could i ask for where i cold find the notebook you used at the end to use?
Thank you it is helpful
Thanks for sharing ❤
Hi Prof. Ryan,
Thank you for explaining the subject in a simple manner.
I have a Human Resources situation at hand. We have an employee appraisal system and the rating is on a 6 point scale (ranging from Poor performer to Outstanding performer). We have 15 departmental heads who rate their respective team members on this 6 point rating scale.
However, there are immense biases that creep in during evaluation. Also, some evaluators are tougher/lenient than others. Consequently, we end up with different ranges/averages.
As the ratings are linked to incentives, sometimes, good performers lose out against their peers in other departments.
I intend to eliminate this bias/lack of neutrality which have been rated by 15 different departments (for 1000 employees). Can you suggest how I should go about this situation please.
Regards...Muralidhar
Thank you!
Can you show an example of scaling with train test split? Do you scale the train and test data with the same scaler?
Thank you for this video
My pleasure
SUPEEEEEEEER clair. thanks
Helpful!
Superb illustration.
Thank you so much 😀
@@professor-ryanahmed You're welcome, Prof. Please, the link to the dataset?
Can you share the dataset you used for this demo pls?
رائع .. متميز
حبيبي يا بروف
Good theoretical explanation.. but I think scaling is used for k means, knn
thx Prof
top explanation along with code, can you upload the notebook file with each video u explain . thanks
For supervised algorithms, can we used both as data input ?
as always: outstanding! Your enthusiasm is inspiring... On the other hand, it is clear why tree-based algorithms do not require feature scaling. However, distance-based algorithms such as K nearest Neighbors and K-means require Euclidean Distance calculation which means that feature scaling is necessary with them. Am I wrong?
I think you should scale features for K-means and K-nn. Think about it intuitively. If you are looking at two points and their x y (feature) distances, how would you want to define their closeness? Do you want their features to be considered equally when calculating your distance or is one feature more important then the other ? If you want both x and y to be considered on equal playing fields, then you should scale them so that the distance computed reflects their importance.
Scale each feature by the method that makes more since to that feature. This is most likely [0 to 1] across samples.
Question, what if our model encounters bigger value than what we had in training data? How do we handle that
you've not missed a single base brother. what an explain
He used to be on Stemplicity as well.
Firstly, I like very much your explination.
Secondly, I would like to know, how do you plot the row and rescalled data? Do you use the histograms function from pandas?
Thank you very much and keep working so on!
I have all ready founded. :D
import seaborn as sns
sns.pairplot(df)
I'm finding a lot of sources are saying feature scaling is advised when using k nearest neighbours. Is there more nuance to this point? Is scaling required after all?
Please, where's the link to the dataset? I'd really appreciate if you can paste it here, Prof. Thanks a lot.
top of the top
Good
thx
The best, marhaba
Just came from a KMeans clustering course that demonstrates how normalization results in better clusters. But at 11:40, you say KMeans clustering doesn't require standardization or normalization. I'm confused.
could you put a link to the csv file so we can download and try the exercise ourselves please?
I really liked your explanation, thanks
P.S.
Are you Egyptian?
I mean your accent is perfect, but your pauses while speaking give the intuition that you're from the Great Egypt.
11:40 wait I am confused now, because I thought that since the distance of the data is so important in algorithms such as kNN, SVM etc. scaling is a MUST pre-process step, but now you are saying that it is not required ? Could you please clarify this ?
Can you please share the github repo link for accessing the data files used in the video
A whole semester in 20 minutes
At 11:27 you mentioned in last bullets that scaling is not required for K-NN and SVM is not correct. K-NN and SVM exploits distances or similarities they do require scaling.
very true!!
How to apply z score normalisation in live data ??? 🙏🙏🙏
Could've added this into your udemy course
where can I get the dataset?
Thank you. Where I can download the notebook code?
I also have this question
distance-based methods assume that features are normalized?. feature scaling is required?. please confirm that?.
tree-based does not need scaling
dataset please
Dear Rayan, how to test a scaled data model.
i used this way the predict value is very different
X_testing=np.array([[550,440,110,0,0,0,0.33,400,8.8,0,863,771]]) #78.6
ypred=model.predict(scaler.fit_transform( X_testing)) # predicted should be 78 , but i got [[0.17291696]]
also without fit_transform also the value is different.
many thanks for you replay.