I think its better to mention how much variance you want keep rather then mentioning number of components. For eg - PCA(.80) # this will maintain 80% variance and will create necessary principal components to keep 80% variance. Hope this is helpful
Thanks for the video krish, But wondering, fresher like me would get puzzled in so many techniques of doing feature selection, it would be great if you tell us which feature selection technique to be used and when.. Regards Pritam
I have some doubts...first can we apply pca for categorical data? Second i wish to know as to how can we calculate the optimum number for n-components? Do we have to calculate the variance explained by manually trying out different values for n-component?
You can use np.cumsum(variance) to see how many components are explaining how much variance. Let's say 7 components are explaining the variance by 80% so use these 7 features for your model.
Using PCA the number of Dimensions can be reduced, but can you pls tell us on what basis these Dimensions/variables are reduced? Is it the Entropy value? or some other things....
Best explanation of PCA . Could you please make an video on Linear Discriminant Analysis. Also please explain the Eigen vector and Eigen value concept behind PCA.
For each variable if orthogonal line gives huge loss of variance, then for all the 30 features, we can only take the 1st component right? Why do we have to even consider the 2nd component? Please provide your insights.
thanks for your videos i'm learning alot from you. can you prove that when you increase the number of dimensions the model accuracy decreases. again is it necessary to reduce my dimensions if i have few dimension like 5D. will i still improve my model if 5D was reduced to 2dimensions.
Hi Krish, nice way you explained it. Thanks. But I have one question, how we can find out the efficiency for PCA, for example, how we can compare that reducing to 2 dimensions is not fruitful as reducing to 3 dimensions. in other words, how can we be sure that not much information is lost by PCA..
hi I have question what if we have features more then 100 , then how we will decide how many n_component we will take . I mean is there any methodlogy to decide n_component
Step 1: Do PCA with n_components = none step 2: Now view the explained_variance_ratio for default say 10 PCA components step 3: Find out the maximum variance explained by summing how many numbers of components that is (n_components). Example : By default say u have 5 components:(0.70,0.10,0.08,........) Now your first 3 components can explain 88% of your total variance hence with this u can decide PCA(n_components = 3)
Hi Krish, Your videos are very useful, thank you for the videos, I have a doubt, the reason we are dng pca is to reduce the number of features right??... So how we wil know which are the features from the given data are useful while applying different models on our data?
We can check with the help of Corelation if there is some corelation whether it might positive or negative if it has corelation than we could say that feature will be useful for us. Coming to PCA its not what you think, PCA Comesup with a value using the existing variables and we use that PCA derived variable for analysis.
Hi, How to choose the correct n_components during PCA. For eg. I have 80 features in the dataset how do I choose the n_components. Is there any logic to select the number of components.
hello sir , I have two columns ID_code , target and There are total 200000 observations in the dataset and 202 features. how can i apply pca to this dataset. all data in numeric
Thanks for the video Krish. One question:How we come to knw which feature we have to select as principal component and why during scatter plotting other feature do not work in place of cancer['Target']
Hello sir I have a question....how can we sure that we have to apply 2 PCA....why we are neglecting other features bcz somehow may be other features are important for the model...????Plse answer this sir.....I'm in doubt
We are not neglecting any features , PCA by no means that we are discarding some of the features to reduce the dimensions. In PCA we are generally creating linear relationship among all the features. and finally those number of principal components who can explain the maximum variance are selected.
Thanks for the nice video, I had one doubt. So how do we decide when to apply PCA? Let's say when the features are 2, 3, or more than 3. Is there any constant number of features for that and can you explain the math behind it? Kudos and cheers mate!
I would suggest not applying PCA based on a number of features. Instead, we should apply PCA in the following scenarios: 1. to reduce the memory space for the data set. 2. to improve the learning speed of the algorithm. 3. to visualize high-dimensional data to 2d or 3d plots.
The features could be a transform of other features mapped on new axis. You can't say there is one to one mapping to original ones. When you predict it is important you apply PCA on predict input before passing to ml algo
Still confused.... I didn't understand ...when we plot a perpendicular line and project the points on that line...it means we are creating one feature ...so what will be the values for that features?
One way of stating the goal of PCA is to find the linear projection that gives you the "best" representation of your data for a given dimensionality. It defines "best" by the representation with the minimal squared reconstruction error. When looking at PCA from 2 dimensions to 1 dimension, as you do there, you are not actually trying to find the line that best predicts y from x. Rather, you're trying to find the combination of y and x such that the new, combined value "best" represents all your initial 2-D points. Essentially, the reason PCA considers the perpendicular distance is because it doesn't actually try to model yy as a function of xx.
Sir, one question. Do we always use only Standard Scaler before PCA, even if some of the features are highly skewed. Or can we use robust scaler in that case?
Hi Krish, great video, I have one question, How do you decide the n_components value? Is there an ideal value or should it be decided based on the initial number of features?
Step 1: Do PCA with n_components = none step 2: Now view the explained_variance_ratio for default say 10 PCA components step 3: Find out the maximum variance explained by summing how many numbers of components that is (n_components). Example :By default say u have 5 components:(0.70,0.10,0.08,........) Now your first 3 components can explain 88% of your total variance hence with this u can decide PCA(n_components = 3)
There is something called scree plot. Read about it. You actually pick up the pc number where your variance explained becomes constant. It is drawn with respect to cumulative variance explained on y axis and pc number on X . Some people also use Eigen values instead of variance explained. Give you the same thing. @krishnaik wanna comment?
Nice video. can you also apply a linear agression on top of the pca and show some sample? i mean, do a test, train run and predict? just to see how it works?
So ... if we apply PCA to reduce the number of dimensions of our dataset, and then create a model to predict a class (as in the cancer dataset), what happens if we receive information from a new patient and we need to make the classification? In other words, how do we handle the new data given to us in the original format (all features) if our classification algorithm is based on the new variables of the PCA?
Hello Linux Tubers , that depends on how much variance you want to capture after dimensional reduction (more variance == preserving more information).. I created a script for this , might be helpful to you in you are using MATLAB -- th-cam.com/video/iMHTgwTFJjQ/w-d-xo.html Happy Learning :-)
Hello sir I have a question, PCA....why we are neglecting other features because somehow may be other features that are important for the model...???? On what bases the particular column is selected Please answer this sir.....I'm in doubt
Here the original data is 2 Dimensional. So after applying PCA we cannot go beyond the original number of dimensions. So N=2 is the maximum PCA can generate. To get more number of dimensions than the original dataset through transformation is something we can get through Kernel Function. Kernel function which are used in Kernel SVM functions are used to project the data in a higher dimension so that they will have a clearly separable boundary so a linear classifier boundary can be drawn with margin for SVC classifier. Hence, PCA is for compression and Kernel is for the opposite
This is one of the best videos on Internet for this topic.
Can't thank you enough sir.
Thank you Krish, for the concise and clear explanation!
I think its better to mention how much variance you want keep rather then mentioning number of components. For eg -
PCA(.80)
# this will maintain 80% variance and will create necessary principal components to keep 80% variance.
Hope this is helpful
Can we use always PCA for creating our ML model?
@@alankarshukla4385 No we can not always use PCA , we only use it when we have to many number of features or variables.
@@rehansiddique1875 Why PCA is only applicable for Unsupervised models ?
@@manusingh9007 you can also use it with supervised model
Thanks for making this type of content. You explain things in a very clear and easy way
Sir, video is very helpful. Analysis is very helpful because analysis is very perfect
super Explanation Anna .
You rocked data science.
Great .Now I have completed my practice inside jupyter notebook successfully. Cheers
Excellent!! Your full channel is extremely helpful. Very well explained.
Fantastic. Thanks from Nairobi Krish.
Amazingly explained video sir keep it up.
Beautifully explained
Thanks for the video krish,
But wondering, fresher like me would get puzzled in so many techniques of doing feature selection, it would be great if you tell us which feature selection technique to be used and when..
Regards
Pritam
very good explanation. thank you so much !
Thank you for putting the video back :)
Thank you Krish * 1 million
Thanks for this video.👌👌👌
Thanks sir... it is great.....
Thanks for sharing!
Thank you so much Sir.
Thank you. This is very helpful.
excellent
I have some doubts...first can we apply pca for categorical data? Second i wish to know as to how can we calculate the optimum number for n-components? Do we have to calculate the variance explained by manually trying out different values for n-component?
You can use np.cumsum(variance) to see how many components are explaining how much variance. Let's say 7 components are explaining the variance by 80% so use these 7 features for your model.
Wonderful.. thank you for doing it sir
Thank you so much sir
Using PCA the number of Dimensions can be reduced, but can you pls tell us on what basis these Dimensions/variables are reduced? Is it the Entropy value? or some other things....
He explained it at first, the reduction is based on the projection of the data ..
You can think about it like the shadow of the data in 2 dimensions
Best explanation of PCA . Could you please make an video on Linear Discriminant Analysis. Also please explain the Eigen vector and Eigen value concept behind PCA.
For each variable if orthogonal line gives huge loss of variance, then for all the 30 features, we can only take the 1st component right? Why do we have to even consider the 2nd component? Please provide your insights.
should go into more maths and how its working... anyone can fit n transform..
This is super useful video Krish..
if datahaving catagarical variable what we have to do?
@krish naik, could you pls tell that whether PCA is learnable/ trainable?
Make a video on pcr
awesome content... amazing
Explained well!
Thanks for the nice representation with hands_on.
great explanation! Thanks Krish
can we check which two features are selected from 30?
how can you determine the optimal number of components you should reduce your features to? love your tutorials btw!!!
Generally, we use scree plot for this. You can plot it using the explained_variance_ratio_ method in pca in sklearn.
Super explanation sir.
tqsm sir.
thanks for your videos i'm learning alot from you. can you prove that when you increase the number of dimensions the model accuracy
decreases. again is it necessary to reduce my dimensions if i have few dimension like 5D. will i still improve my model if 5D was reduced to 2dimensions.
Hi Krish, nice way you explained it. Thanks.
But I have one question, how we can find out the efficiency for PCA, for example, how we can compare that reducing to 2 dimensions is not fruitful as reducing to 3 dimensions. in other words, how can we be sure that not much information is lost by PCA..
Why PCA rotates the axis? What is a significance of that?
hi I have question what if we have features more then 100 , then how we will decide how many n_component we will take .
I mean is there any methodlogy to decide n_component
Step 1: Do PCA with n_components = none
step 2: Now view the explained_variance_ratio for default say 10 PCA components
step 3: Find out the maximum variance explained by summing how many numbers of components that is (n_components).
Example : By default say u have 5 components:(0.70,0.10,0.08,........) Now your first 3 components can explain 88% of your total variance hence with this u can decide PCA(n_components = 3)
Hi Krish,
Your videos are very useful, thank you for the videos,
I have a doubt, the reason we are dng pca is to reduce the number of features right??... So how we wil know which are the features from the given data are useful while applying different models on our data?
We can check with the help of Corelation if there is some corelation whether it might positive or negative if it has corelation than we could say that feature will be useful for us. Coming to PCA its not what you think, PCA Comesup with a value using the existing variables and we use that PCA derived variable for analysis.
Super video😀
Thanks
great explanation. Need to get my hands dirty in Jupyter notebook. thanks
PCA is statistical technique first invented in 1901 by Prof R A Fisher
Hi,
How to choose the correct n_components during PCA. For eg. I have 80 features in the dataset how do I choose the n_components. Is there any logic to select the number of components.
I have the same question, searching for it and I found this
stackoverflow.com/questions/12067446/how-many-principal-components-to-take
Thanks
Great sir
What is the difference between PCA and SVD ???
hello sir ,
I have two columns ID_code
,
target and There are total 200000 observations in the dataset and 202 features. how can i apply pca to this dataset. all data in numeric
Is there any math behind selecting number of components in PCA
Thanks for the video Krish. One question:How we come to knw which feature we have to select as principal component and why during scatter plotting other feature do not work in place of cancer['Target']
Did we remove the target/ output feature from the data set before applying PCA?
@krish
no,
Hello sir I have a question....how can we sure that we have to apply 2 PCA....why we are neglecting other features bcz somehow may be other features are important for the model...????Plse answer this sir.....I'm in doubt
I also have same doubt
We are not neglecting any features , PCA by no means that we are discarding some of the features to reduce the dimensions. In PCA we are generally creating linear relationship among all the features. and finally those number of principal components who can explain the maximum variance are selected.
Hi Krish, your videos are great!! Thanks a ton :)
could you please explain PCA Much More Mathematically like explaining Eigean Vectors,eigean Values....
Thanks for the nice video, I had one doubt. So how do we decide when to apply PCA? Let's say when the features are 2, 3, or more than 3. Is there any constant number of features for that and can you explain the math behind it? Kudos and cheers mate!
Based on the requirement with certain no.of columns you need .
I would suggest not applying PCA based on a number of features. Instead, we should apply PCA in the following scenarios: 1. to reduce the memory space for the data set. 2. to improve the learning speed of the algorithm. 3. to visualize high-dimensional data to 2d or 3d plots.
super krish
This is a great content Krish. I don't understand how do we interpret the two features? Would someone please explain me the final graph?
The features could be a transform of other features mapped on new axis. You can't say there is one to one mapping to original ones. When you predict it is important you apply PCA on predict input before passing to ml algo
hi krish kindly do video on ARIMA Model
Still confused.... I didn't understand ...when we plot a perpendicular line and project the points on that line...it means we are creating one feature ...so what will be the values for that features?
How will I know what should I set as the value of n_components? How do you decide to reduce 30 features to "2" features only?
SAME PROBLEM I M FACING
Which system are you using??
How to find how many features obtained from an image if the size of image is 100*100?
why does pca actually require perpendicular lines for the second, third,... component?
One way of stating the goal of PCA is to find the linear projection that gives you the "best" representation of your data for a given dimensionality. It defines "best" by the representation with the minimal squared reconstruction error.
When looking at PCA from 2 dimensions to 1 dimension, as you do there, you are not actually trying to find the line that best predicts y from x. Rather, you're trying to find the combination of y and x such that the new, combined value "best" represents all your initial 2-D points.
Essentially, the reason PCA considers the perpendicular distance is because it doesn't actually try to model yy as a function of xx.
@@krishnaik06 thanks
can pca be used with Multiple linear regression??
Sir, one question. Do we always use only Standard Scaler before PCA, even if some of the features are highly skewed. Or can we use robust scaler in that case?
always use standard scaler. PCA basically picks out eigen vectors which work best with scaled numbers.
Hi Krish, great video, I have one question, How do you decide the n_components value? Is there an ideal value or should it be decided based on the initial number of features?
Step 1: Do PCA with n_components = none
step 2: Now view the explained_variance_ratio for default say 10 PCA components
step 3: Find out the maximum variance explained by summing how many numbers of components that is (n_components).
Example :By default say u have 5 components:(0.70,0.10,0.08,........) Now your first 3 components can explain 88% of your total variance hence with this u can decide PCA(n_components = 3)
There is something called scree plot. Read about it. You actually pick up the pc number where your variance explained becomes constant.
It is drawn with respect to cumulative variance explained on y axis and pc number on X . Some people also use Eigen values instead of variance explained. Give you the same thing.
@krishnaik wanna comment?
My question is why don't you consider PCA a ml technique. I have used PCA for unsupervised clustering and achieved amazing results
@@socially_apt yes true but it purely depends on data
@@swathys7818 You might have explained well but I didn't get what you said. Could you please elaborate a bit more. Thanks in advance.
How to convert the input values to two features to predict whether person has cancer or not
it is not predication problem.
Nice video. can you also apply a linear agression on top of the pca and show some sample? i mean, do a test, train run and predict? just to see how it works?
Why don't you start community on Slack?
That would be awesome
Sir , so as per your explanation can i say that while transforming an image we should select PC 1 only as there is less data loss in that.
for visualization in 2d..first two pc lines are enough....
@@bharathkumar5870 but how to decide pc1 line at the first.??
So ... if we apply PCA to reduce the number of dimensions of our dataset, and then create a model to predict a class (as in the cancer dataset), what happens if we receive information from a new patient and we need to make the classification?
In other words, how do we handle the new data given to us in the original format (all features) if our classification algorithm is based on the new variables of the PCA?
we have to apply pca on every new patient and then send it to model
we need to find totally independant components in those features which can only be obtained from eign values.
How to decide the feature number for pca?
Hello Linux Tubers , that depends on how much variance you want to capture after dimensional reduction (more variance == preserving more information)..
I created a script for this , might be helpful to you in you are using MATLAB --
th-cam.com/video/iMHTgwTFJjQ/w-d-xo.html
Happy Learning :-)
Thanks Bhai..
clear explanation and keep uploading video like this
Hello sir I have a question, PCA....why we are neglecting other features because somehow may be other features that are important for the model...???? On what bases the particular column is selected Please answer this sir.....I'm in doubt
he never reply..
Pca vs feature selection?
In this playlist, before video is private. Can we have it?
How to decide n_components?
Without finding Eigen value & Eigenmatrics, how can you determine that n_compnents = 2?
Why krish has not answered this, actually this is the good question
i think you could try to plot the scree plot so that you know how many components are representative enough
Here the original data is 2 Dimensional. So after applying PCA we cannot go beyond the original number of dimensions. So N=2 is the maximum PCA can generate.
To get more number of dimensions than the original dataset through transformation is something we can get through Kernel Function. Kernel function which are used in Kernel SVM functions are used to project the data in a higher dimension so that they will have a clearly separable boundary so a linear classifier boundary can be drawn with margin for SVC classifier.
Hence, PCA is for compression and Kernel is for the opposite
Can you do PCA based on regression problem? Because Im curious how different it is if implement PCA in supervised learning as I saw in some articles.
Same I am Thinking
@@manusingh9007 i have seen from the mathematics side of it and seems to be quite confusing. Hope somebody would do more vdeo about this
how i find this actual dataset?
I have a doubt that whenever we have many features in any dataset do we have to use PCA compulsory?
Yes
why we don't use PCA in every project to reduce the dimensions.. when to apply PCA??
No we can not always use PCA , we only use it when we have to many number of features or variables.
PCA also used some computaion in the background.if the features are less it will not give much benefits
Good crisp explanation but not detailed.
bro i think you know tamil if possible explain only one video in tamil- aravind
Thanks