Dimensional Reduction| Principal Component Analysis

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ก.ย. 2024
  • Here is a detailed explanation of the Dimesnioanlity Reduction using Principal Component Analysis.
    Github link: github.com/kri...
    Please subscribe the channel
    / @krishnaik06
    Machine Learning Playlist: • Data Science and Machi...
    You can buy my book where I have provided a detailed explanation of how we can use Machine Learning, Deep Learning in Finance using python
    Packt url : prod.packtpub....
    Amazon url: www.amazon.com...

ความคิดเห็น • 134

  • @shushantgambhir2002
    @shushantgambhir2002 3 หลายเดือนก่อน

    This is one of the best videos on Internet for this topic.
    Can't thank you enough sir.

  • @kamalkantverma6252
    @kamalkantverma6252 5 ปีที่แล้ว +2

    Thanks for making this type of content. You explain things in a very clear and easy way

  • @vineetsansi
    @vineetsansi 5 ปีที่แล้ว +7

    I think its better to mention how much variance you want keep rather then mentioning number of components. For eg -
    PCA(.80)
    # this will maintain 80% variance and will create necessary principal components to keep 80% variance.
    Hope this is helpful

    • @alankarshukla4385
      @alankarshukla4385 4 ปีที่แล้ว

      Can we use always PCA for creating our ML model?

    • @rehansiddique1875
      @rehansiddique1875 4 ปีที่แล้ว

      @@alankarshukla4385 No we can not always use PCA , we only use it when we have to many number of features or variables.

    • @manusingh9007
      @manusingh9007 4 ปีที่แล้ว +1

      @@rehansiddique1875 Why PCA is only applicable for Unsupervised models ?

    • @rehansiddique1875
      @rehansiddique1875 4 ปีที่แล้ว

      @@manusingh9007 you can also use it with supervised model

  • @kevinkennynatashawilfredpa9023
    @kevinkennynatashawilfredpa9023 10 หลายเดือนก่อน

    Thank you Krish, for the concise and clear explanation!

  • @bea59kaiwalyakhairnar37
    @bea59kaiwalyakhairnar37 2 ปีที่แล้ว

    Sir, video is very helpful. Analysis is very helpful because analysis is very perfect

  • @pritamgorain8365
    @pritamgorain8365 4 ปีที่แล้ว +4

    Thanks for the video krish,
    But wondering, fresher like me would get puzzled in so many techniques of doing feature selection, it would be great if you tell us which feature selection technique to be used and when..
    Regards
    Pritam

  • @srashtisingh1799
    @srashtisingh1799 2 ปีที่แล้ว

    Excellent!! Your full channel is extremely helpful. Very well explained.

  • @AkshaykumarPatilAkki
    @AkshaykumarPatilAkki 4 ปีที่แล้ว

    super Explanation Anna .
    You rocked data science.

  • @sunilc8684
    @sunilc8684 4 ปีที่แล้ว +1

    Best explanation of PCA . Could you please make an video on Linear Discriminant Analysis. Also please explain the Eigen vector and Eigen value concept behind PCA.

  • @manojnahak7776
    @manojnahak7776 4 ปีที่แล้ว +5

    Using PCA the number of Dimensions can be reduced, but can you pls tell us on what basis these Dimensions/variables are reduced? Is it the Entropy value? or some other things....

    • @abul4933
      @abul4933 2 ปีที่แล้ว

      He explained it at first, the reduction is based on the projection of the data ..
      You can think about it like the shadow of the data in 2 dimensions

  • @adityasingh788
    @adityasingh788 5 ปีที่แล้ว +2

    Thank you for putting the video back :)

  • @jazzorcazz
    @jazzorcazz ปีที่แล้ว

    very good explanation. thank you so much !

  • @hassamsiddiqui3373
    @hassamsiddiqui3373 ปีที่แล้ว

    Amazingly explained video sir keep it up.

  • @sandipansarkar9211
    @sandipansarkar9211 4 ปีที่แล้ว

    Great .Now I have completed my practice inside jupyter notebook successfully. Cheers

  • @soumendradash5979
    @soumendradash5979 4 ปีที่แล้ว +3

    I have some doubts...first can we apply pca for categorical data? Second i wish to know as to how can we calculate the optimum number for n-components? Do we have to calculate the variance explained by manually trying out different values for n-component?

    • @raj_harsh_
      @raj_harsh_ 2 ปีที่แล้ว

      You can use np.cumsum(variance) to see how many components are explaining how much variance. Let's say 7 components are explaining the variance by 80% so use these 7 features for your model.

  • @justusndegwa
    @justusndegwa 2 ปีที่แล้ว

    Fantastic. Thanks from Nairobi Krish.

  • @jongcheulkim7284
    @jongcheulkim7284 2 ปีที่แล้ว

    Thank you. This is very helpful.

  • @the_imposter_analyst
    @the_imposter_analyst 5 ปีที่แล้ว +2

    how can you determine the optimal number of components you should reduce your features to? love your tutorials btw!!!

    • @rajanbalki7553
      @rajanbalki7553 3 ปีที่แล้ว +1

      Generally, we use scree plot for this. You can plot it using the explained_variance_ratio_ method in pca in sklearn.

  • @pankajgoikar4158
    @pankajgoikar4158 2 ปีที่แล้ว

    Thank you so much Sir.

  • @osho2810
    @osho2810 2 ปีที่แล้ว

    Thanks sir... it is great.....

  • @betanapallisandeepra
    @betanapallisandeepra 2 ปีที่แล้ว

    Wonderful.. thank you for doing it sir

  • @gurinderpartapsingh8694
    @gurinderpartapsingh8694 4 ปีที่แล้ว +2

    Hello sir I have a question....how can we sure that we have to apply 2 PCA....why we are neglecting other features bcz somehow may be other features are important for the model...????Plse answer this sir.....I'm in doubt

    • @varunkukade7971
      @varunkukade7971 4 ปีที่แล้ว

      I also have same doubt

    • @deepjyotisaikia382
      @deepjyotisaikia382 4 ปีที่แล้ว

      We are not neglecting any features , PCA by no means that we are discarding some of the features to reduce the dimensions. In PCA we are generally creating linear relationship among all the features. and finally those number of principal components who can explain the maximum variance are selected.

  • @SantoshKumar-fr5tm
    @SantoshKumar-fr5tm 4 ปีที่แล้ว +1

    Hi Krish, nice way you explained it. Thanks.
    But I have one question, how we can find out the efficiency for PCA, for example, how we can compare that reducing to 2 dimensions is not fruitful as reducing to 3 dimensions. in other words, how can we be sure that not much information is lost by PCA..

  • @johannesmphaka7433
    @johannesmphaka7433 5 ปีที่แล้ว +2

    thanks for your videos i'm learning alot from you. can you prove that when you increase the number of dimensions the model accuracy
    decreases. again is it necessary to reduce my dimensions if i have few dimension like 5D. will i still improve my model if 5D was reduced to 2dimensions.

  • @vijaymurugesan581
    @vijaymurugesan581 4 ปีที่แล้ว

    Hi Krish, your videos are great!! Thanks a ton :)

  • @vinyasshreedhar9833
    @vinyasshreedhar9833 3 ปีที่แล้ว

    For each variable if orthogonal line gives huge loss of variance, then for all the 30 features, we can only take the 1st component right? Why do we have to even consider the 2nd component? Please provide your insights.

  • @dilipgawade9686
    @dilipgawade9686 5 ปีที่แล้ว +1

    This is super useful video Krish..

  • @gopalakrishna9510
    @gopalakrishna9510 4 ปีที่แล้ว +2

    if datahaving catagarical variable what we have to do?

  • @ramleo1461
    @ramleo1461 5 ปีที่แล้ว +2

    Hi Krish,
    Your videos are very useful, thank you for the videos,
    I have a doubt, the reason we are dng pca is to reduce the number of features right??... So how we wil know which are the features from the given data are useful while applying different models on our data?

    • @g.sai_koushik
      @g.sai_koushik ปีที่แล้ว

      We can check with the help of Corelation if there is some corelation whether it might positive or negative if it has corelation than we could say that feature will be useful for us. Coming to PCA its not what you think, PCA Comesup with a value using the existing variables and we use that PCA derived variable for analysis.

  • @sandipansarkar9211
    @sandipansarkar9211 4 ปีที่แล้ว

    great explanation. Need to get my hands dirty in Jupyter notebook. thanks

  • @Sir_AD
    @Sir_AD 5 หลายเดือนก่อน

    can we check which two features are selected from 30?

  • @MrSubhransusekhar
    @MrSubhransusekhar 3 ปีที่แล้ว

    Beautifully explained

  • @TheMangz1611
    @TheMangz1611 3 ปีที่แล้ว

    should go into more maths and how its working... anyone can fit n transform..

  • @souhamahmoudi7745
    @souhamahmoudi7745 ปีที่แล้ว

    Thanks for sharing!

  • @vishalsharda7508
    @vishalsharda7508 2 ปีที่แล้ว

    Thanks for this video.👌👌👌

  • @nikitasinha8181
    @nikitasinha8181 ปีที่แล้ว

    Thank you so much sir

  • @lovefrommars7468
    @lovefrommars7468 4 ปีที่แล้ว

    Still confused.... I didn't understand ...when we plot a perpendicular line and project the points on that line...it means we are creating one feature ...so what will be the values for that features?

  • @divyanipal2705
    @divyanipal2705 4 ปีที่แล้ว +1

    hi I have question what if we have features more then 100 , then how we will decide how many n_component we will take .
    I mean is there any methodlogy to decide n_component

    • @swathys7818
      @swathys7818 4 ปีที่แล้ว +6

      Step 1: Do PCA with n_components = none
      step 2: Now view the explained_variance_ratio for default say 10 PCA components
      step 3: Find out the maximum variance explained by summing how many numbers of components that is (n_components).
      Example : By default say u have 5 components:(0.70,0.10,0.08,........) Now your first 3 components can explain 88% of your total variance hence with this u can decide PCA(n_components = 3)

  • @kushkumar6467
    @kushkumar6467 3 ปีที่แล้ว +2

    Thanks for the nice video, I had one doubt. So how do we decide when to apply PCA? Let's say when the features are 2, 3, or more than 3. Is there any constant number of features for that and can you explain the math behind it? Kudos and cheers mate!

    • @saipavan5194
      @saipavan5194 2 ปีที่แล้ว

      Based on the requirement with certain no.of columns you need .

    • @vishalverma5837
      @vishalverma5837 2 ปีที่แล้ว

      I would suggest not applying PCA based on a number of features. Instead, we should apply PCA in the following scenarios: 1. to reduce the memory space for the data set. 2. to improve the learning speed of the algorithm. 3. to visualize high-dimensional data to 2d or 3d plots.

  • @saikatroy3818
    @saikatroy3818 4 ปีที่แล้ว

    Thanks for the nice representation with hands_on.

  • @DanielWeikert
    @DanielWeikert 5 ปีที่แล้ว +3

    why does pca actually require perpendicular lines for the second, third,... component?

    • @krishnaik06
      @krishnaik06  5 ปีที่แล้ว +19

      One way of stating the goal of PCA is to find the linear projection that gives you the "best" representation of your data for a given dimensionality. It defines "best" by the representation with the minimal squared reconstruction error.
      When looking at PCA from 2 dimensions to 1 dimension, as you do there, you are not actually trying to find the line that best predicts y from x. Rather, you're trying to find the combination of y and x such that the new, combined value "best" represents all your initial 2-D points.
      Essentially, the reason PCA considers the perpendicular distance is because it doesn't actually try to model yy as a function of xx.

    • @DanielWeikert
      @DanielWeikert 5 ปีที่แล้ว +1

      @@krishnaik06 thanks

  • @MonilModi10
    @MonilModi10 ปีที่แล้ว

    Why PCA rotates the axis? What is a significance of that?

  • @apekshaagnihotri5124
    @apekshaagnihotri5124 4 ปีที่แล้ว +1

    This is a great content Krish. I don't understand how do we interpret the two features? Would someone please explain me the final graph?

    • @SivaKumar-ny8pg
      @SivaKumar-ny8pg 3 ปีที่แล้ว

      The features could be a transform of other features mapped on new axis. You can't say there is one to one mapping to original ones. When you predict it is important you apply PCA on predict input before passing to ml algo

  • @deepcontractor6968
    @deepcontractor6968 4 ปีที่แล้ว +1

    Make a video on pcr

  • @prernanichani8516
    @prernanichani8516 3 ปีที่แล้ว +1

    Hi,
    How to choose the correct n_components during PCA. For eg. I have 80 features in the dataset how do I choose the n_components. Is there any logic to select the number of components.

    • @KK-rh6cd
      @KK-rh6cd 3 ปีที่แล้ว +1

      I have the same question, searching for it and I found this
      stackoverflow.com/questions/12067446/how-many-principal-components-to-take

  • @pablo_CFO
    @pablo_CFO 4 ปีที่แล้ว +3

    So ... if we apply PCA to reduce the number of dimensions of our dataset, and then create a model to predict a class (as in the cancer dataset), what happens if we receive information from a new patient and we need to make the classification?
    In other words, how do we handle the new data given to us in the original format (all features) if our classification algorithm is based on the new variables of the PCA?

    • @adityakyatham
      @adityakyatham 4 ปีที่แล้ว +3

      we have to apply pca on every new patient and then send it to model

    • @asutoshghanto3419
      @asutoshghanto3419 3 ปีที่แล้ว

      we need to find totally independant components in those features which can only be obtained from eign values.

  • @HonestADVexplorer
    @HonestADVexplorer 4 ปีที่แล้ว

    great explanation! Thanks Krish

  • @travelling.pandas
    @travelling.pandas 4 ปีที่แล้ว

    Thanks for the video Krish. One question:How we come to knw which feature we have to select as principal component and why during scatter plotting other feature do not work in place of cancer['Target']

  • @md.ahsanulkabirarif5448
    @md.ahsanulkabirarif5448 3 ปีที่แล้ว +10

    Without finding Eigen value & Eigenmatrics, how can you determine that n_compnents = 2?

    • @basavarajpatil9821
      @basavarajpatil9821 3 ปีที่แล้ว +2

      Why krish has not answered this, actually this is the good question

    • @brianlatuconsina6169
      @brianlatuconsina6169 3 ปีที่แล้ว

      i think you could try to plot the scree plot so that you know how many components are representative enough

    • @SoumendraBagh
      @SoumendraBagh 3 ปีที่แล้ว +2

      Here the original data is 2 Dimensional. So after applying PCA we cannot go beyond the original number of dimensions. So N=2 is the maximum PCA can generate.
      To get more number of dimensions than the original dataset through transformation is something we can get through Kernel Function. Kernel function which are used in Kernel SVM functions are used to project the data in a higher dimension so that they will have a clearly separable boundary so a linear classifier boundary can be drawn with margin for SVC classifier.
      Hence, PCA is for compression and Kernel is for the opposite

  • @fitbeat8231
    @fitbeat8231 4 ปีที่แล้ว

    hello sir ,
    I have two columns ID_code
    ,
    target and There are total 200000 observations in the dataset and 202 features. how can i apply pca to this dataset. all data in numeric

  • @af121x
    @af121x 3 ปีที่แล้ว

    Thank you Krish * 1 million

  • @tagoreji2143
    @tagoreji2143 2 ปีที่แล้ว

    tqsm sir.

  • @mehnaztabassum1878
    @mehnaztabassum1878 3 ปีที่แล้ว

    @krish naik, could you pls tell that whether PCA is learnable/ trainable?

  • @anands2239
    @anands2239 4 ปีที่แล้ว

    Nice video. can you also apply a linear agression on top of the pca and show some sample? i mean, do a test, train run and predict? just to see how it works?

  • @CFATrainer
    @CFATrainer หลายเดือนก่อน

    excellent

  • @shashiaradya6905
    @shashiaradya6905 5 ปีที่แล้ว

    Super explanation sir.

  • @kalpanaregmi2137
    @kalpanaregmi2137 4 ปีที่แล้ว

    Hello sir I have a question, PCA....why we are neglecting other features because somehow may be other features that are important for the model...???? On what bases the particular column is selected Please answer this sir.....I'm in doubt

  • @itsmesuchethanv
    @itsmesuchethanv 3 ปีที่แล้ว

    How to find how many features obtained from an image if the size of image is 100*100?

  • @linuxtubers7313
    @linuxtubers7313 4 ปีที่แล้ว +1

    How to decide the feature number for pca?

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1 3 ปีที่แล้ว

      Hello Linux Tubers , that depends on how much variance you want to capture after dimensional reduction (more variance == preserving more information)..
      I created a script for this , might be helpful to you in you are using MATLAB --
      th-cam.com/video/iMHTgwTFJjQ/w-d-xo.html
      Happy Learning :-)

  • @mayankgupta1728
    @mayankgupta1728 ปีที่แล้ว

    Thanks

  • @JaiSreeRam466
    @JaiSreeRam466 4 ปีที่แล้ว +1

    How to convert the input values to two features to predict whether person has cancer or not

  • @ankitac4994
    @ankitac4994 3 ปีที่แล้ว

    Explained well!

  • @jetendramulinti6443
    @jetendramulinti6443 5 ปีที่แล้ว +1

    Super video😀

  • @mercyjhansi8190
    @mercyjhansi8190 2 ปีที่แล้ว

    Sir, one question. Do we always use only Standard Scaler before PCA, even if some of the features are highly skewed. Or can we use robust scaler in that case?

    • @unfiltered_24
      @unfiltered_24 ปีที่แล้ว

      always use standard scaler. PCA basically picks out eigen vectors which work best with scaled numbers.

  • @jayeshkumar2604
    @jayeshkumar2604 3 ปีที่แล้ว

    awesome content... amazing

  • @KVishya
    @KVishya 4 ปีที่แล้ว +1

    Hi Krish, great video, I have one question, How do you decide the n_components value? Is there an ideal value or should it be decided based on the initial number of features?

    • @swathys7818
      @swathys7818 4 ปีที่แล้ว +8

      Step 1: Do PCA with n_components = none
      step 2: Now view the explained_variance_ratio for default say 10 PCA components
      step 3: Find out the maximum variance explained by summing how many numbers of components that is (n_components).
      Example :By default say u have 5 components:(0.70,0.10,0.08,........) Now your first 3 components can explain 88% of your total variance hence with this u can decide PCA(n_components = 3)

    • @socially_apt
      @socially_apt 4 ปีที่แล้ว +5

      There is something called scree plot. Read about it. You actually pick up the pc number where your variance explained becomes constant.
      It is drawn with respect to cumulative variance explained on y axis and pc number on X . Some people also use Eigen values instead of variance explained. Give you the same thing.
      @krishnaik wanna comment?

    • @socially_apt
      @socially_apt 4 ปีที่แล้ว +2

      My question is why don't you consider PCA a ml technique. I have used PCA for unsupervised clustering and achieved amazing results

    • @swathys7818
      @swathys7818 4 ปีที่แล้ว +2

      @@socially_apt yes true but it purely depends on data

    • @chillbro2432
      @chillbro2432 3 ปีที่แล้ว

      @@swathys7818 You might have explained well but I didn't get what you said. Could you please elaborate a bit more. Thanks in advance.

  • @prachinainawa3055
    @prachinainawa3055 3 ปีที่แล้ว

    How will I know what should I set as the value of n_components? How do you decide to reduce 30 features to "2" features only?

  • @sakshamshivhare2474
    @sakshamshivhare2474 2 ปีที่แล้ว

    Is there any math behind selecting number of components in PCA

  • @Nifty1976
    @Nifty1976 4 ปีที่แล้ว

    PCA is statistical technique first invented in 1901 by Prof R A Fisher

  • @monikajain7803
    @monikajain7803 3 ปีที่แล้ว

    Sir , so as per your explanation can i say that while transforming an image we should select PC 1 only as there is less data loss in that.

    • @bharathkumar5870
      @bharathkumar5870 3 ปีที่แล้ว

      for visualization in 2d..first two pc lines are enough....

    • @abhijeetjain8228
      @abhijeetjain8228 2 ปีที่แล้ว

      @@bharathkumar5870 but how to decide pc1 line at the first.??

  • @swethakulkarni3563
    @swethakulkarni3563 4 ปีที่แล้ว +2

    Why don't you start community on Slack?

  • @mallarapubharath
    @mallarapubharath 4 ปีที่แล้ว

    could you please explain PCA Much More Mathematically like explaining Eigean Vectors,eigean Values....

  • @nehamanpreet1044
    @nehamanpreet1044 4 ปีที่แล้ว

    What is the difference between PCA and SVD ???

  • @midhileshmomidi2434
    @midhileshmomidi2434 5 ปีที่แล้ว

    I have a doubt that whenever we have many features in any dataset do we have to use PCA compulsory?

  • @tanaygupta2865
    @tanaygupta2865 3 ปีที่แล้ว

    Did we remove the target/ output feature from the data set before applying PCA?
    @krish

  • @prakashdwivedy5739
    @prakashdwivedy5739 4 ปีที่แล้ว

    can pca be used with Multiple linear regression??

  • @sachinx30
    @sachinx30 5 ปีที่แล้ว

    In this playlist, before video is private. Can we have it?

  • @harishvijay8490
    @harishvijay8490 5 ปีที่แล้ว

    clear explanation and keep uploading video like this

  • @manjunath.c2944
    @manjunath.c2944 5 ปีที่แล้ว

    hi krish kindly do video on ARIMA Model

  • @saurabhbarasiya4721
    @saurabhbarasiya4721 4 ปีที่แล้ว

    Great sir

  • @equbalmustafa
    @equbalmustafa 3 ปีที่แล้ว

    How to decide n_components?

  • @sathvikjoel1525
    @sathvikjoel1525 4 ปีที่แล้ว

    Which system are you using??

  • @florinaling2902
    @florinaling2902 4 ปีที่แล้ว

    Can you do PCA based on regression problem? Because Im curious how different it is if implement PCA in supervised learning as I saw in some articles.

    • @manusingh9007
      @manusingh9007 4 ปีที่แล้ว

      Same I am Thinking

    • @florinaling2902
      @florinaling2902 4 ปีที่แล้ว

      @@manusingh9007 i have seen from the mathematics side of it and seems to be quite confusing. Hope somebody would do more vdeo about this

  • @ShahzadQureshii
    @ShahzadQureshii 3 ปีที่แล้ว

    Pca vs feature selection?

  • @bharteshtandon5095
    @bharteshtandon5095 4 ปีที่แล้ว

    why we don't use PCA in every project to reduce the dimensions.. when to apply PCA??

    • @rehansiddique1875
      @rehansiddique1875 4 ปีที่แล้ว

      No we can not always use PCA , we only use it when we have to many number of features or variables.

    • @kulamanisahoo4785
      @kulamanisahoo4785 4 ปีที่แล้ว

      PCA also used some computaion in the background.if the features are less it will not give much benefits

  • @mdashad1582
    @mdashad1582 4 ปีที่แล้ว

    how i find this actual dataset?

  • @pmanojkumar5260
    @pmanojkumar5260 5 ปีที่แล้ว

    Thanks Bhai..

  • @manjunath.c2944
    @manjunath.c2944 5 ปีที่แล้ว

    super krish

  • @ankitac4994
    @ankitac4994 2 ปีที่แล้ว

    Good crisp explanation but not detailed.

  • @thepresistence5935
    @thepresistence5935 3 ปีที่แล้ว

    bro i think you know tamil if possible explain only one video in tamil- aravind

  • @debatradas1597
    @debatradas1597 3 ปีที่แล้ว

    Thanks