Data Analysis 6: Principal Component Analysis (PCA) - Computerphile

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 พ.ย. 2024

ความคิดเห็น • 123

  • @Computerphile
    @Computerphile  5 ปีที่แล้ว +14

    Check out the full Data Analysis Learning Playlist: th-cam.com/play/PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba.html

    • @7177YT
      @7177YT 5 ปีที่แล้ว +3

      awesome, thank you!!

    • @injeel_ahmed
      @injeel_ahmed 3 ปีที่แล้ว +1

      FINALLY!!! I watched like 20 videos before this to understand PCA ( intuition ) and no one could explain it like you. THANKS A LOT MAN.

    • @dmarsub
      @dmarsub 3 ปีที่แล้ว +1

      It is data reduction if you only plot PC1 and PC2 as a 2 dimensional graph.
      Which is very common.

  • @skydrow4523
    @skydrow4523 5 ปีที่แล้ว +218

    Thank you Dr. Mike. I showed this to my neighbors and they told me it totally changed their life. My village also greatly appreciated PCA.

    • @sebastianx21
      @sebastianx21 2 ปีที่แล้ว +19

      Did you show it to your parents as well? Do they still love you?

    • @dexterdev
      @dexterdev ปีที่แล้ว +13

      Did PCA transformed your village?

  • @AwesomeCrackDealer
    @AwesomeCrackDealer 5 ปีที่แล้ว +169

    Holy shit this pca explanation was just what i needed all this time

    • @zerokelvin3626
      @zerokelvin3626 5 ปีที่แล้ว +4

      Same for me

    • @nicholaselliott2484
      @nicholaselliott2484 8 หลายเดือนก่อน +2

      Yep, it boggles the mind how formalism can completely obscure intuition. I guess the formal stuff works for the academic types

  • @heyandy889
    @heyandy889 5 ปีที่แล้ว +50

    pretty dope. here I was laboring away in 223 dimensions. now I can put food on the table for my family with the time saved by removing 100 dimensions. thank u dr mike pound and computerphile

  • @adamtarnawski
    @adamtarnawski 5 ปีที่แล้ว +81

    Dr Mike provided the best explanation of PCA to non-experts which I have ever seen. I very enjoyable and insightful video overall.

    • @nomen385
      @nomen385 2 ปีที่แล้ว +2

      Yea. Everything he explains feels that way

  • @kanewilliams1653
    @kanewilliams1653 8 หลายเดือนก่อน +1

    Why even have lectures? This fella explained why we "maximize the variance" so clearly in the first 5 minutes.. Lecturers should just make us watch this video in class... great stuff!

  • @mrcoomber9085
    @mrcoomber9085 5 ปีที่แล้ว +44

    He's such a great presenter. Thank you for such wonderful videos.

  • @nitika9769
    @nitika9769 10 หลายเดือนก่อน +1

    I finally get it!! It's people like you that keep me motivated for my work !

  • @manuarteteco6153
    @manuarteteco6153 4 ปีที่แล้ว +11

    Best PCA explanation I found so far, and I searched for days. Thanks man!

  • @harpercfc_
    @harpercfc_ ปีที่แล้ว +1

    I gotta say I enjoy this video so much and kinda started to under stand what PCA is and what it is used for. Totally a new and different angle to look at this concept. Thank you again Dr. Mike.

  • @jsraadt
    @jsraadt 5 ปีที่แล้ว +7

    I recommend doing a parallel analysis before extracting principal components. This will tell you how many PCs explain more variance than can be explained at random.

  • @Zilfalon
    @Zilfalon 3 ปีที่แล้ว +2

    Thank you Dr. Pound, finally someone who can explain pca in easy words. Really helpful in my thesis - and by a strange accident I ended up writing both my thesis about pca. First time in my Bachelors I used it for data reduction, this time I use it to categorize data.

  • @OmarMohammed-fy2he
    @OmarMohammed-fy2he 3 ปีที่แล้ว +20

    Dude, you're better at explaining this than our uni professor :""D
    please keep doing what you're doing.
    Thank you.

    • @andrei642
      @andrei642 2 ปีที่แล้ว +3

      Well Omar, he is too a University Professor...

    • @OmarMohammed-fy2he
      @OmarMohammed-fy2he 2 ปีที่แล้ว

      @@andrei642 I didn't know that at the time. I googled him and he turned out to be quite the expert. Regardless, He has a simple way of explaining things. not many others do.

  • @adityapatel3535
    @adityapatel3535 4 ปีที่แล้ว +3

    this is brilliantly explained. one can only simplify if one truly understands it. thanks

  • @HitAndMissLab
    @HitAndMissLab 5 หลายเดือนก่อน

    Thank you for this brilliant video. In a less then a half an hour I developed intuition that it would take me a month to do from a book.

  • @tlniec
    @tlniec 3 ปีที่แล้ว +1

    Upon first hearing the phrase "principal component analysis", I thought it sounded very analogous to finding principal stress axes in a body under load. As Dr. Pound gave a more detailed explanation later, I realized that is exactly what it is - just expanded to take place in n-dimensional space instead of 3D space. May be a helpful way to visualize for any mechanical engineers out there.

  • @ErickMarkevich
    @ErickMarkevich 4 ปีที่แล้ว

    I really struggled to grasp the concept of PCA before, but thanks to your video it is now clear to me. Thank you

  • @brandonbracho5898
    @brandonbracho5898 3 ปีที่แล้ว +3

    best explanation for PCA I could find, thank you!

  • @699ashi
    @699ashi 3 ปีที่แล้ว +2

    I am just happy to see him using R for this example

  • @Flourish38
    @Flourish38 5 ปีที่แล้ว +4

    This video was EXACTLY what I needed right now. Thank you so much!!!

  • @gzuzchuy505
    @gzuzchuy505 2 ปีที่แล้ว

    What a simple way to explain PCA! Thank you so much for the video.

  • @sepidet6970
    @sepidet6970 4 ปีที่แล้ว

    FInally I learnt what is PCA is and what is does, thank you very much.

  • @__Wanderer
    @__Wanderer 5 ปีที่แล้ว

    Dr. Mike your explanations are brilliant.

  • @tellefsolberg5698
    @tellefsolberg5698 4 ปีที่แล้ว +1

    Fricking loved that it was applied in R!

  • @asgharbeigi9718
    @asgharbeigi9718 2 ปีที่แล้ว

    Dr. Mike, you are a genius.

  • @Eternity4Evil
    @Eternity4Evil 3 ปีที่แล้ว

    Best explanation I've come upon as of yet. Thanks!

  • @0000000854
    @0000000854 4 ปีที่แล้ว +2

    summary:
    (1) draw line to maximize spread
    (2) minimize square error accumulation
    (3)project data to axis which maximize dataset variance

    • @PLAYERSLAYER_22
      @PLAYERSLAYER_22 3 ปีที่แล้ว +1

      hence, “axial reprojection”

    • @0000000854
      @0000000854 3 ปีที่แล้ว

      @@PLAYERSLAYER_22 thanks

  • @man.h
    @man.h 4 ปีที่แล้ว

    the best explanation I have seen so far. thank you so much!

  • @9785633425657
    @9785633425657 9 หลายเดือนก่อน

    Thank you for explaining this! Very good quality of the video

  • @muzzamilnadeem3104
    @muzzamilnadeem3104 4 ปีที่แล้ว

    Great video. The understanding is very relevant to a lot of feature selection etc in data sciences

  • @demonblood8841
    @demonblood8841 2 ปีที่แล้ว

    I'm late to the party but this playlist is gold. Thanks guys :)

  • @simaykazc1508
    @simaykazc1508 3 ปีที่แล้ว

    It is very pleasant to listen to you. Thanks!

  • @8eck
    @8eck 3 ปีที่แล้ว

    So the idea behind it, is a finding a right angle to look at all data, where we can see clearly all data and distances between them. Looks more like support vector machine or SVM, where we increase dimensionality to fit the line on some other dimension.

  • @omerahmaad
    @omerahmaad 4 ปีที่แล้ว

    Probably the best explaination

  • @TAP7a
    @TAP7a 3 ปีที่แล้ว

    Careful when scaling if you’re producing a model which will make predictions on unseen data - the mean that you will be subtracting and the standard deviation that you’re dividing by better be the same between the training set, the test set and the production sets!

  • @alexandros27.
    @alexandros27. 3 ปีที่แล้ว +1

    I agree with most of what is being taught in this video . Using a new basis to maximize variance or minimize the projection error is why PCA is used . What I can't agree with however is the lecturer telling that PCA is used to cluster data . I don't think this is necessarily true . PCA clusters those features which are highly correlated together . It doesn't cluster the data points when they are represented using the new basis vectors . I hope I am not wrong

    • @jagaya3662
      @jagaya3662 3 ปีที่แล้ว

      PCA clusters features by creating new axis, which can help to identify correlations for feature-engeneering.
      However you can still do actual clustering among the new axis and that wouldn't be affected by PCA at all, because data still has the exact same hyperdimensional relative positions, just the axis are shifted.

  • @rijzone
    @rijzone 4 ปีที่แล้ว

    I seriously watch these videos for fun

  • @frobeniusfg
    @frobeniusfg 5 ปีที่แล้ว +1

    Dutch angle is highly appropriate in this topic) Well done, cameraman :)

  • @GoatzAreEpic
    @GoatzAreEpic 5 ปีที่แล้ว +2

    Beautiful explanation with the minimization of error

  • @erw103
    @erw103 5 ปีที่แล้ว

    As I shall mention in my blog, There is a Method to Dr Mike's Madness. Brilliant!

  • @ejkitchen
    @ejkitchen 3 ปีที่แล้ว +1

    Great explanation. THANK YOU!

  • @kirar2004
    @kirar2004 ปีที่แล้ว

    A very nice explanation! Thanks!

  • @ec92009y
    @ec92009y 3 ปีที่แล้ว

    Congratulations again for a great video. Thank you!

  • @paull923
    @paull923 2 ปีที่แล้ว

    ridiculously understandable explained! thank you very much!

  • @sander_bouwhuis
    @sander_bouwhuis 5 ปีที่แล้ว +1

    Outstanding explanation. Thank you, thank you, thank you!

  • @summy291987
    @summy291987 4 ปีที่แล้ว

    Best explanation came upon so far!!

  • @VG-bi9sw
    @VG-bi9sw 3 ปีที่แล้ว

    Very nice explanation. I almost never subscribe but you got me. Thank you.

  • @TheHamzawasi
    @TheHamzawasi 2 ปีที่แล้ว

    Thanks Dr. Mike, really helpful!

  • @annprong5052
    @annprong5052 2 ปีที่แล้ว

    Great video. I also enjoyed the throwback stripey dot-matrix printer paper :)

  • @tapanbasak1453
    @tapanbasak1453 ปีที่แล้ว

    Genius explanation

  • @djstr0b3
    @djstr0b3 11 หลายเดือนก่อน

    Excellent video

  • @shivammishra2524
    @shivammishra2524 5 ปีที่แล้ว

    Great Video. I guess I would never forget PCA

  • @juanluisbaldelomar1617
    @juanluisbaldelomar1617 3 ปีที่แล้ว

    You saved me! Excellent video!!!

  • @7177YT
    @7177YT 5 ปีที่แล้ว +1

    Extra points for using R! Very much approved! Lovely! (:

  • @trafalgarlaw9919
    @trafalgarlaw9919 3 ปีที่แล้ว

    Thank you for the explanation.

  • @melikaelwadany4524
    @melikaelwadany4524 2 ปีที่แล้ว

    Thank you for this video.

  • @pavanagarwal6753
    @pavanagarwal6753 5 ปีที่แล้ว +5

    I wonder how mike learned so much if computerphile could give me the book from where we can extend the horizon??

  • @nomen385
    @nomen385 2 ปีที่แล้ว

    "A new principal component is gonna come out orthogonal to the ones before, until you run out of dimensions and you can't do it anymore."
    - poetry

  • @m22d52
    @m22d52 2 ปีที่แล้ว

    5:25 Why you have not constructed a center of data? Project points to both X and Y axis, calculate both averages and then draw perpendiculars where these averages will intersect which will be a center of dataset

  • @4.0.4
    @4.0.4 5 ปีที่แล้ว

    This is great content. It genuinely makes me want to pick RStudio and try to learn data analysis.

  • @samalkayedktaishat9927
    @samalkayedktaishat9927 3 ปีที่แล้ว

    thank you this made life easier .......i love your accent

  • @astropgn
    @astropgn 5 ปีที่แล้ว +7

    What if you take these new axis (PC1, PC2, PC3...) and do a PCA again? Will they spread even more, or will they give the same exact result?

    • @f4614n
      @f4614n 5 ปีที่แล้ว +12

      You'd get the exact same result, as with the constraints given in PCA, the solution is unique.

    • @ryadbelhakem1944
      @ryadbelhakem1944 5 ปีที่แล้ว +1

      The solution is not unique, since pca was already applied the new axis are non correlated, therefore applying pca could at best perform a rotation of axis, replacing ax by -ax.

  • @pablobiedma
    @pablobiedma 4 ปีที่แล้ว

    Great video Peter Parker

  • @BjarkeHellden
    @BjarkeHellden 5 ปีที่แล้ว

    Great explanation

  • @proprius
    @proprius 3 ปีที่แล้ว

    brilliant, thanks!

  • @kimiaebrahimi5346
    @kimiaebrahimi5346 3 ปีที่แล้ว +1

    amaziiiing

  • @frankietank8019
    @frankietank8019 4 ปีที่แล้ว

    Brilliant, thanks!

  • @hasan0770816268
    @hasan0770816268 5 ปีที่แล้ว +2

    Well that escalated quickly!

  • @passingthetorch5831
    @passingthetorch5831 5 ปีที่แล้ว +2

    SVD when? Mike might also consider mentioning SVD approximation for convolutions, neural networks, etc.

    • @f4614n
      @f4614n 5 ปีที่แล้ว +3

      If you are using PCA, in all likelihood you were applying SVD at some point (maybe without realizing it).

  • @isabellabihy8631
    @isabellabihy8631 5 ปีที่แล้ว

    If I remember multivariate statistics correctly, the name "factor analysis" comes to mind. Indeed, I like PCA better.

  • @breadandcheese1880
    @breadandcheese1880 26 วันที่ผ่านมา

    How do you get column names of that 133 features that make up PCA1 for submitting that as a data frame for Kmeans?

  • @RAINE____
    @RAINE____ 5 ปีที่แล้ว

    Thanks for this

  • @fakhermokadem11
    @fakhermokadem11 5 ปีที่แล้ว +5

    Why does minimizing the error means maximizing the variance?

    • @Kasenkow
      @Kasenkow 5 ปีที่แล้ว

      I think you're minimizing the error when you're fitting a line (which will be the new axis) to existing data points from two previous dimensions. Thus, this error is (as it was mentioned in the video) the summed squared differences between each actual data point and the line that you're trying to fit.

    • @Hexanitrobenzene
      @Hexanitrobenzene 5 ปีที่แล้ว +1

      Judging by his sketch, PCA tries to maximize variance along PC1 axis, while at the same time minimizing error along all the axes orthogonal to PC1, then does the same for PC2 and so on.

    • @willd0g
      @willd0g 5 ปีที่แล้ว

      Recall his fists; the line of best fit would pierce these two data points and introduce the axis that can directionally pivot the data to reveal greater variance (spread) as observed by the space between his hands as he turned them along that newly introduced axis

  • @leksa8845
    @leksa8845 2 ปีที่แล้ว

    i fall in love:D

  • @Rockyzach88
    @Rockyzach88 ปีที่แล้ว

    Good stuff. Is the "weighted sum" the frobenius norm or related? I'm following a book and I'm trying to compare how it is teaching this to how it is explained in other forms of media like youtube videos.

  • @ControlTheGuh
    @ControlTheGuh 3 ปีที่แล้ว

    That maximizes the variance=r2? Bc it seems like p1 was tvhere to minimize the variiance between the linne and the points no?

  • @framm703
    @framm703 8 หลายเดือนก่อน

    Cool 😎

  • @Centhihi
    @Centhihi 3 ปีที่แล้ว

    And what is the benefit of doing PCA? Are we training our neural networker quicker or why would I do this? I still have to collect all the variables, so what is the point?

  • @RamakrishnaSalagrama1
    @RamakrishnaSalagrama1 5 ปีที่แล้ว

    Could not find the dataset. Could you please give a dropbox or drive link.

  • @sdeitym
    @sdeitym 3 ปีที่แล้ว

    5:34 why when we rotate the axis data also split out as 2 clusters?

    • @timowesterdijk5840
      @timowesterdijk5840 3 ปีที่แล้ว

      It is partly a coincidence, but not really. PCA1 gives you the axis that spreads out and separates your data the most (greatest variance). Because your data (from two dimensions) is now separated into one dimension, you can see if there are data points that correlate with eachother.

  • @tear728
    @tear728 5 ปีที่แล้ว

    What about Exploratory Factor Analysis?

  • @whyzed603
    @whyzed603 4 ปีที่แล้ว

    Why minimum distance of data points from the principal axis ensure the maximum length of the axis? Can someone explain or maybe I got something wrong?

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 2 ปีที่แล้ว

    But how do we make use of principle components afterwards, despite the fact that we can’t interpret the components since they no longer represent the original variables? Without interpretability, can PC still be useful? What can PC still tell us?

    • @amineaboutalib
      @amineaboutalib 2 ปีที่แล้ว

      they do represent the original variables, what you have to do is to go through the weights and try to make sense of what kind of hidden variable the PC is representing

  • @TeamRomeroJacobs
    @TeamRomeroJacobs 5 ปีที่แล้ว +1

    Hey quick question for anyone out there. I'm failing to see if there's a difference between the principal component 1 and the linear regression. It seems to me they are the same thing. It is my understanding that
    Btw sorry bad english, not a native speaker.

    • @ryadbelhakem1944
      @ryadbelhakem1944 5 ปีที่แล้ว

      Really not the same but clearly there is a link between both, one could transform pca optimization problem into a special regression using frobenus norm and basic algebra.
      Performing pca you look for non correlated axis, this is simply not the case for regression.

  • @pranayyanarp4118
    @pranayyanarp4118 5 ปีที่แล้ว +1

    What.does ' foggin all ' mean?...at 8.47 time in video

    • @jfagerstrom
      @jfagerstrom 5 ปีที่แล้ว +4

      He's saying 'orthogonal', meaning the second principal component is going to be at a 90 degree angle to the first one. Orthogonal is used since it describes this relationship without ambiguity for higher than 2 dimensions as well. It simply means that the two axes are completely uncorrelated.

    • @pranayyanarp4118
      @pranayyanarp4118 5 ปีที่แล้ว

      @@jfagerstrom u mean he is pronouncing orthogonal as' foggin all" ?... It's in subtitles also

    • @jfagerstrom
      @jfagerstrom 5 ปีที่แล้ว +3

      @@pranayyanarp4118 it's just his accent. The person who wrote the subtitles probably heard it the same way you did. He is for sure saying orthogonal though, it's the only thing that makes sense

    • @pranayyanarp4118
      @pranayyanarp4118 5 ปีที่แล้ว +1

      @@jfagerstrom thanx man

  • @donfeto7636
    @donfeto7636 ปีที่แล้ว

    don't watch the video if you know nothing about pca , come back after you know what is it from StatQuest or other channels

  • @willw4096
    @willw4096 ปีที่แล้ว

    11:58

  • @Hamromerochannel
    @Hamromerochannel ปีที่แล้ว

    @ 9:45 starts r

  • @asifkhaliq9086
    @asifkhaliq9086 4 ปีที่แล้ว

    Dr. Mike can you teach me privately please. . .

  • @charlieangkor8649
    @charlieangkor8649 3 ปีที่แล้ว

    "sponsorship from by Google" - was this piece of English generated by Google's AI?

  •  5 ปีที่แล้ว +2

    Dude, please use data.table::fread() instead of read.csv() for larger data

  • @pexfmezccle
    @pexfmezccle 4 ปีที่แล้ว +1

    “Orffogonal”

  • @brunomartel4639
    @brunomartel4639 4 ปีที่แล้ว

    auto-generated subs pleaseeee!!!!!

  • @DEVSHARMA-zp8xv
    @DEVSHARMA-zp8xv 5 ปีที่แล้ว

    It was nice but could have been better and longer if maths were included..