Factor Analysis and Probabilistic PCA

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 มิ.ย. 2024
  • The machine learning consultancy: truetheta.io
    Want to work together? See here: truetheta.io/about/#want-to-w...
    Factor Analysis and Probabilistic PCA are classic methods to capture how observations 'move together'.
    SOCIAL MEDIA
    LinkedIn : / dj-rich-90b91753
    Twitter : / duanejrich
    Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
    SOURCES
    [1] was my primary source since it provides the algorithm used in the Scikit Learn's Factor Analysis software (which is what I use). Since it walks through the derivation of the fitting procedure, it is quite technical. Ultimately, that level of detail came in handy for this video.
    [2] and [4] were my go-to for Probabilistic PCA. A primary reason is Christopher Bishop is one of the originators of PPCA. That came with a lot of thoughtful motivation for the approach. The discussion there includes a lot advantages of PPCA over PCA.
    [3] was my refresher on this subject when I first decided to this video. Like many of us, I'm a fan of Andrew Ng, so I was curious how he'd explain the subject. He emphasized that this model is particularly useful in high dimension-low data environments - something I forward in this video.
    [5] is an excellent overview of FA and PPCA (as long as you're comfortable with linear algebra and probability). In fact, Kevin Murphy's entire book is like that for every subject and that's why it's my absolute favorite text.
    ---------------------------
    [1] D. Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, 2012
    [2] C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006
    [3] A. Ng, "Lecture 15 - EM Algorithm & Factor Analysis | Stanford CS229: Machine Learning (Autumn 2018)", • Lecture 15 - EM Algori...
    [4] M. Tipping and C. Bishop, "Mixtures of Probabilistic Principal Component Analysers", MIT Press, 1999
    [5] K. P. Murphy. Probabilistic Machine Learning (Second Edition), MIT Press, 2021
    CONTENTS
    0:00 Intro
    0:21 The Problem Factor Analysis Solves
    2:27 Factor Analysis Visually
    5:52 The Factor Analysis Model
    10:56 Fitting a Factor Analysis Model
    14:13 Probabilistic PCA
    15:43 Why is it Probabilistic "PCA"?
    16:59 The Optimal Noise Variance
    TOOLS
    If you'd like to apply Factor Analysis, I'd recommend Scikit-Learn: scikit-learn.org/stable/modul...
    Social Media
    Twitter : / duanejrich
    Patreon : / mutualinformation

ความคิดเห็น • 90

  • @sasakevin3263
    @sasakevin3263 ปีที่แล้ว +32

    The only reason that this guy's video didn't go viral is only 0.01% of the audience are interested in such complex statitics and formulas. But what he made are really awesome!

    • @Mutual_Information
      @Mutual_Information  ปีที่แล้ว +5

      That 0.01% are the cool kids - that's who I'm going for!

    • @ruizhezang2657
      @ruizhezang2657 11 หลายเดือนก่อน

      @@Mutual_Informationawesome job! Best video in this area I’ve ever watched!

  • @pifibbi
    @pifibbi 2 ปีที่แล้ว +20

    Please don't stop making these!

  • @divine7470
    @divine7470 2 ปีที่แล้ว +8

    Thanks for covering this topic. I learned about and how to use FA and PCA in bootcamps but the way you dive into the internals is made so easily digestible.

  • @MikeOxmol_
    @MikeOxmol_ 2 ปีที่แล้ว +21

    It's criminal that you don't have at least 50k subs. Please don't stop making videos, even though they don't have that many views right now, there are people like me who appreciate the videos very much. Certain topics can seem very daunting when you read about them, especially in such "dense" books as Bishop's PRML or Murphy's PML. However, if I start digging into a topic by watching your video and only then do I read the chapter, the ideas seem to connect more easily and I have to spend less time until it "clicks" if you know what I mean.
    On another note, if you look for ideas for future vids (which I'm sure you already have plenty), Variational Inference would be a cool topic

    • @Mutual_Information
      @Mutual_Information  2 ปีที่แล้ว +4

      Thanks extremely nice of you! Yea this channel is for people like you/me, who want to understand those intense details in those books. I know I would have loved a channel like this if it was around when I was learning. I’m glad it’s doing that job for you.
      And yes VI is coming! Thanks for your support! And please don’t hesitate to share the channel with people doing the same as you :)

  • @Nightfold
    @Nightfold 2 ปีที่แล้ว +4

    This sheds some light into what I'm doing with PPCA but still I resent deeply my lack of formation in statistics during my degree.

  • @mCoding
    @mCoding 2 ปีที่แล้ว +2

    Always love to hear your explanations!

    • @Mutual_Information
      @Mutual_Information  2 ปีที่แล้ว +1

      Thanks man - fortunately there’s always a mountain of topics to cover. Plenty to learn/explain :)

  • @fenncc3854
    @fenncc3854 2 ปีที่แล้ว +2

    Great video, really informative, easy to understand, good production quality, and you've also got a great personality for these style of videos.

    • @Mutual_Information
      @Mutual_Information  2 ปีที่แล้ว

      Thank you! These comments mean a lot. Happy to have you :)

  • @jakubm3036
    @jakubm3036 ปีที่แล้ว +1

    great video, understandable explanations and cool format!

  • @MrStphch
    @MrStphch 2 ปีที่แล้ว +1

    Really really nice videos!! Love your way of explaining.

  • @Kopakabana001
    @Kopakabana001 2 ปีที่แล้ว +4

    Another great video!

  • @mainakbiswas2584
    @mainakbiswas2584 6 หลายเดือนก่อน +1

    Had been finding this piece of information for quite a long time. I understood FA by sort of re-discovering it after seeing the sklearn documentation. From that point onward I wanted ro know why it related to pca. This have me the intuition and the resources ro look upto. ❤❤❤

  • @xy9439
    @xy9439 2 ปีที่แล้ว +3

    Very interesting video, as always

  • @quitscheente95
    @quitscheente95 ปีที่แล้ว +1

    Damn, I spend so much time going through 5 different books to understand PPCA and here you are, explaining it in an easy, comprehendable, visual manner. Love it. Thank you :)

    • @Mutual_Information
      @Mutual_Information  ปีที่แล้ว

      Awesome - you are the exact type of viewer I'm trying to help

  • @alan1507
    @alan1507 8 หลายเดือนก่อน +1

    Thanks for the very clear explanation. I was doing my PhD under Chris Bishop when Bishop and Tipping were developing PPCA - good to get a refresher!

    • @Mutual_Information
      @Mutual_Information  8 หลายเดือนก่อน

      Wow, it's excellent to get your eyes on it - very cool!

  • @user-kn4wt
    @user-kn4wt 2 ปีที่แล้ว +1

    these videos are awesome!

  • @AdrianGao
    @AdrianGao 9 หลายเดือนก่อน +1

    Thanks. This is brilliant.

  • @enx1214
    @enx1214 ปีที่แล้ว +1

    True old school best techniques still in use them from 2004. They can save you as can build from nowhere amazing models

  • @EverlastingsSlave
    @EverlastingsSlave 2 ปีที่แล้ว +1

    Man how good are your vidoes, i am amazed at perfection

  • @user-lx7jn9gy6q
    @user-lx7jn9gy6q 11 หลายเดือนก่อน +1

    Underrated channel

    • @Mutual_Information
      @Mutual_Information  11 หลายเดือนก่อน

      You're not going to be believe this.. but I agree

  • @juliafila5709
    @juliafila5709 หลายเดือนก่อน +1

    Thank you so much for your content!

  • @jonastjepkema
    @jonastjepkema 2 ปีที่แล้ว +1

    Amazing! Hope your channel will eexplode soon!

    • @Mutual_Information
      @Mutual_Information  2 ปีที่แล้ว

      Lol thanks - it honestly doesn’t need to for me to keep going. This is a very enjoyable hobby

    • @Mutual_Information
      @Mutual_Information  2 ปีที่แล้ว

      But if you want to tell all your friends, I won’t stop you 😉

  • @wazirkahar1909
    @wazirkahar1909 2 ปีที่แล้ว +2

    Please please please keep doing this :)

  • @saeidhoseinipour3101
    @saeidhoseinipour3101 2 ปีที่แล้ว +1

    Another nice video. Thanks 🙏
    Please cover data science topics such as Clustering and Classification or applications like Textming, Recommender Systems, Image Processing and so on, with statistics perspective and linear algebra perspective.

  • @Blahcub
    @Blahcub 9 หลายเดือนก่อน +1

    This was a super helpful video thank you so much. I love this material and find it super fun.

    • @Mutual_Information
      @Mutual_Information  9 หลายเดือนก่อน

      Excellent - this one is a doozy so it's nice to hear when it lands

    • @Blahcub
      @Blahcub 9 หลายเดือนก่อน

      @@Mutual_Information There's a level of background information that takes a while to process, and even though you say it slow there may be some extra detail warranted. I had to pause a lot and think and rewind to fully grasp the details.

    • @Mutual_Information
      @Mutual_Information  9 หลายเดือนก่อน

      @@Blahcub That's good to know.. you are my ideal viewer :) thank you for your persistence

    • @Blahcub
      @Blahcub 9 หลายเดือนก่อน

      @@Mutual_Information In simpler terms, can't we just say PPCA is just PCA, but we model a distribution over the latent space and sample from that distribution?

    • @Mutual_Information
      @Mutual_Information  9 หลายเดือนก่อน

      @@Blahcub In all cases, we are considering a distribution over the latent space. PPCA is distinct in that we assume a constant noise term across dimensions, and that gives it some closed form results.

  • @michaelcatchen84
    @michaelcatchen84 ปีที่แล้ว

    Around 10:35 you skip over the posterior inference of p(z_i | x_i, W, mu, psi) and that it is also a normal distribution because the normal is a conjugate prior for itself. Would love to see this covered in a separate video

  • @tommclean9208
    @tommclean9208 2 ปีที่แล้ว +1

    is there any code that supplements your videos? i always find i learn easier by looking at an playing around with code :)

    • @Mutual_Information
      @Mutual_Information  2 ปีที่แล้ว +2

      Not in this one unfortunately. For this case, I'd check out the use case for FA from sklearn : scikit-learn.org/stable/modules/decomposition.html#fa
      If you look one level deep into their .fit() method, you'll see the SVD algorithm I reference in the vid.
      I have plans for more code examples in future vids

  • @matej6418
    @matej6418 8 หลายเดือนก่อน +1

    elite content, imho after the introduction I would love to see the content mainly, dunno if staying on screen makes the delivery better? whats the objective here ?

    • @Mutual_Information
      @Mutual_Information  8 หลายเดือนก่อน

      Appreciate the feedback. It's effectively a cheaper way of keeping the video lively without having to create animations, which take a long time. If I'm not on screen and I leave the level of animation the same, it's a lot of audio over still text, which I've separated heard makes people 'zone out'.
      This is also an older video. I really don't like how I did things back then. In the future, I'd like to mature into a more dynamic style.

  • @taotaotan5671
    @taotaotan5671 ปีที่แล้ว +1

    Hi DJ, awesome contents as always!!
    I find I can follow your notations much better than textbook notations. At 8:12, I believe the matrix W is shared across all individuals, while z is specific to each sample. It makes intuitive sense to call matrix W common factors, and call z loadings. However, the textbook (Penn State Stat505 12/12.1) seems to call W (in their notation L) factor loadings, while calling z (in their notation f) common factors.
    I am a little confused and I will appreciate it if you can take a look. Thank you again for the tutorial!

    • @Mutual_Information
      @Mutual_Information  ปีที่แล้ว +1

      Hey Taotao! I just checked this against Barber's book. It appears Stat505 is correct - W is called the "factor loading". I actually recall being confused by this too (and why I had to double check just now).. and all I can say is.. yea the naming is confusing. For me, I avoid the naming in general by just thinking of z as latent variable and W as parameters. I agree, this "factor loading" name is shit.

    • @taotaotan5671
      @taotaotan5671 ปีที่แล้ว

      @@Mutual_Information Thanks so much DJ! That clarifies.

  • @horacemanfred354
    @horacemanfred354 2 ปีที่แล้ว +1

    Great video. Could you cover the use of Energy Functions in ML?

    • @Mutual_Information
      @Mutual_Information  2 ปีที่แล้ว

      Maybe one day, but no concrete plans. Fortunately, there’s an excellent TH-camr who covers energy models really well : th-cam.com/video/y6WNrHskm2E/w-d-xo.html these would probably be a primary source if I were to cover the topic

    • @horacemanfred354
      @horacemanfred354 2 ปีที่แล้ว +1

      @@Mutual_Information Thanks. So funny, in the video the Alfredo Canziani says it has taken him years to understand energy functions. It appears it is about the manifold of the cost function and I understand it better now.

  • @prodbyryshy
    @prodbyryshy 3 หลายเดือนก่อน

    amazing video, i feel like i understand each individual step but im sort of missing the big picture

  • @timseguine2
    @timseguine2 11 หลายเดือนก่อน +1

    One question that came to mind, if you are trying to do factor analysis using an iterative method, are the PPCA ML estimates a good initial value?

    • @Mutual_Information
      @Mutual_Information  11 หลายเดือนก่อน +1

      Possibly.. but if you're going to accept the costs of computing those initial estimates, you might as well just do the FA routine? I don't think it would be worth it

  • @taotaotan5671
    @taotaotan5671 2 ปีที่แล้ว +1

    Does restricted maximum likelihood, a technique that is often used in mix effect model, may also apply in factor analysis?

    • @Mutual_Information
      @Mutual_Information  2 ปีที่แล้ว +1

      I don't know much about restricted max likelihood, but from what I've (just) read, it appears flexible enough to accommodate the FA assumption. Anytime you're estimating variance/covariance, you could use a low-rank approximation.

  • @muesk3
    @muesk3 2 ปีที่แล้ว +1

    Quite funny the difference in treatment of how FA is explained in statistics vs machine learning. :)

    • @gordongoodwin6279
      @gordongoodwin6279 2 ปีที่แล้ว +1

      I was literally thinking this. I honestly thought he was clueless for the first minute then realized it’s just a really interesting and different way to look at factor analysis than what it was originally intended to do (and the way it’s taught in most statistics and psychometrics texts). Great video

  • @njitnom
    @njitnom 2 ปีที่แล้ว +1

    i love you bro

  • @siddharthbisht1287
    @siddharthbisht1287 2 ปีที่แล้ว +1

    For anyone who is wondering about the Parameter formula :
    2D + D(D-1)/2 = D + D + D(D-1)/2
    D : dimension of the Mean Vector
    D : diagonal of the Covariance Matrix (Variance of every Random Variable)
    D(D-1)/2 : Covariance between any two Random Variables di and dj (di and dj

  • @Blahcub
    @Blahcub 9 หลายเดือนก่อน +1

    Isn't it a problem that in factor analysis, PCA and any dimensionality reduction done here is that it assumes a linear relationship?

    • @Mutual_Information
      @Mutual_Information  9 หลายเดือนก่อน +1

      Yea, definitely. The nonlinear version of this rely on the manifold hypothesis, which is akin to saying the assumptions of FA hold, but only *locally* and after nonlinear transformations.. and that essentially changes everything. None of the analytic results you see here hold and we have to resort to other things, like autoencoders.

  • @siddharthbisht1287
    @siddharthbisht1287 2 ปีที่แล้ว +1

    I have a couple of questions,
    1. What do you mean by "averaging out"?
    2. What difference does it make by switching the Covariance matrix from Psi to a Full Covariance Matrix WW* + Psi?
    Great video though !!

    • @Mutual_Information
      @Mutual_Information  2 ปีที่แล้ว +3

      Hey Siddharth, nice to hear from you. For “averaging out”, that was a bit of a hand wave to avoid mentioning the integration p(x) = integral of p(x|z)p(z)dz.. the way I think about that is it’s the distribution over x if you were to rerun the data generation process infinitely and ignore the z’s and ask what distribution over x that would create.
      For your second question, Psi is a diagonal matrix. So WW* + Psi isn’t diagonal but Psi is.

    • @siddharthbisht1287
      @siddharthbisht1287 2 ปีที่แล้ว

      @@Mutual_Information I wanted to understand the difference the change in covariance makes, why are we changing the covariance matrix?

    • @Mutual_Information
      @Mutual_Information  2 ปีที่แล้ว +2

      Hmm, if I understand the question, it’s because one way involves a lot few parameters. If you say your covariance matrix is WW* + Psi, then that covariance matrix is determined by D*L + D parameters. If it’s just the regular covariance matrix is a typical multivariate normal, then it’s number of parameters in the covariance is D + D*(D-1)/2.

    • @siddharthbisht1287
      @siddharthbisht1287 2 ปีที่แล้ว +1

      ​@@Mutual_Information Oh okay. So then L

  • @InquilineKea
    @InquilineKea 2 ปีที่แล้ว +1

    is this like fourier decomposition?

    • @Mutual_Information
      @Mutual_Information  2 ปีที่แล้ว +1

      Eh, I'd say not especially. It's only similar insofar as things are decomposed as the sum of scaled vectors/functions. Fourier series is specific to the complex plane, sin/cos. I don't see much of that showing up. Maybe there is a connection since the normal distribution concerns circles.. but I don't see it.

    • @abdjahdoiahdoai
      @abdjahdoiahdoai 2 ปีที่แล้ว

      i think you are thinking fourier decomposition is by fourier basis, in that sense. Maybe a SVD is what you are looking for

  • @DylanDijk
    @DylanDijk ปีที่แล้ว +1

    Is the log likelihood a concave function of \psi and w

    • @Mutual_Information
      @Mutual_Information  ปีที่แล้ว +1

      If you fix w, then the function is a concave func of psi.. and if you fix psi.. yes I bet it's also a concave function (because it's like doing linear regression). I'm fairly sure of this but not 100%.

    • @DylanDijk
      @DylanDijk ปีที่แล้ว

      @@Mutual_Information Ok thanks, I was asking because I wanted to know whether the EM algorithm is guaranteed to converge to the MLE of the FM. As the Em algorithm is guaranteed to increase the log likelihood at each step, I would assume that if it is concave then we should converge to the MLE. But from reading around, it seems that getting the MLE for the FM using EM is not guaranteed.
      Btw your videas are great!

    • @DylanDijk
      @DylanDijk ปีที่แล้ว

      I guess an important point is that if a function is concave in each of its variables keeping the rest fixed, the function is not guaranteed to be concave. So using what you said, we dont know if the log likelihood is a concave function of \psi and w.

  • @mrcaljoe1
    @mrcaljoe1 10 หลายเดือนก่อน +1

    1:18 when you have to use spit instead

  • @akhileshchander5307
    @akhileshchander5307 2 ปีที่แล้ว +1

    I came to this channel from your comment on another channel, I checked one-two minutes video and found this channel is interesting. My request is please make videos: on these "mathematical notations" you using, because my personal experience, there are many who don't understand this with these symbols, eg: what is the meaning of {x i T} N i =1? Thank

    • @Mutual_Information
      @Mutual_Information  2 ปีที่แล้ว +1

      Hey Akhilesh, I see what you're saying. I'm thinking of creating a notation guide - something like a 1-pager linked to in the description which would go over the exact notation.
      To answer your question {x_i^T: i = 1, ... N} just refers to the set of row vectors (x_i is assumed to be a column vector, so x_i^T is a row vector). The {...} is just set notation. It's just a way of saying.. this is the set of N row vectors.. and we'll indicate each one using the index i. So x_3^T is the third of N row vectors.

  • @gwho
    @gwho 10 หลายเดือนก่อน

    When discussing personality theory, big 5 (aka OCEAN) is superior to MBTI (meyers briggs type indicator) because big 5 uses factor analysis, whereas MBTI presupposes its 4 dimensions.
    Then when comparing MBTI to astrology, just laugh astrology out of the room

  • @EverlastingsSlave
    @EverlastingsSlave 2 ปีที่แล้ว +1

    you are doing so good work therefore i invite you to read Quran so that you are saved in afterlife
    stay blessed