Small error at 18:20. In the performance improved version the responsibilities are wrong: I missed multiplying them with the class_probs. The code on GitHub has been updated. Thanks to @Anuj shah for pointing it out.
Haha :D I appreciate the feedback. Of course, I prepared the code. I did not code it from scratch. There was a finished version on my second monitor - but I did come up with the code by myself; I just prepared it before the video.
@Anuj shah made a contribution with a bivariate axis-aligned Gaussian Mixture Model, check it out: github.com/Ceyron/numeric-notes/blob/main/english/probabilistic_machine_learning/contrib/em_gmm_multivariate.py If you also have some nice extensions of the script, feel free to create a pull request on GitHub and I will merge it into the repo :)
Hi, I tried the code in em_gmm.py . here in the calculation of responsibilities, the class_probs are not used. Am I missing something.thanks. responsibilities = tfp.distributions.Normal(loc=mus, scale=sigmas).prob( dataset.reshape(-1, 1) ).numpy()
Yes, you are correct. That's indeed missing. But it should only be wrong in the faster version where I avoided the double for loop. Thanks for pointing it out :) I will update the file on GitHub later.
Amazing video! I have a question … and maybe I am missing something … but at the end, you basically rediscover the parameters for your Gaussian - but you already know theses a priori. So what’s the point? The GMM mixture component has the Gaussian parameters embedded into it, so it feels trivial that you would converged to these parameters when using data sampled from the GMM
Sure, the fact we kind of "rediscover" the parameters is purely educational. By creating a dataset out of a known GMM, I can be certain that is pure. For a real-world application, you would want to swap this dataset with a real one. A simple testing example in 2D would be the old-faithful eruption dataset (gist.github.com/curran/4b59d1046d9e66f2787780ad51a1cd87 ), which is also featured in Bishop's book. Once you start using real data, you also have to wisely select hyperparameters. In the case of the GMM, these are the number of clusters (= the number of Gaussians) you want to mix. That's not trivial, especially in higher dimensions.
@@MachineLearningSimulation thank you so much for this. I went to your GitHub page and got the code, and I think there may be something wrong. When I shift the true mu values from 2.5, 4.8 to 250 and 480, the normalization of the responsibilities failed with ‘nan’ I feel like I should be able to shift the centers of the true data without consequence I think we just need to improve the initial guess for this case, as it is between 0 and 1 now
@@nickelandcopper5636 Great observation. Indeed, the chosen implementation is numerically unstable if the initial guess is far away from the ground truth. That is because we are not using the log-probability to evaluate the responsibilities. You can check @Екатерина К comment for more detail. A way to somehow mitigate this issue is sieving, which is a heuristic from global optimization, check my video here: th-cam.com/video/Vj4b4xojPMw/w-d-xo.html
Yes, in more realistic scenarios you probably have to do this. However, equally important is also to vary the initial guesses to not get stuck in local optima (The optimization problem is non-convex), see also my video on sieving: th-cam.com/video/Vj4b4xojPMw/w-d-xo.html This video is of course somehow contrived, since we also have "perfect data" and we exactly know the number of clusters. Regarding the Bayesian Version: No, there is no video on this yet. I have it on my To-Do list, but I think other topics are more interesting to do first, but maybe in half-a-year or so, there could be one on it :) Stay tuned.
Hi based on your base code of univariate , I tried to replicate the Expectation Maximization for multivariate(bivariate Gaussian for this example) gaussian mixture model and it seem to converge well. If you don't mind then I can commit this script to your github, it may help others. Can you share your mail id , if that's ok?
Hey, that's amazing! :) I love it. Yes, I would definitely like to add it to the GitHub Repo. Can you create a pull-request on GitHub, then I can merge it.
Thank you a lot for this video! It's a pleasure to watch it) But I faced with a strange behaviour in the code: - when dataset length below then 10000 items - it works well, - but if more than 10000 at the 3d iteration class_responsibilities turns out array of NaN and (class_probs, mus, sigmas) also NaNs. Do you have some thoughts about this?
Hey :) many thanks for your feedback, that motivates me a lot. First: Nice that you played around with the code. That's amazing in order to better understand it. I really appreciate it. I tried to reproduce the problem, see here: user-images.githubusercontent.com/27728103/114820294-131e4f80-9dbf-11eb-88ec-703eec6aa5ce.gif But it seems to work for me. I hope that I got you correct that you changed the "n_samples" to something above 10_000? What NumPy version do you use? (check either by "pip freeze" or by "conda list") - Mine is 1.19.2
@@MachineLearningSimulation Oh, thank you for answering! :) Yes, you correctly understood my issue. So, "conda list" shows 1.20.2. And I also ran the code, as you did, with generated by tfp.distributions.MixtureSameFamily dataset and I received numbers (not nans). Comparison of generated vs my data shows that my data has numbers bigger than generated. And for example, if I divide my data by 3, I receive not nans. Thus it turns out the issue was not in n_samples, but in correct initialisation of mus. If I provide mus which close to the modes of data, the code will not return nans.
@@ЕкатеринаК-х8ъ7т Good job in figuring it out. Also, nice that you tried out the algorithm on your own data. Our problem was a little synthetic since we generated a dataset artificially. To answer your initial question on the thoughts: Mathematically, the EM for the GMM is guaranteed to converge (I will upload a video with respect to this in the following weeks), but this only holds true in infinite precision arithmetic, i.e. when not using floating point numbers. You correctly discovered that the class_responsibilities are NaN which is caused by the responsibilities being NaN. If your initial mus are very far away from the actual modes then your data becomes highly unlikely at first. By this I mean probabilities in the order of magnitude below 1.0e-300. You will therefore encounter floating point problems. TensorFlow Probability will set these values to NaN for you. The actual process is a little more complicated and has to do with the numerically stable way to evaluate normal distributions (I will also upload a video on this topic in the future). Long story short: As in most iterative methods, a good initial guess is extremely valuable. We will even see that the EM converges to many different local minima (also a video on this will be up in the coming weeks). A remedy for this would be the usage of "sieving".
Small error at 18:20. In the performance improved version the responsibilities are wrong: I missed multiplying them with the class_probs. The code on GitHub has been updated.
Thanks to @Anuj shah for pointing it out.
Are you a genius? The way you coded this from scratch and explained it is amazing!
Haha :D I appreciate the feedback.
Of course, I prepared the code. I did not code it from scratch. There was a finished version on my second monitor - but I did come up with the code by myself; I just prepared it before the video.
Great vid, super easy to follow and code along! Thanks
Thanks a lot for the kind words 😊 Appreciate it.
Many thanks for this work. I love you.
You're welcome :)
I'm glad you enjoy the videos. That motivates me a lot.
Danke Schon. Many thanks (:
Your contents are awesome. Keep up.
Many thanks for your feedback :)
Viele Grüße aus Deutschland ;)
@Anuj shah made a contribution with a bivariate axis-aligned Gaussian Mixture Model, check it out: github.com/Ceyron/numeric-notes/blob/main/english/probabilistic_machine_learning/contrib/em_gmm_multivariate.py
If you also have some nice extensions of the script, feel free to create a pull request on GitHub and I will merge it into the repo :)
Hi, I tried the code in em_gmm.py .
here in the calculation of responsibilities, the class_probs are not used. Am I missing something.thanks.
responsibilities = tfp.distributions.Normal(loc=mus, scale=sigmas).prob(
dataset.reshape(-1, 1)
).numpy()
HI After adding this line
for c in range(n_classes):
responsibilities[:,c]=responsibilities[:,c]*class_probs[c]
I am getting correct answer
Yes, you are correct. That's indeed missing.
But it should only be wrong in the faster version where I avoided the double for loop.
Thanks for pointing it out :)
I will update the file on GitHub later.
Code is already updated on GitHub. I used a similar fix but with a vectorized multiplication outside the for loop.
Thanks again for pointing it out :)
@@MachineLearningSimulation right it was missing only in vectorized form,thanks
@@MachineLearningSimulation That's great, thanks for the update!
Amazing video! I have a question … and maybe I am missing something … but at the end, you basically rediscover the parameters for your Gaussian - but you already know theses a priori. So what’s the point? The GMM mixture component has the Gaussian parameters embedded into it, so it feels trivial that you would converged to these parameters when using data sampled from the GMM
Sure, the fact we kind of "rediscover" the parameters is purely educational. By creating a dataset out of a known GMM, I can be certain that is pure.
For a real-world application, you would want to swap this dataset with a real one. A simple testing example in 2D would be the old-faithful eruption dataset (gist.github.com/curran/4b59d1046d9e66f2787780ad51a1cd87 ), which is also featured in Bishop's book. Once you start using real data, you also have to wisely select hyperparameters. In the case of the GMM, these are the number of clusters (= the number of Gaussians) you want to mix. That's not trivial, especially in higher dimensions.
@@MachineLearningSimulation thank you so much for this. I went to your GitHub page and got the code, and I think there may be something wrong.
When I shift the true mu values from 2.5, 4.8 to 250 and 480, the normalization of the responsibilities failed with ‘nan’
I feel like I should be able to shift the centers of the true data without consequence
I think we just need to improve the initial guess for this case, as it is between 0 and 1 now
@@nickelandcopper5636 Great observation. Indeed, the chosen implementation is numerically unstable if the initial guess is far away from the ground truth. That is because we are not using the log-probability to evaluate the responsibilities.
You can check @Екатерина К comment for more detail.
A way to somehow mitigate this issue is sieving, which is a heuristic from global optimization, check my video here: th-cam.com/video/Vj4b4xojPMw/w-d-xo.html
For this method, u need to do a grid search to find the best no of cluster
do u have the Bayesian version?
Yes, in more realistic scenarios you probably have to do this. However, equally important is also to vary the initial guesses to not get stuck in local optima (The optimization problem is non-convex), see also my video on sieving: th-cam.com/video/Vj4b4xojPMw/w-d-xo.html
This video is of course somehow contrived, since we also have "perfect data" and we exactly know the number of clusters.
Regarding the Bayesian Version: No, there is no video on this yet. I have it on my To-Do list, but I think other topics are more interesting to do first, but maybe in half-a-year or so, there could be one on it :) Stay tuned.
Hi based on your base code of univariate , I tried to replicate the Expectation Maximization for multivariate(bivariate Gaussian for this example) gaussian mixture model and it seem to converge well.
If you don't mind then I can commit this script to your github, it may help others.
Can you share your mail id , if that's ok?
Hey, that's amazing! :) I love it.
Yes, I would definitely like to add it to the GitHub Repo. Can you create a pull-request on GitHub, then I can merge it.
@@MachineLearningSimulation sure will do that
@@MachineLearningSimulation I created a pull request for the same.:)
@@anujshah645 It should be merged now :) Thanks again for the great work.
@@MachineLearningSimulation No Problem :)
Thank you a lot for this video! It's a pleasure to watch it)
But I faced with a strange behaviour in the code:
- when dataset length below then 10000 items - it works well,
- but if more than 10000 at the 3d iteration class_responsibilities turns out array of NaN and (class_probs, mus, sigmas) also NaNs.
Do you have some thoughts about this?
Hey :) many thanks for your feedback, that motivates me a lot.
First: Nice that you played around with the code. That's amazing in order to better understand it. I really appreciate it.
I tried to reproduce the problem, see here: user-images.githubusercontent.com/27728103/114820294-131e4f80-9dbf-11eb-88ec-703eec6aa5ce.gif
But it seems to work for me.
I hope that I got you correct that you changed the "n_samples" to something above 10_000? What NumPy version do you use? (check either by "pip freeze" or by "conda list") - Mine is 1.19.2
@@MachineLearningSimulation Oh, thank you for answering! :) Yes, you correctly understood my issue. So, "conda list" shows 1.20.2. And I also ran the code, as you did, with generated by tfp.distributions.MixtureSameFamily dataset and I received numbers (not nans).
Comparison of generated vs my data shows that my data has numbers bigger than generated. And for example, if I divide my data by 3, I receive not nans. Thus it turns out the issue was not in n_samples, but in correct initialisation of mus. If I provide mus which close to the modes of data, the code will not return nans.
@@ЕкатеринаК-х8ъ7т Good job in figuring it out. Also, nice that you tried out the algorithm on your own data.
Our problem was a little synthetic since we generated a dataset artificially.
To answer your initial question on the thoughts: Mathematically, the EM for the GMM is guaranteed to converge (I will upload a video with respect to this in the following weeks), but this only holds true in infinite precision arithmetic, i.e. when not using floating point numbers. You correctly discovered that the class_responsibilities are NaN which is caused by the responsibilities being NaN. If your initial mus are very far away from the actual modes then your data becomes highly unlikely at first. By this I mean probabilities in the order of magnitude below 1.0e-300. You will therefore encounter floating point problems. TensorFlow Probability will set these values to NaN for you. The actual process is a little more complicated and has to do with the numerically stable way to evaluate normal distributions (I will also upload a video on this topic in the future).
Long story short: As in most iterative methods, a good initial guess is extremely valuable. We will even see that the EM converges to many different local minima (also a video on this will be up in the coming weeks). A remedy for this would be the usage of "sieving".
@@ЕкатеринаК-х8ъ7т But again, thanks for the nice question :)
Here is the new video on sieving and analyzing the convergence: th-cam.com/video/Vj4b4xojPMw/w-d-xo.html