Excellent video as always, thank you. I know I'm swimming against the tide here, but if features were truly disentangled, we would have far less need for ML in the first place - 'algorithms' can already detect/generate size, color, rotation etc. pretty well. It is when there is entanglement of features that they become so useful. So for me the holy grail isn't having tweakable *independent* properties - e.g. if I turn the dial to 'make the jaw bigger', I want that to affect the mouth and indeed the whole face, but in the right way. As I understood the paper (caveats here!), it shows that you could have jaw size perfectly on a dial, but when you introduce mouth shape that will intertwine with jaw and change it to a new model. Great! Done well (more caveats!) that's what I would want. Maybe disentangled VAEs, but write unpenalized hints right into the latent space (eg measured jaw size) - maybe the model uses this 'free information' to mostly encode that feature there, but the rest of it will still react to and control it. (I'm experimenting with this kind of thing now, and assume I'll run into many of the problems they state in the paper and I don't yet understand - yes it's inefficient, but still a path to understanding :) Anyway, I've learned a *ton* watching your videos. Obviously I still have a ways to go, but thank you again.
I think I’m missing something here. What gets learned? The encoder to vector of means and vars is one thing right? The decoder from the samples to the reconstruction is another, right? Are the distributions learned also while simultaneously encouraging them to be gaussian? Something is being encouraged to be gaussian by the KL term, right? Confused ... 🤯
Disclaimer: This is more an introduction to VAEs and disentanglement and not so much about the experimental part of the paper.
Excellent video as always, thank you.
I know I'm swimming against the tide here, but if features were truly disentangled, we would have far less need for ML in the first place - 'algorithms' can already detect/generate size, color, rotation etc. pretty well. It is when there is entanglement of features that they become so useful. So for me the holy grail isn't having tweakable *independent* properties - e.g. if I turn the dial to 'make the jaw bigger', I want that to affect the mouth and indeed the whole face, but in the right way. As I understood the paper (caveats here!), it shows that you could have jaw size perfectly on a dial, but when you introduce mouth shape that will intertwine with jaw and change it to a new model. Great! Done well (more caveats!) that's what I would want. Maybe disentangled VAEs, but write unpenalized hints right into the latent space (eg measured jaw size) - maybe the model uses this 'free information' to mostly encode that feature there, but the rest of it will still react to and control it. (I'm experimenting with this kind of thing now, and assume I'll run into many of the problems they state in the paper and I don't yet understand - yes it's inefficient, but still a path to understanding :)
Anyway, I've learned a *ton* watching your videos. Obviously I still have a ways to go, but thank you again.
nice points, thanks for commenting :)
Excellent summary man! Thanks!
Very informative, thank you!
I think I’m missing something here. What gets learned? The encoder to vector of means and vars is one thing right? The decoder from the samples to the reconstruction is another, right? Are the distributions learned also while simultaneously encouraging them to be gaussian? Something is being encouraged to be gaussian by the KL term, right? Confused ... 🤯
the gaussians are hard coded, only their means and variances are output by the encoder
Thanks for the video! Was just reading paper
thank you for this video!
Thanks! :)
awesome !