It would be interesting if you show na execution example of the RBM in a small dataset. Anyway, thank you for the explanation. Keep up the great work! =)
The step at 11:20 seems a little hand-wavy. I don't see how it follows that the fraction is p(h_j | x) just because fraction describes a probability distribution. How do I know it describes the distribution p(h_j | x)? You say it *must* be so. But why?
Good question! It's hard to give the derivation here, but we know that p(h_j=1|x) = sigmoid(W_{j,.} x + b_j) and that p(h_j=0|x) 1-sigmoid(W_{j,.} x + b_j). Then, if you do the exercise of calculating the expression I highlight at 11:20 for h_j = 1 and for h_j=0, you'll see that they match. Hope this helps!
Hugo Larochelle thanks for the tip. I'll definitely check that out. I really enjoyed watching your lectures, you have this extraordinary gift of explain things clearly. So sad I cant speak French.
Sure! Once you've computed the value of the sigmoid (let's call that value p), you sample the value of the corresponding unit by sampling a real number between 0 and 1 from a uniform distribution, and if that number is smaller than p, then you set the unit to 1. Otherwise, you set it to 0. Hope this helps!
Is this explanation based on the assumption that both, h and x are either 1 or 0? I understand that they can´t be a discrete distribution of values between 1 and 0?
Good question! Yes the explanation is specific to the case where the values of x and h are 0 or 1. But it would be possible to derive a version where x and h takes any continuous value between 0 and 1 (the derivation is just a bit more complicated, requiring integrals). Hope this helps!
Hi, Hugo Larochelle, thanks for your video. I am confused what is the different between the sum of the j(numerator) and the sum of the h'_j(denominator). Could you explain it?
I strongly recommend you check out his lectures on CRFs. But basically, you have to sum over all possible values of all of the hidden units, because there are simply those many hidden units involved.
Hugo, Can you also share the assignments/exams associated with the course ? It would help me calibrate how much of the material I have correctly assimilated.
You can skip these lectures if you know this material already. Otherwise, you cannot code anything on your own. You can always look for implementations by others though, but most probably you won't get a deep understanding of the subject that way.
Thanks Hugo ! Out of many available videos on this topic, yours is the most lucid and easy to follow. Big help for me !
Thanks for your kind words!!
It would be interesting if you show na execution example of the RBM in a small dataset. Anyway, thank you for the explanation. Keep up the great work! =)
Thanks a lottt Hugo! I have become a great fan of your works!
really nice vids! i like your slide style
Many thanks for the detailed explanation!
Very good explanation. Thanks a lot!
Hi Hugo!
Amazing video
Can you please help me with the derivation of 4:42 ?? if there is a video supported or a document?
Thanks
Thank you for the video it is very helpful!
Hi, Could you provide the HDBRM experiment code in the paper named "Classification using DRBM"? I try to recreat experiment and get stuck in HDRBM.
The step at 11:20 seems a little hand-wavy. I don't see how it follows that the fraction is p(h_j | x) just because fraction describes a probability distribution. How do I know it describes the distribution p(h_j | x)? You say it *must* be so. But why?
Good question! It's hard to give the derivation here, but we know that p(h_j=1|x) = sigmoid(W_{j,.} x + b_j) and that p(h_j=0|x) 1-sigmoid(W_{j,.} x + b_j). Then, if you do the exercise of calculating the expression I highlight at 11:20 for h_j = 1 and for h_j=0, you'll see that they match.
Hope this helps!
thank you Hugo, it helps a lot.
would you also do some cast in RNN?
Unfortunately no :-( Maybe I'll make some some day. In the mean time, I'd consider reading this:
www.iro.umontreal.ca/~bengioy/dlbook/rnn.html
Hugo Larochelle thanks for the tip. I'll definitely check that out.
I really enjoyed watching your lectures, you have this extraordinary gift of explain things clearly.
So sad I cant speak French.
Ji Feng Thanks, I really appreciate your kind words :-)
Link to RNN resource is down.. any other suggestion!?
that's very very detailed.
which book did you follow?
Thanks for your kind words! I didn't follow any book actually :-)
@@hugolarochellethanks to you man!
Hello, can you pls explain how the hidden layer is updated as 0 or 1 after obtaining the probability (operating with the activation function) ?
Sure! Once you've computed the value of the sigmoid (let's call that value p), you sample the value of the corresponding unit by sampling a real number between 0 and 1 from a uniform distribution, and if that number is smaller than p, then you set the unit to 1. Otherwise, you set it to 0.
Hope this helps!
Is this explanation based on the assumption that both, h and x are either 1 or 0? I understand that they can´t be a discrete distribution of values between 1 and 0?
Good question! Yes the explanation is specific to the case where the values of x and h are 0 or 1. But it would be possible to derive a version where x and h takes any continuous value between 0 and 1 (the derivation is just a bit more complicated, requiring integrals).
Hope this helps!
@@hugolarochelle thanks heaps for answering, and so quickly!
Hi, Hugo Larochelle, thanks for your video. I am confused what is the different between the sum of the j(numerator) and the sum of the h'_j(denominator). Could you explain it?
never mind, I figured out lol
Good job! :-)
OMG THANK YOU!!!!!
My pleasure :-)
@16:38 denominator is bit confusing. I mean why we find neighbors of z instead of z' in the denominator factor fuction?
There's no "reason". This is simply a statement of what the local Markov property is. In other words, this is how it is defined.
Thank you :)
Can you explain @7:00 how the nested sum of hidden units happened?
I strongly recommend you check out his lectures on CRFs. But basically, you have to sum over all possible values of all of the hidden units, because there are simply those many hidden units involved.
@6.44 at the first line the denominator should probably be sum(p(x|h')), not sum(p(x, h')).
Nope, it's indeed \sum p(x,h'). That is because we have p(h|x) = p(x,h) /p(x), and p(x) = \sum_{h'} p(x,h').
cool!
Hey,
could you upload the presentations?
Sure! Everything is here: info.usherbrooke.ca/hlarochelle/neural_networks/content.html
Hugo Larochelle
Thank you !
Hugo,
Can you also share the assignments/exams associated with the course ? It would help me calibrate how much of the material I have correctly assimilated.
you'll find 3 assignments here: info.usherbrooke.ca/hlarochelle/neural_networks/evaluations.html
thank you !
45 minutes of match and proof, no exercise, no code so far.
Not saying I could do better, but maybe someone could.
You should understand this don't think any language could hold on in the future if you know this you won't have to worry in the future
You can skip these lectures if you know this material already. Otherwise, you cannot code anything on your own. You can always look for implementations by others though, but most probably you won't get a deep understanding of the subject that way.