Neural networks [7.7] : Deep learning - deep belief network

Hugo Larochelle

มุมมอง 39 944

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 26 ต.ค. 2024

ความคิดเห็น • 46

@affankhan5525 9 ปีที่แล้ว ⁺⁶
Thanks for taking time to prepare and put it here for everyone.
@ledilnudil4256 9 ปีที่แล้ว ⁺³
Thanks for the video, it is by far the clearest explanation I could find on internet on DBN.
@hugolarochelle 9 ปีที่แล้ว ⁺²
Ledil Nudil Oh wow! Thanks for your very nice words :-)
@sungsoolim8066 8 ปีที่แล้ว
Hello, Mr. Hugo.
I have two quick questions.
1. After initializing the weights of the 1st two layers with RBM, we update the weights of 2nd two layers with RBM as well.
In this case, when we train the weights of the 2nd two layers, the weights of the 1st two layers are remained or changed?
2. When we do fine-tuning (supervised learning between the last hidden layer and the output layer), all the weights of previous layers are remained or updated?
Thank you for the high quality lectures!
@Arslanqadri 8 ปีที่แล้ว
In the first step the first two layers(input- h1) are trained as a RBM, in the next step we train h1- h2 as a RBM, while input-h1 now behaves as a SBN, and so on. When you say you are generating samples of observations (in the first half of the video, when you talk about Gibbs sampling) , do you mean that the input is being reconstructed as a part of this network training? Thanks.
@hugolarochelle 8 ปีที่แล้ว
Good question! That process I'm describing is only to sample from a DBN, but it's not part of the training algorithm.
@saisagar3245 9 ปีที่แล้ว ⁺¹
Hi Hugo,
I am confused about the essence of Directed Network? Why did DBN follow the specific approach (Top Level RBM, remaining Layers are Directed)? Can you explain this?
@hugolarochelle 9 ปีที่แล้ว ⁺⁴
Hi! The motivation came from the RBM stacking, pretraining procedure. As I explain in the second part of the video, it can be understood as improving the prior over the current top hidden layer, each time we add another layer. And when we do this, e.g. when we split into p(x|h^1) p(h^1) when we stack the 2nd hidden layer, then at this point we're now assuming that we have a model for p(x|h^1) and anoter model for p(h^1). This implies that we have a directed model for the interaction between x and h^1.
Hope it's a bit more clear!
@saisagar3245 10 ปีที่แล้ว
Hi Hugo,
If you are taking questions still in this video, I have a doubt.
If we are training DBNs via an RBM, wouldn't it be Deep Boltzmann machine? What is the exact difference between DBNs and DBMs? Can I know a usecase for DBN so that I could distinguish it from Deep Boltzmann Machine
@hugolarochelle 10 ปีที่แล้ว ⁺⁵
You can stack RBMs to pre-train a deep Boltzmann machine (DBM) too, however, the pre-training procedure is slightly different (I suggest you look at the DBM paper to see how exactly: www.cs.toronto.edu/~fritz/absps/dbm.pdf).
More importantly, DBNs and DBMs differ in how data is assumed to be generated, under each model.
A DBN is somewhat of a weird beast: it's a hybrid between a directed and undirected graphical model. It can be understood as a sigmoid belief network, whose prior over the top hidden layer is an RBM model (itself containing its own hidden layer). Specifically, we generate data from a DBN by first doing several steps of Gibbs sampling over the DBN's top two layers (i.e. the layers of the top RBM), and then sampling down towards the input layer, in one pass.
On the other hand, to sample from a DBM, we must perform Gibbs sampling over *all* hidden layers. That's because it's entirely an undirected graphical model. One way of doing that is to alternate between sampling the odd-numbered and even numbered layers of the network. This means that sampling also contains bottom-up processes.
I hope this helps!
@rcerri 8 ปีที่แล้ว
Hi Hugo! Very good lectures!
I got I little bit confused. You say at 3:38 that sampling from the model requires a lot of gibbs sampling at the top two layers. But you got h^3 only after stacking the previous two RBMs, right? Thus, does this gibbs sampling performed in training phase, only after training the two previous RBMs?
Also, if I understood, you are defining a DBN as a model to reconstruct a sample, right? Given that, how the reconstruction is performed? Sigmoids from x to h^1, then from h^1 to h^2, then reconstruction using top RBM, then sigmoid back from h^2 to h^1, and from h^1 to x?
Thanks,
Ricardo
@hugolarochelle 8 ปีที่แล้ว ⁺³
Good questions!
Actually, what I'm describing here isn't about training. It's about what the assumptions this model is making, regarding how the data was generated. In other words, I'm describing how to generate observations from it, given its assumptions about the distribution of all layers.
Also, a DBN isn't a "model to reconstruct a sample". It's a generative model, that makes certain assumptions about the form of the data distribution p(x). More fundamentally, how you train a model is a separate matter from *what* the model is (e.g. for a given model, one could imagine multiple ways of training it). In this video, I describe one way of training it, by greedily stacking RBMs. Better methods exist in the literature however (e.g. the contrastive wake-sleep algorithm).
Hope this helps!
@rcerri 8 ปีที่แล้ว
Thank you again Hugo!
@nawarhalabi710 10 ปีที่แล้ว
Thanks Hugo
A question. Why Sigmoid? I have found answers that talk about the Sigmoid being analytical and allowing non-linear boundaries. Are there other more specific properties for a sigmoid to be chosen?
@hugolarochelle 10 ปีที่แล้ว ⁺¹
For the RBM on top, the sigmoid just falls out of the definition of the joint. You could define other types of joints which would give other forms of conditionals, but I'm not aware of any good reason to use something else in general. If you wanted to have units that are in {-1,1} however, you might define a different RBM which would end up having conditionals that look like tanh units.
As for the sigmoid belief net part, there really is no fundamental reason to use the sigmoid other than any other activation that is between [0,1], as the model of the conditional probability of units. You might be able to argue that that corresponds to a maximum entropy distribution. Also, gradients with binary sigmoidal units are simple. But that's pretty much it.
@po-yupaulchen166 9 ปีที่แล้ว
Hi Hugo,
One question about the joint distribution of DBN. In two hidden layer DBN case, why is the joint distribution p(x, h^1, h^2) = p(h^1, h^2) p(x|h^1)?
I mean that p(x, h^1, h^2) = p(h^1, h^2) p(x| h^1, h^2) without considering the conditional independence. But why x and h^2 are independent conditional on h^1 such that p(x|h^1, h^2) = p(x|h^1) in DBN case? In fully directed model, I know I can utilize D-separate technique to find the conditional independence. However, in a mixture of directed and indirected model (DBN), how to I expoit the conditional independence? Thank you.
Paul
@hugolarochelle 9 ปีที่แล้ว ⁺¹
Po-Yu Paul Chen Good question!
First, let me mention that D-separation provides rules that apply to directed graphical models only, so it can't be used here to infer conditional independence.
Now, p(x, h^1, h^2) = p(h^1, h^2) p(x|h^1) is essentially what we start with, i.e. how it the joint distribution of a DBN was defined, from the very beginning. The graph was generated based on that assumption.
I think the best way to understand how p(x, h^1, h^2) = p(h^1, h^2) p(x|h^1) is reflected in the model is to think about the generative story behind the model. When generating x, we saw that we only use h^1. So just from that, you can conclude that p(x|h^1,h^2) = p(x|h^1).
Hope this helps!
@po-yupaulchen166 9 ปีที่แล้ว
Thanks for your explanation, Hugo.
It helps a lot.
@zx3215 5 ปีที่แล้ว
Thanks for the video. Very impressive, but I guess it is still too much complicated for me. Yet. I'm just working on simple deep FF nets and dropout...
@yassinemajana6426 7 ปีที่แล้ว
thanks for the video , i want to ask you a question . What are the advantages and disadvantages of DBN ?
@TheBigeStache 9 ปีที่แล้ว
Hey there,
I just wanted to ask how the higher layers are trained. These directed edges are confusing me. Do the RBMs in higher layers still train on top of the data representations of the RBMs in previous layers and then thereafter
the fine tuning is done with the Up-Down algorithm.
So the Sigmoid Belief Network in conjunction with the RBM is something like the data generation mode of the DBN?
Thx,
@hugolarochelle 9 ปีที่แล้ว
Nima Mousavi Yep, you got it right! Good job :-)
@TheBigeStache 9 ปีที่แล้ว
Thanks to your good lectures.
This is a big help for me.
Somehow you manage to explain
all these concepts and mathematical
derivations in a way that makes it
easy to follow.
@minh1391993 8 ปีที่แล้ว
could u please tell me why we should use DBN?
Is it better than CNN in term of supervise learning??
If we complete training a DBN network, what can we do further with trained DBN? auto-encoders and auto-reconstruct?
@hugolarochelle 8 ปีที่แล้ว ⁺¹
Good question. DBNs and CNNs are about 2 different ideas. CNNs are about exploiting the spatial topology of pixels and incorporating invariances that make sense for images. DBNs are about using unsupervised learning to learn better features. So you could even imagine combining both ideas, and have a convolutional DBN (in fact, there are papers about that).
All of this said, currently, for image data, CNNs are usually much better than DBNs. Indeed, the parameter sharing and translation invariance of CNNs seems to be very effective at improving generalization, compared to unsupervised learning.
@hugolarochelle 7 ปีที่แล้ว
the same as the different between an RBM and a DBN essentially :-)
@MLDawn 4 ปีที่แล้ว
Unfortunately, this needs more breaking down ...
@eqtidarma4726 ปีที่แล้ว
merci
@guoguozhangqing 9 ปีที่แล้ว
Hi Hugo, it would be great if you can incorporate some coding examples in you next version of lecture! I think you can simply use the code from the tutorials on deeplearning.net, walk through the parts that covered by the particular lecture and run a demo. It will help us understand the materials better, and pick up theano.
@hugolarochelle 9 ปีที่แล้ว
Thanks for the suggestion! In the live course I give, there is an assignment that covers stacked RBMs for instance (the assignments can be consulted here: info.usherbrooke.ca/hlarochelle/neural_networks/evaluations.html).
But I take good note of your suggestion!
@guoguozhangqing 9 ปีที่แล้ว
Hugo Larochelle Thanks for pointing me to the materials! It looks very helpful!
@anuragharidasu5746 8 ปีที่แล้ว
hi Hugo while training RBM do we need to provide the feature values as binary or can it be normal integers
@hugolarochelle 8 ปีที่แล้ว ⁺¹
+anurag haridasu if you are talking about the input layer, and for a binary RBM, it is common practice to use the real value of the inputs, normalized between 0 and 1.
@dhawalarora207 8 ปีที่แล้ว
Hey. Thanks for a great series. Just one question. Where do i find a good/simple explanation of Up Down algorithm (i've seen Hinton's faster learning algorithm paper) or could you do a related video?
@hugolarochelle 8 ปีที่แล้ว
+Dhawal Arora That's a good question. Unfortunately I am not aware of any such resources... I think the reality is that people aren't really using this algorithm now...
@dhawalarora207 8 ปีที่แล้ว
+Hugo Larochelle Okay. Up/Down isn't used or Fine tuning in DBN isn't used in general? If its used, how's it done, backpropagation?
@hugolarochelle 8 ปีที่แล้ว
+Dhawal Arora I mean that Up/Down isn't used in the context of initializing and ultimately fine-tuning a deep network for supervised learning. Going straight from layer-wise pretraining to supervised fine-tuning with backprop works fine.
@dhawalarora207 8 ปีที่แล้ว
Okay. Thanks anyway! Your notes are great too :)
@harshitagarwal5188 7 ปีที่แล้ว
If Deep Belief Network only contains the first 2 layers as RBM then why we say them as stack of RBMs. Also here(deeplearning.net/tutorial/DBN.html) Deep Belief Network is decribed as combination of RBM layers.
@hugolarochelle 7 ปีที่แล้ว ⁺¹
DBNs are pre-trained by stacking RBMs. Some people sometimes don't fine-tune the DBN and use the weights from the stacking procedure, sure. But even on that page you mention, you'll notice in the diagram that the bottom layers aren't RBMs, as they have different weights that go both top down (the DBN weights) and bottom up (in dotted lines, which are the approximate inference weights).
Hope this helps!
@saurabhmehta7681 8 ปีที่แล้ว
Hi Hugo,
I am confused between deep autoencoders and deep belief networks. If you go to this link: www.cs.toronto.edu/~rsalakhu/code.html where Geoffrey Hinton and Ruslan Salakhutdinov have provided their Matlab implementations, and if you go to the "Deep Belief Networks" link, you see the implementation of deep autoencoders. Also, in the slides provided by Prof. Tom Mitchell of CMU here: www.cs.cmu.edu/~tom/10701_sp11/slides/DimensionalityReduction_03_31_2011_ann.pdf , you will find that he shows the same diagram of deep autoencoders that you have, but under the heading "Deep Belief Networks". I am confused how to go about implementing deep belief network( my aim is to train it for speech, for learning purposes). Your suggestion here will be of significant importance to me.
Thanks.
@hugolarochelle 8 ปีที่แล้ว
The reason is that Geoff and Russ have used the same RBM-based pretraining procedure to also train a deep autoencoders. That is otherwise pretty much the only relationship between DBNs and Deep Autoencoders.
Hope this helps!
@saurabhmehta7681 8 ปีที่แล้ว
+Hugo Larochelle Thanks a lot for the clarification! So which one would you prefer for speech recognition?
@hugolarochelle 8 ปีที่แล้ว
DBNs (actually just the stacked RBM pretraining part) have been successful for speech. You can learn more about that here:
static.googleusercontent.com/media/research.google.com/en//pubs/archive/38131.pdf
That said, I think more recently people have found it unnecessary to do pretraining for speech, essentially because there's so much data that the regularization offered by RBMs isn't needed.
Also, you might want to checkout CNTK:
thenextweb.com/microsoft/2016/01/26/microsofts-deep-learning-toolkit-for-speech-recognition-is-now-on-github/#gref
@saurabhmehta7681 8 ปีที่แล้ว
+Hugo Larochelle Thanks a lot!
@MartinThoma 10 ปีที่แล้ว ⁺⁵
The link in the video is www.cs.toronto.edu/~hinton/adi/

ต่อไป

เล่นอัตโนมัติ

Neural networks [7.8] : Deep learning - variational bound