Neural networks [5.1] : Restricted Boltzmann machine - definition

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ต.ค. 2024

ความคิดเห็น • 78

  • @peterd3125
    @peterd3125 10 ปีที่แล้ว +5

    great lecture Hugo, thanks for putting all this hard work into it - very well taught too!

  • @gautamkarmakar3443
    @gautamkarmakar3443 7 ปีที่แล้ว +4

    Used this lecture to understand lecture given by Jeffrey Hinton on NN course at Coursera, Thanks a ton, you saved me again.

  • @TheReficul
    @TheReficul 8 ปีที่แล้ว +2

    Thanks for the explanation on the energy function. Everything suddenly starts to make sense.

  • @yifanli2673
    @yifanli2673 8 ปีที่แล้ว

    I really enjoyed watching this video. As I'm working on a project about DBN, this video is very useful for me. Thanks.

    • @hugolarochelle
      @hugolarochelle  8 ปีที่แล้ว

      +Yifan Li Thanks for your kind words!

  • @JaydeepDe
    @JaydeepDe 8 ปีที่แล้ว +2

    Best lecture on RBM....Thanks Prof.

  • @RudramurthyV
    @RudramurthyV 10 ปีที่แล้ว

    @Jim O' Donoghue The numerator can be seen as exp(b+c) = exp(b).exp(c) (associativity of multiplications). When you apply this to the numerator it turns out to be the equation mentioned at 10:26.

  • @valken666
    @valken666 9 ปีที่แล้ว +6

    You're awesome for doing this.

  • @keghnfeem4154
    @keghnfeem4154 9 ปีที่แล้ว +33

    sorry do not understand.

  • @nigeldupaigel
    @nigeldupaigel 6 ปีที่แล้ว

    c transpose and b transpose are the biases for the hidden and visible nodes

  • @osamahabdullah3715
    @osamahabdullah3715 3 ปีที่แล้ว

    thank you so much, your lecture are awesome

  • @igorjouravlev2643
    @igorjouravlev2643 4 ปีที่แล้ว

    Very good explanation! Thanks a lot!

  • @dombat44
    @dombat44 ปีที่แล้ว

    Many thanks for the lecture, found it really useful. I'm a bit confused about the notation on the slide entitled Markov Network View though. Firstly, have you split the equation onto multiple lines just to make it a bit more readable? Or is it a significant that the unary factors are on different lines to the pairwise factors. Secondly, from my understanding of MNs, a distribution can be written as a product of the potentials defined by the cliques on the graph (up to a normalising constant). Since it's a pairwise MN I can see that the pair-wise factors are represented in the graph but I can't see where the unary factors are represented. What am I missing?

    • @hugolarochelle
      @hugolarochelle  ปีที่แล้ว

      Thanks for your kind words!
      Indeed, I split on different lines for readability, the line on which each term is doesn't matter.
      I agree that the MN representation doesn't make unary factors explicit, and I think that's the main benefit of the factor graph representation, which illustrates all potentials explicitly.
      Hope this helps!

    • @dombat44
      @dombat44 ปีที่แล้ว

      @@hugolarochelle great, thanks for clearing that up.

  • @王国鑫-m6t
    @王国鑫-m6t 9 ปีที่แล้ว

    What an awesome job! Thanks for Ur lecture

  • @stivstivsti
    @stivstivsti 7 ปีที่แล้ว

    plz give a link to next video, so we could understand what is this all for

  • @JimODonoghue
    @JimODonoghue 10 ปีที่แล้ว +1

    Don't really get why the numerator turns into a product @around 10.26...

  • @anchitbhattacharya9125
    @anchitbhattacharya9125 5 ปีที่แล้ว

    Awesome lecture! What softwares did you use for making this video?

    • @hugolarochelle
      @hugolarochelle  5 ปีที่แล้ว

      Thanks! I used Camtasia for mac for the recording. For my slides, I used Keynote and some free app for supporting the drawing on the screen (the one I used then isn't available anymore, but there are other equivalents available).

  • @XiaosChannel
    @XiaosChannel 9 ปีที่แล้ว +1

    I think the use of single alphabet characters in formulas really obfuscate the meaning. we have much larger screens now and nobody do these calculation by hand, so why can't we just use the full word, or at least shorten it in a more meaningful way? like instead of Bj Ck, do Bhi (ith bias of hidden unit) and Bvj(jth bias of visible unit) or something like what we do in programming, vu[i].bias and hu[j].bias

  • @simple_akira
    @simple_akira 8 ปีที่แล้ว

    love it !! good job Hugo :)

  • @brunocosta8974
    @brunocosta8974 9 ปีที่แล้ว

    Thanks for the lecture! Recommended!

  • @mahmoudalbardan2730
    @mahmoudalbardan2730 6 ปีที่แล้ว

    Thank you professor for this video. I have two questions:
    1-How to compute the distribution of the input vector x for instance p(1,0,1)?
    2- Is it possible to feed the RBM with a multivariate input vector i.e. possible value for each visible unit are {0,1,2,3}?
    thank you in advance

    • @hugolarochelle
      @hugolarochelle  6 ปีที่แล้ว

      Hi!
      For 1, see video 5.3 for computing p(x) (th-cam.com/video/e0Ts_7Y6hZU/w-d-xo.html)
      For 2, it is indeed possible to have units that aren't binary (e.g. categorical or multinomial). There is more than one way of doing this, not a single one.

  • @MLDawn
    @MLDawn ปีที่แล้ว

    p(x) is intractable, isn't it?

    • @hugolarochelle
      @hugolarochelle  ปีที่แล้ว

      It is if both the input layer and the hidden layer are large. But if one is small (e.g. ~20 units), then turns out we can compute the partition function in a reasonable amount of time.

  • @louatimohamedkameleddine6857
    @louatimohamedkameleddine6857 4 ปีที่แล้ว

    Thank you.

  • @rafaellima8146
    @rafaellima8146 9 ปีที่แล้ว

    Thank you very much.

  •  6 ปีที่แล้ว

    How can I decide a cut off point for RBM results in the case of unsupervised learning.?

    • @hugolarochelle
      @hugolarochelle  6 ปีที่แล้ว

      Great question! Unfortunately there is no universal answer. It depends on what you are doing. For instance, if you are training features for classification, then you should periodically check on how discriminative the features are for your task, even if that's on a small subset of data.

  • @randywelt8210
    @randywelt8210 8 ปีที่แล้ว

    10:07, I have a problem where to put Bayesian Networks and HMMs. Do they belong to unsupvised learning like in the video above , or supervised, or do they simply present an own category in Machine Learning?

    • @hugolarochelle
      @hugolarochelle  8 ปีที่แล้ว +1

      +Randy Welt Good question! Bayesian Networks and HMMs are in the family of directed graphical models, as opposed to undirected graphical models like the RBM. That's the main distinction.
      Note also that we could do either supervised or unsupervised learning, with either undirected or directed graphical models.

  • @janvonschreibe3447
    @janvonschreibe3447 5 ปีที่แล้ว

    I can't see what the vectors *c* and *b* are.
    I watched the videos of the series on autoencoders first and I understood them but I didn't watch the videos preceding this one. Did I miss something ?

    • @hugolarochelle
      @hugolarochelle  5 ปีที่แล้ว +1

      c and b are vectors of parameters, exactly like in autoencoders. In RBMs, they will be used differently than in autoencoders, but in both cases they are vectors of parameters.
      Hope this helps!

  • @MatthewKleinsmith
    @MatthewKleinsmith 8 ปีที่แล้ว

    Thank you, Hugo. Do you recommend any books on neural networks?

    • @hugolarochelle
      @hugolarochelle  8 ปีที่แล้ว +3

      Oh definitely checkout Goodfellow, Bengio and Courville's Deep Learning book: www.deeplearningbook.org/

  • @pi5549
    @pi5549 8 ปีที่แล้ว +1

    I am attempting to understand deep autoencoders. I've followed chapters 1 and 2. Can I omit chapters 3 and 4 (on CRFs)?

    • @hugolarochelle
      @hugolarochelle  8 ปีที่แล้ว

      Yes, you should be fine without 3 and 4.

  • @zejiazheng1573
    @zejiazheng1573 10 ปีที่แล้ว +2

    Good lecture! Can you post the slides somewhere? Thx :)

    • @hugolarochelle
      @hugolarochelle  10 ปีที่แล้ว +3

      All the slides, and the whole course in fact (with suggested readings and assignments) are available here:
      info.usherbrooke.ca/hlarochelle/neural_networks/content.html

    • @zejiazheng1573
      @zejiazheng1573 10 ปีที่แล้ว +1

      Hugo Larochelle Got it. Thanks again.

    • @liltlefruitfly
      @liltlefruitfly 9 ปีที่แล้ว

      Hugo Larochelle Hi Hugo I keep getting a timeout error for the link

    • @hugolarochelle
      @hugolarochelle  9 ปีที่แล้ว

      Yeah, my university is doing some maintenance this weekend. It will be back up on Monday, at the latest.

  • @quranicscience9631
    @quranicscience9631 5 ปีที่แล้ว

    good content

  • @minh1391993
    @minh1391993 8 ปีที่แล้ว

    Dear Hugo. I am implementing RBM but I find out that energy function of joint probability seems so confusing 6:10
    E(x,h) = - ( Wi,j * hj * xk + sum of all (dot product of visible unit value times bias) + sum of all (dot product of hidden unit value time biases).
    Therefore, how can we define Z since all of the value of visible and hidden units have been used in E(x,h) ???

    • @hugolarochelle
      @hugolarochelle  8 ปีที่แล้ว +1

      Z is defined as the sum of E(x,h), but over all possible values of x and h. There's otherwise no relationship between the x and h in E(x,h) and Z.

    • @minh1391993
      @minh1391993 8 ปีที่แล้ว

      @Hugo: Assuming that I train a network with X1(0,1,1,1) X2(0,0,0,1) then I get H1(0,0,0,0,1), H2(0, 1, 0, 0, 0) respectively. so is that Z = E(x1, h1) + E(x2, h2) right????

    • @minh1391993
      @minh1391993 8 ปีที่แล้ว

      btw I am using C++ to implement RBM, so I am sorry if I ask so many questions and bother you :)

    • @hugolarochelle
      @hugolarochelle  8 ปีที่แล้ว +1

      Ah, no! Z is the sum of *the exponential* of *all possible values of x and h*, not just the values of x seen in your training set. This is why we can't compute Z exactly in practice for even moderately large RBMs.

    • @minh1391993
      @minh1391993 8 ปีที่แล้ว

      According to "A practical Guide to training RBM", partition function Z is given by summing over all possible pair of visible and hidden vector: Z = sum(exp(-E(x,h)). So how can I understand "all possible values of x and h" since with a certain of training data we have only one vector x.
      I notice that "the reconstruction error is actually a very poor measure of the progress of learning" so if we can't compute Z exactly in practice, then what kind of measure can we use?

  • @jimmytsai8069
    @jimmytsai8069 11 ปีที่แล้ว +1

    good job!

  • @bowindbox1132
    @bowindbox1132 4 ปีที่แล้ว

    How is the energy function derived at @6:10?

    • @hugolarochelle
      @hugolarochelle  4 ปีที่แล้ว

      Good question! For this video, we only provide the formula. But in the following videos, we show some properties that can be derived, thanks to the particular formulation of the energy function. So hopefully that'll help understand why this particular energy function is used. Hope this helps!

    • @bowindbox1132
      @bowindbox1132 4 ปีที่แล้ว

      ​@@hugolarochelle Thanks I was wondering if it is the general outcome of simple MRF calculation of joint probability. Considering the bipartite nature of the graph, we can assume there are three two types of connections, hidden node to observed node, and output from hidden or input to hidden. So rest of the random variables are independent. So by sum of product, the product terms would only encode the joint probabilities. Does this make sense?

    • @hugolarochelle
      @hugolarochelle  4 ปีที่แล้ว

      @@bowindbox1132 Indeed an RBM is a type of MRF. It is one that follows a bipartite graph, binary random variables and with pairwise potentials that correspond to h_j x_k W_{j,k}, and unary potentials that are h_j b_j and x_k c_k.
      Hope this helps!

    • @bowindbox1132
      @bowindbox1132 4 ปีที่แล้ว

      @@hugolarochelle So we might assume potential of a node is just the value at that node. For example in input, so in forward pass say we are going from x_k to h_j via edge W_{k,j}. P(x_k, h_j) = Potential on x_k \times Potential to jump to h_j from x_k \times Potential on h_j = x_k \times W_{k,j} \times h_j. The relation would be reversed when we calculate the total energy of the system backward (that is while backpropagating.) Is this correct ?

    • @hugolarochelle
      @hugolarochelle  4 ปีที่แล้ว

      ​@@bowindbox1132 actually, the potentials can't given you directly these probabilities. Indeed, potentials in an MRF aren't necessarily normalized, as probabilities need to be.
      I can't write much on how RBMs and MRFs are related, but I've found this medium article that seems to discuss the relationship, and which perhaps will be useful to you: medium.com/datatype/restricted-boltzmann-machine-a-complete-analysis-part-2-a-markov-random-field-model-1a38e4b73b6d

  • @chiru6753
    @chiru6753 5 ปีที่แล้ว

    what is data dependent regularizer??

    • @hugolarochelle
      @hugolarochelle  5 ปีที่แล้ว +1

      Good question! A "normal" regularizer (like L2 weight decay) will not depend on the input data distribution (L2 weight decay penalizes the sum of the squared parameter values). In contrast, a data dependent regularizer would be one that penalizes certain parameter values but through a term that depends on the input distribution (i.e. the set of x^{(t)} in your dataset).
      Hope this helps!

  • @bahareh_mtl3472
    @bahareh_mtl3472 7 ปีที่แล้ว

    Hi Hugo,
    The way of your teaching during these lectures is so fascinating. So, thanks a lot.
    I was wondering if you have also some code sources of implementing RBMs using python.
    I know that scikit learn has already provided an example of how to use it (scikit-learn.org/stable/auto_examples/neural_networks/plot_rbm_logistic_classification.html) , but I am looking for examples using Theano, or Keras or Tensorflow backend. So, I would really appreciate if you share some examples in implementing RBM.
    Thank you

    • @hugolarochelle
      @hugolarochelle  7 ปีที่แล้ว

      Thanks for your kind words!!
      For Theano: deeplearning.net/tutorial/rbm.html
      For TensorFlow, I don't have any particular recommandation, but I'm sure by googling "TensorFlow RBM" you'll find plenty :-)

  • @abdirahmanhashi5800
    @abdirahmanhashi5800 6 ปีที่แล้ว

    E = energy? i though it is an error function, Helped me a a lot though.

    • @MrCmon113
      @MrCmon113 4 ปีที่แล้ว

      In this case it's the same, because we want to minimize the energy.

  • @osamahabdullah3715
    @osamahabdullah3715 4 ปีที่แล้ว

    where can I find this slides plz ?

    • @hugolarochelle
      @hugolarochelle  4 ปีที่แล้ว

      Here: www.dmi.usherb.ca/~larocheh/neural_networks/content.html
      Cheers!

  • @ghosh5908
    @ghosh5908 4 ปีที่แล้ว

    actual content starts from 3:14

    • @nmana9759
      @nmana9759 3 ปีที่แล้ว

      thank you

  • @bingeltube
    @bingeltube 6 ปีที่แล้ว

    Recommendable

  • @rayeric6323
    @rayeric6323 6 ปีที่แล้ว

    Can anyone tell me what is Energy function?

    • @hugolarochelle
      @hugolarochelle  6 ปีที่แล้ว +2

      Don't get bogged down by the name. It's just a function. The only reason we call it an energy function is due to the analogy from physics. In physics, a configuration of the environment that has high energy will have low probability of being observed (and this is probably not a super accurate statement on my part... I'm not a physicist :-) ).
      In an RBM, it's the same: configurations of x and h which have a high energy (i.e. E(x,h) is high), will have low probability under the RBM model.
      Hope this helps!

  • @ahmedabdelfattah443
    @ahmedabdelfattah443 9 ปีที่แล้ว

    Thanks ,
    I really learned a lot from you , hope you focus more on definitions and use less Math