Deep Belief Nets - Ep. 7 (Deep Learning SIMPLIFIED)

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ต.ค. 2024

ความคิดเห็น • 95

  • @SanyamAgarwal94
    @SanyamAgarwal94 6 ปีที่แล้ว +46

    It's not clear why it's a solution to the vanishing gradient problem

    • @22mashups22
      @22mashups22 5 ปีที่แล้ว +6

      I'm guessing it's because every two layers in the DBN is a RBM itself and takes the output of the previous RBM and the global data into account, whereas the forward-propagation method only took the global data into account once, then worked meticulously on the results of each layer.

    • @jakevikoren
      @jakevikoren 5 ปีที่แล้ว +3

      By training the hidden layer to recreate the input layer, the system guarantees clean backwards data flow.

    • @vijayk7387
      @vijayk7387 5 ปีที่แล้ว +7

      Vanishing gradient problem occurs because of having to train using back propagation, leading to very low gradients on early layers. As DBN will only take two layers at once while training, it will kinda solve the issue.

    • @andresfelipemosqueramarin8160
      @andresfelipemosqueramarin8160 4 ปีที่แล้ว

      Probably you already know, but to those that might have the doubt, in a few words, by providing another method to train deep nets, different to gradient descent in backprop (even though you could apply backprop in the last layers of a DBN for fine-tuning in the supervised learning part).
      If you need more details, you could watch this video first th-cam.com/video/i64KpxyaLpo/w-d-xo.html&feature=share so, if needed, you can start your own investigation.

  • @DeepLearningTV
    @DeepLearningTV  8 ปีที่แล้ว +20

    This is one of my favorite nets - a DBN is a choice for general classification problems with 1000s of input features. Enjoy :-)!

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +4

      +DeepLearning.TV Also FYI, the next one is about Convolutional Nets, which I will post tomorrow.

    • @minhhoale07
      @minhhoale07 8 ปีที่แล้ว +4

      Your series is totally spot on in terms of explaining the technical concepts in Deep learning. This video is no exception, but saying DBN is a solution to the vanishing gradient problem is a fundamental mistake!
      It was LSTM units that targeted vanishing gradient problem. The key idea was to incorporate new nodes with linear operations (i.e. the gates) to the main architecture, so that information (gradient) can flow back via the gates if stuck at a node in the main net. We see recurring idea in ResNet architecture, which allows a CNN stacking over 100 layers to achieve state-of-the-art performance at ILSVRC 2015.
      DBN, however, serves as a good initializer. In concrete, there are actually 2 major issues in training deepnet, vanishing gradient AND highly non-convex objective function with lots of local optimal. The best training phase will find weights & biases which are solutions to the problem of maximizing/minimizing that objective function. However, a not-so-good initialization at the beginning of training phase, coupled with vanishing gradient, leads to "bad" solutions most of the time. It was not until DBN that there is a work-around to this problem. The weights and biases learnt from DBN are effectively at good starting values to train the supervised net properly (from DL review by Bengio)

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +2

      The vanishing gradient is something that causes gradients to decay as back-prop works back through the net. On the other hand, the problem of the vanishing gradient is that it causes nets to take too long to train and still be inaccurate. By side-stepping backprop for most of the training (except for supervised fine-tuning), the DBN does solve the problem of the vanishing gradient, which is the long training time and the inaccuracy.

    • @ashwinr007
      @ashwinr007 8 ปีที่แล้ว +2

      A great and informative series :)
      Thank You!

    • @tiphainechampetier651
      @tiphainechampetier651 6 ปีที่แล้ว +1

      I would say it does not solve the problem in itself because during finetuning you will still have vanishing gradient. However it bypasses the problem in the first part and as a results it heals the symptoms of vanishing gradient that are as you said long training time and the inaccuracy.
      I have the impression that ReLu units directly solves the problem as there are no, or less, gradient vanishing than with sigmoid activation function.

  • @wiz7716
    @wiz7716 5 ปีที่แล้ว +10

    still not really clear why DBN was the solution for vanishing gradient!

  • @abeaumont10
    @abeaumont10 6 ปีที่แล้ว +1

    Great series of videos

  • @sayajujur2565
    @sayajujur2565 7 ปีที่แล้ว +2

    I have trained DBN for my biometric data. I have used DeepLearning4J , it is a Java package for deep learning. Compared to TensorFlow , Kera, and Theano ,...they all have complex installtion instructions. I found this Java package is quite handy. Some problems with my test is that : classification accuracy varies from time to time. For example, first time shows 84% accuracy; second attempt shows 92% accuracy; another try , 76% . And, it seems SVM with kernel produces result much faster and more accurate.

    • @DeepLearningTV
      @DeepLearningTV  7 ปีที่แล้ว

      Thats good to know about DL4j - in what way is it handy compared to the others?

    • @sayajujur2565
      @sayajujur2565 7 ปีที่แล้ว +1

      1.) Installation of DL4J is easy compared to TensorFlow, Keras and Therano.
      I have installed Ubuntu to have those Libraries. Yet failed to install Keras (requires so many pre-requisites); installed Tensorflow but operating in Ubuntu is just not convenient for me.
      2.) Those libraries requires 64 bit operating system. also requires to have GPU. The DL4J , you can have it in 32 or 64 systems, it does not need GPU.
      3.) In DL4J, you can adjust hyper parameters in order to let it work for your own data. Codes in DL4J, they are improving according to people who maintain it.

    • @DeepLearningTV
      @DeepLearningTV  7 ปีที่แล้ว +1

      Got it. You can use TensorFlow and Theano without a GPU, except its gonna be very slow. Also, adjusting hyper-parameters is about how you design your code, as opposed to which library you use. Having said that, DL4j is indeed easier to set up and use as compared to others.

    • @sayajujur2565
      @sayajujur2565 7 ปีที่แล้ว +1

      I must use GPU, time is important in my research. my goal is to identify individuals with ECG. Some researchers predicted that ECG would be used as suppliment to fingerPrint and Palm Vein identifications; even more studies assume it will be used as single , de-facto identification method in the future. No matter what, it will be used at gates and entrances. I guess people can not wait there 30-40 seconds to be granted with an access.
      Yeah, DL4J is easier. I hope code there will be more stable (showing same result in multiple executions like this can of Coca cola tastes same with that can).

    • @DeepLearningTV
      @DeepLearningTV  7 ปีที่แล้ว +1

      Right - well - in that case item #2 in your list is not that important to you. In general, #2 is not really that big an issue as pretty much every practical application of deep nets requires parallel processing. And one of the main ways to get that is to use a GPU.
      About stability, do you mean a different score for a coke can in each evaluation run, or each training run?

  • @punkntded
    @punkntded 6 ปีที่แล้ว +2

    How does RBM resolve the vanishing gradient problem?

  • @paragyadav1268
    @paragyadav1268 5 ปีที่แล้ว

    Thank you so much for a covering a lot of topics in a compact series.
    And the sound at the start and end of the video is unpleasant especially if you are watching at 1.5x

  • @tudortolciu1396
    @tudortolciu1396 6 ปีที่แล้ว +2

    Hello DeepLearningTV! Thanks for the series, it's really helpful. I have a question regarding RBM's. How do you train them to reconstruct input? Say for an image recognition net, would you simply pass images as input in the net and adjust the RBM's step by step? If so how do you do that?

    • @DeepLearningTV
      @DeepLearningTV  6 ปีที่แล้ว

      Did you see the prior episode? Training RBMs is covered in detail there.

  • @itachi-senpaii
    @itachi-senpaii 6 ปีที่แล้ว +1

    nice series ... thanks

  • @chirazbenabdelkader7294
    @chirazbenabdelkader7294 6 ปีที่แล้ว +3

    1:00 "... their shallow counterparts ... " I assume you are talking about MLPs here. But are MLPs necessarily shallow?!

    • @DeepLearningTV
      @DeepLearningTV  6 ปีที่แล้ว

      Actually we are not referring to MLPs - MLPs are by definition deep. We are referring to how neural nets were before they became deep (and have been for the last decade or so).

  • @PraneethJujhavarapu
    @PraneethJujhavarapu 5 ปีที่แล้ว +1

    does this come under semi-supervised learning??

  • @aliboudjema78
    @aliboudjema78 4 ปีที่แล้ว +2

    i am looking for simple code of DBN using Python , where can i find it ?

    • @DeepLearningTV
      @DeepLearningTV  4 ปีที่แล้ว

      The RBM/DBN are Geoff Hinton's brain-children. So maybe look for implementations associated to him. I am sure you would find many Github repos that have DBNs implemented if you googled it as well.

    • @aliboudjema78
      @aliboudjema78 4 ปีที่แล้ว +1

      @@DeepLearningTV , does it work on no binary data ??

    • @DeepLearningTV
      @DeepLearningTV  4 ปีที่แล้ว

      @@aliboudjema78 "No reason to think it wouldn't work but hard to tell if it would."
      In quotes because thats generally true for most ML problems and it is certainly true for your question. You don't know if you don't try it.

  • @vasanthc55
    @vasanthc55 ปีที่แล้ว

    How can I use DBN for Image classification based on supervised learning? Where I have 5 classes labeled images, and each label has 1000 images with 224 224 sizes.

  • @patrickmatimbe18
    @patrickmatimbe18 5 ปีที่แล้ว +1

    thank you so much for the tutorial is really self explanantory

  • @hmzmrzx
    @hmzmrzx 8 ปีที่แล้ว +3

    When we use an RBM to autoencode an RBM with the same number of nodes (as in the video), shouldn't we get an identity mapping?

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +2

      +Hamza Merzic You would, if the weights and biases are the same. However, they are always randomly initialized which means they end up being different when training.

    • @hmzmrzx
      @hmzmrzx 8 ปีที่แล้ว +1

      +DeepLearning.TV Thank you for the answer and for the work you put up. Do you have some references for this video, or something to serve as a good first read?

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      +Hamza Merzic Glad you like the work! Also, unfortunately, for this clip, we don't have other references...outside of Geoff Hinton's videos on TH-cam which are heavily mathematical.

  • @ilevakam316
    @ilevakam316 8 ปีที่แล้ว +1

    Awesome job!

  • @kkochubey
    @kkochubey 8 ปีที่แล้ว +13

    Amazing! that is the solution. It totally makes sense. That is how humans learn. They see many things and it forms huge amount of unlabeled patterns in the head. And later they read about or name things (even once) and here you are :) it is knowledge.
    This is the moment I got it. Sure there more to it since input data from human sensors are different. So many different network tricks are expected but DBN is definitely the basics.

    • @Existentialkev
      @Existentialkev 8 ปีที่แล้ว +3

      +Kirill Kochubey I've been reading this book called the Master Algorithm by P. Domingos he explained that ultimately the issue with all neural nets is that after training they continue to spit out the same fixed function over and over... They lack compositional ability.

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +1

      +Kevin Moturi Interesting! Would you please explain further?

    • @Existentialkev
      @Existentialkev 8 ปีที่แล้ว +4

      +DeepLearning.TV Basically neural nets are incapable of combining pieces of information that may have never been seen together before. For humans composition is needed for commonsense reasoning. If I told you, "Mary ate her a shoe" you would know that is most likely false because humans don't eat non edible things although you may have never heard of the notion of person eating a shoe. Other ML systems can just chain the relevant rules but nets continue to churn out the same responses after training is complete.

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +2

      +Kevin Moturi Yea thats right - history repeats itself - thats a fundamental rule in neural nets (and machine learning in general). Encounter new patterns or combinations of existing patterns and they won't know what to do.

  • @yuanhuang2489
    @yuanhuang2489 7 ปีที่แล้ว +1

    Thanks a lot. Your video is quite inspiring! :-) Yet you said that the reasons why DBN works so well is highly technical. I would like to know these reasons. Is there any material related to this?

    • @DeepLearningTV
      @DeepLearningTV  7 ปีที่แล้ว +1

      Check out Geoff Hinton's MOOC on Coursera; he explains DBNs.

    • @yuanhuang2489
      @yuanhuang2489 7 ปีที่แล้ว +1

      Thanks a lot :-)

  • @cronosddd
    @cronosddd 8 ปีที่แล้ว +1

    Deep Belief Networks seem to be more practical to use but I still don't see them much in use. Any thoughts why ?
    Or am I looking in wrong places? :)

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +1

      +Abhishek Dikshit Mmmm - there are some instances where they are used. For example, I believe that Hinton's work with Google involves really large DBNs for image classification. For classification problems other than digital images, they are a great fit cuz you can get away with just a small labelled data set but I am not aware of specific examples. But you are right in that DBNs are not very popular.
      One reason is backprop with RELU - with the Rectified Linear Unit, the vanishing gradient is no longer an issue, and so you don't need to implement a DBN. That in itself means nothing because you have two approaches that are similar in effort/complexity. The difference comes to light when you consider current deep net architectures in practice; many of them have 100s of layers each with different functions (Convolution, dropout, pooling, activation etc), so layer-wise setup/pretraining becomes way more complex. Using backprop+RELU means you set the whole thing up layer by layer and train it one shot.

  • @bharat_arora
    @bharat_arora 6 ปีที่แล้ว +1

    Yup, this is my favorite.

  • @samohitovi4081
    @samohitovi4081 7 ปีที่แล้ว +1

    Could the auto encoding part work without only two layers at the same time? That is, could you stack the full network and let it auto encode itself? Is the problem again in the vanishing gradient and local maximums?

    • @DeepLearningTV
      @DeepLearningTV  7 ปีที่แล้ว +1

      Yes - check out the episode on Autoencoders - you don't need a forward and backward pass - you can do with 2 or more forward passes.

  • @sbylk99
    @sbylk99 8 ปีที่แล้ว

    Thanks! Great video!

  • @Freeak6
    @Freeak6 6 ปีที่แล้ว +2

    If I have a database of 1000s of images of cats and dogs, is a DBN gonna be able to identify my kitty from a new image (given at least one picture of her is already in the db) or is it just gonna tell me "this is a cat" ?

    • @DeepLearningTV
      @DeepLearningTV  6 ปีที่แล้ว +2

      Its gonna tell you its a cat.

    • @Freeak6
      @Freeak6 6 ปีที่แล้ว

      Ok thanks. Is there any approach, that could identify an object or person, with few samples per label ?

    • @DeepLearningTV
      @DeepLearningTV  6 ปีที่แล้ว +1

      Check out Facebook research; they must have some papers on the topic. They can label pictures with names.

  • @neeraj7193
    @neeraj7193 8 ปีที่แล้ว

    While training DBMs are the stacked RBMs trained in parallel? Ideally, the training should be sequential in each pass since each RBM is progressively getting better at regenerating the output of the previous RBM; right?

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      Well, you need the output of one RBM as input for the next. So they need to be trained in sequence.

    • @mauricet910
      @mauricet910 8 ปีที่แล้ว

      They can be trained in parallel. The way Hinton describes it, imagine you have n RBMs. You're able to train n/2 of them in parallel by doing the following: Train the 1st, 3rd, 5th etc in parallel, then train the 2nd, 4th, 6th, etc in parallel.
      According to Hinton, that's the right way of doing DBMs. He explains it in his lectures, which you can find on coursera.com or TH-cam. Check out the lectures about Boltzmann Machines.

  • @tonychen860
    @tonychen860 7 ปีที่แล้ว +1

    Thanks for your video!I have a question,I want to verify if the video facial image and the ID card facial image belong to the same person.Should I use RBN or CNN?Note that I only have 256 pairs of labeled images.

    • @DeepLearningTV
      @DeepLearningTV  7 ปีที่แล้ว +1

      Hard to say - for decisions like this, yours might be the project where conventional wisdom does not apply. I would say try both and let the metrics decide.

    • @tobiemmanuel8306
      @tobiemmanuel8306 2 ปีที่แล้ว

      5 years into the future 😁
      How did it go?
      IMO, I'd suggest
      Trying DBN or convolutional network Sir

  • @hassanhamdoun
    @hassanhamdoun 8 ปีที่แล้ว

    thanks very nice series. DBN is a stack of RBMs. so how would this work for unsupervised learning application of clustering for example? or would you please give examples on how DBN could be used for clustering or classification in unsupervised learning without labels?

    • @mauricet910
      @mauricet910 8 ปีที่แล้ว

      For 2d clustering, try learning a layer with 2 neurons and using them as (x,y) coordinates.

  • @satishjasthi2500
    @satishjasthi2500 7 ปีที่แล้ว +1

    Thanks a lot

  • @DLSMauu
    @DLSMauu 8 ปีที่แล้ว

    Hello, could you show a paper or some reference stating that the difference between a MLP and a Deep Belief network is only in the training? Thank you very much, awsome channel!

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +5

      Here is Goeff Hinton's DBN paper: www.cs.toronto.edu/~hinton/absps/fastnc.pdf

    • @DLSMauu
      @DLSMauu 8 ปีที่แล้ว +1

      Thank you!

  • @harshitagarwal5188
    @harshitagarwal5188 7 ปีที่แล้ว +1

    at 0.32 shouldn't be "as an alternative to random initialisation of parameters" instead of "as an alternative to back propagation"

    • @chirazbenabdelkader7294
      @chirazbenabdelkader7294 6 ปีที่แล้ว

      I don't think so. DBNs are an alternative to classical neural nets that are trained using Back Propagation. DBNs are trained using a clever training algorithm, as explained later in the video ...

  • @santhanalakshmism22
    @santhanalakshmism22 6 ปีที่แล้ว

    Is DBN a part of Deep learning algorithm ?

    • @DeepLearningTV
      @DeepLearningTV  6 ปีที่แล้ว

      Its one type of deep learning model.

    • @santhanalakshmism22
      @santhanalakshmism22 6 ปีที่แล้ว

      DeepLearning.TV why do we go for DBN specifically ?

    • @santhanalakshmism22
      @santhanalakshmism22 6 ปีที่แล้ว

      DeepLearning.TV Can DBN be used as unsupervised or supervised ?

    • @DeepLearningTV
      @DeepLearningTV  6 ปีที่แล้ว

      Cuz they help address the vanishing gradient problem. Take a look at videos 5, and 6 where this is explained further.

    • @DeepLearningTV
      @DeepLearningTV  6 ปีที่แล้ว

      Actually have you watched this video fully? That is explained here.

  • @张建伟-h7q
    @张建伟-h7q 8 ปีที่แล้ว +1

    thanks

  • @gik25
    @gik25 7 ปีที่แล้ว +1

    Mmm... the video starts to be not clear

  • @jingwenchen1774
    @jingwenchen1774 6 ปีที่แล้ว

    good!

  • @zes7215
    @zes7215 6 ปีที่แล้ว

    nst as sufficx or not, can say any nmw

  • @nzorach
    @nzorach 5 ปีที่แล้ว

    It doesn't tell us much to show us the same diagram of connected dots over and over again in every slide/frame.

  • @SergioArroyoSailing
    @SergioArroyoSailing 8 ปีที่แล้ว

    Start here

  • @malharjajoo7393
    @malharjajoo7393 5 ปีที่แล้ว +1

    Explanation is really unclear !

  • @LeoHusss
    @LeoHusss 7 ปีที่แล้ว +1

    Legends in portuguese Brazilian Please!

    • @DeepLearningTV
      @DeepLearningTV  7 ปีที่แล้ว

      Unfortunately, we don't have any staff that can do Portugese subtitles. However, we accept community contributions, if you are willing to help out :-)

    • @chirazbenabdelkader7294
      @chirazbenabdelkader7294 6 ปีที่แล้ว

      this should be an automatable task by now with state-of-the-art, deep learning based, speech recognition software (but I understand, it's probably not available for free ...)

  • @mtiffany71
    @mtiffany71 5 ปีที่แล้ว

    "Have you ever X before? If so, please comment and share your experiences." That is getting really tedious.

  • @shairozsohail1059
    @shairozsohail1059 2 ปีที่แล้ว

    This isn't a very good video, I got next to no information about functionality, structure, or even training inputs and outputs of a DBN from this.

  • @sreeshyamc.a.1749
    @sreeshyamc.a.1749 7 ปีที่แล้ว

    Don't always say to comment it's highly irritating.