An Old Problem - Ep. 5 (Deep Learning SIMPLIFIED)

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ต.ค. 2024

ความคิดเห็น • 144

  • @DeepLearningTV
    @DeepLearningTV  9 ปีที่แล้ว +16

    This clip explains why deep neural nets are so hard to train. If you've used backprop before, you would relate to this. Enjoy :-)!

    • @er1shivam_sings
      @er1shivam_sings 7 ปีที่แล้ว

      Can you tell how do you make these ppt's..? Which software you use?

    • @jillting1852
      @jillting1852 3 ปีที่แล้ว

      U mentioned a paper.. Can share the link?

  • @Foogly117
    @Foogly117 6 ปีที่แล้ว +2

    Thank you so much. I'm currently going through an intro to deep learning, going through basics of supervised and unsupervised networks. The instructor explaining this kept rambling on and confusing me. This is very helpful!

  • @albertoferreiro1989
    @albertoferreiro1989 8 ปีที่แล้ว +11

    Thanks for this channel!! I really appreciate your simplified aproach in order to grasp the core concepts.
    Looking fordward for the next videos.
    Keep the good work!!!

  • @ajayshaan8573
    @ajayshaan8573 6 ปีที่แล้ว +2

    This channel is AMAZING! I love it when something's so neatly explained that even my grandma can understand. Great job fellas! :D

  • @VinBhaskara_
    @VinBhaskara_ 7 ปีที่แล้ว +1

    excellent explanation! i dont know why complicated sounding lectures miss these

    • @DeepLearningTV
      @DeepLearningTV  7 ปีที่แล้ว

      Sorry about the late reply and glad you like it!

  • @priyangkumarpatel9317
    @priyangkumarpatel9317 5 ปีที่แล้ว +1

    Your explanation is very accurate. It was a good mention that during back propagation, each gradient is a derivative of previous gradients and hence they become flatter and flatter, and eventually they get vanished. This impacts setting of the weights and bias at the initial crucial layers such that their delta is either negligible. These weights and bias are important as they potentially act on direct input features. This causes inaccuracy and slow learning of deep nets.
    This poses a fundamental as well as paradoxical problem for deep neural nets with many hidden layers!

  • @hoangtrunghieu287
    @hoangtrunghieu287 5 ปีที่แล้ว +3

    Wow this video is very useful. Hope to watch your videos more!

  • @cykkm
    @cykkm 3 ปีที่แล้ว +1

    The learning rate was also wort mentioning. The deeper is the net, the more it's prone to jumping over the cost minima. I remember how in the 80's ppl invented all the various trick, such as adding noise to weights to remedy this problem. You can find as sweet spot where the convergence improves, but... that was the 80's, so we all know how it turned out back then.

  • @sillfsxa1927
    @sillfsxa1927 7 ปีที่แล้ว +1

    I also think in another point of view, training the deep ANNs with back prop, will lead to something that we've already knew as curse of dimension. In fact in another way you have explained it very well.

  • @rameshbabuy9254
    @rameshbabuy9254 6 ปีที่แล้ว +1

    very good video on vanishing gradient

  • @tenshihsr
    @tenshihsr 6 ปีที่แล้ว +1

    Thanks for these videos, they're simple and very informative!

  • @davidbuchacaprats1454
    @davidbuchacaprats1454 8 ปีที่แล้ว

    Your videos are cool !
    One thing to note, you don't train anything using backprop. Backprop is an efficient method to compute the gradient of the cost with respect to the parameters, what you do with the gradient then is your training method. You can do SGD, SGD + momentum, Adagrad, RMSprop etc. All the previously mentioned methods use backprop in a neural net.

  • @joesgarage618
    @joesgarage618 8 ปีที่แล้ว +5

    at 3:05 should it say "it starts with the right ?"

  • @amritashsingh5080
    @amritashsingh5080 3 ปีที่แล้ว

    Awesome explanation - great job !!!

  • @farisalasmary6909
    @farisalasmary6909 7 ปีที่แล้ว +1

    Oh man! I'm facing this problem right now. It took may more than 12 hours and the cost still around 0.48

  • @ark6588
    @ark6588 7 ปีที่แล้ว +1

    great videos, thanks !

  • @amizan8653
    @amizan8653 8 ปีที่แล้ว +1

    Perfect explanation, thanks!

  • @IsabelsChannel
    @IsabelsChannel 8 ปีที่แล้ว +16

    Wow I really sounded like I might cry before a year of voice and speech 😂

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      Actually now that you mention it - yea. But I had to really listen for it.

    • @IsabelsChannel
      @IsabelsChannel 8 ปีที่แล้ว +5

      +DeepLearning.TV thank you college acting classes 🙏🏻 and it'll just continue to get better 😊

    • @mgruu
      @mgruu 7 ปีที่แล้ว

      thanks for the quality commentary, it really makes a difference to the production value.

    • @IsabelsChannel
      @IsabelsChannel 7 ปีที่แล้ว

      Thank you for the nice comment!

    • @IsabelsChannel
      @IsabelsChannel 7 ปีที่แล้ว +4

      Yeah? Which ones did you see? :) Unfortunately I didn't have any prior knowledge of DNN for these videos, though narrating them for these courses (and for IBM's Big Data University) has given me a little better understanding. I think it would be really interesting to learn all of this stuff, but I don't think I could listen to my own voice teaching me how to do it haha :)

  • @4XLibelle
    @4XLibelle 6 ปีที่แล้ว +1

    Excellent video; thank you for producing and sharing. Can you please link, or name, the three papers you reference? Many thanks in advance.

    • @DeepLearningTV
      @DeepLearningTV  6 ปีที่แล้ว

      www.iro.umontreal.ca/~pift6266/H10/notes/deepintro.html. Scroll all the way to the bottom

  • @dineshprasadgupta4625
    @dineshprasadgupta4625 7 ปีที่แล้ว +1

    Hi there your videos are good and easy to follow thanks. Please clarify small doubt. I believe ForwardProp is for training whereas BackProp is not for training. Its only for measuring the gradient from backside and it helps to reduce the cost. And eventually reducing the cost of training. So I doubt if BackProp is for training as you mentioned in the video.

    • @DeepLearningTV
      @DeepLearningTV  7 ปีที่แล้ว

      Any time you run forward prop, you get the output of the net, which is its best estimate of a class or a value. Once a net is trained, you use forward prop to run inference on new data points.
      During training, back prop let you measure the gradient, which is then added or subtracted to each parameter. The idea is you want to change the parameters until the estimated output closely matches the actual, with as little cost as possible. Backprop allows you to do this, and so helps train the net.

  • @dan10ds
    @dan10ds 4 ปีที่แล้ว +1

    Helpful !

  • @HusseinMohammedHamburg
    @HusseinMohammedHamburg 8 ปีที่แล้ว +1

    Thank you for these simple, yet helpful videos :). I have a request though; can you please provide me with links to the publications of the three major contributors to training methods in 2006 and 2007? just the title of the publications will be enough as well. Thank you in advance

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +1

      www.iro.umontreal.ca/~pift6266/H10/notes/deepintro.html.
      Scroll down to the bottom of the page.

    • @HusseinMohammedHamburg
      @HusseinMohammedHamburg 8 ปีที่แล้ว +1

      Thank you so much :)

  • @nguyenxuanhung236
    @nguyenxuanhung236 5 ปีที่แล้ว +1

    Nice lecture and nice voice. What tool did you use for this lecture ?

    • @DeepLearningTV
      @DeepLearningTV  5 ปีที่แล้ว

      Prezi (but also hired a professional illustrator for the graphics)

  • @Freeak6
    @Freeak6 6 ปีที่แล้ว +1

    Really nice videos !! Though I felt like the explanation of the vanishing gradient problem was not really clear. I understand that multiplying numbers between [0,1] will be a decaying number, however, I'm not sure to understand how you calculate this backpropagation (and then where this multiplication comes from).
    Otherwise, very good job ;)

  • @MoeMoe-nu4ep
    @MoeMoe-nu4ep 8 ปีที่แล้ว

    This is a recurrent neural network problem due to the linearity of back propagation (it's as if you used a summation value instead of sigmoid function), it doesn't really effect regular networks unless they are insanely large. Networks such as LSTM (invented in 1997) don't have this issue because back propagation is not linear on those models.

  • @ChingMavis
    @ChingMavis 5 ปีที่แล้ว +1

    Hi. Did the forward propagation is characterize as a training method for the neural network or it just a way of neural network to classify the input data?

    • @DeepLearningTV
      @DeepLearningTV  5 ปีที่แล้ว

      Forward prop is used to classify - or in other words as you might hear - run inference.

  • @CraigHollabaugh
    @CraigHollabaugh 8 ปีที่แล้ว

    love the Sly image, nice touch

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      You mean the video thumbnail?

    • @CraigHollabaugh
      @CraigHollabaugh 8 ปีที่แล้ว

      +DeepLearning.TV the image in the conflict bubble.

  • @KirillBerezin
    @KirillBerezin 8 ปีที่แล้ว

    Yes, for some tasks i found that thin but deep networks making it faster than fat and shallow with same accuracy

  • @tamiloreorojo4678
    @tamiloreorojo4678 7 ปีที่แล้ว

    Hey, this video was so helpful, I am trying to understand the vanishing gradient problem and its implications for SRN. Do you have any videos on how MRN introduced has helped to tackle this vanishing gradient problem to some extent? MRN does it have sluggish state memory?

  • @EranM
    @EranM 8 ปีที่แล้ว +1

    when you're talking numbers and doing calculations orally, its better if you can write them graphically as u go along

  • @ashishjohnsonburself
    @ashishjohnsonburself 6 ปีที่แล้ว +1

    I have a doubt...
    In the video at 2:30 it is being stated that "early layers get it wrong, the result built up by the net will be wrong as well." My point here is how can we say this statement because the if the gradient is small it means that the rate of learning a couple of feature at a particular layer will be less and it will not affect the accuracy of the learning feature in the layer.

    • @DeepLearningTV
      @DeepLearningTV  6 ปีที่แล้ว +1

      The problem is that you are not training each layer individually, but the net as a whole in a single training pass.

  • @gitanjalinair2013
    @gitanjalinair2013 7 ปีที่แล้ว +9

    why are the gradient values between 0 and 1?

    • @ajaxrich3821
      @ajaxrich3821 5 ปีที่แล้ว

      This is because a single output is a constant function that outputs 1 all the time and the gradient of a constant function is 0.

    • @ajaxrich3821
      @ajaxrich3821 5 ปีที่แล้ว

      The gradient descent algorithm, in specific, updates the weights by the negative of the gradient multiplied by some small (between 0 and 1) scalar value. If the number of iterations is too small for certain deep neural nets, we will have inaccurate results. If the number is too large, the training duration will become infeasibly long.

    • @shabir301
      @shabir301 5 ปีที่แล้ว

      Because we can't bound to( 0 to infinity) is very headache to store very large value in some cases extremely large ... Need infiite storage.

  • @rakeshmallick27
    @rakeshmallick27 7 ปีที่แล้ว +2

    I have a question is the voice description of this Deep Learning video series given by a deep learning speech to text converter?

    • @juggrnaut76
      @juggrnaut76 7 ปีที่แล้ว +1

      Haha! No - the voiceover is a person. Check credits in the description please!

  • @this1yt
    @this1yt 7 ปีที่แล้ว +1

    Great videos! I've question that while the deep net is training, does it calculates the error for each single input and updates weights and biases for next input; or it calculates average error of one batch of training inputs and updates weights and biases in the next iteration of batch?

    • @DeepLearningTV
      @DeepLearningTV  7 ปีที่แล้ว

      It does it by mini-batch - when you have millions of training examples, processing by mini-batch is sometimes the only practical option

  • @kkochubey
    @kkochubey 8 ปีที่แล้ว +1

    Yes that is the problem I was stuck for some time. Since I learned magic of back propagation I was under impression that it is the solution and only we need more power in computer. Later I heard it does not work well and finally learned what is the problem by going through online course on Coursera. Unfortunately it does not have yet solution.
    I watched lectures from Geff Hinton and Oxford and was not able to grasp a solution.
    Finally after this video that matches my current state and next video that gives me an idea how it can be solved. Still did not try it myself but at least got an idea about solution and it feels right.
    +1
    Thanks again

  • @asenakrk2236
    @asenakrk2236 8 ปีที่แล้ว

    I really like your videos thank you:) but I also want to make some exercise ? what can I do to get a visible result in a simple way?

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +1

      Glad you like the content! "Visible result in a simple way", please correct me if this is wrong - I am taking that to mean you wanna simplify the problem and make the solution simple. A simple solution to the vanishing gradient is to use the RELU activation along with backprop. RELU = 0 if x 0 which means it does not vanish as you work back through the network.

  • @mehdisafar8565
    @mehdisafar8565 8 ปีที่แล้ว

    Thank you for the videos, and I like your teaching style. It is my first time I learn something from a female ;)

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +1

      Glad you like the content, and not sure what I should say to that!

  • @gast1243568790
    @gast1243568790 8 ปีที่แล้ว

    Could you please share the link or the title of the 3 breakthrough papers? Thank you in advance!

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +1

      Here is the link - scroll down to the bottom - www.iro.umontreal.ca/~pift6266/H10/notes/deepintro.html

  • @DeepLearningTV
    @DeepLearningTV  8 ปีที่แล้ว +3

    Hey yea let me know what you find - backprop with RELU is a solution to beat the vanishing gradient. Check it out and let us know what you found :-)

    • @sivatejan1909
      @sivatejan1909 6 ปีที่แล้ว

      i am unable to understand the concept of the gradients,bias and the weights here.can u breifly tell about then?

  • @srinivasvalekar9904
    @srinivasvalekar9904 8 ปีที่แล้ว

    Thanks for the videos! This is a fantastic series for learning neural networks. Just one quick question - Suppose that I have millions of user records. And these records could be repeating. So among these records I would like to extract a best user record. So which is the best neural network to choose for this scenario?

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      Well - what are your criteria for best user record? If they are a handful of rules that are fixed and for which data is available, you may wanna just build a decision tree (or even an if - then statement ). Also is there only a single record or of these that's best and would that repeat?

    • @srinivasvalekar9904
      @srinivasvalekar9904 8 ปีที่แล้ว +1

      Among the millions of records, some records might be repeating. So of these few repeating records in millions of records, I need to find the best once among those few repeating records.
      So neural network should be in such a way that it should give me a new record by choosing best data of these few records.

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      Define "best". Lets say there are 5 records - A through E that repeat. What would make A the best one? Or C? Or D?
      If the criteria are not known, you choose C as the best record (seemingly arbitrarily), any time the net encounters C in the training data, or as a new data point, it will tell you that the best record has been found.

  • @mununulucky7594
    @mununulucky7594 8 ปีที่แล้ว

    Many thanks to you @DeepLearning. TV I have gained some good ideas from your videos. well after reading some papers and watching your videos I have an idea but I am in doubts if it is possible? Can it be possible to use deep convolutional neural networks to initiate the Content based Movie Recommender system? Thanks I don’t have many skills on DL but I liked it and I am learning it. Blessings.
    AM.

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      Mununu Alex Why Convolutional? What makes you pick that? Also how do you mean content - based?

    • @mununulucky7594
      @mununulucky7594 8 ปีที่แล้ว +1

      @DeepLearning.TV Thanks for your reply, I suggest Convolutional NN referring on how it was applied on Music/ Audio recommendation and on the content I mean a recommender which includes choices for content representation (Concept) and User profiling (Positive and Negative feedback). Thank you.

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +1

      Ok understood. It will depend on complexity - content could represented be by simple tags and only a handful, or be complex, involving many many tags, and/or natural language. Same is true for user input though it typically tends to be simple.
      Either way, if it is simple, a neural net is overkill. If it is complex (and/or involve natural language), use a neural net. A CNN could do the job, but CNNs are complex models that require lots of data with high good-quality labels.

    • @mununulucky7594
      @mununulucky7594 8 ปีที่แล้ว +1

      Many Thanks for your detailed Explanations. Picked good ideas from this!!

  • @superjaykramer
    @superjaykramer 8 ปีที่แล้ว

    With the Vanishing grading problem , you never talked about locking the weights in, prior to training of next layers? Can you please explain..

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      WIth back-prop, weights and biases are modified globally - meaning every layer is affected in any pass. So, by locking, please correct me if I am wrong; I believe you are referring to layer-wise pre training. This is discussed in the next couple of videos.
      Also, I may have accidentally deleted the other comment you made; would you please re-enter? Sorry about that.

  • @brandomiranda6703
    @brandomiranda6703 8 ปีที่แล้ว

    It seems you didn't address how to fix this issue. Do you have this on a later video?

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      Thats correct - Ep. 6 and 7 deal directly with addressing the problem of the vanishing gradient.

    • @brandomiranda6703
      @brandomiranda6703 8 ปีที่แล้ว +1

      Thanks!

  • @kevintan5088
    @kevintan5088 6 ปีที่แล้ว +1

    I'm a bit confused. Gradient values don't necessary have to be values between 0 and 1 right? One of the arguments made was that the gradient at earlier layers become smaller and smaller because of the compounding multiplications of values between 0 and 1. Can anyone help me understand?

    • @DeepLearningTV
      @DeepLearningTV  6 ปีที่แล้ว +2

      Good question - no gradients don't necessarily have to be between 0 and 1. However, if they are greater than 1, you get the opposite problem - the exploding gradient. Gradients will get larger and larger and so the later layers will train in much bigger jumps than the earlier layers. What you want is a gradient that is close to 1, but what you really want is a way around the effect of multiplying gradients.

  • @donaldhobson8873
    @donaldhobson8873 8 ปีที่แล้ว

    If the gradients shrinking to nothing is a problem then just find the RMS of the gradients and divide by it to make them larger. That way you can control the total amount of change.

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +1

      Ok - that could mitigate the problem but not make it go away. The best answer for beating the vanishing gradient is the use of ReLU, the derivative is 1. So, no vanishing :-)

  • @alexcipriani6003
    @alexcipriani6003 6 ปีที่แล้ว

    this is a nice video series but stoping in the middle of the video which is breaking the train of thought is a No No! I refer to the announcements that request comments; other than that great presentations. Thx

  • @VividPagan
    @VividPagan 8 ปีที่แล้ว

    So, basically, every node makes the amount of memory necessary to record the previous gradients back and forth up to that point adds exponentially larger, and also gives a possible wider margin for error when improperly trained...? Am I getting that correct?

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      With the exception of a recurrent net, the number of gradients for a given net is fixed. As you work back through the net during back prop, you use more and more of that number in a calculation. But those don't need to be stored separately in memory. You store them once for a backward pass, and use them many times in the calculation. For a given node, you only store one value for a gradient - its own. And that too is stateless unless you tell it as part of the code to store it.
      The error happens because the value of the gradient gets smaller and smaller. Backprop is essentially an optimization algorithm and its objective is to find the point of least cost or minimum. The metaphor to use is hills and valleys, with the landscape representing the math function that describes the problem. To find the point of least cost, you've got to climb down to the valley. This is not an exact analogy, but lets say each node represents a climber. If they all climb down at the same rate, they'll be at the valley together if they start together. If not, by the time some of them reach, others will still be on their way down. This is the problem with the vanishing gradient; some nodes are still training and will likely never be done while others have reached the optimum point.

    • @VividPagan
      @VividPagan 8 ปีที่แล้ว +1

      DeepLearning.TV So it's not so much of a memory storage issue, it's more a problem of the literal time needed for training the entire net when some nodes can't ever really hit "zero"? (Or at least not fast enough to be useful.)

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +1

      +Blink Pink Correct. This means many of them would never do which results in large errors

  • @JamesAnderson-wg8pz
    @JamesAnderson-wg8pz 8 ปีที่แล้ว +1

    I don't understand what the "Gradient". Gradient = rate at which cost changes. From what I understood cost is only possible to calculate at the output of the network. How can you calculate cost that early in the network if that node is not assigned to a specific class..

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +2

      +James Anderson You can calculate a cost at any time, even at the beginning when weights and biases are randomly set. At that time, the class assigned will just not be the right one(s) and the cost will be high. This is possible because there is a math function that lets backprop calculate the cost.

  • @ozgurakpinar1710
    @ozgurakpinar1710 8 ปีที่แล้ว

    Hey. Awesome work. I have a different advise, though. You sound like you are about to cry. Maybe you should train your voice a little. Other than this, I really admired your work.

    • @MichaelBuergerArt
      @MichaelBuergerArt 8 ปีที่แล้ว

      Yeah, at higher volumes I can hear the same thing

  • @rameshmaddali6208
    @rameshmaddali6208 8 ปีที่แล้ว

    Thanks

  • @Jabrils
    @Jabrils 7 ปีที่แล้ว +1

    Hi. I just wanted to know that i love you. That is all. Goodbye :)

  • @mohamedgamea9170
    @mohamedgamea9170 ปีที่แล้ว

    @3:21 "this edge uses the gradient at that node and the next " In the last network layer why does the highlighted edge use the gradient of the next node of the last layer?? doesn't it for only connected node, the other neuron in the last layer is not connected with the highlighted edge in @3:21.
    I think that the second highlighted edge in @3:22 uses only the gradients of connected nodes.
    ------------------------------------------------------------------------------------
    Or do you mean "this edge uses the gradient at that node and the next edge also uses gradient of its connected node"?

  • @TheDevelopmentChannel
    @TheDevelopmentChannel 8 ปีที่แล้ว +1

    Question (maybe I did not get the point) to solve the vanishing gradient problem is not just using the Relu(Rectfier linear unit) activation functions?
    en.wikipedia.org/wiki/Rectifier_%28neural_networks%29

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      +The Development Channel Sorry would you please clarify? Are you asking whether the RELU is sufficient to address the vanishing gradient issue?

    • @TheDevelopmentChannel
      @TheDevelopmentChannel 8 ปีที่แล้ว +1

      +DeepLearning.TV I was thinking that if you use Relu z=(0,max(x)) activation functions on your neural network, the problem of the vanishing gradient would be solved. Am I mistaken on this?

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      +The Development Channel Nope - the RELU activation function is a good solution for the vanishing gradient.

  • @Tarnov95
    @Tarnov95 5 ปีที่แล้ว +7

    Yo breath. You sound like you did a long run befor every single sentence :D

    • @computerguycj1
      @computerguycj1 5 ปีที่แล้ว +2

      Some people don't breath the same as others, possibly for medical reasons. How would you you like it if you had a medical condition and someone told you to act normal? Think! Also, put an 'e' on your "before", you ignoramus.

    • @Tarnov95
      @Tarnov95 5 ปีที่แล้ว +1

      @@computerguycj1 Some people have to tell others what they think, possibly for medical reasons. How would you like it if you have a medical condition and someone told you to act normal? Think! Also, remove one 'you' from your second sentence, you ignoramus.

    • @computerguycj1
      @computerguycj1 5 ปีที่แล้ว

      @@Tarnov95 Classic "I know you are, but what am I" response. You clearly missed the point.

    • @Tarnov95
      @Tarnov95 5 ปีที่แล้ว

      @@computerguycj1 gosh, like i care about your feelings. Everyone understand what you try to say. Aren't you gonna get tired of all that triggering? Sometimes people make jokes or remarks about others. If that's why you get moral and shake your head, then you're just a naive kid in your head. Relax and let the world loosen up a bit.

    • @computerguycj1
      @computerguycj1 5 ปีที่แล้ว

      @@Tarnov95 Just saying. You shouldn't dish out unwarranted criticism, especially if you can't take it yourself. And yes, you do clearly care, because you let my responses get to you. Now you know what it feels like to be mistreated online.

  • @김균우
    @김균우 8 ปีที่แล้ว

    I don't still get what gradient is... Could someone please explain me more?

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว +1

      +김균우 A gradient is the rate at which something changes with respect to a change in something else. Speed is a gradient of distance vs time, for example. If you are familiar with calculus, taking the gradient is the same as performing the derivative - you are measuring how one thing changes with respect to a change in a related thing.

  • @souravmaharana866
    @souravmaharana866 7 ปีที่แล้ว +1

    What is Gradient?

    • @DeepLearningTV
      @DeepLearningTV  7 ปีที่แล้ว +1

      A gradient is the rate at which the cost will change when a particular parameter (weight or a bias) changes. Mathematically, you get it by differentiating the loss function with respect to that parameter. Most deep learning libraries have built in functions to implement the gradient which is super helpful!

  • @dr_flunks
    @dr_flunks 8 ปีที่แล้ว +1

    Great video series! Why Yes! I've seen vanishing gradient. Sadly, I learned the hard way after building a neural net notebook to define a shape through a simple python list. AKA, [10,100,200,300,5] would be 10 inputs, 3 hidden layers, with 100,200,300 nodes, and 5 layer output. I noticed that i could get maybe 3 or 4 layers but if i went deeper accuracy would rapidly begin to suck. i thought i got the backprop alg wrong but it was doing too well on shallow 2 and 3 layer nets. However, 6+ layer networks would not work no matter what i did... sounds like vanishing gradient explains it. Here's the code if anybody wants to see this concept play out in action: github.com/krautt/Python-based-Multilayer-Neural-Network-based-on-Andrew-Ng-Intro-to-Machine-Learning-Course

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      Yea - sounds like your gradients are vanishing! Thanks for the contribution :-)

  • @ShirishJadav162
    @ShirishJadav162 8 ปีที่แล้ว +1

    Most of problems will be solved if( As I see it ) There is a GPU kind of Analog computer architecture. I am preety sure people would be working on this. analog computer cell(single cell is single neuron with input mux and output mux), and some kind of Magnetic memory (not sure of this)to store floating numbers. this kind of Arch would give faster Speed interms of Computation and Parallalism which is not possible with current Arch. I would Like to work on this development but I am stuck in a company where I write Firmware of Controllers.

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      +Shirish Jadav Oh wow - I like that! How much faster? Please tell us more :-)

    • @ShirishJadav162
      @ShirishJadav162 8 ปีที่แล้ว +1

      +DeepLearning.TV well I am not sure how much faster then digital computers but sure enough it would be faster and Parallel(in magnitudes because it will not depend on Clock pulse but the bottle neck would be Propogation of Electrical Signals, that depends on the length of wires,also it might be very sensitive to outer noise working with analog voltages in small range may be problematic because of noise but could work well at range of 5V) ... I am trying to make an abstract Arch for it... I would love to work in this field. but I am not a perfect electrical engineer.. Specially I know Analog electronics is not my job. but I know fundamentals of it and computation. I have been dreaming of this kinds of computers like 4 yrs.. I just have a blur Idea of its working I know what I wanted to do about analog computers can be very much useful for NNs.. I have studied NN from first year of my clg in BTech out of my own Interest. but currently I am working on IoT Project.

    • @ShirishJadav162
      @ShirishJadav162 8 ปีที่แล้ว +1

      +Shirish Jadav Recently I see a great potential in NN's So my interest is getting back to NN's but I know that current hardware limits certain things that I would truely consider NN's close to a biological one.

    • @DeepLearningTV
      @DeepLearningTV  8 ปีที่แล้ว

      +Shirish Jadav Ok - well - do you have any resources on Analog computing and how neural nets apply to that framework?

    • @ShirishJadav162
      @ShirishJadav162 8 ปีที่แล้ว +1

      I had none until I searched it now.. I thought I was alone thinking on this :( bt here is some thing that might interest you... www.eetimes.com/document.asp?doc_id=1138111

  • @fosheimdet
    @fosheimdet 4 ปีที่แล้ว +1

    How are the author names of those three papers spelled please? I promise it's for research (and not for masterbaiting)

    • @DeepLearningTV
      @DeepLearningTV  4 ปีที่แล้ว

      Scroll to the bottom -
      www.iro.umontreal.ca/~pift6266/H10/notes/deepintro.html

  • @gurkiratsingh9788
    @gurkiratsingh9788 5 ปีที่แล้ว

    What is a gradient?

    • @DeepLearningTV
      @DeepLearningTV  5 ปีที่แล้ว +1

      Assuming that you had calculus in school, a gradient is what you get when you differentiate a function. Here, the gradient is obtained by differentiating the loss function.

  • @agentanakin9889
    @agentanakin9889 5 ปีที่แล้ว

    Would be better without the breathing into the microphone.

  • @pocketman5510
    @pocketman5510 7 ปีที่แล้ว

    Why does it sound like you're laughing when you randomly ask, "let me know below"?

  • @AmirhoseinHerandy
    @AmirhoseinHerandy 8 ปีที่แล้ว +6

    You sound like you are crying at the end!!

  • @qew89
    @qew89 6 ปีที่แล้ว +2

    Voice sounds like this woman wants to cry...

  • @omenworks
    @omenworks 7 ปีที่แล้ว

    I love the videos, but person behind the voice need to take a few deep breaths and just relax :) You seem to be stressed and your voice stutter from that a bit.

  • @nicomeyer919
    @nicomeyer919 7 ปีที่แล้ว

    "Did you ever have this issue while training your network?" wtf.. its like asking the kids in first grade what their favorite math functions are..

    • @DeepLearningTV
      @DeepLearningTV  7 ปีที่แล้ว

      First, sentiment is permissible, language is not. Second, do you realize this channel is used by both beginners and experienced people?

  • @zes7215
    @zes7215 6 ปีที่แล้ว

    no such thing as popux or not, not everythx, nonerx

  • @wiz7716
    @wiz7716 5 ปีที่แล้ว

    I can't help but noticing the trembling in your voice!