DeepBean
DeepBean
  • 20
  • 195 600
Vanishing Gradients: Why Training RNNs is Hard
Here, we run down how RNNs are trained via backpropagation through time, and see how this algorithm is plagued by the problems of vanishing and exploding gradients. We present an intuitive and mathematical picture by flying through the relevant calculus and linear algebra (so feel free to pause at certain bits!)
Timestamps
--------------------
00:00 Introduction
00:46 RNN refresher
03:42 Gradient calculation of W
06:50 Exploding and vanishing gradients
07:35 Linear algebra perspective
12:20 Solutions
Links
---------
- Papers on vanishing and exploding gradients
www.bioinf.jku.at/publications/older/2304.pdf
ieeexplore.ieee.org/document/279181
arxiv.org/abs/1211.5063
- Long short-term memory paper: www.bioinf.jku.at/publications/older/2604.pdf
- RNN paper (Elman networks): onlinelibrary.wiley.com/doi/10.1207/s15516709cog1402_1
มุมมอง: 340

วีดีโอ

Vector-Quantized Variational Autoencoders (VQ-VAEs) | Deep Learning
มุมมอง 1.6K2 หลายเดือนก่อน
The Vector-Quantized Variational Autoencoder (VQ-VAE) forms discrete latent representations, by mapping encoding vectors to a limited size codebook. But, how does it do this, and why would we want to do it anyway? Link to my video on VAEs: th-cam.com/video/HBYQvKlaE0A/w-d-xo.html Timestamps 00:00 Introduction 01:09 VAE refresher 02:42 Quantization 04:46 Posterior 06:09 Prior 07:06 Learned prior...
Disentanglement with beta-VAEs | Deep Learning
มุมมอง 5192 หลายเดือนก่อน
Link to my VAE video for a refresher: th-cam.com/video/HBYQvKlaE0A/w-d-xo.html In this video, we explore how and why modifying the VAE loss function enables us to achieve disentanglement in the latent space, with different latent variables corresponding to different semantic features of the data. We take a look at the original beta-VAE formulation, as well as controlled capacity increase, and t...
Convolutional Neural Networks (CNNs) | Deep Learning
มุมมอง 1.8K5 หลายเดือนก่อน
CNNs are a go-to deep learning architecture for many computer vision tasks, from image classification to object detection and more. Here, we take a look at the basics, and see how they use biologically-inspired hierarchical feature extraction to do what they do. Timestamps Introduction 00:00 Kernel convolutions 00:41 Common kernels 02:30 Why flipping? 03:30 Convolution as feature extraction 04:...
Understanding Variational Autoencoders (VAEs) | Deep Learning
มุมมอง 9K6 หลายเดือนก่อน
Here we delve into the core concepts behind the Variational Autoencoder (VAE), a widely used representation learning technique that uncovers the hidden factors of variation throughout a dataset. Timestamps Introduction 0:00 Latent variables 01:53 Intractability of the marginal likelihood 05:08 Bayes' rule 06:35 Variational inference 09:01 KL divergence and ELBO 10:14 ELBO via Jensen's inequalit...
The Geiger-Marsden Experiments | Nuclear Physics
มุมมอง 1.4Kปีที่แล้ว
In 1908-13, nuclear physics was born as Hans Geiger and Ernest Marsden embarked on the experiments that would discover the atomic nucleus and revolutionise our understanding of atomic structure. Here we explore why and how they carried out the famous gold-leaf experiment, as well as how Ernest Rutherford arrived at his startling conclusions. CHAPTERS Introduction 00:00 Alpha Particles 00:20 The...
Dijkstra's Algorithm: Finding the Shortest Path
มุมมอง 881ปีที่แล้ว
Dijkstra's algorithm is a neat way of finding the minimum-cost path between any two nodes in a graph. Here we see briefly how can use it to optimize our path through a graph, and also explore why it performs as well as it does. Feel free to like, comment and subscribe if you appreciate what I do!
Einstein's Ladder Paradox; Simply Explained
มุมมอง 12Kปีที่แล้ว
In special relativity, the ladder paradox (or, "barn-pole" paradox) occurs due to they symmetry of length contraction. Here we explore how this apparent paradox can be solved using the relativity of simultaneity. If you're interested in more special relativity content, check out the series below! SPECIAL RELATIVITY SERIES I. The Michelson-Morley Experiment (th-cam.com/video/DFQtVFEp_3E/w-d-xo.h...
Solving Einstein's Twin Paradox
มุมมอง 3.2Kปีที่แล้ว
Many solutions have been proposed to Einstein's twin paradox, but many of them miss the vital reasons why the Earth twin is correct and the Spaceship twin is wrong. Here we condense the solution of the twin paradox to its essentials, and also discuss why applying general relativity to the problem is unnecessary. CHAPTERS What is the Twin Paradox? 00:00 Time Dilation 00:43 The "Paradox" 01:17 Th...
Relativistic Velocity Addition | Special Relativity
มุมมอง 4.8Kปีที่แล้ว
Here, we briefly derive the equation for relativistic velocity addition, using only the Lorentz transformation equations we derived back in Part 3. Please like, subscribe and leave a comment if you appreciate what I do! SPECIAL RELATIVITY SERIES I. The Michelson-Morley Experiment (th-cam.com/video/DFQtVFEp_3E/w-d-xo.html) II. Time Dilation and Length Contraction (th-cam.com/video/bArTzG3Mkmk/w-...
What is Spacetime? | Special Relativity
มุมมอง 2.4Kปีที่แล้ว
In this fifth video, we explore how space and time can be treated as one interrelated entity, and how the Lorentz transformations can be given a geometric interpretation. We also explore how paradoxes are avoided by the preservation of causality. Please like, comment and subscribe if you appreciate what I do! SPECIAL RELATIVITY SERIES I. The Michelson-Morley Experiment (th-cam.com/video/DFQtVFE...
Deriving the General Lorentz Transformation | Special Relativity
มุมมอง 8Kปีที่แล้ว
In this fourth video of the Special Relativity series, we derive the general (matrix) form of the Lorentz transformations for an arbitrary boost velocity in 3D space. Please like, comment and subscribe if you appreciate what I do! SPECIAL RELATIVITY SERIES I. The Michelson-Morley Experiment (th-cam.com/video/DFQtVFEp_3E/w-d-xo.html) II. Time Dilation and Length Contraction (th-cam.com/video/bAr...
Deriving the Lorentz Transformations | Special Relativity
มุมมอง 19Kปีที่แล้ว
In this third video of the Special Relativity series, we derive the Lorentz transformations, which map events in one reference frame to another reference frame that moves at a constant relative velocity. We also demonstrate how these transformations can be used to derive the phenomena of time dilation and length contraction that we explored more informally in the last video. Please like, commen...
Time Dilation and Length Contraction | Special Relativity
มุมมอง 6Kปีที่แล้ว
Here we explore how Einstein's postulates imply that moving clocks tick slower and moving trains become shorter. Please like, comment and subscribe if you appreciate what I do! SPECIAL RELATIVITY SERIES I. The Michelson-Morley Experiment (th-cam.com/video/DFQtVFEp_3E/w-d-xo.html) II. Time Dilation and Length Contraction III. Deriving the Lorentz Transformations (th-cam.com/video/FvqutkaPmas/w-d...
The Michelson-Morley Experiment | Special Relativity
มุมมอง 29Kปีที่แล้ว
The Michelson-Morley experiment contributed to the crisis of ether theory that eventually gave way to Einstein's special theory of relativity. In this first episode of the Special Relativity series, we summarize the motivation, results and implications of that pivotal experiment. Please like, comment and subscribe if you appreciate what I do! SPECIAL RELATIVITY SERIES I. The Michelson-Morley Ex...
The Physics of Nuclear Weapons
มุมมอง 1.9Kปีที่แล้ว
The Physics of Nuclear Weapons
How YOLO Object Detection Works
มุมมอง 39Kปีที่แล้ว
How YOLO Object Detection Works
Optimization for Deep Learning (Momentum, RMSprop, AdaGrad, Adam)
มุมมอง 48Kปีที่แล้ว
Optimization for Deep Learning (Momentum, RMSprop, AdaGrad, Adam)
Backpropagation: How Neural Networks Learn
มุมมอง 3.4Kปีที่แล้ว
Backpropagation: How Neural Networks Learn
Transformers, Simply Explained | Deep Learning
มุมมอง 4.3Kปีที่แล้ว
Transformers, Simply Explained | Deep Learning

ความคิดเห็น

  • @HaiderAli-l5z1c
    @HaiderAli-l5z1c 6 ชั่วโมงที่ผ่านมา

    confusing there must be a simpler derivation

  • @nabinbk1065
    @nabinbk1065 14 ชั่วโมงที่ผ่านมา

    thanks

  • @gilrosario7224
    @gilrosario7224 7 วันที่ผ่านมา

    I’m here because of Lord Jamar. His interview on the Godfrey Comedy channel was very interesting….

  • @PepysFlora-t8p
    @PepysFlora-t8p 8 วันที่ผ่านมา

    Williams John Young Gary Clark Jessica

  • @Kir-f4j
    @Kir-f4j 10 วันที่ผ่านมา

    Очень классные ролики, очень интересно и понятно смотреть Только переводить не легко бывает, привет из России ❤

  • @fzigunov
    @fzigunov 13 วันที่ผ่านมา

    You're the best explanation out there in my opinion. I appreciate you!!

  • @lambda4931
    @lambda4931 17 วันที่ผ่านมา

    Why wouldn’t going against the aether be the opposite of going with it. They should cancel out.

  • @raihanpahlevi6870
    @raihanpahlevi6870 20 วันที่ผ่านมา

    predicted Ci is calculated with IoU if the cell have object, then how to calculate predicted Ci if the cell doesnt have object?

  • @BradleyJohnson-t2e
    @BradleyJohnson-t2e 23 วันที่ผ่านมา

    Ortiz Mountains

  • @oinotnarasec
    @oinotnarasec 24 วันที่ผ่านมา

    Beautiful video. Thank you

  • @everythingisalllies2141
    @everythingisalllies2141 24 วันที่ผ่านมา

    Your error is at th-cam.com/video/FvqutkaPmas/w-d-xo.html If the spherical wave is centred and expanding from K origin, it cant also be expanding from a different center at location K prime's origin which is also in motion. Your whole explanation has failed at this point.

  • @bradleymorris161
    @bradleymorris161 26 วันที่ผ่านมา

    Thank you so much for this, really cleared up how VAEs work

  • @nielsniels5008
    @nielsniels5008 27 วันที่ผ่านมา

    Thank you so much for these videos

  • @everythingisalllies2141
    @everythingisalllies2141 28 วันที่ผ่านมา

    the is all BS. For jack, his ladder doesn't shrink, because he can say the barn is doing the moving, so the barn is not as big as it was before. So the ladder certainly cant fit. Now that we know it doesn't fit for two reasons, its not going to fit if you invent a third option. Your error is in your stupid simultaneity example. make up your mind, it cant be two things that light does. The centre of an expanding sphere of light cant have two different origins, one not moving and the other moving. That is where you make the error of simple logic and simple physics.

  • @fullerholiday2872
    @fullerholiday2872 หลายเดือนก่อน

    Martin Jessica Moore Carol Taylor Dorothy

  • @KwangrokRyoo
    @KwangrokRyoo หลายเดือนก่อน

    this is amazing 🤩

  • @Chachaboyz
    @Chachaboyz หลายเดือนก่อน

    By far the best resource I've found on VAEs, after _lots_ of reading and video watching. This puts it all together intelligently and clearly. Thank you!!

  • @chhotiverma5019
    @chhotiverma5019 หลายเดือนก่อน

    Wow wonderful explanation ❤️ ⭐⭐⭐⭐⭐

  • @jeffburton1326
    @jeffburton1326 หลายเดือนก่อน

    There is a difference between a paradox and BS. This is BS ........ not a paradox.

  • @truthbetold818
    @truthbetold818 หลายเดือนก่อน

    I think the Aether does exist

  • @qualquan
    @qualquan หลายเดือนก่อน

    confusing

  • @romansate2854
    @romansate2854 หลายเดือนก่อน

    Zero point energy (bare vacuum) needs to be included in this.

  • @HojjatMonzavi
    @HojjatMonzavi หลายเดือนก่อน

    As a junior AI developer, this was the best toturial of Adam and Other optimizers I've ever seen. Simply explained but not too simply to be a useless overview Thanks

  • @ligezhang4735
    @ligezhang4735 หลายเดือนก่อน

    This video is amazing! I like how you get "reparametrization trick" into the picture, that you first calculate the gradient seperately to show the potential issue. Super clear!

  • @ayahouassim4095
    @ayahouassim4095 หลายเดือนก่อน

    Earth does not move!

  • @sokrozayeng7691
    @sokrozayeng7691 หลายเดือนก่อน

    Great Explaination! Thank you.

  • @mehdizahedi2810
    @mehdizahedi2810 หลายเดือนก่อน

    awesome explanation, contains a lot of information which are missing in the paper. Thank you

  • @ancientseed2607
    @ancientseed2607 2 หลายเดือนก่อน

    Yeah, I don’t know about that.

  • @rafa_br34
    @rafa_br34 2 หลายเดือนก่อน

    Very interesting indeed, I feel kinda stupid cuz I barely understand the math tho lol.

    • @deepbean
      @deepbean 2 หลายเดือนก่อน

      I appreciate the comment! Yeah, the theory of VAEs can get a bit heavy at times... though I hope some points are conveyed well enough without the need for equations

  • @tahmidislamtasen1602
    @tahmidislamtasen1602 2 หลายเดือนก่อน

    Finally thats the counterargument that came to my mind

  • @deepbean
    @deepbean 2 หลายเดือนก่อน

    Note on 16:38. The classifier doesn't directly classify the ground-truth factor corresponding to each latent variable; it classifies the factor that was kept constant in each input data pair. However, the structure of the problem, and the limitation of a linear classifier, ensures that it can only do this by mapping latent variables to ground-truth factors.

  • @isiisorisiaint
    @isiisorisiaint 2 หลายเดือนก่อน

    why don't i see such detailed explanation videos as PDFs? do you guys think you'll get rich from a couple hundred views on youtube??? make a PDF (including ALL your explanations during the video!). having only an online version, plus having to click stop and juggle with the video position slider and never having a continuous presentation in front of my eyes is just plain nonsense for this kind of material. basically, it seems (from what i've been able to follow) to be a very well done presentation in terms of content , but totally useless in the form the content is delivered PS: never heard this so clearly stated before: "a VAE maps a point in X to a distribution in Z by pushing the distributions towards the prior of Z which is a unit guasian, which encourages the distributions to overlap and fill all the space of the prior of Z". just brilliant.

  • @stephen7774
    @stephen7774 2 หลายเดือนก่อน

    The sheer arrogance of Michelson and Morley in ignoring the obvious fact that gravity is ether flow totally bewilders me to this day.

    • @jojojo9240
      @jojojo9240 7 วันที่ผ่านมา

      I think you are a little confused. These experiments should be sufficient.

  • @EFCK555
    @EFCK555 2 หลายเดือนก่อน

    Good work man its the best explanation i have ever seen. Thank you so much for your work.

  • @HesitantOne
    @HesitantOne 2 หลายเดือนก่อน

    Thanks a lot. Really clear one.

  • @emmyzhou9552
    @emmyzhou9552 2 หลายเดือนก่อน

    Best video on VAEs I found on TH-cam :D Thanks a lot!

  • @sandpaper7781
    @sandpaper7781 2 หลายเดือนก่อน

    It is a paradox, because it has a solution. Without a solution, it is just a contradiction.

  • @webdozor
    @webdozor 2 หลายเดือนก่อน

    Great explanation. Thank you. Are you familiar with any algorithms for solving the following task: lets say we have an empty battery with infinite capacity that is beeing charged at W Watts/Hour, and B number of buttons, each button uses X amount of stored energy, but increases the charging rate by Y Watts/Hour. Each button can be pressed only once. We need to fin the sequence that will allow us reach maximum charging rate (aka press all the buttons) in the shortest time.

  • @JuergenAschenbrenner
    @JuergenAschenbrenner 2 หลายเดือนก่อน

    exzellent Stuff, who needs netflix with this ;-)

  • @homakashefiamiri3749
    @homakashefiamiri3749 2 หลายเดือนก่อน

    This video was a masterpiece

  • @michaelzoran
    @michaelzoran 2 หลายเดือนก่อน

    Excellent explanation. You kept this video short and sweet while also being informative. I wish more videos were like this.

  • @deepbean
    @deepbean 2 หลายเดือนก่อน

    NOTES: - Slight typo at 15:20. Where it says "ELBO" within the integral, it's just the difference of logs, whereas the ELBO is actually the expectation of difference of logs over q(z|x). - At 20:36 (the Gaussian case), I've phrased it in univariate terms (even though the Gaussian is generally multivariate); however, we would still recover the result that what we're calculating is a Euclidean distance between x and mu, given that our covariance matrix is an identity matrix.

  • @nikiiliev8062
    @nikiiliev8062 3 หลายเดือนก่อน

    You put everything together working in the video, that's really helpful. Thank you!

  • @blower05
    @blower05 3 หลายเดือนก่อน

    Why the light travel forward and backward distance are the same? it is not the same.

    • @deepbean
      @deepbean 2 หลายเดือนก่อน

      This is assuming the light rays have constant speed.

    • @blower05
      @blower05 2 หลายเดือนก่อน

      @@deepbean Thanks! but the scientists used the laser in huge optical fibers bundle with significant distance to try to detect the earth's rotation by interference produced for the distance run by the laser(light). This makes me confused.

    • @krzysztofciuba271
      @krzysztofciuba271 หลายเดือนก่อน

      @@deepbean Tell me, what is the wave number, wavelength, and wave period for the perpendicular direction to the parallel mirror (your T(t) time; this direction has the angle. A typical definition of wave number in all 3D (dimension)!

  • @carlosmontalban687
    @carlosmontalban687 3 หลายเดือนก่อน

    Hello, I can't understand the step of instant 15:20. You expand the expectation, you bring the gradient int the integral, but, why do you substitute the difference of logs by the ELBO?? The ELBO is the expectation of the difference of logs over q, isn't it?

    • @deepbean
      @deepbean 2 หลายเดือนก่อน

      Hi Carlos, thanks for your comment! Yes, it looks like a little typo, and I'll make sure to add a comment clarifying this.

    • @carlosmontalban687
      @carlosmontalban687 2 หลายเดือนก่อน

      @@deepbean Thank you very much for all your work.

  • @carlosmontalban687
    @carlosmontalban687 3 หลายเดือนก่อน

    Thank you very much for your work, it has been decisive for me to understand the basis of this type of models and it will surely be of great help for me to understand the functioning of others.

  • @georgewootten4428
    @georgewootten4428 3 หลายเดือนก่อน

    at 8:07 why would the scale factor be different for light going in the opposite direction?

  • @420_gunna
    @420_gunna 3 หลายเดือนก่อน

    I just came off of about three hours of lectures on VAEs in the Stanford "Deep Generative Models" course, and they didn't do as good of an explanation as you did here 🤷‍♂

  • @peta1001
    @peta1001 3 หลายเดือนก่อน

    Please, comment! My problem with this experiment, as well as with the "Hafele and Keating experiment", is that it was conducted in and affected by the Earth's gravitational field. Is there an experiment that was (or is being conducted) in space, as far away from the Sun and/or any planet's gravitational field? Thanks

    • @deepbean
      @deepbean 3 หลายเดือนก่อน

      Hello! Not that I'm aware of. But yes, that would eliminate the effect of gravity and/or the proposed "aether drag" of the Earth.

  • @BigBeniir
    @BigBeniir 3 หลายเดือนก่อน

    My favorite video on VAEs, the derivation of the ELBO is much clearer than in other resources I've found online. Awesome resource.