Deep Foundations
Deep Foundations
  • 82
  • 68 351
Vecrtor Quantization and Multi-Modal Models
Vecrtor Quantization and Multi-Modal Models
มุมมอง: 131

วีดีโอ

Contrastive Coding
มุมมอง 2334 หลายเดือนก่อน
This lecture describes contrastive coding, a fourth fundamental loss function of deep learning.
Guidance for diffusion Models
มุมมอง 3254 หลายเดือนก่อน
In practice diffusion models need a poorly understood alteration of their objective function called "guidance". This talk gives the timeline of the rapid development from the introduction of guidance in May 2021 to DALLE2 in March 2022.
Diffusion1
มุมมอง 2444 หลายเดือนก่อน
This is an introduction to the mathematics of diffusion models.
VAE
มุมมอง 1994 หลายเดือนก่อน
This lecture describes the ELBO loss function that defines variational auto encoders (VAEs) as a third fundamental equation (together with cross entropy loss and the GAN adversarial objective).
GANs
มุมมอง 1135 หลายเดือนก่อน
The formulation of GANs plus a variety of applications of GANs and discriminative loss.
Lecture 8b
มุมมอง 1875 หลายเดือนก่อน
The Occam Guarantee (The Free Lunch Theorem), The PAC-Bayes Theorem (real valued model parameters and L2 regularization guarantees), Implicit Regularization, Calibration, Ensembles, Dounble Descent, and Grokking.
SDE
มุมมอง 2465 หลายเดือนก่อน
Gradient Flow, Diffusion Processes (Brownian Motion), Langevin Dynamics and the Stochastic Differential Equation (SDE) model of SGD
SGD
มุมมอง 2905 หลายเดือนก่อน
A presentation of Vanilla SGD, momentum and Adam with an analysis based on understanding temperature and its relationship to hyper-parameter tuning.
Transformer
มุมมอง 2766 หลายเดือนก่อน
Language Modeling, Self Attention and the Transformer.
Some Fundamental Architectural Elements
มุมมอง 2326 หลายเดือนก่อน
This describes the motivation for RELU, initialization methods, normalization layers and residual connections.
history to 2024
มุมมอง 7506 หลายเดือนก่อน
This is an overview of the history of Deep learning. It reviews the history starting from the introduction of the neural threshold unit in 1943 but focusing mainly on the "current era" which starts in 2012 with AlexNet.
Lecture 5: Language Modeling and the Transformer
มุมมอง 289ปีที่แล้ว
Lecture 5: Language Modeling and the Transformer
Lecture 4: Initialization, Normalization, and Residual Connections
มุมมอง 327ปีที่แล้ว
Lecture 4: Initialization, Normalization, and Residual Connections
Lecture 3 Einstein notation and CNNs
มุมมอง 416ปีที่แล้ว
Lecture 3 Einstein notation and CNNs
Lecture 1: A Survey of Deep Learning
มุมมอง 771ปีที่แล้ว
Lecture 1: A Survey of Deep Learning
More Recent Developments
มุมมอง 5472 ปีที่แล้ว
More Recent Developments
Vector Quantized Variational Auto-Encoders (VQ-VAEs).
มุมมอง 8K2 ปีที่แล้ว
Vector Quantized Variational Auto-Encoders (VQ-VAEs).
Progressive VAEs
มุมมอง 4632 ปีที่แล้ว
Progressive VAEs
Gaussian Models and the Perils of Differential Entropy
มุมมอง 4152 ปีที่แล้ว
Gaussian Models and the Perils of Differential Entropy
Variation Auto Encoders (VAEs)
มุมมอง 8232 ปีที่แล้ว
Variation Auto Encoders (VAEs)
VAE Lecture 1
มุมมอง 4633 ปีที่แล้ว
VAE Lecture 1
SGD Lecture 1
มุมมอง 2753 ปีที่แล้ว
SGD Lecture 1
2021 Developments
มุมมอง 5183 ปีที่แล้ว
2021 Developments
Back-Propagation
มุมมอง 8363 ปีที่แล้ว
Back-Propagation
Back-Propagation with Tensors
มุมมอง 1.1K3 ปีที่แล้ว
Back-Propagation with Tensors
The Educational Framework (EDF)
มุมมอง 1.1K3 ปีที่แล้ว
The Educational Framework (EDF)
Minibatching
มุมมอง 5643 ปีที่แล้ว
Minibatching
Trainability
มุมมอง 5273 ปีที่แล้ว
Trainability
Einstein Notation
มุมมอง 5013 ปีที่แล้ว
Einstein Notation

ความคิดเห็น

  • @nserver109
    @nserver109 หลายเดือนก่อน

    Wonderful!

  • @DorPolo-x5g
    @DorPolo-x5g 2 หลายเดือนก่อน

    great video.

  • @ees7416
    @ees7416 3 หลายเดือนก่อน

    fantastic course. thank you.

  • @saikalyan3966
    @saikalyan3966 4 หลายเดือนก่อน

    Uncanny

  • @stevecaya
    @stevecaya 5 หลายเดือนก่อน

    At minute 16 is priceless. The teacher is not sure if there has been any big advances in AI other then this small thing called GPT-3. Ha ha ha ha. Nothing big about that model other then it would turn out to be the biggest consumer app to 100 million users in history. And usher in the AI age to the general public. Dude, how did you miss that one…ouch.

  • @moormanjean5636
    @moormanjean5636 5 หลายเดือนก่อน

    This is such a helpful video thank you

  • @martinwafula1183
    @martinwafula1183 6 หลายเดือนก่อน

    Very timely tutorial

    • @solomonw5665
      @solomonw5665 5 หลายเดือนก่อน

      *released 3 years ago 🫠

    • @quickpert1382
      @quickpert1382 5 หลายเดือนก่อน

      @@solomonw5665 timed for showing off in TH-cam recommendations after KANs released. For him was quite timed, for me, too late already

  • @shivamsinghcuchd
    @shivamsinghcuchd 6 หลายเดือนก่อน

    This is gold!!

  • @aditya_a
    @aditya_a 9 หลายเดือนก่อน

    Narrator: deep networks were NOT saturated lol

  • @jennifergo2024
    @jennifergo2024 11 หลายเดือนก่อน

  • @jennifergo2024
    @jennifergo2024 11 หลายเดือนก่อน

  • @jennifergo2024
    @jennifergo2024 11 หลายเดือนก่อน

  • @jennifergo2024
    @jennifergo2024 11 หลายเดือนก่อน

  • @jennifergo2024
    @jennifergo2024 11 หลายเดือนก่อน

  • @jennifergo2024
    @jennifergo2024 11 หลายเดือนก่อน

  • @K3pukk4
    @K3pukk4 ปีที่แล้ว

    what a legend!

  • @yorailevi6747
    @yorailevi6747 ปีที่แล้ว

    Thanks i was just searching for this idea

  • @zeydabadi
    @zeydabadi ปีที่แล้ว

    Could you elaborate on “… j ranges over neurons at that position …” ?

  • @verystablegenius4720
    @verystablegenius4720 ปีที่แล้ว

    terrible exposition - doesn't seem to understand it himself either. "we should do the verification" even your notation is not clear. Also: "unary potential" is called a BIAS. Just read a stat. mech. book before making these videos, sigh.

  • @andrewluo6088
    @andrewluo6088 ปีที่แล้ว

    After watched this video, I finally understand

  • @stupidoge
    @stupidoge 2 ปีที่แล้ว

    Thanks for ur interpretation. I have a clear understading of how this equation works. (If could, I still need some detailed teaching on each part of equation. all in all, thanks for your help!!!

  • @AmitKumarPradhan57
    @AmitKumarPradhan57 2 ปีที่แล้ว

    I understood when Ps = Pop, contrastive divergence goes to zero as distribution of Y_hat and Y are same. It's not clear to me why the gradient also goes to zero. Thank you in advance. PS: I took this course last quarter.

  • @Jootawallah
    @Jootawallah 3 ปีที่แล้ว

    Another question, why is the gate function G(t) not just an independent parameter between 0 and 1? What do we gain from making it a function of h_t-1 and x? At the end, SGD would find good values for G(t) even if it were an independent parameter.

  • @Jootawallah
    @Jootawallah 3 ปีที่แล้ว

    Is there an explanation for why the three gated RNN architectures here differ in performance? Why is the LSTM, the architecture with the most parameters, not the most effective one? In fact, neither is the simplest one the most effective. It's the intermediate one that takes the gold medal. But why?

    • @davidmcallester4973
      @davidmcallester4973 3 ปีที่แล้ว

      A fair comparison uses the same number of parameters for each architecture --- you can always increase the dimension of the hidden vectors. Some experiments have indicated that at the same number of parameters all the gated RNNs behave similarly. But there is no real analytic understanding.

  • @Jootawallah
    @Jootawallah 3 ปีที่แล้ว

    I don't understand, what is the benefit of using a gated, i.e. residual, architecture? You talk about gates allowing forgetting or remembering, but why would we want to forget anyway? Also, whether G is zero or one, we always remember the previous state h_t-1 in some way! So I don't get it ...

    • @davidmcallester4973
      @davidmcallester4973 3 ปีที่แล้ว

      A vanilla RNN just after initialization does not remember the previous hidden state because the information is destroyed by the randomly initialized parameters. Vanilla RNNs could probably be improved with initializations that are better at remembering the previous state, but the structure of a gated RNN seems to cause SGD to find parameter settings with better memory than happens by running SGD on vanilla RNNs.

    • @Jootawallah
      @Jootawallah 3 ปีที่แล้ว

      @@davidmcallester4973 So is this again just a matter of residual architectures providing a lower bound to the gradient, and thus preventing them vanishing?

  • @Jootawallah
    @Jootawallah 3 ปีที่แล้ว

    On slide 11, shouldn't it be self.x.addgrad(self.x.grad*...) ? self.grad isn't defined, right?

  • @Jootawallah
    @Jootawallah 3 ปีที่แล้ว

    Illuminating!

  • @Jootawallah
    @Jootawallah 3 ปีที่แล้ว

    So if KL(p|q) = 0, does it mean that p = q upto a constant? Or are there other symmetries to take into account?

  • @jonathanyang2359
    @jonathanyang2359 3 ปีที่แล้ว

    Thanks! I don't attend this institution, but this was an extremely clear lecture :)

  • @addisonweatherhead2790
    @addisonweatherhead2790 3 ปีที่แล้ว

    The intuition around the 13 minute mark was really helpful! I've been trying to understand this paper for a few days now, and this has really made its goal and reasoning more succinct. Thanks!

  • @bernhard-bermeitinger
    @bernhard-bermeitinger 3 ปีที่แล้ว

    Thank you for this video, however, please don't call your variable ŝ 😆 (or at least don't say it out loud)

  • @kaizhang5796
    @kaizhang5796 3 ปีที่แล้ว

    Great lecture! May I ask how to choose the conditional probability of node i given its neighbors in a continuous case? Thanks!

    • @davidmcallester4973
      @davidmcallester4973 3 ปีที่แล้ว

      If the node values are continuous and the edge potentials are Gaussian then the conditional probability of a node given its neighbors is also Gaussian.

    • @kaizhang5796
      @kaizhang5796 3 ปีที่แล้ว

      David McAllester thanks! if each node has d-dimensional features, and let node i has k neighbors. Then how to determine the parameters of this Gaussian p(i | k neighbors)?

    • @kaizhang5796
      @kaizhang5796 3 ปีที่แล้ว

      Should I multiply k Gaussians together, where the mean of each Gaussian are the k neighbors?

  • @LyoshaTheZebra
    @LyoshaTheZebra 3 ปีที่แล้ว

    Thanks for explaining that! Great job. Subscribed!

  • @sdfrtyhfds
    @sdfrtyhfds 3 ปีที่แล้ว

    also, what if you skip the quantization during inference? would you still get images that make sense?

    • @davidmcallester4973
      @davidmcallester4973 3 ปีที่แล้ว

      Do you mean "during generation"? During generation you can't skip the quantization because the pixel-CNN is defined to generate the quantized vectors (the symbols).

    • @sdfrtyhfds
      @sdfrtyhfds 3 ปีที่แล้ว

      @@davidmcallester4973 I guess that during generation it wouldn't make much sense, i was thinking more in the direction of interpolating smoothly between two different symbols.

  • @sdfrtyhfds
    @sdfrtyhfds 3 ปีที่แล้ว

    do you train the pixel cnn on the same data and just not update the Vae weights while training?

    • @davidmcallester4973
      @davidmcallester4973 3 ปีที่แล้ว

      yes, the vector quantization is held constant as the pixel CNN is trained.

  • @bastienbatardiere1187
    @bastienbatardiere1187 3 ปีที่แล้ว

    you are not even taking the examle of a graph with loops, which is the whole point of LBP. Moreover, please introduce a little bit the notations, we should be able to understand even if we did not watch your previous videos. Nevertheless, it's great to teach such method.

  • @mim8312
    @mim8312 3 ปีที่แล้ว

    Now, the cutting edge scientists are working on the future AIs. Creating an AI by a combination of multiple AI's, which reportedly is similar to how our brain functions, which different portions performing specific functions, which can then understand and perform a multiple set of completely different tasks better than humans? What could go wrong?

  • @mim8312
    @mim8312 3 ปีที่แล้ว

    I think that too many people are focusing on the game, which I also follow, as if this were an ordinary player. Since I have significant knowledge, and since I believe that Hawking and Musk were right, I am really anxious by the self-taught nature of this AI. This particular AI is not the worrisome thing, albeit it has obvious, potential applications in military logistics, military strategy, etc. The really scary part is how fast this was developed after AlphaGO debuted. We are not creeping up on the goal of human-level intelligence. We are likely to shoot past that goal amazingly soon without even realizing it, if things continue progressing as they have. The early, true AIs will also be narrow and not very competent or threatening, even if they become "superhuman" in intelligence. They will also be harmless, idiot savants at first. Upcoming Threat to Humanity. The scary thing is the fact that computer speed (and thereby, probably eventually AI intelligence) doubles about every year, and will likely double faster when super-intelligent AIs start designing chips, working with quantum computers as co-processors, etc. How fast will our AIs progress to such levels that they become indispensable -- while their utility makes hopeless any attempts to regulate them or retroactively impose restrictions on beings that are smarter than their designers? At first, they may have only base functions, like the reptilian portion of our brain. However, when will they act like Nile crocodiles and react to any threat with aggression? Ever gone skinny dipping with Nile crocodiles? I fear that very soon, before we realize it, we will all be doing the equivalent of skinny dipping with Nile crocodiles, because of how fast AIs will develop by the time that the children born today reach their teens or middle age. Like crocodiles that are raised by humans, AIs may like us for a while. I sure hope that lasts. As the announcer in Jeopardy said about a program that was probably not really an advanced AI long ago, I, for one, welcome our future, AI overlords.

  • @zv8369
    @zv8369 3 ปีที่แล้ว

    5:50 The reference was meant to be Poole et el. rather than Chen et al.? arxiv.org/abs/1905.06922

  • @zv8369
    @zv8369 3 ปีที่แล้ว

    Could you please provide reference for your statement at 11:18 *"cross-entropy objective is an upperbound on the population entropy"*

    • @zv8369
      @zv8369 3 ปีที่แล้ว

      I think I got around to understanding why this is the case. Entropy, H(X), is the minimum number of bits required for representing X. Cross entropy is minimum when q matches the true distribution p (the minimum CE value is the entropy that is using the true distribution); otherwise CE is larger than entropy. Therefore, CE is an upperbound on the entropy! I didn't do a good job describing this but hope this helps!

    • @siyaowu7443
      @siyaowu7443 ปีที่แล้ว

      @@zv8369 Thanks! It helps me a lot!