Neural Ordinary Differential Equations

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ก.ย. 2024

ความคิดเห็น • 47

  • @nathancooper1001
    @nathancooper1001 5 ปีที่แล้ว +24

    Best explanation I've found so far on this. Good job!

  • @jordibolibar6767
    @jordibolibar6767 3 ปีที่แล้ว +9

    After reading and watching various articles and videos, I must say this is the clearest explanation I've found so far. Thanks!

    • @eshaa3393
      @eshaa3393 2 ปีที่แล้ว

      لا جلالاخح

  • @siclonman
    @siclonman 5 ปีที่แล้ว +15

    I spent 3 hours yesterday trying to figure out what the hell was happening in this paper, and I wake up to this...THANK YOU

  • @KMoscRD
    @KMoscRD 9 หลายเดือนก่อน

    That series of explaining deep learning papers, soo good.

  • @cw9249
    @cw9249 ปีที่แล้ว +1

    wow this seems by far the most distinctive type of network in deep learning. everything else kind of falls into a few categories, but can all be conceptually interconnected in some way. this is not even close

  • @zyadh2399
    @zyadh2399 3 ปีที่แล้ว +1

    This is my video of the year, thanky you for the explanation.

  • @chrissteel7889
    @chrissteel7889 3 ปีที่แล้ว +2

    Really great explanation, very clear and concise.

  • @DonHora
    @DonHora 4 ปีที่แล้ว +4

    So clear now, many thanks ! +1 follower

  • @ericstephenvorm
    @ericstephenvorm 3 ปีที่แล้ว +2

    Cheers! That was an excellent video. Thanks so much for putting it together!

  • @peterhessey7732
    @peterhessey7732 3 ปีที่แล้ว +2

    Super helpful video, thanks!

  • @SuperSarvagya
    @SuperSarvagya 5 ปีที่แล้ว +3

    Thanks for making this video. This was really helpful

  • @shorray
    @shorray 3 ปีที่แล้ว +2

    Great video thx! I didn't get the part with the encoder, where is the video, you talked about?
    I mean the Figure 6, are they supposed to work with NODE or... mhhh...
    would love, if somebody could explain it

  • @handokosupeno5425
    @handokosupeno5425 10 หลายเดือนก่อน

    Amazing explanation

  • @zitangsun8688
    @zitangsun8688 3 ปีที่แล้ว

    please, see the caption of Fig.2 "If the loss depends directly on the state at multiple observation times, the adjoint state must be
    updated in the direction of the partial derivative of
    the loss with respect to each observation"
    why need to add an offset for each observation???

  • @liangliangyan7528
    @liangliangyan7528 2 ปีที่แล้ว

    Thank you for this video, maybe is useful for me.

  • @Alex-rt3po
    @Alex-rt3po ปีที่แล้ว

    How does this relate to liquid neural networks? That paper is also worthy of a video from you I think

  • @albertlee5312
    @albertlee5312 4 ปีที่แล้ว +2

    Thank you!
    I am trying to understand the implementation in Python, but I am confused about why we still need 2~3 Conv2D layers with activation function... if we consider hidden layers as a continuous function that can be solved by ODE solvers.
    Could you please help me with this?

    • @YannicKilcher
      @YannicKilcher  4 ปีที่แล้ว +4

      The network doesn't represent the continuous function, but is a discrete approximation to the linear update equation of ODEs.

  • @wasgeht2409
    @wasgeht2409 5 ปีที่แล้ว +3

    u are fucking good my friend

  • @hdgdhdhdh
    @hdgdhdhdh 3 ปีที่แล้ว

    Hi, Thanks for the crisp explanation. However, Is there any forum or link which I can join for ODE related issues/tasks? Actually, I have just started working on ODEs and would appreciate some help or discussions related to the topic. Thanks!

  • @moormanjean5636
    @moormanjean5636 2 ปีที่แล้ว

    Amazing video.. new subscriber for sure

  • @ClosiusBeg
    @ClosiusBeg ปีที่แล้ว

    Ok.. and how to find the adjoin equation? What is it and what does it mean and why we can do it?

  • @alekhmahankudo1051
    @alekhmahankudo1051 5 ปีที่แล้ว +1

    I could not understand why do we need to compute dL/dZ(0), don't we need just dL/d{theta}, for updating our parameters. I would appreciate if anybody could answer my query.

    • @wujiewang8781
      @wujiewang8781 4 ปีที่แล้ว +1

      they are related in the augmented dynamics, you could look in the appendix in the paper

  • @-mwolf
    @-mwolf ปีที่แล้ว

    Thanks!

  • @sarvagyagupta1744
    @sarvagyagupta1744 3 ปีที่แล้ว +1

    Thanks for making this video. I had questions while reading this paper and you covered those topics. But I still don't understand how did we get equation 4? Also, when we go from eq 4 to 5, the integration is for a very small time step right? It's not for the whole period as shown in the diagram. Let me know.

  • @herp_derpingson
    @herp_derpingson 5 ปีที่แล้ว

    Let me try to summarize. Tell me if I understood it right.
    There is a neural network which tries to predict the hidden activations at each layer (in a continuous space) of another neural network.
    So, the integral, of the outputs, of this entire neural network, should be the activations of the final layer (x1), of the neural network which we are trying to predict. Similarly, the input should be the initial activations. (x0)
    Therefore, loss is the deviation from ground truth and integration of the first neural network from x0 to x1.
    The integration is done through some numerical ODE solver like Euler method. It must be continuous and differentiable.
    t is a hyperparameter which is an arbitrarily chosen "depth" of the neural network which we are trying to predict.

    • @YannicKilcher
      @YannicKilcher  5 ปีที่แล้ว +3

      Yes that sounds legit. The main contribution of the paper is the way they implement backpropagation without having to differentiate through the ODE solver.

  • @daveb4446
    @daveb4446 ปีที่แล้ว

    What is the scientific term for “oh crap okay”?

  • @keeplearning7588
    @keeplearning7588 ปีที่แล้ว

    What‘s the t mean? number of layer?

  • @conjugategradient5498
    @conjugategradient5498 3 ปีที่แล้ว

    I suspect that the primary reason why the RNN has such jagged lines is due to Relu. I'm curious to see what the results look like with Tanh.

    • @moormanjean5636
      @moormanjean5636 2 ปีที่แล้ว

      RNNs are a time discrete analog to Neural ODEs, which could also contribute to the jerkiness

  • @2bmurders
    @2bmurders 5 ปีที่แล้ว

    I think I'm misunderstanding something from the paper (maybe it's the comparison with residual networks). Is this concept using the idea of having a fixed size neural network that is approximating an unknown differential equation that would then be numerically integrated across to some arbitrary time step (the prediction) in the future for the output? That would make sense to me. But the paper seems to sort of hint at still having x layers as the approximating intermediate steps of the approximated differential equation when the paper references the hidden states of the network. That part is what's throwing me off.

    • @YannicKilcher
      @YannicKilcher  5 ปีที่แล้ว +1

      Not sure I understand your problem, but your first statement is correct. And the "arbitrary" time step in the future is a fixed time, I think. I guess easiest would be to always go from 0 to 1 with t. The question of the paper is, if this whole procedure approximates some neural network with h hidden layers, how big is h?

    • @2bmurders
      @2bmurders 5 ปีที่แล้ว

      Thanks for the follow up. After letting this paper sink in for a bit, I think it's finally clicked for me...and now I feel a little dumb for not getting it the first time because it's pretty straight forward (must have been over thinking it). I'm curious if it's possible to parameterize the width dynamics over time in tandem with depth via a neural network (probably as a PDE at that point). Regardless, this paper is really exciting.

  • @iqraiqra6627
    @iqraiqra6627 4 ปีที่แล้ว +1

    Hay guys can anyone help me to write research proposal on ODEs topics

  • @bertchristiaens6355
    @bertchristiaens6355 4 ปีที่แล้ว

    Fantastic videos and channel!!
    ps: I noticed that in your playlist of "DL architectures' there are a few videos duplicated (e.g. 7 times Linformer)

    • @YannicKilcher
      @YannicKilcher  4 ปีที่แล้ว

      thanks. I know my playlists are a mess :-S

    • @bertchristiaens6355
      @bertchristiaens6355 4 ปีที่แล้ว

      @@YannicKilcher but your videos are 👌 though

  • @bhanusri3732
    @bhanusri3732 4 ปีที่แล้ว +1

    How does da(t)/dt equation come?

    • @YannicKilcher
      @YannicKilcher  4 ปีที่แล้ว

      It comes from the ODE literature. I think to really understand this paper you might need to dive into that.

    • @bhanusri3732
      @bhanusri3732 4 ปีที่แล้ว

      @@YannicKilcher how did the equation da(t)/dt know its directly proportion to -a(t) and df/dz.Is it through experimenting.I am noob sorry if my doubt was too basic

    • @bhanusri3732
      @bhanusri3732 4 ปีที่แล้ว

      @@YannicKilcher will we apply chain rule? How does it work in this particular equation

    • @wujiewang8781
      @wujiewang8781 4 ปีที่แล้ว

      @@bhanusri3732 you can look into the SI of the paper. the derivation is not too bad.

  • @ethansmith7608
    @ethansmith7608 ปีที่แล้ว

    diffusion, before it was cool