This made me fall in love with AI and ML again. Thank you so much. I was going through a slump, but when watching this I couldnt stop smiling throughout the entire video
You've gotten way better than the last time I checked you out. That was 4 years ago, lol, so I guess thats just normal. But great man! Loved it! Absolutely amazing content.
This looks more and more to me like consciousness is simply a sophisticated set of mathematical operations. This Neural Network architecture is able to optimize its own structure, like how many layers it has, in order to best solve a given problem. The set of equations looks a lot like the same equations used in optimized control theory where an observed state is compared to a desired state to give error state which is then applied by a multiplier and fed back into the system so as to move the system one order of magnitude closer to the desired state.
About a week back, I started working as Teaching Assistant to Under grad Differential Equations course, I wondered when I was reading the text, I had learnt all these theory myself I was in fresh men year but very rarely used these differential equations after the course and I wondered if I can use these in Machine learning (my area of interest). I am really excited after watching your video.
I regularly watch Siraj’s videos and this is one of the best I’ve seen... got my adrenaline pumping when I saw that list of topics to be covered at 8:30!
Last night when I was going to sleep I had a great idea for a self-evolving non-parametric neural-network. I was wondering for the longest time how I can get the integral of a function of the learning rate with multiple variables. Today I saw this, thank you.
Siraj... please tell me that you have travelled back in time to help us catchup with the future. I am just flabbergasted by the volume & intensity you handle.! I have no words to comment just a dropped jaw in pure awe!!!😘
Thank you for the video. One thing that I believe it's a kind of frustration it's when you try to solve a differential equation and you don't have any function initial value because actually it results in a serie of functions, not just one. Watching that video I just realized you already have those function initial values: simply they are those data you use to train the network!
This could be interesting for me as someone that spent many years during his PhD looking at nonlinear ODEs. Now as a ML guy this would be great to relate back to my original work. There is a caveat that I was not clear on, there is a difference between stability conditions for ODEs which was not clear in the paper how they treat this.
I am feeling more happy and proud now for learning Mathematics as my favourite subject. Another interesting reason to explore the AI more and more ..... Thanks, Sirj :)
To whoever pointing out he's speaking and going too fast: this is video is not a course in deep learning, and you shouldn't expect to be able to actively apply notions starting from here; it's a (very good, imho) panoramic view of the subject just to give you a taste. If you're willing to get somewhere, you first need to study: some linear algebra, some probability theory, some multivariable calculus, deep learning dedicated libraries in whatever programming language you wanna use and, last but not least, study from some books about deep learning. I've really appreciated this video, I come from "pure mathematics" (even if I don't really like this term), and I had just an intuitive idea of how deep learning is implemented, but now my understanding is a lot less fuzzy. Thank you very much.
The code which was shown in this video at the end of the video, doesn't show the ODE definition block. I mean, where the ODE was specified, except for the solver. Without defining ODE, how's it possible to solve dx/dt or d2x/dt2?
I'm only half way through the video and I can already tell this is my favorite one of 2019, and possibly my favorite research paper ever! Thanks, Siraj!
Really interesting research, AI is moving so fast right now. There is so many doors going to be opened. Modelling more complicated functions but still keeping the memory tied in. Amazing stuff, your videos are first class!
Please keep posting such videos for new interesting papers. It feels like, something under our noses with math, and we just need to notice it to completely solve AI in an unexpectedly simpler way. Delicious thing to watch. WTG.
Excellent video. It may be self evident, but It's important to conceptualize these improvements from both a mathematical and programming understanding. You tackled a tough concept beautifully!!! Good job, mate
Programmer: This function has too many conditionals to write. Can't be done. Data-scientist: Have you tried using Stochastic Gradient Descent to write them? *DNNs are born* Programmer: This function needs too many layers to generate. Can't be done. Data-scientist: Have you tried Stochastic Gradient Descent to pick the right number of layers? *ResNets are born* Programmer: Each feature of this function needs a dynamic, potentially non-integer number of non-linearities added in order to be generated. Can't be done. Data-scientist: Have you tried Differential Calculus to just generate the function? *ODEs are born* Programmer: This function is nowhere-differential. Can't be done. Data-scientist: Uh... *Pulls out box-counting* Programmer: This function can't be described by its fractal dimension. Can't be done. Data-scientist: Oh god... *Pulls out Neural Multifractal Analysis* Programmer: This function can't be described by its singularity spectra. Can't be done. Data-scientist: *Pulls out Neural Multifractal Analysis, but harder this time* Programmer: This function can't be described by its singularity spectra. Can't be done. Data-scientist: [Maximum Call-Stack Exceeded]
Hey mota bhay.....I think in this video you really tried to make things simpler , oh ...yeah . Thanks for considering my suggestion . Keep rocking bro , keep educating the people.
I haven't finished watching yet, but this type of videos is what makes Siraj shine in the world of AI teaching. Latest AI paper explained in a very exciting and motivational way. He is very right when he says that you cannot find this type of lecture anywhere else.
When we're predicting timestep t+h is it that we just forecast this in one step, or do we subdivide the gap (between t and h) into lots of sub-timesteps where the output is evaluated and passed into the algorithm again (almost like autoregression)?
+Siraj Raval I tried (and failed) to implement ODE nets on a gnn just before the end of the year. It was difficult not only because of the data source structure-ML in graph DBs is still in it's infancy-but also due to the relative dearth of info on this technique. Your explanations were helpful and (maybe even more important) your enthusiasm inspired me to go back and tackle it again; I'd forgotten why ODEnets are so appealing in the first place. Thank you!
Very intersting~ The way to illustrate maths(derivative, integral, partial derivative) is intuitive, I will spend time on Euler Function which I still not very clear. Thank you for uploading such a great introduction which is both profound and intuitive.
Interesting that more and more abstract concepts are added to the deep learning mix. Once found to be a more of a bottom up idea. Besides GANs which I found to be adding higher concepts of the mimax to lower ones as the neural networks, there are also developments in structuring networks from a point of view in abstract algebra, or now by this ODE. It's good to get an overview of the developing flow ....
TH-cam listed this video to me this morning just after I spent time looking for information on Liquid Neural Networks. Raj understood the importance of ODE 4 years ago ! Well done
did Siraj post this paper anywhere? I know the original documentation is there for the download but i'm trying to find the one which he is scripting off of.
so this paper essentially makes vertical wormholes for marbles to skip specific air current layers, then digs valleys so the marble has more time to fall into the appropiate grouping.
wow.....at some parts i wondered whether i accidentally enabled 1.5x mode. Slow down at the essential parts Siraj. Anyways....will try this out right now. I always come to your channel for inspiration and i get energised by the end of your video.
Thank you for the attempt, my suggestion is that you should use the time in the video more efficiently. This is a pretty advanced paper, and noone who doesn't know the basics of neural networks or what a differential is will attempt/succeed to understand it.
Normally, the input of each layer is the output of the previous layer (with some activation applied). If you therefore "stack" 10 layers, all of them will transform the data in a certain way before eventually forming the network output. In a ResNet, the input of each layer consists not only of the output of the previous layer, but also of the *input* to the previous layer. What this means is that if a layer learns to simply "do nothing", it will pass its input on to the next layer. While you therefore have a network with 10 layers, it could be the case that all layers after the 2nd one simply pass on the computed value; it doesn't necessarily need to be modified any further. So, you can specify any number of layers, but if it turns out that a small number of layers would work better than actually using all of these layers, the network can just learn to "ignore" some of its layers.
Very interesting stuff! However, what I don't quite understand is how the ODEs fit in with gradient descent. If the layers of the network can be represented as an ODE at some time t, and algorithms like Euler's method can be used to solve such equations, why is gradient descent necessary? Or if I understood incorrectly and Euler's method is used for computing the gradients rather than the weights, what is the benefit of this compared to using current methods? Does it allow for non-differentiable activation functions?
correct me if im wrong, but the main part of ode they used was Euler's method to approximate i was wondering if you can use any other tools that was taught for solving odes to neural networks.
Could you post a video on using the adjoint method to solve odes. I would just really appreciate a concise presentation. All of the material I have found on it, is hard to digest.
I have been exploring differential equations and am so happy I found this video, it puts the calculus in a context that is really interesting and applicable!!
Your vids are always of super high quality, often the topic is completely new to me yet you explain it in simple and easy to understand terms with clear examples. Well done!
Wow, I'm looking for a crisp overview of Neural ODE, this is to the point. Students like us need the old Siraj back. "You can do 99 things for someone and all they'll remember is the one thing you did bad"
I have read the paper: arxiv.org/pdf/1806.07366.pdf It seems that in a ResNet, the parameters of the layers are not the same in each layer, while in the ODE fitting problem the parameters are the same. This clearly reduces the degree of freedom in choosing the parameters. ODE parameter fitting is not new, there are even some limited references in the paper. It seems that now one can use standard machine learning libraries, too.
1 - Basic neural network theory |
8:30
2 - "Residual" neural network theory
| 12:40
3 - Ordinary Differential Equations (ODEs)
| 17:00
4 - ODE Networks
| 22:20
5 - Euler's Method to Optimize an ODENet
| 27:45
6 - Adjoint Method for ODENet Optimization
| 29:15
7 - ODENet's Applied to time series data
| 30:50
8 - Future Applications of ODENets | 33:41
Thanks broo
Np !
Only PyTorch implementation as of now? rtqichen's torchdiffeq Github.
I have faith in humanity because of people like you 👏🙏
Is there a popular term that TH-cam people use for a list of video bookmarks like this one?
The input-times weight-add a bias-activate song is brilliant and should be used in elementary schools
This made me fall in love with AI and ML again. Thank you so much. I was going through a slump, but when watching this I couldnt stop smiling throughout the entire video
Only channel on TH-cam that motivates me to study Maths..
I love how you're always excited about what you're talking about. It's infectious.
You've gotten way better than the last time I checked you out. That was 4 years ago, lol, so I guess thats just normal. But great man! Loved it! Absolutely amazing content.
thank you siraj for putting the effort to enclose a much larger, broader audience. Everyone benefits from this.
This looks more and more to me like consciousness is simply a sophisticated set of mathematical operations. This Neural Network architecture is able to optimize its own structure, like how many layers it has, in order to best solve a given problem. The set of equations looks a lot like the same equations used in optimized control theory where an observed state is compared to a desired state to give error state which is then applied by a multiplier and fed back into the system so as to move the system one order of magnitude closer to the desired state.
Siraj dropped the most fire freestyle of 2019 in this video.
@@marketsmoto3180 wait 10 hours for my next video
Siraj Raval I cant eat or sleep until I get these new bars Siraj!
About a week back, I started working as Teaching Assistant to Under grad Differential Equations course, I wondered when I was reading the text, I had learnt all these theory myself I was in fresh men year but very rarely used these differential equations after the course and I wondered if I can use these in Machine learning (my area of interest). I am really excited after watching your video.
This is an incredible research paper.
Thank you! I watched many videos on ODE with ResNet and yours is the best!!!
I regularly watch Siraj’s videos and this is one of the best I’ve seen... got my adrenaline pumping when I saw that list of topics to be covered at 8:30!
Last night when I was going to sleep I had a great idea for a self-evolving non-parametric neural-network. I was wondering for the longest time how I can get the integral of a function of the learning rate with multiple variables. Today I saw this, thank you.
Siraj... please tell me that you have travelled back in time to help us catchup with the future. I am just flabbergasted by the volume & intensity you handle.!
I have no words to comment just a dropped jaw in pure awe!!!😘
Thank you for the video. One thing that I believe it's a kind of frustration it's when you try to solve a differential equation and you don't have any function initial value because actually it results in a serie of functions, not just one. Watching that video I just realized you already have those function initial values: simply they are those data you use to train the network!
This could be interesting for me as someone that spent many years during his PhD looking at nonlinear ODEs. Now as a ML guy this would be great to relate back to my original work. There is a caveat that I was not clear on, there is a difference between stability conditions for ODEs which was not clear in the paper how they treat this.
I am feeling more happy and proud now for learning Mathematics as my favourite subject.
Another interesting reason to explore the AI more and more .....
Thanks, Sirj :)
I agree, I'm studying maths at university and it is awesome to see differential equations pop up in AI.
Ramesh is that you?
To whoever pointing out he's speaking and going too fast: this is video is not a course in deep learning, and you shouldn't expect to be able to actively apply notions starting from here; it's a (very good, imho) panoramic view of the subject just to give you a taste. If you're willing to get somewhere, you first need to study: some linear algebra, some probability theory, some multivariable calculus, deep learning dedicated libraries in whatever programming language you wanna use and, last but not least, study from some books about deep learning.
I've really appreciated this video, I come from "pure mathematics" (even if I don't really like this term), and I had just an intuitive idea of how deep learning is implemented, but now my understanding is a lot less fuzzy. Thank you very much.
fuzzy logic?
The code which was shown in this video at the end of the video, doesn't show the ODE definition block. I mean, where the ODE was specified, except for the solver. Without defining ODE, how's it possible to solve dx/dt or d2x/dt2?
"ODE block" is not really a block. Shameless plug, here is my explanation of this paper: th-cam.com/video/uPd0B0WhH5w/w-d-xo.html
I'm only half way through the video and I can already tell this is my favorite one of 2019, and possibly my favorite research paper ever! Thanks, Siraj!
Even though I'm good at math, I would have never imagined myself using differential equations again after high school... and here I'm
Really interesting research, AI is moving so fast right now. There is so many doors going to be opened. Modelling more complicated functions but still keeping the memory tied in. Amazing stuff, your videos are first class!
I like that you said "I know that sounds complicated but don't go anywhere."
Please keep posting such videos for new interesting papers. It feels like, something under our noses with math, and we just need to notice it to completely solve AI in an unexpectedly simpler way. Delicious thing to watch. WTG.
This is awesome, you're killing it mate!
Ohh come on!
I needed this for my differential equations proyect last semester:/
such an interesting topic!
Awesome siraj. You made my day.
I like this style of video where you talk freely, just like your livestreams.
Excellent video. It may be self evident, but It's important to conceptualize these improvements from both a mathematical and programming understanding. You tackled a tough concept beautifully!!! Good job, mate
Programmer: This function has too many conditionals to write. Can't be done.
Data-scientist: Have you tried using Stochastic Gradient Descent to write them?
*DNNs are born*
Programmer: This function needs too many layers to generate. Can't be done.
Data-scientist: Have you tried Stochastic Gradient Descent to pick the right number of layers?
*ResNets are born*
Programmer: Each feature of this function needs a dynamic, potentially non-integer number of non-linearities added in order to be generated. Can't be done.
Data-scientist: Have you tried Differential Calculus to just generate the function?
*ODEs are born*
Programmer: This function is nowhere-differential. Can't be done.
Data-scientist: Uh... *Pulls out box-counting*
Programmer: This function can't be described by its fractal dimension. Can't be done.
Data-scientist: Oh god... *Pulls out Neural Multifractal Analysis*
Programmer: This function can't be described by its singularity spectra. Can't be done.
Data-scientist: *Pulls out Neural Multifractal Analysis, but harder this time*
Programmer: This function can't be described by its singularity spectra. Can't be done.
Data-scientist: [Maximum Call-Stack Exceeded]
God-tier lulz lad, bravo
I have no fucking clue what this means...
...but it's fucking hilarious and I like it.
one day ill come back to this and understand...
Awesome, Thanks Siraj! The physics community is going to love this! Looking forward to you making more videos on this when this research expands!
Great video Siraj! Thanks and keep up the great work!!
Nice job, bruv. Keep making the diaspora proud!
Your videos are a continues stream of super high quality learnings about new computing mechanisms! Thank you!
Hey, siraj! Please make a video on Spiking Neural Networks!
Hey mota bhay.....I think in this video you really tried to make things simpler , oh ...yeah . Thanks for considering my suggestion .
Keep rocking bro , keep educating the people.
You are such a clever brain. Great work man thanks.
Thank you for the great effort you put in
I haven't finished watching yet, but this type of videos is what makes Siraj shine in the world of AI teaching. Latest AI paper explained in a very exciting and motivational way. He is very right when he says that you cannot find this type of lecture anywhere else.
You're a real triple og for doing this
Cant thank you enough! Thank you very much man, your channel is the best!
Really glad I studied math and CS in college.
When we're predicting timestep t+h is it that we just forecast this in one step, or do we subdivide the gap (between t and h) into lots of sub-timesteps where the output is evaluated and passed into the algorithm again (almost like autoregression)?
Breakthroughs like this are why AGI is closer than we think !
the movement of ur hands always inspire me ;p
thank you so much Siraj, I think you just opened my eyes on my next paper title.
Awesome Video, Hopping to cover more about new research papers in that simple way, I really enjoyed even I'm not mathematician.
Thank you Siraj, ive been reading over this paper for the last two weeks seeing how I can use it for my Forex predictions
Thanks Siraj, you're doing a great job!
One of your best videos.
+Siraj Raval
I tried (and failed) to implement ODE nets on a gnn just before the end of the year. It was difficult not only because of the data source structure-ML in graph DBs is still in it's infancy-but also due to the relative dearth of info on this technique.
Your explanations were helpful and (maybe even more important) your enthusiasm inspired me to go back and tackle it again; I'd forgotten why ODEnets are so appealing in the first place. Thank you!
Very intersting~ The way to illustrate maths(derivative, integral, partial derivative) is intuitive, I will spend time on Euler Function which I still not very clear. Thank you for uploading such a great introduction which is both profound and intuitive.
This is amazing. You are amazing. Thank you.
Interesting that more and more abstract concepts are added to the deep learning mix. Once found to be a more of a bottom up idea. Besides GANs which I found to be adding higher concepts of the mimax to lower ones as the neural networks, there are also developments in structuring networks from a point of view in abstract algebra, or now by this ODE. It's good to get an overview of the developing flow ....
I'm Artificial intelligence enthusiastic, please bring some more videos like this. it'll be helping a lot!
TH-cam listed this video to me this morning just after I spent time looking for information on Liquid Neural Networks.
Raj understood the importance of ODE 4 years ago ! Well done
did Siraj post this paper anywhere? I know the original documentation is there for the download but i'm trying to find the one which he is scripting off of.
Thanks for this good intro into this topic!
so this paper essentially makes vertical wormholes for marbles to skip specific air current layers, then digs valleys so the marble has more time to fall into the appropiate grouping.
Siraj bhai, Happy Uttarayan.
Sir, you are a great teacher, math simplified. 👌👌
Wow, something more interesting than capsule networks
I'm excited to see how this will merge with orthogonal polynomials
Thank you for making these videos!
Awesome breakdown of very involved topics, Siraj. Keep it up!
Thank you Siraj!! Your videos are awesome
Thanks for explaining all these concepts
Amazing explenations 😮
13:37 when Siraj is about to drop some hardcore ML knowledge
HAHAHAHA, shit is getting serious
wow.....at some parts i wondered whether i accidentally enabled 1.5x mode. Slow down at the essential parts Siraj. Anyways....will try this out right now. I always come to your channel for inspiration and i get energised by the end of your video.
Really badass presentation.
This is huge---Thanks
At ~11:00 "That, in essence, is how deep learning research goes. Let's be real, everybody."
You just won LeInternet for today ;-)
Thanks for your great work!
Thanks for explaining this. Genius
Next Einstein ?
Amazing video!
Feeding the next layer plus the input reminds me of Mandelbrots fractals f(z) = z^2 + c. Here the input and output are complex numbers though
"infinitesimally small and infinity big" once again Leibniz's monad is still schooling the world.
Thank you for the attempt, my suggestion is that you should use the time in the video more efficiently. This is a pretty advanced paper, and noone who doesn't know the basics of neural networks or what a differential is will attempt/succeed to understand it.
30:29 You know shit is about to get serious when Siraj takes on a ninja posture
This is amazing
Thanks siraj for your work
that was really great, thanks a lot!
Thank you so much for putting this together
I didn't get the "Network can decide how deep it needs to be" part at 14:02 . How? Can anybody eli5?
Normally, the input of each layer is the output of the previous layer (with some activation applied). If you therefore "stack" 10 layers, all of them will transform the data in a certain way before eventually forming the network output.
In a ResNet, the input of each layer consists not only of the output of the previous layer, but also of the *input* to the previous layer. What this means is that if a layer learns to simply "do nothing", it will pass its input on to the next layer. While you therefore have a network with 10 layers, it could be the case that all layers after the 2nd one simply pass on the computed value; it doesn't necessarily need to be modified any further.
So, you can specify any number of layers, but if it turns out that a small number of layers would work better than actually using all of these layers, the network can just learn to "ignore" some of its layers.
@@EctoMorpheus Thank you so much for taking the time to answer my question, I appreciate it! the whole thing makes much more sense now.
Very interesting stuff! However, what I don't quite understand is how the ODEs fit in with gradient descent. If the layers of the network can be represented as an ODE at some time t, and algorithms like Euler's method can be used to solve such equations, why is gradient descent necessary?
Or if I understood incorrectly and Euler's method is used for computing the gradients rather than the weights, what is the benefit of this compared to using current methods? Does it allow for non-differentiable activation functions?
Math is awesome i like that bru and my first time ever to hear about reinforcement learning.
I would like to learn more about the code started from 30:50 though.
But I love this video! Thanks for sharing.
I just wanna keep stare at the evolving convolutional layer output with this one. Must be fun! :)
Nice explainer. Thanks
Amazing video
thank you,excellent explanation.
correct me if im wrong, but the main part of ode they used was Euler's method to approximate i was wondering if you can use any other tools that was taught for solving odes to neural networks.
Could you post a video on using the adjoint method to solve odes. I would just really appreciate a concise presentation. All of the material I have found on it, is hard to digest.
I have been exploring differential equations and am so happy I found this video, it puts the calculus in a context that is really interesting and applicable!!
Your vids are always of super high quality, often the topic is completely new to me yet you explain it in simple and easy to understand terms with clear examples. Well done!
Wow, I'm looking for a crisp overview of Neural ODE, this is to the point. Students like us need the old Siraj back.
"You can do 99 things for someone and all they'll remember is the one thing you did bad"
I'm so glad you don't stop rapping from time to time, man
I have read the paper: arxiv.org/pdf/1806.07366.pdf It seems that in a ResNet, the parameters of the layers are not the same in each layer, while in the ODE fitting problem the parameters are the same. This clearly reduces the degree of freedom in choosing the parameters. ODE parameter fitting is not new, there are even some limited references in the paper. It seems that now one can use standard machine learning libraries, too.
I am also confused at this, since every layer would have to have the same weight?