@@SerranoAcademy I'm actually looking for other architectures right now since my models can't get pass the 88% AUC ROC maximum. Hopefully, I can use this to get to that sweet 95-ish %. Thank you again. Please be kind with the maths in your next video. lol
Your delivery was brilliant! You gave the right amount of detail. I’m really a fan now. Can’t wait for the next one, and I’m going to watch your other playlists. Thank you.
At first look, KAN requires more parameters to be trained than MLP, but they claimed in the paper that KAN can compete equally if not better with MLP using a smaller network, therefore a smaller number of layers. I cannot wait to watch the next video, I would like to understand how the initial splines are chosen. For instance, if we go with B-splines, which one do we take and how many ? Are there other parameters to learn in addition to the knots ?
Seems to me that, instead of training weights that lead to activation functions, KANs are training weights (knot vectors) that lead to spines. Interested to learn more about the tradeoffs between the two.
Sir: Around 11:28 in your video you show quadratic B splines (my question is true for any spline approximation) three splines that approximate the function of interest. I was unclear how this will be used. They will not be used as weights for a linear dot product right? The three splines connecting to x1 will be used to determine what each outputs right? If x1 cvalue is .3 then the middle will output a 0.3 and the other two will output 0. Am I right? I am confused how you can use them as weights in the regular sense.
Mathematical beauty is enough to motivate me to watch the rest of this series. But the practical question is whether these networks can perform as well as neural networks on benchmarks, given equal “compute.” That is probably more an empirical question than a mathematical one.
Thank you! I fully agree. I really liked the mathematical beauty, so that is what caught my interest. From what I understand, they perform well compared to regular NNs. But it could go either way; they could become huge, or not. However, my hope is that either way, they'll inspire new architectures coming from the theory of representation of functions, as this is a beautiful field that has remained (until now) unexplored in ML.
The next logical step is to dynamically change the accuracy of each spline per problem. And then train both weights and functions by having a third function which determines which part to train. And then we'll end up with something closer to how a biological neuron works.
Ok this is great. However doesn't it also demonstrate that KANs and MLPs are equivalent? The spline sections are equivalent to the activation levels.. and the choice of b splines is equivalent to the choice of functions. So aren't the two theories and entire architectures potentially equivalent? Is this just a choice of how to get the same function approximation system into memory?
The best explanation of the topic on TH-cam. All you need to know is explained in less than 15 minute! Thank you very much
Truly a hidden gem. I've missed out a lot by not knowing about this channel, and off course Luis is the best math teacher i've ever had.
Thanks!
@khaledal-utaibi2049 thank you so much for your very kind contribution! I really appreciate it ☺️
One of the finest videos on KAN network
and this lesson is free? What a time to be alive!
Thank you, my payment is kind comments like yours. :)
@@SerranoAcademy I'm actually looking for other architectures right now since my models can't get pass the 88% AUC ROC maximum. Hopefully, I can use this to get to that sweet 95-ish %. Thank you again. Please be kind with the maths in your next video. lol
Your delivery was brilliant! You gave the right amount of detail. I’m really a fan now. Can’t wait for the next one, and I’m going to watch your other playlists. Thank you.
I can't thank you enough, Luis. You make all this stuff look very simple.
Great delivery! I wish every math teacher was like Luis.
Thank you so much for your efforts to put out such informative videos.
Muchas gracias por las explicaciones. Como siempre, las mejores que hay en youtube.
Muchas gracias, me alegra que te gusten! :)
12/5. Checked the channel for #2. Eagerly waiting. Great Video.
Very well done video, as usual! Great and interesting work!
@@skydiver151 thank you! I’m glad you liked it!
Clearly explained and illustrated. Thank you.
Thank you! I'm glad you liked it!
Amazing video by amazing teacher
Thank you! :) The next one is coming up soon, and I'm having a lot of fun making it. :)
Very innovative concept.
excellent explanation. thank you so much
This was a really good explanation of KANs. 🥳
Eagerly waiting for the second part!!!❤
Great timing! The second part just came out! :) th-cam.com/video/nS2hnm0JRBk/w-d-xo.html
@@SerranoAcademy thank u sir,u r awesome!!!🎉
At first look, KAN requires more parameters to be trained than MLP, but they claimed in the paper that KAN can compete equally if not better with MLP using a smaller network, therefore a smaller number of layers. I cannot wait to watch the next video, I would like to understand how the initial splines are chosen. For instance, if we go with B-splines, which one do we take and how many ? Are there other parameters to learn in addition to the knots ?
Thank you so much Luis , cant wait for next chapter 😍
Thank you! :) Yes, super excited for that one, it's coming up soon!
@@SerranoAcademy 😍😍
This is fantastic! Thank you
Thanks prophet.
Great video! Very well explained, Peace!
@@Pedritox0953 thank you so much, I’m glad you liked it! Peace! 😊
Why this video only have 813 views after 4 hours? Subscribe instantly :D
thank you sir
Amazing video! Thanks :)
Thank you, I'm glad you liked it!
Seems to me that, instead of training weights that lead to activation functions, KANs are training weights (knot vectors) that lead to spines. Interested to learn more about the tradeoffs between the two.
Sir: Around 11:28 in your video you show quadratic B splines (my question is true for any spline approximation) three splines that approximate the function of interest. I was unclear how this will be used. They will not be used as weights for a linear dot product right? The three splines connecting to x1 will be used to determine what each outputs right? If x1 cvalue is .3 then the middle will output a 0.3 and the other two will output 0. Am I right? I am confused how you can use them as weights in the regular sense.
I think they reinvented the wheel with this one. Existing NN-s are already KANs. What they think is new is a misunderstanding of these concepts.
Mathematical beauty is enough to motivate me to watch the rest of this series. But the practical question is whether these networks can perform as well as neural networks on benchmarks, given equal “compute.” That is probably more an empirical question than a mathematical one.
Thank you! I fully agree. I really liked the mathematical beauty, so that is what caught my interest. From what I understand, they perform well compared to regular NNs. But it could go either way; they could become huge, or not. However, my hope is that either way, they'll inspire new architectures coming from the theory of representation of functions, as this is a beautiful field that has remained (until now) unexplored in ML.
great video😊
Thank you! :)
Typo in the KAN depiction: output should be f1(x1) + f2(x2).
Oh thanks! Yeah you're right, the w's should be x's.
You are the best!!!
Thank you so much! :)
The next logical step is to dynamically change the accuracy of each spline per problem. And then train both weights and functions by having a third function which determines which part to train. And then we'll end up with something closer to how a biological neuron works.
why do we require 4 basis functions to approximate any linear/quadratic function with 3 bins?
The second part is out, on the Kolmogorov-Arnold Theorem!
th-cam.com/video/nS2hnm0JRBk/w-d-xo.htmlsi=ym6OsCVKFgiHhtne
Could you please share the slide/keynotes?
Ok this is great. However doesn't it also demonstrate that KANs and MLPs are equivalent? The spline sections are equivalent to the activation levels.. and the choice of b splines is equivalent to the choice of functions. So aren't the two theories and entire architectures potentially equivalent? Is this just a choice of how to get the same function approximation system into memory?
🎉
@@carolinalasso ❤️
❤
❤️🧠
😊💪
It is the same CNN+DENSE layers , so what is its advantage ? Would you please take example of its advantage?
The hippo-kan is a new kan family it's perform better than kan model