Thanks a lot for making this accessible for people outside the field, for which reading and understanding these papers is quite tough. Thanks to you I'm able to stay slightly more up to date with the crazy quick developments in ML!
Thank you for bringing me into the world of neural network. Your videos always make difficult topics become easier by interconnecting relevant concepts that greatly enhance the understanding to follow your mindset. I hope I can learn more knowledge from you and apply them into my life goal some day.
Ho appena letto la piccola bio del tuo canale, spero di non essere offensivo dicendo che adesso capisco perché il tuo ottimo inglese mi sembrasse comunque molto familiare. Ad ogni modo ti ringrazio enormemente per il tuo contributo hai spiegato tutta la teoria in un modo, a mio avviso, estremamente chiaro e soprattutto coinvolgente. Ti prego continua così, di nuovo un enorme grazie e complimenti per il tuo contributo alla scienza
Grazie a te per aver visitato il mio canale! Spero di pubblicare più spesso, anche se per fare contenuti di qualità ci vogliono settimane di studio e preparazione. In ogni caso, spero di rivederti presto! Buon weekend
What funny, is that i predicted your next video will be on KAN, after i see you in github. I WILL WATCH THIS VIDEO, AS I FEEL THIS WILL BE THE FUTURE OF NEUR NETWORK, THANK YOU FOR YOUR WORK AND CONTENT ❤
crazy that it took me an hr video to understand that its the (control points) being trained on the spline graph vs weights with MLPs and CNNs, thank you!
I think KAN will be the catalist of a significant tipping point in science. I want to apply this to power system grids and replace existing dynamic models with ones made from PMU data using KAN
Hey @Umar, great content as always. Looking forward to a KAN implementation video from scratch. Also I think in 31:01 there is a minor language mistake. I think it will be for using a quadratic Bspline curve rather than quadratic Bezier curve
Thank-you so much for explaining the paper, it is so easy to understand now, btw can you also make a hands on video with the kan package developed by mit which is based off pytorch.
thanks Umar. Very nice explanation. Just 2 questions : 1 - Does it mean we can specify different knots per edge? 2 - I am not understanding how the backpropagation will work. Let's say we calculate the gradient from h1. It will update phi 1,1 and phi 1,2 but how the learning process will impact the knots to the desired value?
Excellent video, thanks! At the end, I _really_ wanted to see an illustration of the relatively "non-local" adaptation of MLP weights. Can that be found somewhere?
Ciao Umar. Innanzitutto grazie mille del tuo lavoro, sei una fonte di conoscenza infinita per come esponi gli argomenti. Ho seguito interamente questo video ed ho dei dubbi. All'inizio, quando introduci le b-splines si parla di control point in quanto punti che vengono dati come input e per i quali viene creata una curva che passa vicina ad essi secondo la base function. Successivamente, quando viene introdotto il network, si dice che ad essere trainate sono le funzioni ed in particolare i control points. Cosa vuol dire questo? I control points non sono gli input che diamo al modello e quindi i nostri dati che vogliamo approssimare ad una funzione? Sarei grato se mi chiarissi questo concetto. Grazie mille e buon lavoro :)
L'unico parametro che definisci è il numero di control point (che ne determina la granularità, ovvero quanto "precisa" deve essere l'interpolazione). Compito di una rete neurale è "apprendere" i parametri di una funzione complessa per ridurre una funzione di costo (loss function). Quali sono i parametri che si allenano? La posizione dei control point, non il loro numero, che invece è deciso a priori. È come quando cerchi di interpolare dei punti usando un polinomio: prima scegli il grado del polinomio (quante potenze della X), poi usando un qualche algoritmo "alleni" i coefficienti di ciascuna potenza. Spero ora sia più chiaro
Thanks for the video. For the first feature x0,1 we have 5 features for the same input x0,1 how the output is going to be different although they used the same input, grid size, degree and knot vector?
Sir I have been a huge fan of your videos and have watched all of them . I am currently in my second year BTech and really passionate about learning ml sir if possible can work under you I don’t want any certificate or anything just want to see observe and learn
Is the explicit form of the obtained functions accessible after training the model and performing L-1 regularization? Is there a repository and code for it already?
Thank you for the great video! Can you (or anyone) help understand why you need to introduce the basis functions b(x) in the residual activation functions?
One thing I didn’t catch: how are the functions tuned? If each function consists of points in space and we move around the points to move the B spline, how do we decide to move the points? Doesn’t seem like backprop would work in the same way.
The same way we move weights for MLPs: we calculate the gradient of the loss function w.r.t the parameters of these learnable functions and change them in the opposite direction of the gradient. This is how you reduce the loss. We are still doing backpropagation, so nothing changed on that front compared to MLPs.
fwiw I took a MLP solution for MNIST, substituted KAN for the MLP layers and no matter what I did (adding dimensions etc) it couldn't solve it. My intuition is that KANs only work well for approximating linear-ish functions, not irregular, highly discontinuous ones like image classification would need. But perhaps I just screwed it up :D
It seems that the results can not be improved on the mnist dataset but show less parameters, maybe the best senario for KAN is what needs interpretability
The fact this video is free is incredible
You're welcome 🤗
Your videos are literally the only ones with 1hr+ I would ever watch on TH-cam. Keep going mate, extremely high quality content 👏🏽👏🏽
Thanks a lot for making this accessible for people outside the field, for which reading and understanding these papers is quite tough. Thanks to you I'm able to stay slightly more up to date with the crazy quick developments in ML!
Thank you for bringing me into the world of neural network. Your videos always make difficult topics become easier by interconnecting relevant concepts that greatly enhance the understanding to follow your mindset. I hope I can learn more knowledge from you and apply them into my life goal some day.
Incredibly clear explanations, the flow of the video is also really smooth. It’s almost like you’re telling a story. Please keep making content!!
Clearly explained and very valuable content as always Umar. Thank you!
The intro of a basic linked up linear layers was so well done and really makes this introduction friendly!
I love that this research area develops fast enough that we need dedicated channels to explain new developments.
Your videos help me (a grad student) really understand difficult, often abstract concepts. Thank you so much... I'll always support your stuff!
You're on a mission to make the best and friendliest content to consume deep learning algorithms and I am all in for it.
Thanks Umar for such a wonderful tutorial! I've been eyeing this paper for a while!
Wow this was a super clear an on-point explanation. Thank you, Umar.
Ho appena letto la piccola bio del tuo canale, spero di non essere offensivo dicendo che adesso capisco perché il tuo ottimo inglese mi sembrasse comunque molto familiare.
Ad ogni modo ti ringrazio enormemente per il tuo contributo hai spiegato tutta la teoria in un modo, a mio avviso, estremamente chiaro e soprattutto coinvolgente.
Ti prego continua così, di nuovo un enorme grazie e complimenti per il tuo contributo alla scienza
Grazie a te per aver visitato il mio canale! Spero di pubblicare più spesso, anche se per fare contenuti di qualità ci vogliono settimane di studio e preparazione. In ogni caso, spero di rivederti presto! Buon weekend
@@umarjamilai Avevi già guadagnato un iscritto adesso hai guadagnato un fan.
Ahahahahah
Best explanations of splines i have seen. Legit 100%
Amazing content, thanks! I'm very excited about the continual learning properties of these networks.
i don't comment on YT but man oh man, this man is love. Too good of an explanation.
This is life changing, in my opinion. Thank you for the efforts on the videos!
It is a amazing resource for KANs. Thank you so much 🙂
One of the best math videos I’ve watched on TH-cam
Hello Umar, this video is my best birthday gift I have ever received, thanks a lot :)
Extremely clear explanation and content here! Very helpful. I am happy that you came from PoliMI as well :) keep it up!
What funny, is that i predicted your next video will be on KAN, after i see you in github.
I WILL WATCH THIS VIDEO, AS I FEEL THIS WILL BE THE FUTURE OF NEUR NETWORK, THANK YOU FOR YOUR WORK AND CONTENT ❤
Thanks for including prerequisites
Thanks for the crystal clear explaination!!
I saw this paper on papers with code, and thought to myself I wonder if Umar Jamil will cover this.
Thanks for your effort and videos!
Very clear, well explained, top notch!
crazy that it took me an hr video to understand that its the (control points) being trained on the spline graph vs weights with MLPs and CNNs, thank you!
Amazing! Thank you very much for this.
Great explanation and underrated also waiting for "Implementation of KAN from scratch" video
Your explanations are the best, thank you so much😘🤗
Thank you for your excellent explainations
🤩🤩🤩🤩
Thank you for such great and detailed explanation.
I think KAN will be the catalist of a significant tipping point in science.
I want to apply this to power system grids and replace existing dynamic models with ones made from PMU data using KAN
Having a such good teacher is so adorable, i wish i could be your students.
哪里哪里啊,谢谢你的赞成!
@@umarjamilai 太棒了,您还会中文👍
@@seelowst 我就是刚刚从中国来的,在中国主了4年了,现在回欧洲了。
@@umarjamilai 我从没离开过我的城市,我希望像您一样👍
High quality explanations.. Thanks.
Thanks for the amazing explanation!
that is very useful, informative and interesting! Thanks a lot!
You are savior, without you mortals like me would be lost in the darkness!!!
Thank you for what you do, you are amazing.
Fantastic explanation!
Awesome explanation!!!
awesome, easy to follow even person dont know anything :)
Amazing video! Thanks a lot !
Cant wait to watch this, saved! Will comment again when i actually watch it..😅
Hey @Umar, great content as always. Looking forward to a KAN implementation video from scratch. Also I think in 31:01 there is a minor language mistake. I think it will be for using a quadratic Bspline curve rather than quadratic Bezier curve
Thank-you so much for explaining the paper, it is so easy to understand now, btw can you also make a hands on video with the kan package developed by mit which is based off pytorch.
Much Thanks for this video
Thankyou Jamil, what a cool video
It was fantastic. continue my friend.
awesome explanation
You are amazing, thank you!
thanks Umar. Very nice explanation. Just 2 questions :
1 - Does it mean we can specify different knots per edge?
2 - I am not understanding how the backpropagation will work. Let's say we calculate the gradient from h1. It will update phi 1,1 and phi 1,2 but how the learning process will impact the knots to the desired value?
Sir, you are great..💙💙
Thank you so so much for this amazing content.
This is really great! Power to you!!🚀
Good video, quality content.
Thank you for making this video!
Amazingg explanation !
Excellent video, thanks! At the end, I _really_ wanted to see an illustration of the relatively "non-local" adaptation of MLP weights. Can that be found somewhere?
Amazing! Just wanted to ask if I should expect an implementation of this concept on this channel?
An implementation video will be awesome
Hats off, what an awesome video!!!
Excellent.
Thanks man. Next xLSTM please.
Umar bhai you the great
This is awesome!
Ciao Umar. Innanzitutto grazie mille del tuo lavoro, sei una fonte di conoscenza infinita per come esponi gli argomenti.
Ho seguito interamente questo video ed ho dei dubbi. All'inizio, quando introduci le b-splines si parla di control point in quanto punti che vengono dati come input e per i quali viene creata una curva che passa vicina ad essi secondo la base function. Successivamente, quando viene introdotto il network, si dice che ad essere trainate sono le funzioni ed in particolare i control points. Cosa vuol dire questo? I control points non sono gli input che diamo al modello e quindi i nostri dati che vogliamo approssimare ad una funzione?
Sarei grato se mi chiarissi questo concetto.
Grazie mille e buon lavoro :)
L'unico parametro che definisci è il numero di control point (che ne determina la granularità, ovvero quanto "precisa" deve essere l'interpolazione). Compito di una rete neurale è "apprendere" i parametri di una funzione complessa per ridurre una funzione di costo (loss function). Quali sono i parametri che si allenano? La posizione dei control point, non il loro numero, che invece è deciso a priori.
È come quando cerchi di interpolare dei punti usando un polinomio: prima scegli il grado del polinomio (quante potenze della X), poi usando un qualche algoritmo "alleni" i coefficienti di ciascuna potenza.
Spero ora sia più chiaro
Thanks for the video.
For the first feature x0,1 we have 5 features for the same input x0,1 how the output is going to be different although they used the same input, grid size, degree and knot vector?
Sir I have been a huge fan of your videos and have watched all of them . I am currently in my second year BTech and really passionate about learning ml sir if possible can work under you I don’t want any certificate or anything just want to see observe and learn
Phenomenal! Thank you :)
i loved it sir .
Is the explicit form of the obtained functions accessible after training the model and performing L-1 regularization?
Is there a repository and code for it already?
You’re fantastic, mate.
Please do post more ! please do more videos !
Great Content !!
THANK YOU
brilliant video!
bruh so good. Keep it up!
this video is so amazing!!!!!!!
awesome👍
刚好期盼这个!
期待你的评价😇
❤很好的内容,有考虑做inverse rl的内容吗❤
Thank you for the great video! Can you (or anyone) help understand why you need to introduce the basis functions b(x) in the residual activation functions?
Could you please next explain multi modal llms, techniques like Llava, llava plus, llava next?
I waiting for that day too
Check my latest video!
@@umarjamilai Yeah checking out, your are as usual the G.O.A.T
thank you
Hi, can you please make a video on multimodal LLMs, fine tuning it for custom dataset...
Check my latest video!
amazing
One thing I didn’t catch: how are the functions tuned? If each function consists of points in space and we move around the points to move the B spline, how do we decide to move the points? Doesn’t seem like backprop would work in the same way.
The same way we move weights for MLPs: we calculate the gradient of the loss function w.r.t the parameters of these learnable functions and change them in the opposite direction of the gradient. This is how you reduce the loss.
We are still doing backpropagation, so nothing changed on that front compared to MLPs.
Wow! 🙏
There are continuous but indiferable points in the spline, right? What are you going to do?
awesome bro.
Time to implement it
fwiw I took a MLP solution for MNIST, substituted KAN for the MLP layers and no matter what I did (adding dimensions etc) it couldn't solve it. My intuition is that KANs only work well for approximating linear-ish functions, not irregular, highly discontinuous ones like image classification would need. But perhaps I just screwed it up :D
It seems that the results can not be improved on the mnist dataset but show less parameters, maybe the best senario for KAN is what needs interpretability
@@haowu7916 yeah with adjustment I got it to solve it, but not with less parameters.
at 2:21 you mentioned the documentation. where can I find it ?
Great explanation. What app do you use to create slides ?
PowerPoint + a lot a lot a lot a lot a lot of patience.
Thanks!
Thanks
In search of gold i found a diamond
can you make tutorial video on model like Perplexity that use website live search
But what about wavelt Kolmogorov Arnold networks ?
Please explain DSPy