Kolmogorov Arnold Networks (KAN) Paper Explained - An exciting new paradigm for Deep Learning?

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ธ.ค. 2024

ความคิดเห็น • 118

  • @avb_fj
    @avb_fj  6 หลายเดือนก่อน +18

    At 7:10 there is a correction. The notations aren't consistent with the matrix shown at 5:44.
    x_1 will pass through phi_{11}, phi_{21},..., phi_{51}; and x_2 would pass through phi_{21}, phi_{22},...,phi_{52}.
    Basically, the activation functions should be labeled in this order: phi_{11}, phi{21}, phi{31}, phi{41}, phi{51}, phi_{21}, phi_{22}, phi_{32}, phi_{42}, phi_{52}
    Credit to @bat.chev.hug.0r for pointing it out!

  • @jayd8935
    @jayd8935 6 หลายเดือนก่อน +25

    Even as a person who isn't great at math, your explanation was clear and helped me a lot in understanding this quite exciting paper! Thank you :)

  • @foramjoshi3699
    @foramjoshi3699 6 หลายเดือนก่อน +28

    2:54 The example really helps me understand...this is an amazing and simple to understand KAN. Kudos to you!

  • @sergiorubio6797
    @sergiorubio6797 12 วันที่ผ่านมา

    Great explanation! Let's see how they perform in different applications. Kudos to you!

  • @AurobindoTripathy
    @AurobindoTripathy 6 หลายเดือนก่อน +25

    This is an excellent explanation of the paper (now i can ease into reading the paper). Learnable activations is new and exciting and most researchers would be kicking themselves saying, "why didn't I think of that?" The next step (for the authors of the paper) may be to work with "attention", because as far as we know, that's "all you need".

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน +8

      Agreed! In theory, they could probably do some attention stuff when aggregating the outputs of the activation function at each layer. Instead of a regular addition, just do a (attention-)weighted addition. It'll be interesting to see for sure - Kolmogorov Arnold Attention Networks (KAAN) got a nice ring to it.
      That said, I think they should prioritize making it highly parallelizable and fast first.

    • @dead_d20dice67
      @dead_d20dice67 6 หลายเดือนก่อน

      There have been attempts to make activation functions learnable. In my opinion, one of the most successful attempts is the radial basis function neural network. It's quite an interesting mechanism, but it is now considered outdated.

  • @tamalmajumder4832
    @tamalmajumder4832 23 วันที่ผ่านมา

    Glad I came across your profile. Amazing stuff your putting out here dada.

    • @avb_fj
      @avb_fj  22 วันที่ผ่านมา

      Thanks!!

  • @soumilyade1057
    @soumilyade1057 6 หลายเดือนก่อน +3

    Simple and to-the-point explanation. You avoided the mathematical jargos cleverly.

  • @736939
    @736939 6 หลายเดือนก่อน +6

    This is what I call: the democratization of the Math. The true scientist can explain the most hardest things in Math with simple terms.

  • @NasrinAkbari-ge7pm
    @NasrinAkbari-ge7pm 5 หลายเดือนก่อน +1

    wowwwww, it was great explanation. you make the concepts very easy to understand. Thank you!

  • @darkhydrastar
    @darkhydrastar 6 หลายเดือนก่อน +4

    Great work bud. I also appreciate your high quality sound and gentle voice.

  • @johnandersontorresmosquera1156
    @johnandersontorresmosquera1156 6 หลายเดือนก่อน +6

    Excellent explanation, and great examples, thanks for sharing your knowledge !

  • @alexeypankov8180
    @alexeypankov8180 6 หลายเดือนก่อน +8

    this reminds me of harmonics in sound, where the function is one-dimensional (the strength of the sound depends on time), but we can say that a sound wave is also a complex function that consists of simpler functions, namely different frequencies or harmonics of the sound wave. I have this analogy in my head

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน +5

      I think thats a fair analogy. I saw some stuff in Hackernews (news.ycombinator.com/item?id=40219205) where someone tried to implement a KAN layer on pytorch with Fourier coefficients (github.com/GistNoesis/FourierKAN/).

  • @mrpocock
    @mrpocock 6 หลายเดือนก่อน +4

    I get an itch in the back of my brain that KANs should be able to use some support-vector tricks. In particular, there should be a sub-set of training examples that support the learned splines, with the others being hit "well enough" by interpolation. It's kind of like learning the support vectors + kernel at the same time. It perhaps should be possible to train an independent KAN per minibatch with a really restricted number of free params, and use this to a) drop out the non-supporting training examples, and b) concat/combine the learned parameters recursively.

  • @AliKamel2004
    @AliKamel2004 6 หลายเดือนก่อน +5

    Finally !!
    A clear explanation.
    Thanks bro 🇮🇶

  • @saichaithanya4
    @saichaithanya4 5 หลายเดือนก่อน

    Awesome explanation. The approach taken to understand a paper is really good. Solid job, mate.

  • @braineaterzombie3981
    @braineaterzombie3981 6 หลายเดือนก่อน +3

    Wow. I am sold bro. This explaination was really good.

  • @pladselsker8340
    @pladselsker8340 6 หลายเดือนก่อน

    This is the best explanation of the theorem I've found so far. I think I understood most of it when going through the paper, but this has really solidified and clarified what the proof is about.

  • @RiteshBhalerao-wn9eo
    @RiteshBhalerao-wn9eo 6 หลายเดือนก่อน +2

    Love the simplicity of explanation !

  • @VURITISAIPRANAYCSE2021VelTechC
    @VURITISAIPRANAYCSE2021VelTechC 2 หลายเดือนก่อน

    very well explained. thanks a lot. keep doing this stuff.

    • @avb_fj
      @avb_fj  2 หลายเดือนก่อน

      Thanks!!

  • @jeankunz5986
    @jeankunz5986 6 หลายเดือนก่อน +1

    Great and simple explanation. Worthy of A. Karpathy
    😀

  • @PeterWauyo
    @PeterWauyo 4 หลายเดือนก่อน +1

    This is an excellent explanation of KANs

  • @federicocolombo8761
    @federicocolombo8761 5 หลายเดือนก่อน

    Such an amazing work. Thank you for the video!

  • @fatau_sertaneja
    @fatau_sertaneja 6 หลายเดือนก่อน +1

    I cannot believe I actually understood this! Thank you very much ❤️👏👏👏👏🇧🇷🇧🇷🇧🇷🇧🇷

  • @ajk251
    @ajk251 6 หลายเดือนก่อน +2

    Amazing video! Great explanation & visuals. I tried to read the paper, but couldn't fully grasp it. Your video really helped my understanding.

  • @EmirSyailendra
    @EmirSyailendra 5 หลายเดือนก่อน +1

    Thank you for such a great explanation!

  • @maxheadrom3088
    @maxheadrom3088 2 หลายเดือนก่อน

    Kolmogorov - the most important unknown mathematician ever!

  • @Foba_Bett
    @Foba_Bett 3 หลายเดือนก่อน

    Amazing work! Thank you!

  • @MengqiShi-el6cv
    @MengqiShi-el6cv 6 หลายเดือนก่อน +1

    thank you! really good explanation and helps me a lot!

  • @StratosFair
    @StratosFair 5 หลายเดือนก่อน

    Very nice explanation, thank you !

  • @capablancastyle
    @capablancastyle 6 หลายเดือนก่อน +2

    Muchas gracias por la explicación!!!

  • @AdmMusicc
    @AdmMusicc 6 หลายเดือนก่อน +2

    I loved your mathematic explanations! Thanks for this. Will sub to your patreon :)

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน

      Awesome, thank you! Glad you enjoyed it.

    • @AdmMusicc
      @AdmMusicc 6 หลายเดือนก่อน

      Do you plan on making long mathematical breakdowns and derivations of ML papers at some point in the future? An example of what I mean is something like the mathematical explanation of the diffusion model by "Outlier" TH-cam channel.
      The suggestion is basically to have 2 versions of some major ML topic. An overview like this video and another one that goes into a more deep dive of derivations and simplifying it.

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน +1

      @@AdmMusicc Thanks for the suggestion, sounds like a good idea. I might consider doing more in-depth math videos in the future. Most of my videos right now focus on the more practical and intuitive aspects of ML algorithms with some visual cues and illustrations.

    • @AdmMusicc
      @AdmMusicc 6 หลายเดือนก่อน

      @@avb_fj Thank you!

  • @theabc50111
    @theabc50111 6 หลายเดือนก่อน

    Best explanation video I've watched!

  • @sethjchandler
    @sethjchandler 6 หลายเดือนก่อน

    Best explanation I’ve seen. Thanks.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 6 หลายเดือนก่อน +2

    This is really well presented

  • @MrKrtek00
    @MrKrtek00 6 หลายเดือนก่อน

    good explanation and useful details! thanks

  • @SanthoshKammari-ug2gj
    @SanthoshKammari-ug2gj 6 หลายเดือนก่อน

    Really clear explanation!!

  • @julienroy6561
    @julienroy6561 6 หลายเดือนก่อน

    Wonderful overview, thanks!

  • @jubaerjami
    @jubaerjami 4 หลายเดือนก่อน

    Great explanation!

  • @jcugnoni
    @jcugnoni 6 หลายเดือนก่อน +1

    Thank you for this great, extremely clear video. KAN network seem to be a much more sensible approach than MLP for physics as the basis function can be selected based on some prior knowledge of this field... But without GPU support it will be complicated to scale to large scale models.

  • @nikhiljoshi8171
    @nikhiljoshi8171 6 หลายเดือนก่อน +1

    It was to the point explanation
    Thanks

  • @NS-ls2yc
    @NS-ls2yc 6 หลายเดือนก่อน

    Thanks for the easy explanations

  • @epaillas
    @epaillas 6 หลายเดือนก่อน

    Great explanation, thanks for the video!

  • @Jai-tl3iq
    @Jai-tl3iq 6 หลายเดือนก่อน

    Good explanation, please continue to make more videos on neural nets.

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน

      Thanks!

  • @amirarsalanrajabi5171
    @amirarsalanrajabi5171 6 หลายเดือนก่อน

    Awesome video! Thanks a lot 🙏

  • @rishiroy2476
    @rishiroy2476 6 หลายเดือนก่อน

    AVB sir you are awesome.

  • @nagakushalageeru135
    @nagakushalageeru135 6 หลายเดือนก่อน +1

    Great video !

  • @elonmax404
    @elonmax404 6 หลายเดือนก่อน +2

    Great explanation

  • @squarehead6c1
    @squarehead6c1 6 หลายเดือนก่อน

    Great presentation! Impressive!

  • @ezl100
    @ezl100 6 หลายเดือนก่อน +1

    great explanation thanks. I am just a bit confuse of what contains the learnable function at the edge level and how these local parameters are updated during the backpropagation phase. Thanks !

  • @MultiCraftTube
    @MultiCraftTube 6 หลายเดือนก่อน

    Excellent explanation!

  • @ramanShariati
    @ramanShariati 6 หลายเดือนก่อน +1

    great video! loved it.

  • @taylorkim1243
    @taylorkim1243 5 หลายเดือนก่อน

    This man is brilliant.

  • @sshaikh8104
    @sshaikh8104 24 วันที่ผ่านมา

    Throrem is well explained thanks, have one question whats the difference in multiple regression and kn throrem its seems same to me

  • @christopherc168
    @christopherc168 5 หลายเดือนก่อน +1

    can you do one on the WAV KAN wavlet Kan paper?

  • @reji6414
    @reji6414 6 หลายเดือนก่อน

    Super explaination ❤❤

  • @ps3301
    @ps3301 6 หลายเดือนก่อน +2

    How does liquid neural network compare to kan ?

  • @jonclement
    @jonclement 6 หลายเดือนก่อน

    very clear. Subscribed. Question though. Your b-spline visualization showed it curving under itself -- but wouldn't this make it a non continuous function? (ie/ more then 1 output per x co-ord?)

  • @ParthivShah
    @ParthivShah 6 หลายเดือนก่อน +1

    it's really fascinatng.

  • @LearnAIWithRJ
    @LearnAIWithRJ 6 หลายเดือนก่อน

    Awesome video. If possible make a video explaininb B splines in detail.😅

  • @paaabl0.
    @paaabl0. 6 หลายเดือนก่อน

    If you want to have an arbitrary complex b-spine in each node, does it mean you have an unbounded (dynamic) number of parameters?

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน +1

      I think the degree of the b splines is a hyperparameter that is predetermined at the start of training. So no its not dynamic during training.
      That said, one can increase or decrease the number of control points of the splines after training (look at Grid Extension in the video or the paper) to create a new model with more/less complexity.

  • @pzhao7615
    @pzhao7615 6 หลายเดือนก่อน

    this is more than good

  • @simonstaro2075
    @simonstaro2075 6 หลายเดือนก่อน

    Most important that is theoreticaly prooven formula of presentation of any multidemensional function. Kolmogorov- Arnold theorem plus control points of B-spline are basis for any continuous function. Training, speed and propagation are technical problems which should be solved.

  • @johnlennon2009nyc
    @johnlennon2009nyc 6 หลายเดือนก่อน

    Thank you for your very clear explanation
    I have a question
    Please tell me about the formula at the bottom around 3:40.
    In this formula, isn't "Price" the result of adding all the prices in each row above?
    Or is “price” in this expression a vector?

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน

      Thanks a lot!
      So in the dataset (like the Boston Housing Dataset) each row stands for one house/property with all of its different features and its price. And the task is to train a ML model that inputs the features (bedrooms/sq footage etc) and predicts the price. So basically the model can be used later when to predict prices when the price is “unknown” and the features are known. So yeah, we won’t be adding up prices from other rows because a) they are all for different houses, and b) they are the entity we wanna predict using those other attributes. Hope that answers the question!

    • @johnlennon2009nyc
      @johnlennon2009nyc 6 หลายเดือนก่อน

      @@avb_fj Thank you for your reply I am so glad to see
      well, I think the data in the first and second rows of the Boston Housing Dataset are the input data for f1 and f2, respectively.
      Then, what exactly does "Price" on the left side of this equation contain?
      Or is the idea that only one line should be entered for one calculation?

  • @trishitasamanta8107
    @trishitasamanta8107 6 หลายเดือนก่อน

    Even though I am working on MLP in my present research work, it may be useful for my next project. nice explanation 👍😊

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน

      Good to know! All the best for your research work!

  • @galporgy
    @galporgy 6 หลายเดือนก่อน

    So KAR = a kind of Fourier transform for nonperiodic, multivariate functions?

  • @sirinath
    @sirinath หลายเดือนก่อน

    How does this stack up with Liquid Neural Networks?

  • @Daydreamers0
    @Daydreamers0 6 หลายเดือนก่อน

    It's not build of KA representation theorem but inspired from KA representation theorem

  • @bat.chev.hug.0r
    @bat.chev.hug.0r 6 หลายเดือนก่อน +1

    Aren't indices of the $\phi$ functions inverted at 7:10 when you describe the KAN layer? I think $x_1$ should be passed through $\phi_{11}, \phi{21},...,\phi{51}$; and the same goes for $x_2$ ? Great video anyway! Thanks

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน

      Great observation. You are correct. Thanks for pointing that out.

  • @umarfarooque3687
    @umarfarooque3687 6 หลายเดือนก่อน

    good explanation. can you show code with some kaggle data ?

  • @nicholaskomsa1777
    @nicholaskomsa1777 6 หลายเดือนก่อน

    how do kans work on MNIST? Because MLP can do 92% test accuracy with around 20k connections, it should be easy for you to zip mnist through it and get a result?

  • @automatescellulaires8543
    @automatescellulaires8543 6 หลายเดือนก่อน +1

    So basically, "promising results" on very simple function approximation. What about classifying the mnist digits? It's not even considered a meaningful test nowadays, but at least it shows that the method does work (on 28*28 dimensions). It doesn't take much time to test it.

  • @carlbroker
    @carlbroker 6 หลายเดือนก่อน +1

    Is there code implementation out there yet for us plebs to play with?

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน +2

      Check out: kindxiaoming.github.io/pykan/intro.html and
      github.com/KindXiaoming/pykan

    • @carlbroker
      @carlbroker 6 หลายเดือนก่อน +2

      @@avb_fj thank you so much! And, thank you for your breakdown of the math. HUGE anxiety trigger for me and your fantastic presentation skills did wonders for that.

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน +4

      Thanks a lot for the kind words! Fwiw, I wrestle with math and notations all the time too!

    • @carlbroker
      @carlbroker 6 หลายเดือนก่อน

      @@avb_fj You truly have a gift, thank you for sharing it with us.

  • @Adventure1844
    @Adventure1844 6 หลายเดือนก่อน

    "If you want to find the secrets of the universe, think in terms of energy, frequency and vibration." Tesla

  • @JoelGreenyer
    @JoelGreenyer 6 หลายเดือนก่อน

    Mmh, why is it good that all functions are univariate? Why not make bivariate functions based on bezier surfaces or few-variate hyper,-surfaces? They could pack some power of combining inputs to a higher degree, while retaining some advantageous of weight changes having local impact...

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน +1

      I’m sure as time goes on, someone will try it to squeeze out more accuracy and performance out of KANs with multivariate functions. I guess philosophically it makes sense to keep them univariate because according to the Kolmogorov Arnold theorem, all multivariate functions is just adding up a bunch of univariate functions.

  • @kephas-media
    @kephas-media 6 หลายเดือนก่อน

    Why does this sound like vector embedding (please note I've only seen small clips about vectors, so I don't know what I'm saying, just an observation)

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน +1

      That’s true. Any vector representation of a data point is its embedding. The outputs of a KAN layer is indeed an embedding of the input. It just computes this embedding in a different way than MLPs.

  • @deliyomgam7382
    @deliyomgam7382 6 หลายเดือนก่อน

    👍

  • @r_pydatascience
    @r_pydatascience 6 หลายเดือนก่อน

    So it is just a summation of univariate regression equations and then passing them by an activation function.

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน +1

      The univariate functions themselves are the trainable activation functions.

  • @danielcezario2419
    @danielcezario2419 5 หลายเดือนก่อน

    👏👏👏👏👏👏

  • @clivea99
    @clivea99 6 หลายเดือนก่อน

    Reminds me of wavelets. But if it doesnt work for high D datasets it's not going to be any practical use.

  • @skn123
    @skn123 6 หลายเดือนก่อน

    The paper also talks about MNIST. How would a CNN be represented using Kan?

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน +1

      For starters, they will probably try to flatten the 28x28 images into a 784 length vector and run the current version of KAN on it. Similar to how standard MLPs train on images. To do a CNN-like implementation, they will have to do more stuff like summing up representations over a rolling window/kernel, which would probably be more in the future.

    • @braineaterzombie3981
      @braineaterzombie3981 6 หลายเดือนก่อน

      What was the accuracy of Kan model on mnist tho?

    • @avb_fj
      @avb_fj  6 หลายเดือนก่อน

      @@braineaterzombie3981 they didn’t trained one for the paper… I’m sure someone online must’ve already tried it after the paper and the repo published.

  • @tisfu17
    @tisfu17 6 หลายเดือนก่อน +1

    "Can KAN or can't KAN" -> like and sub 😂

  • @nikbl4k
    @nikbl4k 6 หลายเดือนก่อน

    its called "pascals triangle"

  • @ratulshahriar-on8pc
    @ratulshahriar-on8pc หลายเดือนก่อน

    this took me more than an hour to understand along with the splines ....am I dumb ?

    • @avb_fj
      @avb_fj  หลายเดือนก่อน

      If you understood splines and KANs in an hour, you are probably very smart.

  • @dexterdev
    @dexterdev 5 หลายเดือนก่อน

    3:21

  • @llllllllllllllllllllIll
    @llllllllllllllllllllIll 6 หลายเดือนก่อน

    KANs are cool but could you please give an example of where this can be used directly ina real world scenario based on your experience?

  • @NarkeEmpire
    @NarkeEmpire 6 หลายเดือนก่อน

    Fffuuuuuuuu so it wasn't me the dumb one after all!!!

  • @georgekarniadakis5089
    @georgekarniadakis5089 6 หลายเดือนก่อน

    KAN NOT beat MLPs....they do not beat SOTA MLPs and they are too slow! MLPscan use adaptive activation functions, see the work of Jagtap et al.

  • @googleyoutubechannel8554
    @googleyoutubechannel8554 6 หลายเดือนก่อน +1

    I still don't understand why this paper is important. Yes, of course you can replace anything in an existing RNN style structure with some other type of function, yay. There are an innumerable number of arrangements and styles of things you could do to modify RNNs. Whee. I haven't seen a single example that shows why we should care about this particular arrangement, if KANs were good, they'd have more evidence than the very synthetic and useless examples in the paper.

    • @micknamens8659
      @micknamens8659 6 หลายเดือนก่อน

      So you've read and fully understood the paper?

  • @petercrossley1069
    @petercrossley1069 6 หลายเดือนก่อน

    Stop saying “less parameters”. It is “fewer parameters”.

  • @Sunny-dl9yk
    @Sunny-dl9yk 6 หลายเดือนก่อน

    excellent explanation! thank you!