KAN: Kolmogorov-Arnold Networks

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 พ.ค. 2024
  • Paper: arxiv.org/abs/2404.19756
    Spline Video: • B-Splines
    My notes: drive.google.com/file/d/1twcI...
    00:00 Intro
    00:45 MLPs and Intuition
    05:12 Splines
    19:02 KAN Formulation
    28:00 Potential Downsides to KANs
    32:09 Results

ความคิดเห็น • 100

  • @ahmedheakl5181
    @ahmedheakl5181 18 วันที่ผ่านมา +8

    Thank you so much for such great explanation!

  • @MilesBellas
    @MilesBellas 16 วันที่ผ่านมา +22

    via Pi
    "The Kolmogorov-Arnold Network (KAN) is a fascinating type of neural network that takes a unique approach to learning compared to traditional multilayer perceptrons (MLPs). Here are some key points about KANs:
    * KANs have learnable activation functions on edges (or "weights"), while MLPs have fixed activation functions on nodes ("neurons").
    * This design choice is based on the Kolmogorov-Arnold representation theorem, which states that any continuous multivariate function can be represented as a superposition of continuous functions of a single variable.
    * KANs can achieve better accuracy with fewer parameters compared to MLPs.
    * They also offer advantages in terms of model accuracy and interpretability.
    * The concept of KANs was introduced in the paper "KAN: Kolmogorov-Arnold Networks" and has since been implemented in various libraries and frameworks, including the PyKAN Python package.
    Overall, KANs provide an intriguing alternative to MLPs, with their theoretical foundations in the Kolmogorov-Arnold representation theorem and practical advantages in terms of model performance and interpretability."

  • @juanfritas_8398
    @juanfritas_8398 14 วันที่ผ่านมา +1

    Thanks for the explanation Grabriel!

  • @user-qy9sx7bn1l
    @user-qy9sx7bn1l 15 วันที่ผ่านมา +3

    Thank you for this clear and comprehensive explanation. Also your insights about the downsides of the model was quite accurate and helpful.

  • @eddiehead9794
    @eddiehead9794 4 วันที่ผ่านมา

    Wow! The extractability of solutions really catches my eye.

  • @acasualviewer5861
    @acasualviewer5861 18 วันที่ผ่านมา +42

    what surprises me is that nobody tried to used splines before (they probably did).. seems like an obvious way to approximate functions

    • @slavrine
      @slavrine 16 วันที่ผ่านมา +9

      they have, but they would solve for one symbolic set of splines, instead of continously learning them. The closest thing to this is physics informed neural networks

    • @tangomuzi
      @tangomuzi 15 วันที่ผ่านมา

      @@slavrine Hmmm, no, Kolmogorov formula also used to design Neural Networks. Just search for Kolmogorov Spline Netwotks. People generally like to exaggerate things, to reinvent the wheel. Nothing very special for KAN.

    • @user-uc2qy1ff2z
      @user-uc2qy1ff2z 13 วันที่ผ่านมา

      Well, MoE in general aren't so far from splines.

    • @acasualviewer5861
      @acasualviewer5861 13 วันที่ผ่านมา

      @@user-uc2qy1ff2z you mean Mixture of Experts?

    • @andrewpolar1685
      @andrewpolar1685 13 วันที่ผ่านมา +3

      There are three successful applications of splines before this article all published. Find references on the articles on my video, I can't post links, system filters them. Splines is representation of the functions, training is completely different subject. We train models in a completely different way, MIT work is different from our, but splines were used KAN before.

  • @daehankim2437
    @daehankim2437 9 วันที่ผ่านมา +3

    Good explanation!
    13:13 btw that order n Bazier curve recursion is just a linear interpolation of two order (n-1) bazier curves.

  • @Armkeyter
    @Armkeyter 9 วันที่ผ่านมา

    Thank you very much. You explained very clear

  • @AI-kt6iw
    @AI-kt6iw 15 วันที่ผ่านมา +1

    Great explanation

  • @marsen1106
    @marsen1106 16 วันที่ผ่านมา +2

    nice explanation

  • @kemoxplus
    @kemoxplus 7 วันที่ผ่านมา

    thank you for the explanation

  • @RelatedGiraffe
    @RelatedGiraffe 13 วันที่ผ่านมา +5

    Great explanation! However, instead of "activation matrix," you should probably say "activation function matrix." An activation is the output of an activation function, which is not what this matrix contains.

  • @objio
    @objio 3 ชั่วโมงที่ผ่านมา

    Summary: while current RNs work as a universal approximator (probability machines), Kolmogorov networks allow you to leverage a more deterministic function.
    End point: the axiom of the universal approximator as a pillar of current networks could be replaced by networks of variable geometry. Think about the difference between searching or decoding information in 1 dimension, and now adding 2 dimensions. A data point vs a line of data.
    Like go from 480p to 8KHD resolution.

  • @idiosinkrazijske.rutine
    @idiosinkrazijske.rutine 9 วันที่ผ่านมา

    Someone who made the acronym knew that it will remind us on KAM - Kolmogorov-Arnold-Moser.

  • @SOFTWAREMASTER
    @SOFTWAREMASTER 15 วันที่ผ่านมา

    Greta video! Any chance you can cover xlstms? It just released an hr ago lol

    • @gabrielmongaras
      @gabrielmongaras  11 วันที่ผ่านมา +1

      I can probably cover that one!

  • @ourtubey121
    @ourtubey121 14 วันที่ผ่านมา

    Thanks for the greatest video. I have some questions, I can't understand how that algorithm update the c (learnable terms)? back propagation? or something else

    • @andrewpolar1685
      @andrewpolar1685 13 วันที่ผ่านมา

      It is tree of functions. All functions are presented via splines. You can initialize and compute target Z. It is compared to actual and you have DELTA Z to propagate. Since all functions are splines, they have derivatives and you can implement one of many descents, such as Newton or gradient or mixture. They refereed theirs in article. Imagine z = f [ g(x) + q(y) ] if f, g, q are splines, x, y arguments and you need to update splines to modify z.

  • @__-de6he
    @__-de6he 15 วันที่ผ่านมา +5

    "Forest has been lost behind trees"

    • @gabrielmongaras
      @gabrielmongaras  11 วันที่ผ่านมา

      I haven't ever heard something more true

    • @whannabi
      @whannabi 6 วันที่ผ่านมา

      Your forest comment seems quite random, I'll talk about it to my nearest neighbors

  • @akrammohamed8374
    @akrammohamed8374 17 วันที่ผ่านมา +1

    Thank you for this very helpful video.
    One thing that I’m curious about is whether the assumption of a spline relationship is a valid assumption for all use cases or not.

    • @andrewpolar1685
      @andrewpolar1685 13 วันที่ผ่านมา

      No. It is only best way to approximate functions listed in articles. The observation of real systems have errors. They are not small, 10% is usual case. There are uncertainties, such as not accounted features, not observed. All that makes accuracies of 0.1% absurd. I use piecewise linear models and it more than enough. I even used piecewise constant and it worked well on approximate data. There is no such thing as exact solution when you train model to detect credit card fraud or predict stock price.

    • @gabrielmongaras
      @gabrielmongaras  11 วันที่ผ่านมา +1

      Will be interesting to see if splines generalize to all methods like MLPs. They seem to be particularly good as smaller problems, but I'm guessing they are going to run into problems with large models as splines have more local behavior.

    • @akrammohamed8374
      @akrammohamed8374 11 วันที่ผ่านมา

      There’s something to be said about General models always seeming to be (shitty) POCs that people end up fine tuning so that it preforms better on specialized tasks.

  •  11 วันที่ผ่านมา +1

    The idea of combining the splines for the activation function is in any way similar to RBF networks ?

    • @gabrielmongaras
      @gabrielmongaras  9 วันที่ผ่านมา +1

      I've actually seen someone use RBF to approximate B-splines for KANs!
      github.com/ZiyaoLi/fast-kan

  • @AlanBasishvili
    @AlanBasishvili 16 วันที่ผ่านมา +4

    MLP can't learn this, but their method KAN =)

  • @ps3301
    @ps3301 10 วันที่ผ่านมา

    How about liquid neural network ?

  • @leastofyourconcerns4615
    @leastofyourconcerns4615 18 วันที่ผ่านมา +13

    Gabriel if only you could start writing in a readable way some random postgrad at mit would solve the agi by now. love the vid, big thanks

    • @jaimeberkovich
      @jaimeberkovich 14 วันที่ผ่านมา +1

      working on it

    • @andrewpolar1685
      @andrewpolar1685 13 วันที่ผ่านมา

      It is not as complex as look like. I have working code piecewise linear code without using any library modeling KAN with 500 lines. My coauthor published spline version (different from MIT), which also not using any library about same length in MATLAB. MIT code is published. I found module, navigated code, everything is opened. Where the problem?

    • @leastofyourconcerns4615
      @leastofyourconcerns4615 13 วันที่ผ่านมา

      u so smart, we so dumb

    • @andrewpolar1685
      @andrewpolar1685 13 วันที่ผ่านมา

      @@leastofyourconcerns4615 install their library, find by command location of the module, navigate to code and find all there, it is 500 lines python code using splines and one version of Newton descent, all is taught in colleges and students practice that in labs.

    • @gabrielmongaras
      @gabrielmongaras  11 วันที่ผ่านมา +3

      😂

  • @thipoktham5164
    @thipoktham5164 16 วันที่ผ่านมา +1

    It looks like multi layer version of GAMs to me (Generalized Additive Models), which has problem when extrapolate

    • @andrewpolar1685
      @andrewpolar1685 13 วันที่ผ่านมา

      All models return garbage when used outside training. You only can introduce uncertainty level and make model return confidence interval.

    • @thipoktham5164
      @thipoktham5164 13 วันที่ผ่านมา

      @@andrewpolar1685 Fair point

  • @TheSupermanMc
    @TheSupermanMc 8 วันที่ผ่านมา

    Thanks for the video

  • @user-or1rc9ng1j
    @user-or1rc9ng1j 13 วันที่ผ่านมา

    what is the video about splines?

    • @gabrielmongaras
      @gabrielmongaras  11 วันที่ผ่านมา

      m.th-cam.com/video/qhQrRCJ-mVg/w-d-xo.html

  • @algoritm3034
    @algoritm3034 14 วันที่ผ่านมา

    Hello everyone, can you please tell me if I understood correctly that splines are built based on the input data, then the neuron calculates the activation function using these splines (which then go as weights into the matrix) and the result of the calculation is passed on to the other layers?

    • @andrewpolar1685
      @andrewpolar1685 13 วันที่ผ่านมา

      No, it is tree of functions. Lower level have features as args and pass function values to upper level as args. All functions are presented by splines and training is tuning them for data by related to Newton method.

    • @andrewpolar1685
      @andrewpolar1685 13 วันที่ผ่านมา

      I also did KAN, forgot to tell, you can find my video and code. There are also other KANs, I added few references. It is not as new as might look.

    • @algoritm3034
      @algoritm3034 13 วันที่ผ่านมา

      ​@@andrewpolar1685 Okay, but I would like to understand where the splines come from? This is the only thing I do not know, I will be glad if you explain it to me in more detail

    • @algoritm3034
      @algoritm3034 13 วันที่ผ่านมา

      @@andrewpolar1685 By the way, where can I find your video and code specifically?

    • @andrewpolar1685
      @andrewpolar1685 13 วันที่ผ่านมา

      @@algoritm3034 that is another video with over 4000 views. find yourself, I can't publish link, system treat it as ad and remove comment, in description you can find links to articles and code locations

  • @andrewpolar1685
    @andrewpolar1685 6 วันที่ผ่านมา

    I published c# KAN, which shows about same performance and accuracy, it does not use any 3rd party library. Interested?

    • @gabrielmongaras
      @gabrielmongaras  5 วันที่ผ่านมา

      Same performance and accuracy as an MLP? Sounds cool! Are you open sourcing the code?

  • @pierce8308
    @pierce8308 17 วันที่ผ่านมา +7

    I am not going to lie. I am unsure about the authors motivation of using a B-Spline and paper as a whole. The authors simply want to replace the linear combination of inputs with a non linear combination. So why not just replace a linear layer(with activations) of a pytorch network with a "recursive linear layer", where every weight of the linear layer is simply another linear layer.
    Much simpler to implement, just a small 5-6 line for loop, and also fits within neural network literature perfectly. No need to prove, with regards to Kolmogorv-Arnold theorems, because our formulation is simply another neural network. Whats more this relieves the model from performance issues that the authors express.
    What am I missing ?

    • @richardforster8195
      @richardforster8195 16 วันที่ผ่านมา +2

      Thats actually kind of what Transformers are doing: Key and Query vector decide together, how much of the value vector goes out. Runtime-Learned Weight sort of

    • @ultrasound1459
      @ultrasound1459 13 วันที่ผ่านมา +2

      You missed to share the code 🧏‍♂️

  • @drdca8263
    @drdca8263 14 วันที่ผ่านมา

    Hm, suppose you wanted to learn some function in a way that, using some kind of constraint or prior, would be expected to generalize to regions far from a ball bounding the training data (so, not just, “far from any point in the training data” as a consequence of the input data being high-dimensional, but far from a ball bounding the data.
    (Not like with language modeling or whatever, where the input values should have bounded norm after embedding)
    How might one impose constraints that would still apply outside that region?
    Like, I’m imagining using some kind of algebraic property of the non-linear activation function, in order to express some bound on something or other, and like, put that into a loss?
    Like, for ReLU, one can say some nice things about it, as, for positive inputs it is just the identity, of course.
    And MLP network using RELU can in principle be expressed as a piecewise-affine function of the input, with finitely many regions (though the number of regions may be exponential in the number of “neurons” I think). The number of these regions is much too large for this to be practical, but, seeing as there are only finitely many of them,
    the union over the ones of those regions that are finite, is finite. I wonder how large the number of infinite such regions is.
    Well, I imagine that this depends more on the dimension of the first layer and the last layer?
    Each of these regions is convex, and is defined by a system of finitely many inequalities.
    Uh, any infinite region, there will be a ray which it contains.
    (These regions are subsets of the space of possible inputs to the first layer, to be clear)
    If we associate to each infinite region, the set of directions such that that region contains a ray with that direction,
    well, I think that the different regions sets of directions will only overlap at the boundaries.
    For any direction in the interior of the set of associated directions, I think that moving far enough in that direction will always land one in that region?
    If there are few enough of them, maybe the loss could have terms based on the affine behavior in these regions, to encourage them to have particular properties?
    Like maybe invertibility or something? Idk.

  • @DB-nl9xw
    @DB-nl9xw 12 วันที่ผ่านมา +1

    why don't you think it won't scale?

    • @gabrielmongaras
      @gabrielmongaras  11 วันที่ผ่านมา

      The authors mainly show that the spline assumption works at small scale, but for large scale splines may run into issues such as with the curse of dimensionality. Also KANs lose interoperability at scale.

  • @ruokcnn
    @ruokcnn 16 วันที่ผ่านมา

    quality content

  • @Basant5911
    @Basant5911 18 วันที่ผ่านมา

    I was thinking the same thats its gonna be time complex as it sounds, 😅 but they wrote initially that it requires fewer parameters &faster inference.

    • @gabrielmongaras
      @gabrielmongaras  11 วันที่ผ่านมา

      It's probably possible to create a more optimized version of their code with some type of Cuda kernel since they say they didn't optimize it very much. Recent implementations are looking pretty good.

  • @TheTruthOfAI
    @TheTruthOfAI 13 วันที่ผ่านมา +4

    Nice . Real AI channel. For engineers and not for blogwriters script kiddos

  • @einsteinsapples2909
    @einsteinsapples2909 17 วันที่ผ่านมา

    You completely lost me on the splines, not necessarily your fault :)

    • @ProAndProudMLG
      @ProAndProudMLG 16 วันที่ผ่านมา

      I went to ChatGPT to explain it to me as a 5 year old and I did haha

    • @joelwillis2043
      @joelwillis2043 12 วันที่ผ่านมา

      @@ProAndProudMLG just a core method that has existed for 70+ years

  • @superfliping
    @superfliping 16 วันที่ผ่านมา

    This is Whats next, show your skills?
    1. CodeCraft Duel: Super Agent Showdown
    2. Pixel Pioneers: Super Agent AI Clash
    3. Digital Duel: LLM Super Agents Battle
    4. Byte Battle Royale: Dueling LLM Agents
    5. AI Code Clash: Super Agent Showdown
    6. CodeCraft Combat: Super Agent Edition
    7. Digital Duel: Super Agent AI Battle
    8. Pixel Pioneers: LLM Super Agent Showdown
    9. Byte Battle Royale: Super Agent AI Combat
    10. AI Code Clash: Dueling Super Agents Edition

  • @meguellatiyounes8659
    @meguellatiyounes8659 15 วันที่ผ่านมา

    Its MLP in disguise

  • @Eizengoldt
    @Eizengoldt 8 วันที่ผ่านมา

    Im sick of neral networks, we need to shut them down