KANs Pros and Cons Pros - Accuracy - Interpretability - Faster neural scaling laws (achieve comparable or better outcomes with fewer parameters) Cons - Speed and efficiency (10x slower than MLPs given the same number of parameters) - Scaling
Since the "activation function" of each edges are different, the current implementation of KAN doesn't work well with GPU but it should be possible to be accelerated by specially designed chips
@@leosmi1seems like they can just but 10x the amount of gpus they buy now there is serious investment now , money isn’t the issue , Alltho there is economy in mind
@@xba2007I wonder if you can have methods to dissolve different parts of the brain , the more you scale it on a lot different hardware I wonder how you use 5 computers and merge the data , would it need to be dissolve how would it be plastered ,
Okay. Rewriting. My intuition on this is now, this is MLPs, but with nonlinear terms attahced to the weights and no nonlinear activation layer. In my reflections on this, it sounds like the nonlinear terms are selected by the trainer. Hm. I don’t know what this will bring. I feel that introducing the nonlinear terms is almost like biasing the model before training. Whereas linear terms are much less biased. But I’m not sure.
Excellent presentation, very clear and very interesting
Now I can understand KAN more clearly. Thank you!
Amazing! Can't wait to see all the applications!
Google started working on it this fast. Thats crazy
KART seems to work for functions taking inputs in the [0,1] range, how do you deal with that?
how is the activation selection done? Don't you need a lookup/domain of functions to choose from?
splines, got it
Has it been integrated into common AI framework like PyTorch or tensorflow?
Anything come out of KAN?
Do KANs require fewer GPUs to achieve the same results for certain problems ?
KANs Pros and Cons
Pros
- Accuracy
- Interpretability
- Faster neural scaling laws (achieve comparable or better outcomes with fewer parameters)
Cons
- Speed and efficiency (10x slower than MLPs given the same number of parameters)
- Scaling
Since the "activation function" of each edges are different, the current implementation of KAN doesn't work well with GPU but it should be possible to be accelerated by specially designed chips
Thanks for clarifying!
@@leosmi1seems like they can just but 10x the amount of gpus they buy now there is serious investment now , money isn’t the issue , Alltho there is economy in mind
@@Ori-lp2fm hope gets better
4:26 🤣😭
Very interesting thanks
Thanks.
Welcome
Such interesting stuff and not so much time to do anything with it it should have been my bread and butter haha
This architecture is not compatible with current hardware due to the need to compute many additional and diverse nonlinear functions.
Not really, the bsplines are just simple multiplications / additions. In the end it's exactly the same type of operations.
@@xba2007I wonder if you can have methods to dissolve different parts of the brain , the more you scale it on a lot different hardware I wonder how you use 5 computers and merge the data , would it need to be dissolve how would it be plastered ,
why the blue guy blurred? is he wanted by the FBI?
Imagine llm agent interacting with kan to do above. We can let it run autonomously
Okay. Rewriting.
My intuition on this is now, this is MLPs, but with nonlinear terms attahced to the weights and no nonlinear activation layer.
In my reflections on this, it sounds like the nonlinear terms are selected by the trainer.
Hm.
I don’t know what this will bring. I feel that introducing the nonlinear terms is almost like biasing the model before training.
Whereas linear terms are much less biased.
But I’m not sure.
Isn’t the sigmoid function nonlinear?
Mlp in disguise.