How to make your CPU as fast as a GPU - Advances in Sparsity w/ Nir Shavit

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 มิ.ย. 2024
  • #ai #sparsity #gpu
    Sparsity is awesome, but only recently has it become possible to properly handle sparse models at good performance. Neural Magic does exactly this, using a plain CPU. No specialized hardware needed, just clever algorithms for pruning and forward-propagation of neural networks. Nir Shavit and I talk about how this is possible, what it means in terms of applications, and why sparsity should play a much larger role in the Deep Learning community.
    Sponsor: AssemblyAI
    Link: www.assemblyai.com/?...
    Check out Neural Magic: neuralmagic.com/
    and DeepSparse: github.com/neuralmagic/deepsp...
    OUTLINE:
    0:00 Introduction
    1:08 Sponsor: AssemblyAI
    2:50 Start of Interview
    4:15 How the NIR company was founded?
    5:10 What is Sparsity about?
    9:30 Link between the human brain and sparsity
    12:10 Where should the extra resource that the human brain doesn't have go?
    14:40 Analogy for Sparse Architecture
    16:48 Possible future for Sparse Architecture as standard architure for Neural Networks
    20:08 Pruning & Sparsification
    22:57 What keeps us from building sparse models?
    25:34 Why are GPUs so unsuited for sparse models?
    28:47 CPU and GPU in connection with memory
    30:14 What Neural Magic does?
    32:54 How do you deal with overlaps in tensor columns?
    33:41 The best type of sparsity to execute tons of CPU
    37:24 What kind of architecture would make the best use out of a combined system of CPUs and GPUs?
    41:04 Graph Neural Networks in connection to sparsity
    43:04 Intrinsic connection between the Sparsification of Neural Networks, Non Layer-Wise Computation, Blockchain Technology, Smart Contracts and Distributed Computing
    45:23 Neural Magic's target audience
    48:16 Is there a type of model where it works particularly well and the type where it doesn't?
    Links:
    Homepage: ykilcher.com
    Merch: ykilcher.com/merch
    TH-cam: / yannickilcher
    Twitter: / ykilcher
    Discord: ykilcher.com/discord
    LinkedIn: / ykilcher
    If you want to support me, the best thing to do is to share out the content :)
    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar: www.subscribestar.com/yannick...
    Patreon: / yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 164

  • @YannicKilcher
    @YannicKilcher  ปีที่แล้ว +24

    OUTLINE:
    0:00 Introduction
    1:08 Sponsor: AssemblyAI
    2:50 Start of Interview
    4:15 How the NIR company was founded?
    5:10 What is Sparsity about?
    9:30 Link between the human brain and sparsity
    12:10 Where should the extra resource that the human brain doesn't have go?
    14:40 Analogy for Sparse Architecture
    16:48 Possible future for Sparse Architecture as standard architure for Neural Networks
    20:08 Pruning & Sparsification
    22:57 What keeps us from building sparse models?
    25:34 Why are GPUs so unsuited for sparse models?
    28:47 CPU and GPU in connection with memory
    30:14 What Neural Magic does?
    32:54 How do you deal with overlaps in tensor columns?
    33:41 The best type of sparsity to execute tons of CPU
    37:24 What kind of architecture would make the best use out of a combined system of CPUs and GPUs?
    41:04 Graph Neural Networks in connection to sparsity
    43:04 Intrinsic connection between the Sparsification of Neural Networks, Non Layer-Wise Computation, Blockchain Technology, Smart Contracts and Distributed Computing
    45:23 Neural Magic's target audience
    48:16 Is there a type of model where it works particularly well and the type where it doesn't?

    • @mephilees7866
      @mephilees7866 ปีที่แล้ว +1

      These talks are especially important. Thank you Yannic for effort to host it even when you don't have to.

  • @ramakrishna5480
    @ramakrishna5480 ปีที่แล้ว +59

    We need more people like nir , with completely different ideas

    • @TheDoomerBlox
      @TheDoomerBlox ปีที่แล้ว

      They exist, they're out there, and nobody gives a shit so who cares.

    • @singuto
      @singuto ปีที่แล้ว

      With cultural diversity quickly dying off it's going to be harder and harder to get people like him.

    • @ThinkTank255
      @ThinkTank255 ปีที่แล้ว

      We're... far... from having that. Way too many people get haircuts. Okay, that's all the jokes I have.

  • @buzztwister
    @buzztwister ปีที่แล้ว +10

    Kudos to Yannic for preparing excelent questions. This opened discussion and we could here very interesting, in-depth answers.

  • @handris99
    @handris99 ปีที่แล้ว +10

    This was amazing. Can't wait for it when they do LLMs on CPU

  • @ikiphoenix9505
    @ikiphoenix9505 ปีที่แล้ว +7

    Thanks for this one ! It deserves more attention !

  • @adityay525125
    @adityay525125 ปีที่แล้ว +10

    Great interview. Learnt a lot, thanks a ton Yannic

  • @ear4400
    @ear4400 ปีที่แล้ว +9

    Really interesting stuff and I look forward to seeing the impact this knowledge has in the field.

    • @neuralmagic
      @neuralmagic ปีที่แล้ว

      We are trying to productize this knowledge! Give it a try and let us know what you think. The code is on our home page: neuralmagic.com

  • @HappyMathDad
    @HappyMathDad ปีที่แล้ว

    This kind of videos are why TH-cam is so relevant today. You can't get this content on blog, paper or podcast. Video is the best medium and TH-cam the best platform for it. Kudos!!!!

  • @alexandermedina4950
    @alexandermedina4950 ปีที่แล้ว +26

    Thank you Yannick, and Mr
    Nir, this was a really educational interview, you can see the teaching experience the guest has.

  • @jabowery
    @jabowery ปีที่แล้ว +30

    It’s really maddening that the only reason parameter reduction of large language models is getting attention in the ML community is to mimic the time*space savings of the brain’s “sparsity”. Really, if academia had its head screwed on straight, it would introduce Solomonoff Induction during CS101. The concept of the smallest program generating a dataset, as also the most probable model of the world generating the dataset, is sufficiently intuitive for an introductory concept to computation while imbuing great relevance to the subject.

    • @RaviAnnaswamy
      @RaviAnnaswamy ปีที่แล้ว +1

      Good catch!

    • @RaviAnnaswamy
      @RaviAnnaswamy ปีที่แล้ว +1

      Solomonoff ranks with the likes of Shannon Turing if not more, and it is a shame most computer scientists have not heard of him.

    • @RaviAnnaswamy
      @RaviAnnaswamy ปีที่แล้ว

      Maybe because besides being strong theoreticians these were also engine builders or engineers

    • @RaviAnnaswamy
      @RaviAnnaswamy ปีที่แล้ว

      More people could relate to their ideas by seeing the machines they built

    • @frenchmarty7446
      @frenchmarty7446 ปีที่แล้ว +1

      Those are just two perspectives on the same thing.

  • @Djjustincase-lb9th
    @Djjustincase-lb9th ปีที่แล้ว +1

    Thank you for another great interview.

  • @Finkbom1
    @Finkbom1 ปีที่แล้ว

    Excellent interview and speaker many thanks Yannic and Nir

  • @KamilCzerski
    @KamilCzerski ปีที่แล้ว

    Simply amazing. I am an advocate for sparsity in larger grain-size scope for ANNs, with Palm-like models as a starting point! So surprising title turns out to be not a clickbait but an absolutely well-defended claim! And the software is already out there. Also, I can't think of better questions to ask. Brief and exhaustive interview.

  • @HappyMathDad
    @HappyMathDad ปีที่แล้ว

    Amazing interview!!!

  • @ensabinha
    @ensabinha ปีที่แล้ว +8

    Building an optimal sparse graph is an optimization problem by itself on top of everything… which imply that computing gradients is not so easy… which implies the use of gradient free optimizers… which imply a lot of parallelization during training anyway.

  • @rmajdodin
    @rmajdodin ปีที่แล้ว

    Very insightful and inspiring. I think I've got ideas for my next project.

  • @TeodorAngelov
    @TeodorAngelov ปีที่แล้ว

    I appreciate the insightful and straight-on answers :)

  • @MikeySalinas
    @MikeySalinas ปีที่แล้ว

    Great questions and awesome overall interview

  • @kepplerM
    @kepplerM ปีที่แล้ว +1

    So much knowledge! Thank you ;)

  • @Femador
    @Femador ปีที่แล้ว

    This!
    Thanks a lot for your work!

  • @Evan490BC
    @Evan490BC ปีที่แล้ว

    Absolutely beautiful!

  • @rothn2
    @rothn2 ปีที่แล้ว +4

    Yannic: *Hints at extremely interesting stuff*
    Nir: "That's the magic in Neural Magic"
    Did they publish?

    • @michaelgoin4760
      @michaelgoin4760 ปีที่แล้ว +7

      Our ML research, models, and optimization libraries are all open-source!
      Here is our model zoo where you can find the models we've already optimized: sparsezoo.neuralmagic.com/
      Here are some recent papers our teams have published that contain in-depth explanations for our state-of-the-art model optimization methods:
      M-FAC: Efficient Matrix-Free Approximations of Second-Order Information - arxiv.org/abs/2107.03356
      The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models - arxiv.org/abs/2203.07259
      Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning - arxiv.org/abs/2208.11580
      Also some blogs we've written about the research:
      neuralmagic.com/blog/obert/
      neuralmagic.com/blog/bert-large-prune-once-for-distilbert-inference-performance/

  • @coemgeincraobhach236
    @coemgeincraobhach236 10 หลายเดือนก่อน

    Really interesting discussion

  • @deeplearner2634
    @deeplearner2634 ปีที่แล้ว

    Interesting forecast and imaginatipn of the future
    Really hope DL will come up with a new paradigm in the near future

  • @anteconfig5391
    @anteconfig5391 ปีที่แล้ว +1

    I love this. I had the idea that sequential computation and parallel computation were two sides of the same coin, so you could utilize CPUs instead of GPUs and get the same performance.
    I'm glad someone proved me right.

    • @SimGunther
      @SimGunther ปีที่แล้ว

      For certain algorithms that's more branchy than data flow or so simple that you could easily compute the answer by hand, CPU could be as fast as a GPU. You also need to think about the latency of data transferring back and forth between the GPU and the CPU relative to the data crunching before deciding which batch of data needs to go to the GPU and if data transfer was necessary in the first place.

  • @simonkotchou9644
    @simonkotchou9644 หลายเดือนก่อน

    Well this guy is for sure on the right track, the KAN paper just shocked us all

  • @carlotonydaristotile7420
    @carlotonydaristotile7420 ปีที่แล้ว +1

    Super interesting talk.

  • @scottmiller2591
    @scottmiller2591 ปีที่แล้ว +2

    I put an R library in CRAN 8 years ago that exploits sparsity; one of the demos is that sparse representations interpolate and extrapolate (generalize ID and OOD) much better than full representations, something that current approaches are poor at.

  • @sean_vikoren
    @sean_vikoren ปีที่แล้ว +5

    I noticed for the first time that bad video makes it hard for me to understand what a person is saying.
    Right, so I forget that that I am a bit deaf from loud rock and roll.
    But I thought I would mention it since it's probably a few percent of your biz.

  • @black-snow
    @black-snow ปีที่แล้ว +6

    Interesting interview. I really hope Doc Ock is gonna bring us excellent sparse models.

    • @NeoShameMan
      @NeoShameMan ปีที่แล้ว

      Unless spiderman stop him first

    • @aaaa9r
      @aaaa9r ปีที่แล้ว

      Doc Ock, I knew he reminded me of someone 🤣 I'm really hyped by this project. Very interesting stuff.

  • @codonology
    @codonology ปีที่แล้ว

    Very Cool and hopeful system for future-AI-for-everyone

  • @supremebeme
    @supremebeme ปีที่แล้ว

    Super trippy conversation

  • @zerothprinciples
    @zerothprinciples ปีที่แล้ว

    Many great ideas here

  • @cem_kaya
    @cem_kaya ปีที่แล้ว

    Computing NN's depth wise (in Tensor columns) seems very interesting

  • @markhampton3614
    @markhampton3614 ปีที่แล้ว +1

    There is a lot of modularity in brains too, is Neural Magic combining sparsity with modularity? It might be interesting to support RISC-V because it can be customized more easily than ARM. The interview was very well done.

    • @tedarcher9120
      @tedarcher9120 ปีที่แล้ว +1

      Maybe combine memory and compute on one multi-layer risc-v chip

  • @mpjstuff
    @mpjstuff ปีที่แล้ว

    I was just talking about this idea of using a large database of pre-computed matrix algebra -- a good terabyte should be able to cover perhaps 99% of the most common inputs and outputs in a lookup table rather than computing them, and also, macros for model shapes (sort of like shape primitives). This of course is created by having an NN "watch" an AI and find efficiencies and places where a macro might replace a complex computation -- and, if you abstract a shape or texture from color and lighting, that should help. In fact, it makes sense to keep certain parts of what you do separate until the end as they will have different optimizations. Coupled with AI to make "good enough" inferences to do 3D visualization. It should be able to take a few samples and "guess" how a large area will look given a few examples -- so, render 1 in a 1,000 light bounces or one in 100 pixels -- and, should help with animating between key frames -- and, when there is AI generated imagery, perhaps reduce the vectors in an animation in 3D space to 16x16 -- and, after computing a few frames, then estimate where the vectors would be -- this allows you to collapse the precision and computation to an 8bit space. And, in terms of AI that uses random noise, why not use estimations and reduced computations or even compression data to reduce the amount of noise you are creating a delta from? There should be a way to look at inaccuracy or estimates to find a way to reduce computation as if you were working with noise.

  • @dr.mikeybee
    @dr.mikeybee ปีที่แล้ว +6

    I've been amazed by how well large models run on an M1 Mac's CPU's cores because of integrated memory.

    • @maloxi1472
      @maloxi1472 ปีที่แล้ว

      That's only a fraction of what neuromorphic chips will be able to do... and we haven't even begun to scratch the surface of unconventional computing architectures. I remember having to pick up my jaw from the floor when l read Merkle's paper on a 1cm^3 reversible chip design capable of cranking out a ZettaFLOPS of compute per Watt (more powerful than the current Bitcoin network for less energy than your smartphone).
      Until I see a concrete implementation of such an energy efficient design, new chips are gonna look pretty banal to me 😄

  • @MHTHINK
    @MHTHINK ปีที่แล้ว

    Fascinating discussion. I agree with almost everything I knew enough about to have any opinion, but I think neuromrphic computing is more important than Nir seemed to indicate. Since we're looking at a 10^6 level or in efficiency for the human brain to achieve similar practical compared with current compute architecture, I figure even design with 1000x of refinement to come could be 1000x better than current compute. I wrote a paper on it recently, but still exploring AI as papers get published.

  • @willnutter1194
    @willnutter1194 ปีที่แล้ว +1

    Maybe a naïve approach, but I was seeing parallels between traffic and NNs, with the ability to turn off roads dynamically to make efficient pathways. Sparsity surely plays some role here.

  • @generationgap416
    @generationgap416 ปีที่แล้ว

    Sparsity is a promising neural network architecture for blockchain on-chain computation.

  • @TiberiusSkywalker
    @TiberiusSkywalker ปีที่แล้ว +6

    Can self play be used to optimize sparsity?

  • @SimonJackson13
    @SimonJackson13 ปีที่แล้ว

    Sparse makes a delta from the dense. Does batching a few deltas help add some sparse training? If you remove some of the carry propergation on the multiply the clock speed can go up. Who cares if the weighting multiply becomes a bit statistically trendy?

  • @blinded6502
    @blinded6502 ปีที่แล้ว

    It'd be nice to be able to just subdivide latent space of a neural network into separate modules of sort? Like, just imagine downloading concepts of what a flower is into your neural network from either the web or just loading it from the disk and into the RAM, thus minimizing the amount of RAM overall required, since you can unload stuff you don't need

  • @tedarcher9120
    @tedarcher9120 ปีที่แล้ว

    Metaneuroscience is I think the future. Using neural networks to understand neural networks

  • @DonCat-sc3qo
    @DonCat-sc3qo ปีที่แล้ว

    Amazing guy. 😮

  • @NeoShameMan
    @NeoShameMan ปีที่แล้ว +5

    It's always cool to see two AGI talking to each other

    • @neuralmagic
      @neuralmagic ปีที่แล้ว

      🤖

    • @player75826
      @player75826 ปีที่แล้ว

      I see no artificial, only general intelligence in combination with lots of creativity, ideating, and combining of concepts from multiple fields that were deduced by perception in the real world, which were then transferred into the AI field, and those ideas that were transferred bring insight into a field to make something that will benefit not just a single corporation, but the world.

  • @DavidSaintloth
    @DavidSaintloth ปีที่แล้ว

    What is The goal of a shallow and broad (sparse) algorithm? It is to utilize minimum energy in order to connect or Cascade firings between neural nodes that correlate with the features of entities undergoing comparison with incoming experience in the sensory modality that the particular network of nodes is mapped to capture. It's basically an energy minimization algorithm that over time allows a maximization of the probability that a given pattern stored in the network can be matched against patterns within the data stream of incoming sensory data. Selection of the pattern is variable depending on intrinsic low-level attributes known as salience factors in biological systems these factors are autonomic (food,sex,comfort) and amygdala(emotion) driven.
    I formulated the salience theory of dynamic cognition and consciousness around the above hypothesis around 2008 and published it at my blog, 5 years later... Going to implement POC after I retire from the rat race.

  • @JTMoustache
    @JTMoustache ปีที่แล้ว +2

    Boom !

  • @ericpmoss
    @ericpmoss ปีที่แล้ว

    So, given the memory throughput advantages of the Apple M1 Ultra architecture, what kind of match is this software for this CPU? I mean, 128GB RAM with 200-400GB/sec must be a dream match among OTS components.

  • @Ride-Tahoe
    @Ride-Tahoe ปีที่แล้ว +1

    Can this be applied to CPU gaming models like Microsoft Flight Simulator where the CPU can be a bottleneck?

  • @KW-jj9uy
    @KW-jj9uy ปีที่แล้ว

    Exciting to see large transformer models on cpu. Large parts of the model do not get activated for each run

  • @asrjy
    @asrjy ปีที่แล้ว +5

    Is it similar to what TensorFlow lite does? IIRC it also uses sparsity

    • @mjohnk11
      @mjohnk11 ปีที่แล้ว +1

      Yes, somewhat! SparseML enables sparsification and leveraging better algorithms for higher sparsities, performance, and recovery. DeepSparse enables utilizing that sparsity for inference speedup on X86 CPUs

  • @Rizhiy13
    @Rizhiy13 ปีที่แล้ว +2

    Could sparsity help with decentralized training, e.g. Hivemind? If only a part of the network can be evaluated depth-wise, can different machines evaluate different paths and train in parallel that way?

  • @ericpmoss
    @ericpmoss ปีที่แล้ว +1

    I'm not sure if it was meant that way, but the "10% of the brain" idea is a mis-reading of the original work. The researchers only understood maybe 10% of the regions of the brain that were lighting up with certain stimuli. It was not that only 10% of the brain was being used.

  • @Phasma6969
    @Phasma6969 ปีที่แล้ว

    Would be cool if we can combine the compute of CPU and GPU to calculate output...

  • @ulamss5
    @ulamss5 ปีที่แล้ว

    This sounds very closely related to 'weight agnostic neural networks'

  • @dr.mikeybee
    @dr.mikeybee ปีที่แล้ว

    Can you sparsify the big Bloom model and put it on Huggingface?

  • @thegistofcalculus
    @thegistofcalculus ปีที่แล้ว +1

    It is not entirely clear if he just aggressively prunes or actually increases the parameter count but only executes a fraction of the network. It should not be too hard to train a large dense network to switch off parts of itself during execution depending on input but the real computational savings would be realized with flexible CPU execution.

  • @andrewowens5653
    @andrewowens5653 ปีที่แล้ว +3

    Yannick, it's nice to see that you shaved today. 8-)

  • @tedarcher9120
    @tedarcher9120 ปีที่แล้ว

    What do you think about Numenta?

  • @anteckningar
    @anteckningar ปีที่แล้ว

    GPUs can do sparse models. Just look at graph networks. The issue is rather the optimization method, we don't get gradients if we don't backprop every possible connection

  • @neuchesnook4181
    @neuchesnook4181 ปีที่แล้ว

    Check the volume intensity! i had to lower the volume when you spoke and raise it up when not. (Just a bit)

  • @ChibatZ
    @ChibatZ ปีที่แล้ว

    was wondering how exactly 176 Billion Parameters fit on your desktop; is this calculation about right for the argument?:
    176 Billion Params * 2 Byte/Param (Because of BF16 Format or similar) * 0.05 (because of 5% Density) = 17.6Billion Bytes = 16.4 GigaByte
    and 16.4 GigaByte fit or your desktop easily
    is this calculation fair?

  • @lizardy2867
    @lizardy2867 ปีที่แล้ว +2

    Time for evolutionary algorithms!

  • @gleleylo
    @gleleylo ปีที่แล้ว

    Thanks for the video! Take a tip on "donate 4 fun"

  • @user-qu2oz2ut2h
    @user-qu2oz2ut2h ปีที่แล้ว +2

    you guys might be interested in this:
    "SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems"
    they beat Tesla v100 with 44 core processor in unusual settings

  • @rothn2
    @rothn2 ปีที่แล้ว

    Well, PyTorch DataLoaders and TensorFlow could probably do some of this pipelining through inversion of control by putting it in a framework...

  • @ClydeCoulter
    @ClydeCoulter ปีที่แล้ว

    If the inputs to a part of a layer makes no changes to nodes value(s) in a part of that layer, then there is an opportunity for optimization.

  • @ramakrishna5480
    @ramakrishna5480 ปีที่แล้ว

    Scientists like nir give hope to future researchers who don't have unlimited computing resources

  • @Aldraz
    @Aldraz ปีที่แล้ว

    I was excited, but disappointed to find out my Zen 1 CPU is unsupported lol. And also it's more suited for Linux right now, I am not sure how well it works on Windows. Probably doesn't, only in docker or something like that.

  • @Georgesbarsukov
    @Georgesbarsukov ปีที่แล้ว +3

    I've always been interested in the idea of training a large model, then training a small model to learn to mimic the large model. Obviously that creates a bit of a covariance shift since it'd be trying to predict what the larger model outputs instead of what the training data. The benefit though would be a more easily understandable dataset and a faster model.

    • @Georgesbarsukov
      @Georgesbarsukov ปีที่แล้ว +2

      An extension to this idea that I was thinking about applying at my FAANG company was to create a cheat model, one that can see more data than is available, then train another version that can't see the cheating data, but you're able to use the cheat model to label or generate more data for the non-cheat model.

    • @shibash1
      @shibash1 ปีที่แล้ว +1

      @@Georgesbarsukov It looks like finetuning on pre-trained model for me)

    • @revimfadli4666
      @revimfadli4666 ปีที่แล้ว

      ​@@shibash1 or bootstrapping a la actor-critic

    • @GoriIIaTactics
      @GoriIIaTactics ปีที่แล้ว

      That's literally just teacher student model. Maybe give your ideas a quick search on Google before thinking you had a revolutionary idea

    • @Georgesbarsukov
      @Georgesbarsukov ปีที่แล้ว

      @@GoriIIaTactics I didn't say it was a revolutionary idea. I mentioned it because I recall a paper released from Stanford where students were able to create a 'Google translate' clone model. I don't understand why you need to be antagonistic about it.

  • @HappyMathDad
    @HappyMathDad ปีที่แล้ว

    Wow. The sql db story all over again. Column storage!!! This is so crazy. Clearly sparsity is the biggest difference to the brain.

  • @mpjstuff
    @mpjstuff ปีที่แล้ว

    I think this is a great example of how the human brain works with very incomplete data -- I myself do a lot of SPARSE thinking. ;-)

  • @dislike__button
    @dislike__button ปีที่แล้ว +2

    Would this work on stable diffusion model?

    • @mjohnk11
      @mjohnk11 ปีที่แล้ว +2

      We're in early research experiments for large, generative models. Hopeful to have some results soon, but we haven't seen any neural networks yet that we couldn't sparsify!

    • @dislike__button
      @dislike__button ปีที่แล้ว

      @@mjohnk11 sounds promising! Can't wait 😄

  • @IvanGarcia-cx5jm
    @IvanGarcia-cx5jm ปีที่แล้ว

    He mentions that the brain is very sparse and uses little computation on input. But I have also heard that the brain spends a lot of energy on visual tasks. Like in computers visual tasks are computationally expensive. I suspect that vision tasks require some level of density, because after all you are processing a set of images (2-D dense arrays) with a lot of data. Even when you are paying attention to a small part of the image, still the data has some level of density.

    • @blinded6502
      @blinded6502 ปีที่แล้ว

      Hasn't that been already mentioned in the video?

  • @billykotsos4642
    @billykotsos4642 ปีที่แล้ว +3

    Inference can work on CPUs it seems... But training in def bespoke hardware territory

    • @neuralmagic
      @neuralmagic ปีที่แล้ว

      It for sure is. We are at an early stage of exploring training on CPUs. Subscribe to our newsletters on our website to stay in the know when we release something!

  • @billykotsos4642
    @billykotsos4642 ปีที่แล้ว

    Their deep sparse engine can't be used in production right?

  • @thet0ast3r
    @thet0ast3r ปีที่แล้ว +1

    lol the audio transkription thing was r3kd by openai whisper. 🤣

  • @novantha1
    @novantha1 ปีที่แล้ว

    Hm.... Sparsity, on reflection, seems like an obvious solution. If you imagine a fairly simple neural network, a Boltzman brain, it essentially the opposite of a "sparse" architecture in that everything is completely connected...But if you were to analyze any problem that you're dealing with when using the Boltzman brain, odds are good that 20% of those connections are doing 80% of the work.
    That kind of makes me wonder if you really have to have "full" connections. Like, if there's a connection that's important to have, but isn't important to use 100% of the time, could you fire it, say, 20% of the training runs?

  • @deepfakestudio7776
    @deepfakestudio7776 ปีที่แล้ว

    interesting

  • @yes-vy6bn
    @yes-vy6bn ปีที่แล้ว +1

    wen sparse stable diffusion 👀

  • @billykotsos4642
    @billykotsos4642 ปีที่แล้ว +1

    The comments about them targeting the edge are really interesting.... imagine you get a Jetson nano.... and you get better performance out of your AI by using the cpu cores rather than Nvidias GPU.... insane stuff...

  • @jahcane3711
    @jahcane3711 ปีที่แล้ว

    It would have been really nice if you turned up his volume or reduced yours for the video so that i could hear him more clearly without being blasted by your voice :)

  • @michealkinney6205
    @michealkinney6205 ปีที่แล้ว

    I've been thinking about this a bit lately and I think we are talking about conceptual knowledge (sparse) over that of repetitive/regurgitated knowledge (dense, memorization, often taught in schools in order to take "quizzes" in order to "prove knowledge") which is easily cheated with a transcript of the learning material now that everything is "taught" online. Conceptual and therefor sparse knowledge, with select dense key knowledge (like tokenized phrases, dates/times, etc.) I think would be the way forward for AGI (much the way that applicable knowledge in human brains works).

  • @HappyMathDad
    @HappyMathDad ปีที่แล้ว

    The story of the interview. Is that GPU creates base models. And CPU executes. That CPU + GPU question for now seems to be how much model evolution does the application has.

  • @victorsmirnov876
    @victorsmirnov876 ปีที่แล้ว

    Sparsity vs in-memory computing?

    • @neuralmagic
      @neuralmagic ปีที่แล้ว

      The combination of both is the "magic" behind Neural Magic 😇

  • @mattizzle81
    @mattizzle81 ปีที่แล้ว

    Unfortunately didn’t hear much on whether this currently works in applications like a GAN or Pix2pix which have a hard time running in real time on a CPU. Object detection and image classification already run in real time

  • @wadahadlan
    @wadahadlan ปีที่แล้ว

    As above so below - sparcity

  • @4.0.4
    @4.0.4 ปีที่แล้ว

    Imagine if you could buy VRAM in sticks? That's a real limit on GPU compute.

  • @AviPars
    @AviPars ปีที่แล้ว

    Skip to 2:05

  • @djmips
    @djmips ปีที่แล้ว +3

    But can't you just use sparsity on the GPU and you're back to square one.

    • @neuralmagic
      @neuralmagic ปีที่แล้ว

      We've seen NVIDIA show that the A100 GPU gets a 1.25x speedup with sparsity on a BERT-Large model. We were able to replicate these BERT-Large results but haven't been able to see this performance for any other models. CPU execution shows the benefits of sparsity across many models at more than 5x improvements. See our website for ways to test this out and replicate it with your own data!

  • @kostian8354
    @kostian8354 ปีที่แล้ว

    At 40 he is describing apple M1 family chips

  • @sophontec2822
    @sophontec2822 ปีที่แล้ว

    To my understanding, the current DL model with layers is already sparse.

  • @w4hns1nnn
    @w4hns1nnn ปีที่แล้ว +7

    I miss your paper videos. The interviews are way too sparse in information.

    • @dislike__button
      @dislike__button ปีที่แล้ว +3

      hehe

    • @neuralmagic
      @neuralmagic ปีที่แล้ว

      Pun intended? Check out our website for all the papers that back this research up, including the code to replicate the results on CPUs!

    • @w4hns1nnn
      @w4hns1nnn ปีที่แล้ว

      @@neuralmagic Yep the pun was intended. I like your research and intention. I know the pain of deploying ai models and speeding them up. I used them all: coral tpus, openvino, ncnn, tensorrt, Nvidia jetson but in my opinion none of them is really mature enough. In addition with all the libs sparse models are bringing no inference speed ups just model compression. So it's great to see people working on it! My point is, that Yannic would be able to explain all the content of the 50minute video much clearer, more entertaining and also in more detail in less time. The information density of his paper review videos is quite high while the information density of most of the interviews is quite low.

  • @barrettvelker198
    @barrettvelker198 ปีที่แล้ว +12

    Imagine if this were on Huggingface

    • @mjohnk11
      @mjohnk11 ปีที่แล้ว +4

      We have integrations with HuggingFace! Check them out here:
      github.com/neuralmagic/sparseml/tree/main/integrations/huggingface-transformers
      github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/transformers

  • @synthoelectro
    @synthoelectro ปีที่แล้ว +2

    optimization should have been the key from the beginning.

    • @synthoelectro
      @synthoelectro ปีที่แล้ว +1

      @@things_leftunsaid should one agree upon it though? Case in point, MS loves to release broken software. They've been doing it for years, and yet lately it's even more broken. Seems counterproductive to me.

    • @sunorcio3901
      @sunorcio3901 ปีที่แล้ว

      ​@@things_leftunsaid i suspect that means nothing

  • @AustinGlamourPhoto
    @AustinGlamourPhoto 2 หลายเดือนก่อน

    You guys really need to figure that the brain isn't digital. It's works with analog waves with complex data flowing over carrier waves.

  • @vast634
    @vast634 ปีที่แล้ว

    There are many operations that are faster on a CPU than on a GPU, if they are not compatible to the way GPUs process data.

  • @sathyanarayanankulasekaran1674
    @sathyanarayanankulasekaran1674 ปีที่แล้ว

    Moore's law, we will have more n more computation in the coming years..so this problem might not be as tedious as we see today...but a grt intention to make AI more democratic

  • @zackbrandigan9209
    @zackbrandigan9209 ปีที่แล้ว +2

    How realistic is training on CPUs? If you want to get rid of GPUs you need to takle sparse training on CPUs.