Meet Sohu, the fastest AI chip of all time.

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 มิ.ย. 2024
  • Unleash the power of transformers! Introducing Sohu, the world's first chip built specifically for blazing-fast transformer models like ChatGPT and Stable Diffusion 3. Sohu offers unmatched speed and efficiency, but focuses solely on transformers - it can't run other AI models. This is a gamble on the future of AI, but if transformers are here to stay, Sohu will revolutionize the game. See why we took the leap!
    Tell us what you think in the comments below!
    Twitter Announcement: x.com/Etched/status/180562569...
    Founder Interview (w/ John Coogan): x.com/johncoogan/status/18056...
    Etched Blog: www.etched.com/announcing-etched
    Groq Rebuttal 1: x.com/GroqInc/status/18056522...
    Groq Rebuttal 2: wow.groq.com/12-hours-later-g...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 130

  • @RonMar
    @RonMar 13 วันที่ผ่านมา +22

    I think this makes 100% sense because GPUs are no built for inference-specific purposes-- just co-opted because it was at hand.
    The idea of bespoke hardware for a given model is intriguing.
    These optimized chips will be much faster/energy than the more generalized GPUs. Also, much cheaper and smaller I expect.
    Eventually, I think Local Inference will dominate almost all usages. Only extreme, edge-case's will require a cloudy data center.

    • @MrDragonFoxAI
      @MrDragonFoxAI 13 วันที่ผ่านมา +2

      i think that is very much wishful thinking - the money is made in DC's and the hardware is build for it - at least for the foreseeable future and with all the regulation hitting right now - well you get my drift
      while i agree that this is a win for transformers .. its not uncommon .. groq did similar stuff with there custom fabric - but uses sram - hbm3e is sold out for like 2 years ahead .. and slow in i/o vs direct on die - once you go off chip .. that wont beat groq once optimized - it cant .. / and groq stopped selling hardware :)

    • @paulmuriithi9195
      @paulmuriithi9195 13 วันที่ผ่านมา

      agree. am rooting for biomolecular processes specific models and their custom hardware. like attachment based recurrent NN's would run great on SOHU. what do you think?

  • @coolnhotstorage
    @coolnhotstorage 12 วันที่ผ่านมา +8

    I think just having tokens that fast opens up the doors to massively recursive chain of thought. You can get an even greater level of accuracy if you let the models refine they're thoughts rather than just get it to write an essay of the spot. That's what i'm most excited for.

  • @CookieDoughFantasies
    @CookieDoughFantasies 13 วันที่ผ่านมา +7

    We need this to make Open Source models viable for the average consumer.

  • @AaronALAI
    @AaronALAI 13 วันที่ผ่านมา +16

    Frick....shut up and take my money!
    This will make local hosting much more common.

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา +6

      I would be willing to mortgage my house for one of these... in three years when they start showing up on eBay ;). Do you use GPUs or groq with open source models?

    • @AaronALAI
      @AaronALAI 13 วันที่ผ่านมา +3

      ​@@aifluxchannel I would pay a hefty fee for one of these(2x plus would be heart stopping). I use 7x 24gb gpus on a xeon system.... sometimes I trip the breaker.

    • @mackroscopik
      @mackroscopik 12 วันที่ผ่านมา +2

      I only need one of these! If anyone needs a kidney or testicle hit me up, I have an extra one of each.

  • @vishalmishra3046
    @vishalmishra3046 13 วันที่ผ่านมา +6

    *There were no transformers (Attention abstraction) before 2017*
    Etched is incorrectly assuming that Transformers (unlike RNN/LSTM etc.) will *NEVER* be replaced by an entirely new innovation (using an abstraction that's way more effective than *Attention* ). From that moment on, all LLMs including Vision Transformers based models will migrate to the new (better-than-Attention) mechanism. Now Sohu needs to be replaced by another special purpose ASIC. Meanwhile, nVidia is incorporating version-3 transformer engine in their post-Blackwell ( *Rubin* ) architecture.

    • @biesman5
      @biesman5 10 วันที่ผ่านมา

      It's obvious that transformers will be eventually replaced and they know it as well. I guess they are betting that it won't happen overnight, so they might as well make some cash in the meanwhile.

  • @VastCNC
    @VastCNC 13 วันที่ผ่านมา +11

    Who’s going to buy them? Not the chips, but the company. I could see a major acquisition to pass the bag and build the moat.

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา +4

      At this point I doubt they're looking for a buyer. Frankly, given they're making ASICs although they have the best tech in existence for transformer inference, the amount of debt necessary to make that happen is astronomical.
      Do you think nVidia / OpenAI would look to acquire them?

    • @VastCNC
      @VastCNC 13 วันที่ผ่านมา +1

      @@aifluxchannel I’m hoping Anthropic, but more likely musk, Google or Meta.

    • @GerryPrompt
      @GerryPrompt 13 วันที่ผ่านมา

      @@VastCNC was also thinking this

    • @brandonreed09
      @brandonreed09 11 วันที่ผ่านมา

      The AI providers will all invest, none will acquire.

  • @ernestuz
    @ernestuz 13 วันที่ผ่านมา +4

    Betting everything to transformers is a risky thing, their main drawback is the context size, memory needs goes with context size squared, the result is small contexts and memory bandwidth starvation, apart from needing zillion of ultrabytes to operate. Even the most advocate companies are looking for solutions with other architectures. We will be in a transition period for a few years.

    • @peterparker5161
      @peterparker5161 8 วันที่ผ่านมา

      It's just the fact that transformers are currently the best usable thing.

  • @user-bd8jb7ln5g
    @user-bd8jb7ln5g 13 วันที่ผ่านมา +7

    Inference costs will start dropping based on how quickly these guys can deliver their product. Assuming same price as Nvidia cards and 20x inference improvement. Inference cost should drop by 20x. I bet however that their accelerators will be cheaper than Nvidia's.

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา +1

      It's going to get interesting, especially with the already apparent lull in the GPU compute market.

  • @Perspectivemapper
    @Perspectivemapper 13 วันที่ผ่านมา +4

    🐴We should assume that other chip manufacturers will jump on the bandwagon of dedicated LLM chips as well.

  • @DanielBellon108
    @DanielBellon108 13 วันที่ผ่านมา +4

    I went to there website and there's no place to place any orders for chips or software there not even on TH-cam yet is this company really there 🤔

  • @DigitalDesignET
    @DigitalDesignET 13 วันที่ผ่านมา +3

    Very interesting, keep us informed. Thank you so much.

  • @MrAmack2u
    @MrAmack2u 13 วันที่ผ่านมา +6

    So you cant use them to train models? That is where we need the biggest breakthrough at this point.

    • @Alice_Fumo
      @Alice_Fumo 12 วันที่ผ่านมา +3

      I actually don't think we do. There are a lot of techniques which make models more capable by using a lot more compute on inference. Also, there's reinforcement learning where very few of the inferences might be considered to be very good and could then be used as new training data.

    • @thorvaldspear
      @thorvaldspear 12 วันที่ผ่านมา +4

      @@Alice_Fumo Yea, prompting techniques like Tree of Thoughts look promising but are very inference hungry, so having lightning fast inference will be super helpful. Also, imagine how valuable inference becomes when proper robotics transformers finally get figured out; lightning fast reflexes, unstoppable killing machines...

    • @gentleman9534
      @gentleman9534 12 วันที่ผ่านมา +1

      شريحة xtropic يمكنها تدريب النماذج بسرعة فائقة جدا و رخيصة جدا و استهلاك منخفض جدا يكاد يكون معدوم في استهلاك الطاقة

  • @fontende
    @fontende 9 วันที่ผ่านมา +1

    interesting to see a real demo of this, that is what my robot needed!

  • @woolfel
    @woolfel 12 วันที่ผ่านมา +3

    something will replace transformers. Will they be able to adapt and change fast enough when transformers architecture evolves to something else? Things are moving fast and research shows there's lots of room for improvement.

  • @supercurioTube
    @supercurioTube 13 วันที่ผ่านมา +4

    If Gen AI settle on transformers already, I expect mobile SoC and laptops to add a Transformers ASIC block on each chip
    Just like video encoder and decoder blocks for each individual codecs.

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา

      Makes more sense to have an API backed by an entire datacenter of these asics. Far cheaper.

    • @supercurioTube
      @supercurioTube 13 วันที่ผ่านมา +1

      @@aifluxchannel maybe combining both? Like Apple's implementation with "Apple Intelligence" on iPhones.
      But with transformer ASIC added to the NPU.

    • @PaulSpades
      @PaulSpades 11 วันที่ผ่านมา

      @@aifluxchannel LLMs are now capable as a natural language text and voice interface. This has been a goal of HCI for 50 years.
      Other multimedia tasks can be handled via remote compute, but the command interface and interpretation needs to be local.

  • @sigmata0
    @sigmata0 13 วันที่ผ่านมา +1

    It's like any other kind of technology. Those who enter first can get superseded by those who follow as technology, techniques and thinking changes. it's great because it keeps the pressure on to have companies innovate and improve. With the resources at NVIDIA's disposal they can too observe improvements by others and include them with additions in their next iterations of product.

  • @gerardolopez9368
    @gerardolopez9368 12 วันที่ผ่านมา +1

    Nemotron was released a couple days ago, definitely see a race going on 🔥🔥🔥🔥💡💡

  • @nagofaz
    @nagofaz 12 วันที่ผ่านมา +4

    Hold on a second. We're all getting worked up over this, but has anyone actually seen a real card yet? I can't help but feel we should be a lot more skeptical here. The whole thing just reeks of 'too good to be true'. Look, I'd love to be wrong, but let's face it - this wouldn't be the first time someone's tried to pull a fast one in this industry. These guys aren't blind; they know the market's on fire right now, with money being thrown around like there's no tomorrow. Maybe we should take a step back and think twice before buying into all this hype.

    • @dg-ov4cf
      @dg-ov4cf 10 วันที่ผ่านมา

      this reads like when i tell claude to act human

  • @elon-69-musk
    @elon-69-musk 12 วันที่ผ่านมา

    most amount of compute will be needed for inference to produce synthetic data so this type of chips might be more important than training chips

  • @os10v311
    @os10v311 12 วันที่ผ่านมา +1

    More Than Meets the Eye.

  • @darkreader01
    @darkreader01 12 วันที่ผ่านมา +1

    Like groq LPU, won't they provide us any platform to try it out ourselves?

  • @issay2594
    @issay2594 12 วันที่ผ่านมา +1

    Nvidia traps itself. It's enough to remember how 3d graphics has started - 3dfx creating the gpu to do specific load - meshs calculations making quake run fast. gpu were the tool to do certain task faster, not "everything in the world should be ever done on gpu", as it was on cpu before..

  • @falklumo
    @falklumo 12 วันที่ผ่านมา +1

    The challenge today is still training, not inference.

  • @bpolat
    @bpolat 11 วันที่ผ่านมา

    One day those type of chips will be on computer or movile device and work with the largest text or video models witout even internet

  • @novantha1
    @novantha1 13 วันที่ผ่านมา +2

    I think this is less interesting as a specific product than it appears on the surface. It's interesting certainly for the implications on the industry, but in terms of its direct relevance to "ordinary" people who like to buy a piece of hardware and use it (as opposed to being locked behind APIs) it's way less useful.
    Adding onto that, I think the most interesting thing in inference workloads is ultra low precision; forget FP8, why aren't they doing Int8? Int4? Int2 (as per Bitnet 1.58 I think 1.5 bit should be possible with a dedicated ASIC) could be incredible.
    Floating point numbers aren't actually very easy to work with on a hardware level, so it seems like a really weird choice for an inference only chip, as I'm sure they could have squeezed even more performance out of just using the same number of bits as integers, let alone using low precision integers. (I think Int 2 could potentially be ~4.5x faster off the top of my head).
    More to the point, given I'm primarily interested in hardware I can actually buy, if I wanted something like this Tenstorrent's accelerators seem way more interesting, and affordable.
    With all of the nay-saying said, the one thing about these that seems really interesting is that Moore's Law isn't actually dead. We're still getting more transistors. The issue is that it becomes harder and harder to control them all together in a centralized manner (like on a CPU), hence, CPU performance has declined, and even parallelized you see the same leveling off of improvements over time due to things like accessing cache and so on in GPUs.
    I'm no hardware engineer, but it seems to me, that in addition to gaining more performance from removing roadblocks (ie: only using hardware needed to calculate a Transformer network, so no CPU or GPU specific elements are included), Transformer specific hardware should still continue scaling with performance node improvements to a greater degree than existing legacy architectures, very similarly to how bitcoin ASICs were able to do so.

  • @styx1272
    @styx1272 13 วันที่ผ่านมา +1

    Brainchips Corps revolutionary neural spiking Akida2 Chip has its own revolutionary new algorithm called TENN's creating ultra low power consumption and possibility similar productivity Etch whilst being highly adaptable And it doesn't require a cpu or memory to operate.

  • @sikunowlol
    @sikunowlol 13 วันที่ผ่านมา +1

    this is actually huge news..

  • @AltMarc
    @AltMarc 12 วันที่ผ่านมา

    Excerpt from a Nvidia Press release:
    "DRIVE Thor is the first AV platform to incorporate an inference transformer engine, a new component of the Tensor Cores within NVIDIA GPUs. With this engine, DRIVE Thor can accelerate inference performance of transformer deep neural networks by up to 9x, which is paramount for supporting the massive and complex AI workloads associated with self driving."

  • @pigeon_official
    @pigeon_official 13 วันที่ผ่านมา +1

    500k T/s is genuinely the most unbelievable absolutely insane thing I've ever heard in my life so until we actually see it irl I will continue to not believe it but I hope it's true so badly

  • @tamineabderrahmane248
    @tamineabderrahmane248 10 วันที่ผ่านมา

    the Ai hardware accelerators race has been started !

  • @ps3301
    @ps3301 13 วันที่ผ่านมา +5

    When a new ai model appears, this startup will tank

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา +4

      Well... the entire point of this hardware is it can run anything transformer based. All SOTA models, closed and open source, are transformer based.

    • @wwkk4964
      @wwkk4964 13 วันที่ผ่านมา

      ​@@aifluxchannelit's really hoping that an SSM hybrid will not outcompete a pure transformer based model.

    • @JankJank-om1op
      @JankJank-om1op 13 วันที่ผ่านมา

      found jensen's alt

    • @HaxxBlaster
      @HaxxBlaster 12 วันที่ผ่านมา

      @@aifluxchannelThats what he is saying, new models which will not be transformers based. But they will surely be used until the next breakthrough

  • @cacogenicist
    @cacogenicist 13 วันที่ผ่านมา +3

    Of course when we have a model with "average human intelligence," it's not going to have the _knowledge_ of an average human -- it will be vastly more knowledgeable than any human.
    It'll be like an average human only in specific sorts of reasoning, where they are rather poor presently.
    The top models are already quite good at verbal analogical reasoning.

  • @tsclly2377
    @tsclly2377 12 วันที่ผ่านมา +1

    Watch Microsoft as they have gone the complete binary AI route and they have their core of users that they want to service, gaming and business. the one thing that is blatantly obvious it that the trend is for server centralization when that may not be in the best interests of the user or get the best results.

  • @theatheistpaladin
    @theatheistpaladin 13 วันที่ผ่านมา +1

    We need a maba acsci lpu.

  • @DonaldHughesAkron
    @DonaldHughesAkron 12 วันที่ผ่านมา +1

    More than wanting to buy one.. How do you buy stock in this company? Are they on the market yet?

  • @seefusiontech
    @seefusiontech 11 วันที่ผ่านมา +1

    Where is the John Coogan video? I looked and couldn't find it.

    • @aifluxchannel
      @aifluxchannel  11 วันที่ผ่านมา

      Linked in description but I'll put it here as well - x.com/johncoogan/status/1805649911117234474/video/1

    • @seefusiontech
      @seefusiontech 11 วันที่ผ่านมา

      @@aifluxchannel Thanks! I swear I looked, but I didn't see it. BTW, I watched it, but your vid had way better info :) Keep up the great work!

  • @cacogenicist
    @cacogenicist 13 วันที่ผ่านมา +3

    Wen consumer home ASIC LLM inference machines for under $4,000? 😊

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา

      On eBay in five years when these start to show up haha.

  • @ScottyG-wood
    @ScottyG-wood 12 วันที่ผ่านมา +1

    Dead on arrival if data center + inference only is their strat. The initial sales will be great but nvidia, AMD, Intel, and Google are doubling down on merging inference and training. When you’re taking data center, the model that Sohu is taking will not make it. Their positioning will make or break them.

  • @maragia2009
    @maragia2009 9 วันที่ผ่านมา

    What of training. Inference is only one part, training is really, really, really important.

  • @TheReferrer72
    @TheReferrer72 12 วันที่ผ่านมา +1

    Inference is not the biggest capital outlay for the big players and I thought the Google, Microsoft's, Meta's & Amazon's of the world are already making their own GPU's.
    its making the foundation models that requires big CapEx spends, this is of little threat to Nvidia or am I missing something?

  • @jonmichaelgalindo
    @jonmichaelgalindo 13 วันที่ผ่านมา +2

    Can it run a diffuser like SDXL? Or a diffusion-transformer like SD3 / Sora?

    • @jonmichaelgalindo
      @jonmichaelgalindo 13 วันที่ผ่านมา +1

      They claim SD is a transformer, which is half-true, but the VAE is a convolver. A CNN. They specifically said they can't run CNNs. Are they gaslighting right now? Did they sink everything into text-only hardware that's already out of date now that multimodal is taking over?

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา

      Yep, it's been tested with SD3 and can run any transformer based model (SORA included)

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา +1

      This was also my first thought - but they claim to have already run SD3 on the device.

    • @jonmichaelgalindo
      @jonmichaelgalindo 13 วันที่ผ่านมา +1

      @@aifluxchannel I bet they ran just the transformer backbone. Pass the latent noise in over bus transfer, run denoise steps on Sohu's hardware, transfer the latent output back to GPU with stop-over in RAM, then run the VAE. That's a lot of hops.

    • @hjups
      @hjups 13 วันที่ผ่านมา

      ​@@aifluxchannel Do you have a link to this claim? It seems rather foolish to support SD3 since you are necessarily throwing away algorithmic improvements that come with causal attention - i.e. they may have gotten 1M tk/s if they did not support non-causal attention.

  • @HaxxBlaster
    @HaxxBlaster 12 วันที่ผ่านมา +1

    Isnt this just theorethical so far, if not, where is the demo?

    • @aifluxchannel
      @aifluxchannel  12 วันที่ผ่านมา

      No, they're fabbed the prototypes at TSMC already - going into production at TSMC within the month.

    • @HaxxBlaster
      @HaxxBlaster 12 วันที่ผ่านมา

      @@aifluxchannel Thanks for the reply, but i need more to be convinced to see if this can become a consumer product for real. A prototype is one thing, but there could be a lot of other possible obstacles to get to a real product. Good luck to these guys, but i get an instant red flag when its a lot of talk and no real product yet

  • @user-bd8jb7ln5g
    @user-bd8jb7ln5g 13 วันที่ผ่านมา +4

    Majority of AI compute over the last year is being used for inference. If these guys can deliver these accelerators and servers quickly with no issues, Nvidia won't know what hit them.

  • @bigtymer4862
    @bigtymer4862 13 วันที่ผ่านมา +2

    What’s the price tag for one of these though 👀

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา

      More than any of us can afford haha. But I bet it's within striking distance of the nVidia B200 at least per rack.

  • @Wobbothe3rd
    @Wobbothe3rd 13 วันที่ผ่านมา +2

    Arent transformers going to be replaced by mamba/S4 models? And even if that isnt true, is it really a good bet to assume the transformer will be the dominant type of model forever?

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา +1

      Not necessarily, Mamba (SSM) and Eagle (RNN) based models only exist as alternatives to transformers because they help with the problem of scaling compute. If Etched has actually solved this with hardware for Transformers the performance wins of Mamba start to look much less impressive.

    • @loflog
      @loflog 13 วันที่ผ่านมา

      i wonder of problems like lost in the middle will come to the forefront once hardware scaling is eased
      dont think we actually know today whether theres architectural blindspots in transformers that motivate new architectures to emerge

  • @obviouswarrior5460
    @obviouswarrior5460 3 วันที่ผ่านมา

    4000 € ?
    I have money ! I wish to buy one Sohu (and more after) !
    Where can we buy it ?!

  • @GerryPrompt
    @GerryPrompt 13 วันที่ผ่านมา +4

    500,000 tokens/s???? 😂 Groq is TOAST

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา +2

      They certainly have a lot of catchup to work on. Glad they're no longer the "first" to enter this space. Do you use Groq with open source models or just run locally with your own GPU?

    • @hjups
      @hjups 13 วันที่ผ่านมา +2

      Groq's main attribute is latency, not throughput (although they have been marketing for throughput). While latency can still be low in the cloud, Groq is also used for sub-ms applications like controlling particle accelerators (LLMs were more of a "oh and we can do that too").

    • @MrDragonFoxAI
      @MrDragonFoxAI 13 วันที่ผ่านมา

      groq is also old .. 14nm process .. - once they switch to the new samsung node .. it should be vastly different - the big win here is sram vs offdie hbm3e

    • @hjups
      @hjups 13 วันที่ผ่านมา

      ​@@MrDragonFoxAI 14nm is not "old" in the ASIC world. I don't know if Sohu stated a process node, but I would not be surprised if they did a tapeout at 14nm.
      If I recall correctly, the new LPU2 is moving to 7nm and will also have LPDDR5. But that comes with other challenges since Groq relies on cycle-accurate determinism (which is only possible with SRAM). So it's an engineering tradeoff.
      Also keep in mind that a Sohu die is likely much larger and power hungry than a Groq LPU die - I would not be surprised if Sohu was at the reticule limit (again a tradeoff).

    • @MrDragonFoxAI
      @MrDragonFoxAI 13 วันที่ผ่านมา

      @@hjups they did .. they aim for a 4.5nm tsmc node - and probably what did the dev silicon too

  • @lavafree
    @lavafree 13 วันที่ผ่านมา +2

    Nahhh…nvidia just would integrate more specialized circuits into its chips

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา +2

      But their purpose is to ship narrow general compute devices not asics. The limiting factor is how nVidia GPUs have to rely on batch processes not streaming processes.

  • @sativagirl1885
    @sativagirl1885 10 วันที่ผ่านมา

    human years to #AI is like dog years to humans.

  • @bozbrown4456
    @bozbrown4456 13 วันที่ผ่านมา +1

    Nothing about Power usage / price

    • @aifluxchannel
      @aifluxchannel  12 วันที่ผ่านมา +1

      We currently don't have any information from Etched regarding price / power usage. However, given the size of the die and the rack density I think its safe to bet that Sohu is far more power efficient than the nVidia B200.

  • @nenickvu8807
    @nenickvu8807 13 วันที่ผ่านมา

    There is an issue of the shortsightedness of designing chips to do inference alone, especially that of transformer based models. Transformer based models are probabilistic in nature, and businesses and individuals can't bet on them alone. Other forms of software and hardware needs to pair with this before it really does have long term value.
    It's a good start, but NVIDIA is still ahead. After all, NVIDIA use their GPUS to not just simply generate probabilities that are novel and interesting, they use them to design chips and model physics. Reliability and consistency is the real magic here. And no one, not Meta or Google or OpenAI have been able to hone down AI to the point where it is reliable and consistent. And they will never get there with simple inference chips like these.

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา +1

      Good point, but that stated usecase is far larger than what Etched has set out to solve with Sohu. Sohu is intended for one thing and one thing only, model inference with transformer based models.

    • @nenickvu8807
      @nenickvu8807 13 วันที่ผ่านมา

      ​@@aifluxchannel that's the problem. Competitor chips like Sohu may catch up to past use cases, but the industry has already evolved. Retrieval Augmented Generation is already the standard. Agentic and AI team based approaches is already becoming an expectation. So where is the investment going to come from that pays for this super inference chip that has limited or no collaborative functionality with other software and hardware, even if it inferences more quickly?
      And there is always the threat that the next big model might not be just based in transformers and will require hardware that is different. And the year after that, and the year after that. After all, software grows quickly, hardware rarely does.

  • @yagoa
    @yagoa 13 วันที่ผ่านมา +1

    without making chips that can combine memory and processing on wafer, you can't create a human-like intelligence

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา +2

      You just need enough of them networked together ;)

  • @frodenystad6937
    @frodenystad6937 13 วันที่ผ่านมา +1

    This is real

  • @jameshughes3014
    @jameshughes3014 13 วันที่ผ่านมา +1

    wow. designing and building hardware that can only run transformers, not knowing if something better would come along. But what an amazing payoff if they can be cost effective on release.

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา

      They have at least 2-3 years to let things play out. At least for now, the only advantages of Mamba and RNN's is their preferable scaling capability. Transformers still have the upper hand in raw performance.

    • @jameshughes3014
      @jameshughes3014 13 วันที่ผ่านมา

      @@aifluxchannel Oh I agree. Even if a novel algorithm takes over in a month and is the new hot thing, there's so much code now that uses transformers that it's gonna be useful.

  • @K9Megahertz
    @K9Megahertz 13 วันที่ผ่านมา +1

    Not sure it's wise to invest in making an AI chip only for transformers when transformers are not what is going to take us to the next level. Transformers are limited regardless of context size or how many tokens you can generate per second.

    • @IntentStore
      @IntentStore 13 วันที่ผ่านมา +2

      20x faster inference would take us to the next level. This makes AI-ifying everything basically free. It doesn’t have to be the permanent future, it’s just radically improving the usefulness of current models and enabling new applications which couldn’t be done before due to token speeds and latency, like making live voice assistants respond as quickly as a real person, and giving them CD quality voices instead of sounding like a Skype call.

    • @IntentStore
      @IntentStore 13 วันที่ผ่านมา +2

      It also means coding assistants and coding automation go from being sorta possible to; generate a working tested application with 200,000 lines of code in a few seconds.

    • @IntentStore
      @IntentStore 13 วันที่ผ่านมา +1

      Of course as soon as a better architecture is proven, they can immediately begin R&D on an asic for that too, because they will have tons of revenue from their previous successful product and investors.

    • @K9Megahertz
      @K9Megahertz 13 วันที่ผ่านมา +3

      @@IntentStore Faster crap is still crap.
      Transformers will never be able to do that. They fail simple algorithmic coding tests I give them. Because they don't know how to code, only spit out code they've been trained on that aligns with whatever prompt you give it.
      The reason it cant get the answer to one of the problems right is that the answer lies in a single PDF that was on the net 25 years ago. (It's still out there) I know this because I helped fix a bug in the algorithm back then. It's nothing complicated, maybe 100 lines of code.
      In order for a transformer to spit out a 200,000 line program, it would have had to been trained on a 200,000 line program. Multiple in fact. Which means the code had already had to have been written. Which at that point, you already have the code, what do you need a Transformer AI for?
      Transformers wont get significantly better, they cant. They're limited by their probabilistic text predictive noodles. Which just doesn't work for software.

    • @styx1272
      @styx1272 13 วันที่ผ่านมา +1

      @@K9Megahertz What about multivariable Brainchip's spiking neural Akida2 chip with its unique TENN's algorithm ? It may possibly steamroll the competition.

  • @apoage
    @apoage 13 วันที่ผ่านมา +1

    ouh

  • @bobtarmac1828
    @bobtarmac1828 11 วันที่ผ่านมา

    Ai overhype. Can we …cease Ai? Or else be… laid off by Ai, then human extinction? Or suffer an… Ai new world order? With swell robotics everywhere, …Ai jobloss is the top worry. Anyone else feel the same?

  • @Zale370
    @Zale370 13 วันที่ผ่านมา +3

    Finally someone showing how Jensen's marketing BS was just that!

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา +3

      Etched and Jensen make very different products focusing on the same market segment.

  • @Maisonier
    @Maisonier 13 วันที่ผ่านมา +13

    This is fake news dude...

    • @aifluxchannel
      @aifluxchannel  13 วันที่ผ่านมา +1

      nVidia will definitely revise their B200 benchmark numbers, but the areas where Etched tweaks their statistics are the same places where Groq and Cerebras have also made claims. Regardless, these chips are currently the most performant when it comes to inference on transformer based models.

    • @HemangJoshi
      @HemangJoshi 13 วันที่ผ่านมา

      I also think so about comparison of all nvidia GPUs, here you have shown that nvidia GPUs are not getting better but they are for compute per energy unit. It is not only getting better but breaking Moore's laws.

    • @GilesBathgate
      @GilesBathgate 13 วันที่ผ่านมา +1

      My question would basically be "how" transformers attention basically do matmul and ffn is also matmul. I can understand how ASICS are better when you algo is sha(sha()) GPUs are not designed to do that, but how are these gains made when the algos are the same? memory access patterns?

  • @lb5928
    @lb5928 13 วันที่ผ่านมา +2

    This clown just said no one is using AMD MI300X. 😂
    Microsoft/Open AI, Oracle, IBM and Amazon AWS and more have announced they are using MI300x.
    Reportedly it is one of the best selling AI accelerators right now.

    • @Manicmick3069
      @Manicmick3069 12 วันที่ผ่านมา +1

      That's where I stopped listening. AMD engineering is world class. That's who NVIDIA needs to worry about. If ROCM comes with better integration, it's game time.