Meet Sohu, the fastest AI chip of all time.

แชร์
ฝัง
  • เผยแพร่เมื่อ 18 พ.ย. 2024

ความคิดเห็น • 142

  • @RonMar
    @RonMar 4 หลายเดือนก่อน +28

    I think this makes 100% sense because GPUs are no built for inference-specific purposes-- just co-opted because it was at hand.
    The idea of bespoke hardware for a given model is intriguing.
    These optimized chips will be much faster/energy than the more generalized GPUs. Also, much cheaper and smaller I expect.
    Eventually, I think Local Inference will dominate almost all usages. Only extreme, edge-case's will require a cloudy data center.

    • @MrDragonFoxAI
      @MrDragonFoxAI 4 หลายเดือนก่อน +3

      i think that is very much wishful thinking - the money is made in DC's and the hardware is build for it - at least for the foreseeable future and with all the regulation hitting right now - well you get my drift
      while i agree that this is a win for transformers .. its not uncommon .. groq did similar stuff with there custom fabric - but uses sram - hbm3e is sold out for like 2 years ahead .. and slow in i/o vs direct on die - once you go off chip .. that wont beat groq once optimized - it cant .. / and groq stopped selling hardware :)

    • @paulmuriithi9195
      @paulmuriithi9195 4 หลายเดือนก่อน

      agree. am rooting for biomolecular processes specific models and their custom hardware. like attachment based recurrent NN's would run great on SOHU. what do you think?

    • @netron66
      @netron66 4 หลายเดือนก่อน

      another thing, gpu need reeally low latency so they need to have preferably one chip, that is why nvidia kill nvlink in rtx card, but for ai and workstation pc it is not the first priority

  • @coolnhotstorage
    @coolnhotstorage 4 หลายเดือนก่อน +10

    I think just having tokens that fast opens up the doors to massively recursive chain of thought. You can get an even greater level of accuracy if you let the models refine they're thoughts rather than just get it to write an essay of the spot. That's what i'm most excited for.

  • @ernestuz
    @ernestuz 4 หลายเดือนก่อน +5

    Betting everything to transformers is a risky thing, their main drawback is the context size, memory needs goes with context size squared, the result is small contexts and memory bandwidth starvation, apart from needing zillion of ultrabytes to operate. Even the most advocate companies are looking for solutions with other architectures. We will be in a transition period for a few years.

    • @peterparker5161
      @peterparker5161 4 หลายเดือนก่อน

      It's just the fact that transformers are currently the best usable thing.

  • @Perspectivemapper
    @Perspectivemapper 4 หลายเดือนก่อน +5

    🐴We should assume that other chip manufacturers will jump on the bandwagon of dedicated LLM chips as well.

  • @alpha.male.Xtreme
    @alpha.male.Xtreme 4 หลายเดือนก่อน +10

    We need this to make Open Source models viable for the average consumer.

    • @TridentHut-dr8dg
      @TridentHut-dr8dg 4 หลายเดือนก่อน

      Dots are connecting aren't they

    • @LochlynHansen
      @LochlynHansen 9 วันที่ผ่านมา

      you got a supercomputer laying around? or a couple hundred thousand $?

  • @AaronALAI
    @AaronALAI 4 หลายเดือนก่อน +17

    Frick....shut up and take my money!
    This will make local hosting much more common.

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน +6

      I would be willing to mortgage my house for one of these... in three years when they start showing up on eBay ;). Do you use GPUs or groq with open source models?

    • @AaronALAI
      @AaronALAI 4 หลายเดือนก่อน +4

      ​@@aifluxchannel I would pay a hefty fee for one of these(2x plus would be heart stopping). I use 7x 24gb gpus on a xeon system.... sometimes I trip the breaker.

    • @mackroscopik
      @mackroscopik 4 หลายเดือนก่อน +4

      I only need one of these! If anyone needs a kidney or testicle hit me up, I have an extra one of each.

  • @DanielBellon108
    @DanielBellon108 4 หลายเดือนก่อน +6

    I went to there website and there's no place to place any orders for chips or software there not even on TH-cam yet is this company really there 🤔

  • @VastCNC
    @VastCNC 4 หลายเดือนก่อน +12

    Who’s going to buy them? Not the chips, but the company. I could see a major acquisition to pass the bag and build the moat.

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน +6

      At this point I doubt they're looking for a buyer. Frankly, given they're making ASICs although they have the best tech in existence for transformer inference, the amount of debt necessary to make that happen is astronomical.
      Do you think nVidia / OpenAI would look to acquire them?

    • @VastCNC
      @VastCNC 4 หลายเดือนก่อน +2

      @@aifluxchannel I’m hoping Anthropic, but more likely musk, Google or Meta.

    • @GerryPrompt
      @GerryPrompt 4 หลายเดือนก่อน

      @@VastCNC was also thinking this

    • @brandonreed09
      @brandonreed09 4 หลายเดือนก่อน

      The AI providers will all invest, none will acquire.

  • @MrAmack2u
    @MrAmack2u 4 หลายเดือนก่อน +7

    So you cant use them to train models? That is where we need the biggest breakthrough at this point.

    • @Alice_Fumo
      @Alice_Fumo 4 หลายเดือนก่อน +4

      I actually don't think we do. There are a lot of techniques which make models more capable by using a lot more compute on inference. Also, there's reinforcement learning where very few of the inferences might be considered to be very good and could then be used as new training data.

    • @thorvaldspear
      @thorvaldspear 4 หลายเดือนก่อน +5

      @@Alice_Fumo Yea, prompting techniques like Tree of Thoughts look promising but are very inference hungry, so having lightning fast inference will be super helpful. Also, imagine how valuable inference becomes when proper robotics transformers finally get figured out; lightning fast reflexes, unstoppable killing machines...

    • @gentleman9534
      @gentleman9534 4 หลายเดือนก่อน +1

      شريحة xtropic يمكنها تدريب النماذج بسرعة فائقة جدا و رخيصة جدا و استهلاك منخفض جدا يكاد يكون معدوم في استهلاك الطاقة

  • @woolfel
    @woolfel 4 หลายเดือนก่อน +5

    something will replace transformers. Will they be able to adapt and change fast enough when transformers architecture evolves to something else? Things are moving fast and research shows there's lots of room for improvement.

  • @TomM-p3o
    @TomM-p3o 4 หลายเดือนก่อน +8

    Inference costs will start dropping based on how quickly these guys can deliver their product. Assuming same price as Nvidia cards and 20x inference improvement. Inference cost should drop by 20x. I bet however that their accelerators will be cheaper than Nvidia's.

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน +2

      It's going to get interesting, especially with the already apparent lull in the GPU compute market.

  • @fontenbleau
    @fontenbleau 4 หลายเดือนก่อน +1

    interesting to see a real demo of this, that is what my robot needed!

  • @gerardolopez9368
    @gerardolopez9368 4 หลายเดือนก่อน +2

    Nemotron was released a couple days ago, definitely see a race going on 🔥🔥🔥🔥💡💡

  • @sigmata0
    @sigmata0 4 หลายเดือนก่อน +2

    It's like any other kind of technology. Those who enter first can get superseded by those who follow as technology, techniques and thinking changes. it's great because it keeps the pressure on to have companies innovate and improve. With the resources at NVIDIA's disposal they can too observe improvements by others and include them with additions in their next iterations of product.

  • @DigitalDesignET
    @DigitalDesignET 4 หลายเดือนก่อน +4

    Very interesting, keep us informed. Thank you so much.

  • @supercurioTube
    @supercurioTube 4 หลายเดือนก่อน +5

    If Gen AI settle on transformers already, I expect mobile SoC and laptops to add a Transformers ASIC block on each chip
    Just like video encoder and decoder blocks for each individual codecs.

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน

      Makes more sense to have an API backed by an entire datacenter of these asics. Far cheaper.

    • @supercurioTube
      @supercurioTube 4 หลายเดือนก่อน +1

      @@aifluxchannel maybe combining both? Like Apple's implementation with "Apple Intelligence" on iPhones.
      But with transformer ASIC added to the NPU.

    • @PaulSpades
      @PaulSpades 4 หลายเดือนก่อน

      @@aifluxchannel LLMs are now capable as a natural language text and voice interface. This has been a goal of HCI for 50 years.
      Other multimedia tasks can be handled via remote compute, but the command interface and interpretation needs to be local.

  • @pigeon_official
    @pigeon_official 4 หลายเดือนก่อน +3

    500k T/s is genuinely the most unbelievable absolutely insane thing I've ever heard in my life so until we actually see it irl I will continue to not believe it but I hope it's true so badly

  • @vishalmishra3046
    @vishalmishra3046 4 หลายเดือนก่อน +7

    *There were no transformers (Attention abstraction) before 2017*
    Etched is incorrectly assuming that Transformers (unlike RNN/LSTM etc.) will *NEVER* be replaced by an entirely new innovation (using an abstraction that's way more effective than *Attention* ). From that moment on, all LLMs including Vision Transformers based models will migrate to the new (better-than-Attention) mechanism. Now Sohu needs to be replaced by another special purpose ASIC. Meanwhile, nVidia is incorporating version-3 transformer engine in their post-Blackwell ( *Rubin* ) architecture.

    • @biesman5
      @biesman5 4 หลายเดือนก่อน

      It's obvious that transformers will be eventually replaced and they know it as well. I guess they are betting that it won't happen overnight, so they might as well make some cash in the meanwhile.

    • @5omebody
      @5omebody 16 วันที่ผ่านมา

      doesn't matter in the slightest. if people are spending 100b+ on training then a 10m+ investment acquiring hardware every time there's a new architecture will absolutely be worth it. transformer engine isn't dedicated hardware, so unless they do something really special this _10x_ improvement will be absolutely worth it even if they're just used for even less than a year

  • @elon-69-musk
    @elon-69-musk 4 หลายเดือนก่อน

    most amount of compute will be needed for inference to produce synthetic data so this type of chips might be more important than training chips

  • @nagofaz
    @nagofaz 4 หลายเดือนก่อน +5

    Hold on a second. We're all getting worked up over this, but has anyone actually seen a real card yet? I can't help but feel we should be a lot more skeptical here. The whole thing just reeks of 'too good to be true'. Look, I'd love to be wrong, but let's face it - this wouldn't be the first time someone's tried to pull a fast one in this industry. These guys aren't blind; they know the market's on fire right now, with money being thrown around like there's no tomorrow. Maybe we should take a step back and think twice before buying into all this hype.

    • @DonG-1949
      @DonG-1949 4 หลายเดือนก่อน

      this reads like when i tell claude to act human

  • @issay2594
    @issay2594 4 หลายเดือนก่อน +2

    Nvidia traps itself. It's enough to remember how 3d graphics has started - 3dfx creating the gpu to do specific load - meshs calculations making quake run fast. gpu were the tool to do certain task faster, not "everything in the world should be ever done on gpu", as it was on cpu before..

  • @tamineabderrahmane248
    @tamineabderrahmane248 4 หลายเดือนก่อน

    the Ai hardware accelerators race has been started !

  • @falklumo
    @falklumo 4 หลายเดือนก่อน +1

    The challenge today is still training, not inference.

  • @novantha1
    @novantha1 4 หลายเดือนก่อน +3

    I think this is less interesting as a specific product than it appears on the surface. It's interesting certainly for the implications on the industry, but in terms of its direct relevance to "ordinary" people who like to buy a piece of hardware and use it (as opposed to being locked behind APIs) it's way less useful.
    Adding onto that, I think the most interesting thing in inference workloads is ultra low precision; forget FP8, why aren't they doing Int8? Int4? Int2 (as per Bitnet 1.58 I think 1.5 bit should be possible with a dedicated ASIC) could be incredible.
    Floating point numbers aren't actually very easy to work with on a hardware level, so it seems like a really weird choice for an inference only chip, as I'm sure they could have squeezed even more performance out of just using the same number of bits as integers, let alone using low precision integers. (I think Int 2 could potentially be ~4.5x faster off the top of my head).
    More to the point, given I'm primarily interested in hardware I can actually buy, if I wanted something like this Tenstorrent's accelerators seem way more interesting, and affordable.
    With all of the nay-saying said, the one thing about these that seems really interesting is that Moore's Law isn't actually dead. We're still getting more transistors. The issue is that it becomes harder and harder to control them all together in a centralized manner (like on a CPU), hence, CPU performance has declined, and even parallelized you see the same leveling off of improvements over time due to things like accessing cache and so on in GPUs.
    I'm no hardware engineer, but it seems to me, that in addition to gaining more performance from removing roadblocks (ie: only using hardware needed to calculate a Transformer network, so no CPU or GPU specific elements are included), Transformer specific hardware should still continue scaling with performance node improvements to a greater degree than existing legacy architectures, very similarly to how bitcoin ASICs were able to do so.

  • @darkreader01
    @darkreader01 4 หลายเดือนก่อน +1

    Like groq LPU, won't they provide us any platform to try it out ourselves?

  • @manonamission2000
    @manonamission2000 4 หลายเดือนก่อน

    self-driving cars could use this

  • @DonaldHughesAkron
    @DonaldHughesAkron 4 หลายเดือนก่อน +2

    More than wanting to buy one.. How do you buy stock in this company? Are they on the market yet?

  • @AltMarc
    @AltMarc 4 หลายเดือนก่อน

    Excerpt from a Nvidia Press release:
    "DRIVE Thor is the first AV platform to incorporate an inference transformer engine, a new component of the Tensor Cores within NVIDIA GPUs. With this engine, DRIVE Thor can accelerate inference performance of transformer deep neural networks by up to 9x, which is paramount for supporting the massive and complex AI workloads associated with self driving."

  • @sikunowlol
    @sikunowlol 4 หลายเดือนก่อน +1

    this is actually huge news..

  • @TomM-p3o
    @TomM-p3o 4 หลายเดือนก่อน +5

    Majority of AI compute over the last year is being used for inference. If these guys can deliver these accelerators and servers quickly with no issues, Nvidia won't know what hit them.

  • @ps3301
    @ps3301 4 หลายเดือนก่อน +5

    When a new ai model appears, this startup will tank

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน +5

      Well... the entire point of this hardware is it can run anything transformer based. All SOTA models, closed and open source, are transformer based.

    • @wwkk4964
      @wwkk4964 4 หลายเดือนก่อน

      ​@@aifluxchannelit's really hoping that an SSM hybrid will not outcompete a pure transformer based model.

    • @JankJank-om1op
      @JankJank-om1op 4 หลายเดือนก่อน

      found jensen's alt

    • @HaxxBlaster
      @HaxxBlaster 4 หลายเดือนก่อน

      @@aifluxchannelThats what he is saying, new models which will not be transformers based. But they will surely be used until the next breakthrough

  • @seefusiontech
    @seefusiontech 4 หลายเดือนก่อน +1

    Where is the John Coogan video? I looked and couldn't find it.

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน

      Linked in description but I'll put it here as well - x.com/johncoogan/status/1805649911117234474/video/1

    • @seefusiontech
      @seefusiontech 4 หลายเดือนก่อน

      @@aifluxchannel Thanks! I swear I looked, but I didn't see it. BTW, I watched it, but your vid had way better info :) Keep up the great work!

  • @yagoa
    @yagoa 4 หลายเดือนก่อน +1

    without making chips that can combine memory and processing on wafer, you can't create a human-like intelligence

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน +3

      You just need enough of them networked together ;)

  • @TheReferrer72
    @TheReferrer72 4 หลายเดือนก่อน +1

    Inference is not the biggest capital outlay for the big players and I thought the Google, Microsoft's, Meta's & Amazon's of the world are already making their own GPU's.
    its making the foundation models that requires big CapEx spends, this is of little threat to Nvidia or am I missing something?

  • @Wobbothe3rd
    @Wobbothe3rd 4 หลายเดือนก่อน +2

    Arent transformers going to be replaced by mamba/S4 models? And even if that isnt true, is it really a good bet to assume the transformer will be the dominant type of model forever?

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน +1

      Not necessarily, Mamba (SSM) and Eagle (RNN) based models only exist as alternatives to transformers because they help with the problem of scaling compute. If Etched has actually solved this with hardware for Transformers the performance wins of Mamba start to look much less impressive.

    • @loflog
      @loflog 4 หลายเดือนก่อน

      i wonder of problems like lost in the middle will come to the forefront once hardware scaling is eased
      dont think we actually know today whether theres architectural blindspots in transformers that motivate new architectures to emerge

  • @twinsoultarot473
    @twinsoultarot473 17 นาทีที่ผ่านมา

    So..is anyone here investing in the Sohu chip? Is it publicly traded?

  • @bigtymer4862
    @bigtymer4862 4 หลายเดือนก่อน +2

    What’s the price tag for one of these though 👀

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน

      More than any of us can afford haha. But I bet it's within striking distance of the nVidia B200 at least per rack.

  • @styx1272
    @styx1272 4 หลายเดือนก่อน +1

    Brainchips Corps revolutionary neural spiking Akida2 Chip has its own revolutionary new algorithm called TENN's creating ultra low power consumption and possibility similar productivity Etch whilst being highly adaptable And it doesn't require a cpu or memory to operate.

  • @bpolat
    @bpolat 4 หลายเดือนก่อน

    One day those type of chips will be on computer or movile device and work with the largest text or video models witout even internet

  • @GerryPrompt
    @GerryPrompt 4 หลายเดือนก่อน +5

    500,000 tokens/s???? 😂 Groq is TOAST

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน +2

      They certainly have a lot of catchup to work on. Glad they're no longer the "first" to enter this space. Do you use Groq with open source models or just run locally with your own GPU?

    • @hjups
      @hjups 4 หลายเดือนก่อน +2

      Groq's main attribute is latency, not throughput (although they have been marketing for throughput). While latency can still be low in the cloud, Groq is also used for sub-ms applications like controlling particle accelerators (LLMs were more of a "oh and we can do that too").

    • @MrDragonFoxAI
      @MrDragonFoxAI 4 หลายเดือนก่อน

      groq is also old .. 14nm process .. - once they switch to the new samsung node .. it should be vastly different - the big win here is sram vs offdie hbm3e

    • @hjups
      @hjups 4 หลายเดือนก่อน

      ​@@MrDragonFoxAI 14nm is not "old" in the ASIC world. I don't know if Sohu stated a process node, but I would not be surprised if they did a tapeout at 14nm.
      If I recall correctly, the new LPU2 is moving to 7nm and will also have LPDDR5. But that comes with other challenges since Groq relies on cycle-accurate determinism (which is only possible with SRAM). So it's an engineering tradeoff.
      Also keep in mind that a Sohu die is likely much larger and power hungry than a Groq LPU die - I would not be surprised if Sohu was at the reticule limit (again a tradeoff).

    • @MrDragonFoxAI
      @MrDragonFoxAI 4 หลายเดือนก่อน

      @@hjups they did .. they aim for a 4.5nm tsmc node - and probably what did the dev silicon too

  • @jonmichaelgalindo
    @jonmichaelgalindo 4 หลายเดือนก่อน +2

    Can it run a diffuser like SDXL? Or a diffusion-transformer like SD3 / Sora?

    • @jonmichaelgalindo
      @jonmichaelgalindo 4 หลายเดือนก่อน +1

      They claim SD is a transformer, which is half-true, but the VAE is a convolver. A CNN. They specifically said they can't run CNNs. Are they gaslighting right now? Did they sink everything into text-only hardware that's already out of date now that multimodal is taking over?

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน

      Yep, it's been tested with SD3 and can run any transformer based model (SORA included)

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน +1

      This was also my first thought - but they claim to have already run SD3 on the device.

    • @jonmichaelgalindo
      @jonmichaelgalindo 4 หลายเดือนก่อน +1

      @@aifluxchannel I bet they ran just the transformer backbone. Pass the latent noise in over bus transfer, run denoise steps on Sohu's hardware, transfer the latent output back to GPU with stop-over in RAM, then run the VAE. That's a lot of hops.

    • @hjups
      @hjups 4 หลายเดือนก่อน

      ​@@aifluxchannel Do you have a link to this claim? It seems rather foolish to support SD3 since you are necessarily throwing away algorithmic improvements that come with causal attention - i.e. they may have gotten 1M tk/s if they did not support non-causal attention.

  • @cacogenicist
    @cacogenicist 4 หลายเดือนก่อน +3

    Wen consumer home ASIC LLM inference machines for under $4,000? 😊

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน

      On eBay in five years when these start to show up haha.

  • @HaxxBlaster
    @HaxxBlaster 4 หลายเดือนก่อน +1

    Isnt this just theorethical so far, if not, where is the demo?

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน

      No, they're fabbed the prototypes at TSMC already - going into production at TSMC within the month.

    • @HaxxBlaster
      @HaxxBlaster 4 หลายเดือนก่อน

      @@aifluxchannel Thanks for the reply, but i need more to be convinced to see if this can become a consumer product for real. A prototype is one thing, but there could be a lot of other possible obstacles to get to a real product. Good luck to these guys, but i get an instant red flag when its a lot of talk and no real product yet

  • @obviouswarrior5460
    @obviouswarrior5460 4 หลายเดือนก่อน

    4000 € ?
    I have money ! I wish to buy one Sohu (and more after) !
    Where can we buy it ?!

  • @tsclly2377
    @tsclly2377 4 หลายเดือนก่อน +1

    Watch Microsoft as they have gone the complete binary AI route and they have their core of users that they want to service, gaming and business. the one thing that is blatantly obvious it that the trend is for server centralization when that may not be in the best interests of the user or get the best results.

  • @cacogenicist
    @cacogenicist 4 หลายเดือนก่อน +4

    Of course when we have a model with "average human intelligence," it's not going to have the _knowledge_ of an average human -- it will be vastly more knowledgeable than any human.
    It'll be like an average human only in specific sorts of reasoning, where they are rather poor presently.
    The top models are already quite good at verbal analogical reasoning.

  • @ScottyG-wood
    @ScottyG-wood 4 หลายเดือนก่อน +1

    Dead on arrival if data center + inference only is their strat. The initial sales will be great but nvidia, AMD, Intel, and Google are doubling down on merging inference and training. When you’re taking data center, the model that Sohu is taking will not make it. Their positioning will make or break them.

  • @maragia2009
    @maragia2009 4 หลายเดือนก่อน

    What of training. Inference is only one part, training is really, really, really important.

  • @bozbrown4456
    @bozbrown4456 4 หลายเดือนก่อน +1

    Nothing about Power usage / price

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน +1

      We currently don't have any information from Etched regarding price / power usage. However, given the size of the die and the rack density I think its safe to bet that Sohu is far more power efficient than the nVidia B200.

  • @theatheistpaladin
    @theatheistpaladin 4 หลายเดือนก่อน +1

    We need a maba acsci lpu.

  • @K9Megahertz
    @K9Megahertz 4 หลายเดือนก่อน +1

    Not sure it's wise to invest in making an AI chip only for transformers when transformers are not what is going to take us to the next level. Transformers are limited regardless of context size or how many tokens you can generate per second.

    • @IntentStore
      @IntentStore 4 หลายเดือนก่อน +2

      20x faster inference would take us to the next level. This makes AI-ifying everything basically free. It doesn’t have to be the permanent future, it’s just radically improving the usefulness of current models and enabling new applications which couldn’t be done before due to token speeds and latency, like making live voice assistants respond as quickly as a real person, and giving them CD quality voices instead of sounding like a Skype call.

    • @IntentStore
      @IntentStore 4 หลายเดือนก่อน +2

      It also means coding assistants and coding automation go from being sorta possible to; generate a working tested application with 200,000 lines of code in a few seconds.

    • @IntentStore
      @IntentStore 4 หลายเดือนก่อน +1

      Of course as soon as a better architecture is proven, they can immediately begin R&D on an asic for that too, because they will have tons of revenue from their previous successful product and investors.

    • @K9Megahertz
      @K9Megahertz 4 หลายเดือนก่อน +3

      @@IntentStore Faster crap is still crap.
      Transformers will never be able to do that. They fail simple algorithmic coding tests I give them. Because they don't know how to code, only spit out code they've been trained on that aligns with whatever prompt you give it.
      The reason it cant get the answer to one of the problems right is that the answer lies in a single PDF that was on the net 25 years ago. (It's still out there) I know this because I helped fix a bug in the algorithm back then. It's nothing complicated, maybe 100 lines of code.
      In order for a transformer to spit out a 200,000 line program, it would have had to been trained on a 200,000 line program. Multiple in fact. Which means the code had already had to have been written. Which at that point, you already have the code, what do you need a Transformer AI for?
      Transformers wont get significantly better, they cant. They're limited by their probabilistic text predictive noodles. Which just doesn't work for software.

    • @styx1272
      @styx1272 4 หลายเดือนก่อน +1

      @@K9Megahertz What about multivariable Brainchip's spiking neural Akida2 chip with its unique TENN's algorithm ? It may possibly steamroll the competition.

  • @lavafree
    @lavafree 4 หลายเดือนก่อน +2

    Nahhh…nvidia just would integrate more specialized circuits into its chips

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน +2

      But their purpose is to ship narrow general compute devices not asics. The limiting factor is how nVidia GPUs have to rely on batch processes not streaming processes.

  • @nenickvu8807
    @nenickvu8807 4 หลายเดือนก่อน

    There is an issue of the shortsightedness of designing chips to do inference alone, especially that of transformer based models. Transformer based models are probabilistic in nature, and businesses and individuals can't bet on them alone. Other forms of software and hardware needs to pair with this before it really does have long term value.
    It's a good start, but NVIDIA is still ahead. After all, NVIDIA use their GPUS to not just simply generate probabilities that are novel and interesting, they use them to design chips and model physics. Reliability and consistency is the real magic here. And no one, not Meta or Google or OpenAI have been able to hone down AI to the point where it is reliable and consistent. And they will never get there with simple inference chips like these.

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน +1

      Good point, but that stated usecase is far larger than what Etched has set out to solve with Sohu. Sohu is intended for one thing and one thing only, model inference with transformer based models.

    • @nenickvu8807
      @nenickvu8807 4 หลายเดือนก่อน

      ​@@aifluxchannel that's the problem. Competitor chips like Sohu may catch up to past use cases, but the industry has already evolved. Retrieval Augmented Generation is already the standard. Agentic and AI team based approaches is already becoming an expectation. So where is the investment going to come from that pays for this super inference chip that has limited or no collaborative functionality with other software and hardware, even if it inferences more quickly?
      And there is always the threat that the next big model might not be just based in transformers and will require hardware that is different. And the year after that, and the year after that. After all, software grows quickly, hardware rarely does.

  • @jameshughes3014
    @jameshughes3014 4 หลายเดือนก่อน +1

    wow. designing and building hardware that can only run transformers, not knowing if something better would come along. But what an amazing payoff if they can be cost effective on release.

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน

      They have at least 2-3 years to let things play out. At least for now, the only advantages of Mamba and RNN's is their preferable scaling capability. Transformers still have the upper hand in raw performance.

    • @jameshughes3014
      @jameshughes3014 4 หลายเดือนก่อน

      @@aifluxchannel Oh I agree. Even if a novel algorithm takes over in a month and is the new hot thing, there's so much code now that uses transformers that it's gonna be useful.

  • @sativagirl1885
    @sativagirl1885 4 หลายเดือนก่อน

    human years to #AI is like dog years to humans.

  • @frodenystad6937
    @frodenystad6937 4 หลายเดือนก่อน +1

    This is real

  • @testingvidredactro
    @testingvidredactro 3 หลายเดือนก่อน +1

    Good luck, hope they succeed, gpu/tpu/npu/apu manufacturers and clouds just burning the power with their not optimal solutions for ML and charging too much for it...

    • @aifluxchannel
      @aifluxchannel  3 หลายเดือนก่อน +1

      Datacenter acceleration makes sense, but IMO NPU / TPU is a waste of time and engineering horsepowerl.

  • @Zale370
    @Zale370 4 หลายเดือนก่อน +4

    Finally someone showing how Jensen's marketing BS was just that!

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน +4

      Etched and Jensen make very different products focusing on the same market segment.

  • @bobtarmac1828
    @bobtarmac1828 4 หลายเดือนก่อน

    Ai overhype. Can we …cease Ai? Or else be… laid off by Ai, then human extinction? Or suffer an… Ai new world order? With swell robotics everywhere, …Ai jobloss is the top worry. Anyone else feel the same?

  • @dontwannabefound
    @dontwannabefound หลายเดือนก่อน

    Total snake oil

  • @MARKXHWANG
    @MARKXHWANG 4 หลายเดือนก่อน

    And I can make a chip 10000X faster than them

  • @apoage
    @apoage 4 หลายเดือนก่อน +1

    ouh

  • @Maisonier
    @Maisonier 4 หลายเดือนก่อน +14

    This is fake news dude...

    • @aifluxchannel
      @aifluxchannel  4 หลายเดือนก่อน +1

      nVidia will definitely revise their B200 benchmark numbers, but the areas where Etched tweaks their statistics are the same places where Groq and Cerebras have also made claims. Regardless, these chips are currently the most performant when it comes to inference on transformer based models.

    • @HemangJoshi
      @HemangJoshi 4 หลายเดือนก่อน

      I also think so about comparison of all nvidia GPUs, here you have shown that nvidia GPUs are not getting better but they are for compute per energy unit. It is not only getting better but breaking Moore's laws.

    • @GilesBathgate
      @GilesBathgate 4 หลายเดือนก่อน +1

      My question would basically be "how" transformers attention basically do matmul and ffn is also matmul. I can understand how ASICS are better when you algo is sha(sha()) GPUs are not designed to do that, but how are these gains made when the algos are the same? memory access patterns?

    • @AltafKhan-qd1tk
      @AltafKhan-qd1tk 4 หลายเดือนก่อน

      I'm literally working on developing this chip lol

    • @GilesBathgate
      @GilesBathgate 4 หลายเดือนก่อน

      @@AltafKhan-qd1tk So is it just more die area for floating point operation units, by removing stuff that GPU only has for graphics tasks? Is it basically a TPU?

  • @Beeti1
    @Beeti1 2 หลายเดือนก่อน

    So far nothing but claims and nothing to show for it.

  • @lb5928
    @lb5928 4 หลายเดือนก่อน +2

    This clown just said no one is using AMD MI300X. 😂
    Microsoft/Open AI, Oracle, IBM and Amazon AWS and more have announced they are using MI300x.
    Reportedly it is one of the best selling AI accelerators right now.

    • @Manicmick3069
      @Manicmick3069 4 หลายเดือนก่อน +1

      That's where I stopped listening. AMD engineering is world class. That's who NVIDIA needs to worry about. If ROCM comes with better integration, it's game time.