I hope these specialized chips completely take over the inference market and that future chips take over training at scale too. I would like to see sane prices for GPUs again.
Yeah though hopefully we'll see fab capacity scale to account for both. A lot of the price of the GPU is determined by what price a chip of that size on that fab standard can be sold for. It's one reason why AMD doesn't bring Radeon prices down as low as they could, because it would make it less profitable than Zen, eat into their margins,, and they both need to share capacity for what AMD can get from TSMC. Having more market share as a publicly traded company isn't valuable if it doesn't also mean higher net profit to be able to reinvest into R&D for future performance/feature gains & AMD already learned & got burned that way with Vega.
Pretty sure ai isnt what people are buying those cards for, running things locally is a very very tiny market, and actually running ai models on a commercial level with consumer hardware is... not economical.
like the conversation about how you keep backward and forward compatibility. As a software engineer in the consulting space. Compatibility is the blessing and curse of maintaining code.
Even for Jim Keller it will be a hard task to catch up on 10 years of CUDA and the whole software stack that rests on top of it. I really hope they succeed. Software-Hardware co-design is really the crucial aspect here.
6:20 Delta makes an 60x20 blower that'd fit the formfactor far better, slim down that unit to properly occupy a low-profile slot in say a compact machine like HP's e1000 micro servers, I'd recommend bringing in a cheap low-power microcontroller to monitor and manage the fan's speed as well, to reduce overall system noise and allow you to optimize the fan curve.
Every success with these dev kits that allow developers to get their heads around the hardware and software stacks . The level of transparency and authenticity displayed in all tenstorrent interviews is very encouraging vs watching a slick marketing pitch to hype up the crowd . Many comments are about the LDDR size, and perhaps those are from people wanting to plug in a card, and run a LLM - the amazing tech in the chip and software stacks - with accessibility, is where the value is , as it is not difficult to place more LDDR chips . Our application is a multimodal authentication inference engine at the edge where speed , low power and accuracy are key figures of merit , so we are looking forward to getting our hands on the devkit.
Sorry to use this reference, but as SJ used to say, "Great Products Ship". You cannot try things out unless they're manufactured and in your hands. 'Announcements' don't run LLMs. 😸
So Grayskull is useful for 2016 workloads? Wormhole seems barely useful for today's task either? Maybe image decoders? With that memory limitation. Does the compilation run completely on the card? I mean there is a lot of compute on board - so could I run it as a language model server and then use my system for something else? Or am I supposed to buy 8 of these ... Und put them into a server board with a server CPU? A single groq card is 20k and no memory. Perhaps developer kit, but not researcher friendly. It seems. I want an inference card to run 70B models in my workstation. And preferable directly via accelerate. So I can write device agnostic code. Any model from HF at any precision from fp32, to bf16 to fp16 to quantized models. So your roadmap is to be upstreamed to PyTorch 2.0 natively? That is like half a year late. And today we had the release of pytorch 2.2. Intel is aiming to get their GPU upstreamed by Pytorch 2.5 in October. Which will also be a backend switch to triton. Perhaps I should sign up on the website and share my requirements.
Maybe they explain this in the video but it says TT-Buda: Run any model right away but the grey skull card is only 8GB wont you be limited to models under 8GB or can it leverage your cpu's ram
The latency is too high on the PCIe bus to use CPU RAM for large models with good performance. The only tensor accelerator which I have seen that can effectively run large models fast in shared memory is the Apple M GPU. Apple M can do this because they have a very good unified memory model and high bandwidth internal bus. (I have tried doing this on Ryzen with unified memory but the iGPU is not significantly faster than the CPU for LLM inference. I tested pytorch 2.1 / ROCm 5.7.1, on RDNA2 with Llama2:13b - AMD does not officially support ROCm on this GPU.)
It’s simply not a llm inference machine. Transformers, through highly in hype right now, are a small subset of machine learning. Also, when you nail the architecture, it might be easier to extend into larger memory (bandwidth is the problem, not the size). Asking for more memory is like asking for higher megapixels on cameras, completely forgetting that you need to be able to fill and page that large bucket.
@@dinoscheidt Agreed, and given they are positioned as a dev kit of sorts, this is more than enough memory for someone to get small test builds up and running that will scale to larger pools on future hardware.
Not too bad start. While it might not quite outperform something like 7900XT, pricing is decent, smaller, slightly more efficient, and software support looks already pretty good. But I think 8GB is going to be a bit limiting. Maybe with two cards installed, it could be worth for bigger models. Looking at website, documentation and repos, it is all rather strightforward to use, instructions and structure of pieces is easy to understand. So already ahead of AMD for example. I really hope that the tt-kmd driver gets mainlined into upstream kernel first.
I'm very envious of people that can program FPGAs. I have a masters in Comp Sci and no matter how much I try, I can't get my head wrapped around FPGAs and emacs.
just start with VHDL on a FPGA with a good GUI studio. If thats still too difficult and you have money for a liscense I'd recommend LabView. It can target FPGAs as well as CPUs and is a graphical programming environment (no code solution) that is extremely approachable.
I find that they are easier to program if you take them to their natural habitat.... the countryside. I think they are mainly used by cattle farmers to systematically control a set of access points to the pastures. Basically... if you are out standing in your field (Computer Science) you will be able to figure it out. On a serious not though.... don't give up trying... every attempt, although it may not seem like it, you are getting better at it, somethings just have crazy steep learning curves. I am pretty sure that a lot of concepts you learned in CS took a while to sink in, but they did :) I hope you are able to envy your future self :)
when i studied digital electronics around 2001 we started started with basic logic gates and built a traffic light system. I can't remember the name of the software we used but it was a xilinx fpga we worked with and it was mostly drag and drop placement to bulit up a digital circuit diagram that could be exported to the chip, it was much easier than programming an 8086 microcontroller in assembly language. :)
This reminds me of the Physx add-in cards some 15 years ago. Unfortunately for them, single graphics cards very quickly became fast enough to do in-game physics themselves without requiring a separate card for the purpose. NVIDIA just swallowed Physx whole… as it had done with 3dfx before it. Since then, NVIDIA’s dominance has become all-encompassing. I’ve known NVIDIA almost since its inception…. it’s a hard-nosed company that takes no prisoners. My advice for other A.I. companies is to keep out of NVIDIA’s crosshairs.
I'm very interested to know what Tenstorrent's plans are, if any, for getting their Linux drivers upstreamed into the mainline kernel. Having upstreamed drivers would really go a long way in giving me confidence these cards are going to have long term software support, independent of the fortunes of the company which created them.
The logo will be something that will catch the attention of AMD's legal department... If I would be judge / jury in a trial on the IP, I would most certainly see a conflict with AMD's logo.
So many start-ups / companies today are built with only and only one goal; to demonstrate something narrow and not sustainable on its own, and finally (the Goal) be sold to big tech. Unfortunately, in the process, they must sell their beta or 'dev-kit' product to customers, basically using them as free work force. Competing with Nvidia? Oh, please. This is presented as dev-kit, but for what purpose will someone invest their energy hoping that whole proprietary stack will not die and that it will be able to scale in future? Basically, an example of real life use-case for this dev-kit today in its current form. Regardless of above, it was a pleasure listening Jasmina and Ian discussing the topic. Good job, Ian. And all the best, Jasmina. Hope Nvidia buy you for billions :)
start developing on it :) it's like how arm workstations absolutely sucked until stuff like Altra showed up, there just had to be something available to see what works
@@lbgstzockt8493 That would be slow... I can tell you from practice once your model spills outside of VRAM... it gets very slow. Some small spillover sometimes is not very detrimental, but it slashes your speed 2x or 3x times... of course it's still better than 20x times slower. Nvidia GPUs are literal AI monsters.
I'm interested. What kind of performance difference between Nvidia graphics cards you get with these accelerators? I'm assuming it's not as good as a 4090 or something, but it's still probably significantly better than just running on my 16 core CPU. So like where in that range does this thing sit? Or is it more about the interesting framework that enables more creative development?
Just looking at the bandwidth it would be about 3x slower than RTX 3060... for 2x more money... so not good.... and I don't believe they have faster tensors... but even with faster tensors, the limiting factor is the bandwidth not the tensors.
Now that they have enterprise partnerships, they're getting into the rhythm of shipping. Devs will test, extrapolate use cases, and report feedback. Same as it ever was. Not enough memory? Same for every piece of hardware that gained traction.
8 GB of memory in 2024 just plain sucks. LLMs are all the rage right now, and the smallest one with 7B parameters needs atleast 16 GB VRAM (DDR6 not DDR4). I don't see how anyone would be interested in these over the H100, which everyone drools over. Atleast increase the memory to 128 GB + to drive some interest.
@@TechTechPotato would they be looking into using standard DDR ram as a slower chace or even you an SSD as direct connection. have an SSD throw a lump of bulk transfer of contunes memory load a chunk of model then load in to ram in FILO and have it running in a loop
@@nadiaplaysgames2550 Unloading to SSD will be very slow, even on the fastest SSD. Even spillover to RAM makes the models very slow... I don't even try to unload to the SSD (and possibly destroy it, because it would be very heavily used). I mean if a model fits fully inside the VRAM, if you use the "streaming option", the model will start to answer around 5 seconds, if there is small RAM spillover the answer time will progressively slowdown... to 20 - 30 seconds... if the model runs fully in RAM.... you'll wait 200 - 300 seconds...(depending on context length) which does not look like chat but like sending an e-mail and waiting for an answer... it's possible but not fun at all. If it spills over to the SSD, the answer will probably come after an hour... if the SSD doesn't explode before that.
Hilarious how much ignorance one comment demonstrates. How are you comparing an $800 dev kit card to an H100, which is north of $40K? A literal 50x price difference. Not to mention calling VRAM "DDR6, not DDR4" when GDDR6 is generationally aligned with DDR4, just specialized for GPUs.
@@TechTechPotato Wait, Global Foundries...? But GF doesn't even have a 7/10nm fab -- how would these cards be able to even match a 4060 Ti with a process as old as 14/12nm?
Not heard Jim Keller mentioned since he bailed out of the Ryzen project in 2015. Considering how bad those early CPUs were I’m guessing AMD didn’t listen to his advice. Pretty sure he wouldn’t think having the cache speed locked to RAM speed would be a good idea.
Cool, I’ll look through your old videos. I discovered you through an article you wrote interviewing the head of Intel’s OC lab in 2020. I’m getting back into overclocking and looking for an edge. He said per core overclocking was the way forward but I can’t see how that’s going to improve an of my cpu or 3dmark scores lol.
What the hell are you talking about? Zen 1 saved AMD, it was wildly considered the saving grace for AMD, and even though it wasn't completely killing Intel's Core, it was a viable alternative that everyone celebrated. I also doubt you have any clue how to build a CPU to be commenting on AMD tying RAM and L3 cache clocks together, cache coherency is literally the hardest problem is computer engineering and I would gamble that Intel's Chiplet design will do something similar.
Get a consumer card with 48gb or more memory out there for less than 1500$ and you'll make hundreds of billions on edge AI computing. Pls free us from the green giant and his little red minion.
Just looking at the bandwidth number, it would be slow, even if their tensor cores are fast. You need fast ram and a lot of bandwidth to feed the tensors, otherwise it's slow. Just look at how much bandwidth Nvidia GPUs have. Even if their tensors are faster than Nvidia (which seems impossible to believe), they would need to feed them. Also why didn't they put a lot of that RAM, at least 32GB. 8GB is very small, you can buy 16GB RTX for about that price, which can start working immediately without any hassle.
So Ian, you finally met your match!. A solid Phd in FPGA really good at the stuff with elegance and beauty to match. What can I say ?. A unicorn is so so rare.....yet, we are looking at one!.
for training models used by consumers, probably smaller/more niche ones given big software players have their own chips or have the capital for NV. As a consumer you're probably never going to buy your own inferencing card, but maybe much further down you could see tenstorrent IP in your CPU
Some Nvidia RTX with at least 16GB or more VRAM... so definitely not 8GB... I have an RTX 3060 12GB GPU and it's not enough for the bigger models, and once your model spills into the regular RAM it becomes slow, so more VRAM is better. Also have in mind AMD and Intel will not help you, you'll have a hard time with them (as if you have a problem, almost nobody will help you, because everybody uses Nvidia) to run LLM models on them and the models are optimized only for Nvidia GPUs.
@@Slav4o911 doing some research the 4060ti is highest you can get without spelling a organ I just hope the split memory buss and 8x lanes won't mess me up
Don't know if you're aware but those cores implement a ton of custom instructions optimized for AI. That and all the networking etc is is where they get their tops/flops.
The tensix cores are aupposed to have five control RISC-V core and a large compute engine. I'm not sure what the RISC-V cores in Grayskull actually are though (extension wise).
I was originally very enthousiast on risc-v. But what I hear and see, is that is is just not performant and crashes continuously. I am hopeful for the future, but until it is picked up by a credible company like Qualcom / Intel / AMD / Nvidia / ARM / Samsung / ..., I doubt it will get to a mature point.
OKAY ! WHAT IS AI Accelerator again!??!! CUZ YOU ALL SHOWING HARDWARE BUT IT JUST SOFTWARE!! why keep showing me pci-card when you can literally use usb.2!! is it funny to sell FREE-chatgpt as a new monster graphic chip!!?? i not gamer to fool me by DLSS & RTX !! YOU TALKING TO I.T VIEWER NOT SOME HOME GAMING USER! SO WHO YOU WANT TO FOOL WITH THIS??! WHO!!
I hope these specialized chips completely take over the inference market and that future chips take over training at scale too.
I would like to see sane prices for GPUs again.
Yeah though hopefully we'll see fab capacity scale to account for both. A lot of the price of the GPU is determined by what price a chip of that size on that fab standard can be sold for. It's one reason why AMD doesn't bring Radeon prices down as low as they could, because it would make it less profitable than Zen, eat into their margins,, and they both need to share capacity for what AMD can get from TSMC. Having more market share as a publicly traded company isn't valuable if it doesn't also mean higher net profit to be able to reinvest into R&D for future performance/feature gains & AMD already learned & got burned that way with Vega.
Pretty sure ai isnt what people are buying those cards for, running things locally is a very very tiny market, and actually running ai models on a commercial level with consumer hardware is... not economical.
@@bits360wastakenvirtually every tech company is building software and hardware for local ai, what are you talking about?
8:49 lol at Ian playing it off like he was going for a smell and not a bite when she thought that
like the conversation about how you keep backward and forward compatibility. As a software engineer in the consulting space. Compatibility is the blessing and curse of maintaining code.
Yeah, it seems a bit optimistic, especially when the API is very similar to the hardware.
Even for Jim Keller it will be a hard task to catch up on 10 years of CUDA and the whole software stack that rests on top of it. I really hope they succeed. Software-Hardware co-design is really the crucial aspect here.
They have to hit the hobbyist entry point to make a mark.
Can't wait to get one of these, hold it above my head and shout "I HAVE THE POWER!!!!!"
I can literally hear the guitar riffs!
As Cringer transforms into Battle Cat!
AI-Man: "By the power of Grayskull.... I have the power!" 🙃
6:20 Delta makes an 60x20 blower that'd fit the formfactor far better, slim down that unit to properly occupy a low-profile slot in say a compact machine like HP's e1000 micro servers, I'd recommend bringing in a cheap low-power microcontroller to monitor and manage the fan's speed as well, to reduce overall system noise and allow you to optimize the fan curve.
I think I'm going to gift myself a Grayskull AI Accelerator for my birthday
Facts
Every success with these dev kits that allow developers to get their heads around the hardware and software stacks . The level of transparency and authenticity displayed in all tenstorrent interviews is very encouraging vs watching a slick marketing pitch to hype up the crowd . Many comments are about the LDDR size, and perhaps those are from people wanting to plug in a card, and run a LLM - the amazing tech in the chip and software stacks - with accessibility, is where the value is , as it is not difficult to place more LDDR chips . Our application is a multimodal authentication inference engine at the edge where speed , low power and accuracy are key figures of merit , so we are looking forward to getting our hands on the devkit.
Sorry to use this reference, but as SJ used to say, "Great Products Ship". You cannot try things out unless they're manufactured and in your hands. 'Announcements' don't run LLMs. 😸
The "It does fit in your desktop", is such a underrated burn on NVIDIA/AMD haha 🔥
So Grayskull is useful for 2016 workloads?
Wormhole seems barely useful for today's task either? Maybe image decoders? With that memory limitation. Does the compilation run completely on the card? I mean there is a lot of compute on board - so could I run it as a language model server and then use my system for something else?
Or am I supposed to buy 8 of these ... Und put them into a server board with a server CPU? A single groq card is 20k and no memory.
Perhaps developer kit, but not researcher friendly. It seems.
I want an inference card to run 70B models in my workstation. And preferable directly via accelerate. So I can write device agnostic code. Any model from HF at any precision from fp32, to bf16 to fp16 to quantized models. So your roadmap is to be upstreamed to PyTorch 2.0 natively? That is like half a year late. And today we had the release of pytorch 2.2. Intel is aiming to get their GPU upstreamed by Pytorch 2.5 in October. Which will also be a backend switch to triton.
Perhaps I should sign up on the website and share my requirements.
they should forget that 8GB LDDR and just give us fast access to one or two NVMes, done. I would never complain about memory again.
@@danielme17 For what? You wouldn't be able to feed that compute with the puny bandwidth of an NVMe.
This is amazing news! Looking forward to order one
Maybe they explain this in the video but it says TT-Buda: Run any model right away but the grey skull card is only 8GB wont you be limited to models under 8GB or can it leverage your cpu's ram
yhe same question.. im down to paying that price but if its barely an advantage over a same prices gpu then might as wel buy a more flexible gpu..
The latency is too high on the PCIe bus to use CPU RAM for large models with good performance. The only tensor accelerator which I have seen that can effectively run large models fast in shared memory is the Apple M GPU. Apple M can do this because they have a very good unified memory model and high bandwidth internal bus. (I have tried doing this on Ryzen with unified memory but the iGPU is not significantly faster than the CPU for LLM inference. I tested pytorch 2.1 / ROCm 5.7.1, on RDNA2 with Llama2:13b - AMD does not officially support ROCm on this GPU.)
It’s simply not a llm inference machine. Transformers, through highly in hype right now, are a small subset of machine learning. Also, when you nail the architecture, it might be easier to extend into larger memory (bandwidth is the problem, not the size). Asking for more memory is like asking for higher megapixels on cameras, completely forgetting that you need to be able to fill and page that large bucket.
@@dinoscheidt Agreed, and given they are positioned as a dev kit of sorts, this is more than enough memory for someone to get small test builds up and running that will scale to larger pools on future hardware.
Not too bad start. While it might not quite outperform something like 7900XT, pricing is decent, smaller, slightly more efficient, and software support looks already pretty good.
But I think 8GB is going to be a bit limiting. Maybe with two cards installed, it could be worth for bigger models.
Looking at website, documentation and repos, it is all rather strightforward to use, instructions and structure of pieces is easy to understand. So already ahead of AMD for example.
I really hope that the tt-kmd driver gets mainlined into upstream kernel first.
Hopefully these guys knock nVIDIA down a peg in the future. Competition is good.
Not with 8GB LPDDR... they need VRAM (a lot of VRAM) and as high as possible bandwidth.
I'm very envious of people that can program FPGAs. I have a masters in Comp Sci and no matter how much I try, I can't get my head wrapped around FPGAs and emacs.
just start with VHDL on a FPGA with a good GUI studio. If thats still too difficult and you have money for a liscense I'd recommend LabView. It can target FPGAs as well as CPUs and is a graphical programming environment (no code solution) that is extremely approachable.
I find that they are easier to program if you take them to their natural habitat.... the countryside.
I think they are mainly used by cattle farmers to systematically control a set of access points to the pastures.
Basically... if you are out standing in your field (Computer Science) you will be able to figure it out.
On a serious not though.... don't give up trying... every attempt, although it may not seem like it, you are getting better at it, somethings just have crazy steep learning curves.
I am pretty sure that a lot of concepts you learned in CS took a while to sink in, but they did :)
I hope you are able to envy your future self :)
when i studied digital electronics around 2001 we started started with basic logic gates and built a traffic light system. I can't remember the name of the software we used but it was a xilinx fpga we worked with and it was mostly drag and drop placement to bulit up a digital circuit diagram that could be exported to the chip, it was much easier than programming an 8086 microcontroller in assembly language. :)
What? FPGAs can't be programmed in VIM?
@@sailorbob74133 🤣
This reminds me of the Physx add-in cards some 15 years ago. Unfortunately for them, single graphics cards very quickly became fast enough to do in-game physics themselves without requiring a separate card for the purpose. NVIDIA just swallowed Physx whole… as it had done with 3dfx before it. Since then, NVIDIA’s dominance has become all-encompassing. I’ve known NVIDIA almost since its inception…. it’s a hard-nosed company that takes no prisoners. My advice for other A.I. companies is to keep out of NVIDIA’s crosshairs.
I remember those! Always wondered why they disappeared.
I like to see how much it can accelerate inference. Some performance numbers would be great.
I'm very interested to know what Tenstorrent's plans are, if any, for getting their Linux drivers upstreamed into the mainline kernel. Having upstreamed drivers would really go a long way in giving me confidence these cards are going to have long term software support, independent of the fortunes of the company which created them.
Doctorate in FPGA, impressive
The logo will be something that will catch the attention of AMD's legal department...
If I would be judge / jury in a trial on the IP, I would most certainly see a conflict with AMD's logo.
What things work in the architecture? Do things like Tensorflow work on these chips?
I love the C64 tshirt!
So many start-ups / companies today are built with only and only one goal; to demonstrate something narrow and not sustainable on its own, and finally (the Goal) be sold to big tech. Unfortunately, in the process, they must sell their beta or 'dev-kit' product to customers, basically using them as free work force. Competing with Nvidia? Oh, please.
This is presented as dev-kit, but for what purpose will someone invest their energy hoping that whole proprietary stack will not die and that it will be able to scale in future? Basically, an example of real life use-case for this dev-kit today in its current form.
Regardless of above, it was a pleasure listening Jasmina and Ian discussing the topic. Good job, Ian. And all the best, Jasmina. Hope Nvidia buy you for billions :)
Grayskull is tagged 2021 on their own roadmap? Isn't it too little too late?
in the development of windows support do you consider WSL ?
I doubt it personally
I wish them all the best and success.
The dev kit memory seems a bit tiny doesn’t it? It’s 8gb with a bandwidth of 118GB/s.
What can you do with that?
Maybe it streams from your system RAM and just caches?
start developing on it :) it's like how arm workstations absolutely sucked until stuff like Altra showed up, there just had to be something available to see what works
@@lbgstzockt8493RAM has abysmally low bandwidth.
@@lbgstzockt8493 That would be slow... I can tell you from practice once your model spills outside of VRAM... it gets very slow. Some small spillover sometimes is not very detrimental, but it slashes your speed 2x or 3x times... of course it's still better than 20x times slower. Nvidia GPUs are literal AI monsters.
Yeah that's a $200 6600XT. Not quite sure what their idea is here, especially when GPUs are already extremely efficient for machine learning
Is this the SDI/3dfx 3d accelerator moment for AI Accelerator's?
Nah.... not even close.
@@Slav4o911 Regan star wars and Jurassic park are coming.
I'm interested. What kind of performance difference between Nvidia graphics cards you get with these accelerators?
I'm assuming it's not as good as a 4090 or something, but it's still probably significantly better than just running on my 16 core CPU.
So like where in that range does this thing sit? Or is it more about the interesting framework that enables more creative development?
Just looking at the bandwidth it would be about 3x slower than RTX 3060... for 2x more money... so not good.... and I don't believe they have faster tensors... but even with faster tensors, the limiting factor is the bandwidth not the tensors.
Is there any chance we see RISC-V laptops and pc's? Like Ascalon or anything else.. Or ARM will be only option there?
Now that they have enterprise partnerships, they're getting into the rhythm of shipping. Devs will test, extrapolate use cases, and report feedback. Same as it ever was. Not enough memory? Same for every piece of hardware that gained traction.
Hello, can you stack them to a cluster?
8 GB of memory in 2024 just plain sucks. LLMs are all the rage right now, and the smallest one with 7B parameters needs atleast 16 GB VRAM (DDR6 not DDR4). I don't see how anyone would be interested in these over the H100, which everyone drools over. Atleast increase the memory to 128 GB + to drive some interest.
They're dev kits :) It's in the name.
@@TechTechPotato would they be looking into using standard DDR ram as a slower chace or even you an SSD as direct connection.
have an SSD throw a lump of bulk transfer of contunes memory load a chunk of model then load in to ram in FILO and have it running in a loop
@@TechTechPotatoMy NVidia AGX Orin with 64GB of RAM is also a dev kit :)
@@nadiaplaysgames2550 Unloading to SSD will be very slow, even on the fastest SSD. Even spillover to RAM makes the models very slow... I don't even try to unload to the SSD (and possibly destroy it, because it would be very heavily used). I mean if a model fits fully inside the VRAM, if you use the "streaming option", the model will start to answer around 5 seconds, if there is small RAM spillover the answer time will progressively slowdown... to 20 - 30 seconds... if the model runs fully in RAM.... you'll wait 200 - 300 seconds...(depending on context length) which does not look like chat but like sending an e-mail and waiting for an answer... it's possible but not fun at all. If it spills over to the SSD, the answer will probably come after an hour... if the SSD doesn't explode before that.
Hilarious how much ignorance one comment demonstrates. How are you comparing an $800 dev kit card to an H100, which is north of $40K? A literal 50x price difference. Not to mention calling VRAM "DDR6, not DDR4" when GDDR6 is generationally aligned with DDR4, just specialized for GPUs.
why was it branded "Taiwan" if the contract went to Samsung Fab?
This chip was technically GF I think. Packaging likely done in TW.
@@TechTechPotato Wait, Global Foundries...? But GF doesn't even have a 7/10nm fab -- how would these cards be able to even match a 4060 Ti with a process as old as 14/12nm?
Where to but that?
GROQ is way better right ?
Thank you 👍
Not heard Jim Keller mentioned since he bailed out of the Ryzen project in 2015. Considering how bad those early CPUs were I’m guessing AMD didn’t listen to his advice. Pretty sure he wouldn’t think having the cache speed locked to RAM speed would be a good idea.
I've interviewed him multiple times! Covered his time at Intel, when he left, and his new ventures!
Cool, I’ll look through your old videos. I discovered you through an article you wrote interviewing the head of Intel’s OC lab in 2020. I’m getting back into overclocking and looking for an edge. He said per core overclocking was the way forward but I can’t see how that’s going to improve an of my cpu or 3dmark scores lol.
What the hell are you talking about? Zen 1 saved AMD, it was wildly considered the saving grace for AMD, and even though it wasn't completely killing Intel's Core, it was a viable alternative that everyone celebrated. I also doubt you have any clue how to build a CPU to be commenting on AMD tying RAM and L3 cache clocks together, cache coherency is literally the hardest problem is computer engineering and I would gamble that Intel's Chiplet design will do something similar.
Get a consumer card with 48gb or more memory out there for less than 1500$ and you'll make hundreds of billions on edge AI computing. Pls free us from the green giant and his little red minion.
That's right. That's right. That's right.
whats the performance like
Just looking at the bandwidth number, it would be slow, even if their tensor cores are fast. You need fast ram and a lot of bandwidth to feed the tensors, otherwise it's slow. Just look at how much bandwidth Nvidia GPUs have. Even if their tensors are faster than Nvidia (which seems impossible to believe), they would need to feed them. Also why didn't they put a lot of that RAM, at least 32GB. 8GB is very small, you can buy 16GB RTX for about that price, which can start working immediately without any hassle.
What can i do with this card?
So Ian, you finally met your match!. A solid Phd in FPGA really good at the stuff with elegance and beauty to match. What can I say ?. A unicorn is so so rare.....yet, we are looking at one!.
Get a stack of these and revive your Pi calculations to 100 trillion please!.
Doctorate corn: 2 PhD for the price of 1.
But seriously, where will these chips used from a consumer standpoint?
In pushing Nvidia out of AI so they can return to making graphics cards. :D
for training models used by consumers, probably smaller/more niche ones given big software players have their own chips or have the capital for NV. As a consumer you're probably never going to buy your own inferencing card, but maybe much further down you could see tenstorrent IP in your CPU
that fan placement looks so janky lol
Does anyone have a discount code to share ?
Nice Ian, but the C64 was before your time.
I'm older than I look. The C64 was my first system when I was young.
Soure Code?
For once real engineering talk instead of pure marketing s*it 👍
By the power of...AI!
Benchmarks?
can i use it to run a Minecraft server?
are they hiring?
what card do i need for a local LLM
@user-ef2rv9el9x yup i fix that today o got new card to day
@user-ef2rv9el9xPeople are using Macs and Macbooks because of the unified high speed memory as well.
Some Nvidia RTX with at least 16GB or more VRAM... so definitely not 8GB... I have an RTX 3060 12GB GPU and it's not enough for the bigger models, and once your model spills into the regular RAM it becomes slow, so more VRAM is better. Also have in mind AMD and Intel will not help you, you'll have a hard time with them (as if you have a problem, almost nobody will help you, because everybody uses Nvidia) to run LLM models on them and the models are optimized only for Nvidia GPUs.
@@Slav4o911 doing some research the 4060ti is highest you can get without spelling a organ I just hope the split memory buss and 8x lanes won't mess me up
@@Slav4o911 anything bigger than 16gb its 4090
❤
Can you play games on that thing
Good commodore tshirt. ❤
take my money
So this dev board can only run tiny models from a few years ago? Disappointing. Even their bigger boards only have like 12Gb
No need to support Windows, Cuda does not do it anymore so community shifted to linux completely, support for wsl is quite enough
Would be nice if you can use it in combination with Matlab, interesting product. Interesting woman very eloquent.
Thank you for sharing ;-) We need more female in AI ....Urgent to balance outcome of humanity and AI !!
Will they sell AI chips for consumer?
We need a person who save us from Jensen Huang.
You can buy them now
I see
Anyway, I want see demo running with GraySkull.
That's right
so many RISC-V cores to process ML? I don't believe it worth it
Don't know if you're aware but those cores implement a ton of custom instructions optimized for AI. That and all the networking etc is is where they get their tops/flops.
@@bartios keep custom, remove cores)))
No, this isn't RISC-V cores. It's Tensix cores.
The tensix cores are aupposed to have five control RISC-V core and a large compute engine. I'm not sure what the RISC-V cores in Grayskull actually are though (extension wise).
She mentioned xilinx and altera. Amd bought xilinx and intel bought altera and after that ruined both companies.
8gb of lpddr4…….. for $599…….. bruh 💀. it’s an interesting project don’t get me wrong, but I could do better with an off-the-shelf Nvidia gpu.
It's a developer kit.
I was originally very enthousiast on risc-v.
But what I hear and see, is that is is just not performant and crashes continuously.
I am hopeful for the future, but until it is picked up by a credible company like Qualcom / Intel / AMD / Nvidia / ARM / Samsung / ..., I doubt it will get to a mature point.
nice hardwarep0rn 🙂
Is Ian flirting? 😂
Pity there's too much BOTOX. She cannot even move her mouth anymore, left alone smiling fully. OMGoodness
I feel uncomfortable watching this. Such an awkward thing
OKAY ! WHAT IS AI Accelerator again!??!! CUZ YOU ALL SHOWING HARDWARE BUT IT JUST SOFTWARE!! why keep showing me pci-card when you can literally use usb.2!!
is it funny to sell FREE-chatgpt as a new monster graphic chip!!??
i not gamer to fool me by DLSS & RTX !! YOU TALKING TO I.T VIEWER NOT SOME HOME GAMING USER! SO WHO YOU WANT TO FOOL WITH THIS??! WHO!!
So pretty and smart..
Saying things like that makes someone feel uncomfortable and is weird
I do love Ian's Commodore shirt yes
I do love some well designed and placed pogo pins myself
@@tuqe But is true, but i would say smart first
@@AK-vx4dy nah still comes across as someone who has not spent enough time around women to realize that they are humans