You can run the 405b model on a server with at least 256gb of ram and a vega 56 or 64 graphics card. That specific card can access ram as if it were vram and on some platforms, bypass the CPU. You can also buy optane pm for cheap. Either adding 4x4 optane modules via pcie then raid 0 and add the raid to swap, or as ddr4 dims to just add as a massive ram pool. I'll be experimenting with Vega 20 next week
@@SchoolofMachineLearning I have confirmed A- it is slow, but B- it works. Still waiting for my MI60 to come in and I decided to get a Cascade Lake Xeon Gold system that supports Optane Pmem. Gonna post progress on level1 forums. Right now the problem is memory access time is a major issue. If you want accuracy and don't care about token's per sec, or you just want to tinker, a single Vega 56 is not as bad as you might think. 10sec to first word and that is commendable for such a large model. Don't do any jail-break prompts or it strait locks up. No idea why.
is there a single source of Information which could give details hardware requirements for each of Llama 3.5 Models (i.e. GPU, RAM , Memory , Cache Memory etc.)
I've posted extensive details. The link is in the description box. Meta doesn't officially release any hardware requirements afaik.
3 หลายเดือนก่อน
Thanks a lot for the detailed explanation in the video! I have a question regarding Ollama. Is it possible to use Ollama and the models available on it in a production environment? I would love to hear your thoughts or any experiences you might have with it. Thank you!
good, I will now try to run the 405B model on my 50,000$ pc.
Is even a 50k PC fast enough? ;-)
You can run the 405b model on a server with at least 256gb of ram and a vega 56 or 64 graphics card.
That specific card can access ram as if it were vram and on some platforms, bypass the CPU.
You can also buy optane pm for cheap. Either adding 4x4 optane modules via pcie then raid 0 and add the raid to swap, or as ddr4 dims to just add as a massive ram pool.
I'll be experimenting with Vega 20 next week
I've not tried that but I feel it will be too slow or not usable. Let me know how it goes for you.
@@SchoolofMachineLearning I have confirmed A- it is slow, but B- it works.
Still waiting for my MI60 to come in and I decided to get a Cascade Lake Xeon Gold system that supports Optane Pmem. Gonna post progress on level1 forums. Right now the problem is memory access time is a major issue. If you want accuracy and don't care about token's per sec, or you just want to tinker, a single Vega 56 is not as bad as you might think. 10sec to first word and that is commendable for such a large model.
Don't do any jail-break prompts or it strait locks up. No idea why.
is there a single source of Information which could give details hardware requirements for each of Llama 3.5 Models (i.e. GPU, RAM , Memory , Cache Memory etc.)
I've posted extensive details. The link is in the description box. Meta doesn't officially release any hardware requirements afaik.
Thanks a lot for the detailed explanation in the video! I have a question regarding Ollama. Is it possible to use Ollama and the models available on it in a production environment? I would love to hear your thoughts or any experiences you might have with it. Thank you!
It's not recommended to use in a production environment, it's aimed more at consumer hardware than a production hardware.
um 405b is like 300 gigs good luck with THAT lol
definitely not for the average user :P
Have u tried 405b version localy ? What are ur pc specs ?
Have u guys tried 405b version localy ? What are ur pc specs ?
Most people prefer to run in the cloud as the requirements for a local PC would be incredibly high. Also would recommend running a quantised version.