How To Run Llama 3.1: 8B, 70B, 405B Models Locally (Guide)

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 พ.ย. 2024

ความคิดเห็น • 14

  • @ecchiRhino99
    @ecchiRhino99 4 หลายเดือนก่อน +12

    good, I will now try to run the 405B model on my 50,000$ pc.

    • @DCEntropy
      @DCEntropy 2 หลายเดือนก่อน

      Is even a 50k PC fast enough? ;-)

  • @JonVB-t8l
    @JonVB-t8l 3 หลายเดือนก่อน +5

    You can run the 405b model on a server with at least 256gb of ram and a vega 56 or 64 graphics card.
    That specific card can access ram as if it were vram and on some platforms, bypass the CPU.
    You can also buy optane pm for cheap. Either adding 4x4 optane modules via pcie then raid 0 and add the raid to swap, or as ddr4 dims to just add as a massive ram pool.
    I'll be experimenting with Vega 20 next week

    • @SchoolofMachineLearning
      @SchoolofMachineLearning  3 หลายเดือนก่อน +1

      I've not tried that but I feel it will be too slow or not usable. Let me know how it goes for you.

    • @JonVB-t8l
      @JonVB-t8l 3 หลายเดือนก่อน

      @@SchoolofMachineLearning I have confirmed A- it is slow, but B- it works.
      Still waiting for my MI60 to come in and I decided to get a Cascade Lake Xeon Gold system that supports Optane Pmem. Gonna post progress on level1 forums. Right now the problem is memory access time is a major issue. If you want accuracy and don't care about token's per sec, or you just want to tinker, a single Vega 56 is not as bad as you might think. 10sec to first word and that is commendable for such a large model.
      Don't do any jail-break prompts or it strait locks up. No idea why.

  • @deepaksingh9318
    @deepaksingh9318 3 หลายเดือนก่อน +3

    is there a single source of Information which could give details hardware requirements for each of Llama 3.5 Models (i.e. GPU, RAM , Memory , Cache Memory etc.)

    • @SchoolofMachineLearning
      @SchoolofMachineLearning  3 หลายเดือนก่อน +3

      I've posted extensive details. The link is in the description box. Meta doesn't officially release any hardware requirements afaik.

  •  3 หลายเดือนก่อน

    Thanks a lot for the detailed explanation in the video! I have a question regarding Ollama. Is it possible to use Ollama and the models available on it in a production environment? I would love to hear your thoughts or any experiences you might have with it. Thank you!

    • @SchoolofMachineLearning
      @SchoolofMachineLearning  3 หลายเดือนก่อน +1

      It's not recommended to use in a production environment, it's aimed more at consumer hardware than a production hardware.

  • @gu9838
    @gu9838 4 หลายเดือนก่อน +7

    um 405b is like 300 gigs good luck with THAT lol

    • @SchoolofMachineLearning
      @SchoolofMachineLearning  3 หลายเดือนก่อน

      definitely not for the average user :P

    • @ouso3335
      @ouso3335 3 หลายเดือนก่อน

      Have u tried 405b version localy ? What are ur pc specs ?

  • @ouso3335
    @ouso3335 3 หลายเดือนก่อน

    Have u guys tried 405b version localy ? What are ur pc specs ?

    • @SchoolofMachineLearning
      @SchoolofMachineLearning  3 หลายเดือนก่อน

      Most people prefer to run in the cloud as the requirements for a local PC would be incredibly high. Also would recommend running a quantised version.