ไม่สามารถเล่นวิดีโอนี้
ขออภัยในความไม่สะดวก

Run ANY LLM Using Cloud GPU and TextGen WebUI (aka OobaBooga)

แชร์
ฝัง
  • เผยแพร่เมื่อ 18 ส.ค. 2024
  • In this video, I'll show you how to use RunPod.io to quickly and inexpensively spin up top-of-the-line GPUs so you can run any large language model. It's super easy, and you can run even the largest models such as Guanaco 65b. This also includes a tutorial on Text Generation WebUI (aka OobaBooga), which is like Automatic1111 but for LLMs. Basically, an open-source interface for your LLM.
    Enjoy :)
    Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewber...
    Need AI Consulting? ✅
    forwardfuture.ai/
    Rent a GPU (MassedCompute) 🚀
    bit.ly/matthew...
    USE CODE "MatthewBerman" for 50% discount
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    Media/Sponsorship Inquiries 📈
    bit.ly/44TC45V
    Links:
    Runpod (Affiliate)- bit.ly/3OtbnQx
    Runpod The Bloke Template - runpod.io/gsc?...
    HuggingFace - www.huggingfac...
    Guanaco Model - huggingface.co...
    TextGen WebUI - github.com/oob...

ความคิดเห็น • 175

  • @jeremybristol4374
    @jeremybristol4374 ปีที่แล้ว +21

    I appreciate that you find and post these but also walk us through the setup. Huge time saver! Thank you!

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +2

      My pleasure, Jeremy!

    • @sally60
      @sally60 ปีที่แล้ว

      @@matthew_berman Could you share how I can get an API to use with sillytavern?

  • @wolphiekun
    @wolphiekun ปีที่แล้ว +7

    Would be amazing with a guide like this specifically for setting up the best model for coding with the largest token context window.... for us plebs who do not have access to Anthropic yet 😁 appreciate your hands-on, get started fast kind of flavor here Matthew!

  • @pollywops9242
    @pollywops9242 ปีที่แล้ว +1

    I am doing it now the uncensored was the push i needed 😁

  • @MrPuschel
    @MrPuschel 11 หลายเดือนก่อน +6

    TheBloke is not the author of these models, as stated in the model cards, but provides quantized versions of them.

  • @theresalwaysanotherway3996
    @theresalwaysanotherway3996 ปีที่แล้ว +15

    while 65B models are definetly beyond reasonable consumer hardware, in order to run 33B models, all you need is 8gb VRAM and 32GB system RAM. I get ~1.1 tokens a second using an rtx 3070 and R5 3600. Meaning you can run a lot of these SOTA models using just pretty cheap local hardware.
    Also small correction: The Bloke doesn't make those models, he quantizes them to 4/5bit so that we can all run them. It's super cool that he does that, but he doesn't *make* all those models that you've stated there. Eric Hartford and Tim Dettmers are the 2 big model authors at the moment.

    • @NeuroScientician
      @NeuroScientician ปีที่แล้ว +1

      Any idea what would be the requirement for 65B model? Do I need like full fat A100?

    • @blablabic2024
      @blablabic2024 ปีที่แล้ว +1

      @@NeuroScientician You would need dual 7900 XTX or dual 4090, each of them has 24 GB and in tandem that gives them 48 GB, enough to run and train the 65B model. A100 is 8,000 $ GPU, that's a second hand car price level... you need also a proper CPU to go with that, that's another 5,000 $ ... that's a total of 20k US$ all combined... If you need that type of fire power, then it's better to rent it.

    • @avg_ape
      @avg_ape ปีที่แล้ว

      @@blablabic2024 Hi - How did you calculate the above req?

    • @adams546
      @adams546 ปีที่แล้ว

      Are you using GGML or GPTQ?

    • @begaxo
      @begaxo ปีที่แล้ว

      Can I run 33B models with a 12GB AMD gpu with 32gb ram? If so how? Id be really thankful

  • @joelzola5362
    @joelzola5362 ปีที่แล้ว +4

    I'm surprised you don't have more followers. Keep going!

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +3

      Thank you very much. I hope to continue to grow and educate people on AI topics!

  • @mingyukang6592
    @mingyukang6592 ปีที่แล้ว +5

    Does it cost money to train, and then turn off the GPU and then use it again? And is it impossible to download the trained model to a local machine?

  • @surajthakkar3420
    @surajthakkar3420 11 หลายเดือนก่อน +2

    Hey Matthew,
    Great Video! When can we expect a video about training our own LLM?

  • @jwesley235
    @jwesley235 ปีที่แล้ว +5

    FWIW Ada is not pronounced "Ay-Dee-Ay;" it's "Ayda," as in Ada Lovelace, acclaimed as the first programmer.

  • @autophile525i
    @autophile525i ปีที่แล้ว +4

    Would you use this for only prototyping, or could they be left running reliably to be the hardware in a paid service?

  • @Uterr
    @Uterr ปีที่แล้ว +2

    You should add annotation that when setting up pod you should override persistent storage, because runpod sets persistent storage to 100Gb and it would eat up you budget very fast.

  • @RedShipsofSpainAgain
    @RedShipsofSpainAgain ปีที่แล้ว +5

    I have a question. If you're working with proprietary data or private data (like PII), and you don't want to risk sending that data over the internet to Podman or OpenAI or whatever cloud based model, how would you fine tune your data? Is local training on your own local machine the only option?

    • @manoo2056
      @manoo2056 ปีที่แล้ว +1

      I hope someone or the author answer you. Great question.

    • @Shallowmind
      @Shallowmind 7 หลายเดือนก่อน

      Yes. Or sign a contract with who can train it for you

  • @SirajFlorida
    @SirajFlorida ปีที่แล้ว +2

    Heads up, you can click the copy icon to the right of label so that way you get a pretty paste.

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว

      Yea...thanks. I tried that but got a weird output the first time. Then I tried it again and it was perfect. I'll be doing that going forward!

  • @aihome242
    @aihome242 9 หลายเดือนก่อน +1

    If the model is trained on that pod, can it be saved or downloaded? if the data gets destroyed what is the point of the training? I see this has been asked here but no clear answer. Thanks!

  • @dik9091
    @dik9091 ปีที่แล้ว +1

    thnx man best vid so far for me and my quest to actually get things done ;)

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว

      You're welcome. What are you going to run on it?

    • @dik9091
      @dik9091 ปีที่แล้ว

      @@matthew_berman myself

  • @HampusAhlgren
    @HampusAhlgren 8 หลายเดือนก่อน +1

    Quick note: the block isn't actually the author for the models he just converts existing models to support llama.cpp

  • @kitrunner6596
    @kitrunner6596 3 หลายเดือนก่อน

    Your tech tutorials are bar nun the best. clear concise with exact trouble shoot fixes. can you give a tutorial on how to us vs code with run pod through ssh. each time the server is to connect its asking for a password. I've gone through there trouble shoot but nothing is working.

  • @Anarchy-Is-Liberty
    @Anarchy-Is-Liberty 8 หลายเดือนก่อน

    But we need a tutorial on training!! That's what a lot of us need! I want to build my own models for my own business, so I need to figure out how to train the AI to have full understanding and data of what I'm doing. Is there any videos you can point me to so I can start learning how to train this AI?

  • @EAAIO
    @EAAIO 6 หลายเดือนก่อน

    Thanks for your tutorial, save me thousands of dollar, just to try this. now I can test.

  • @fangornthewise
    @fangornthewise 9 หลายเดือนก่อน +1

    How do we know if we need an RTX 6000 or if the 4090 is enough?
    Those extra cents of USD do stack up after a while for us in "developing" countries.

  • @goldhydride
    @goldhydride ปีที่แล้ว

    it's the conten we deserve😭 everything is to the point, love this. especially i love your videos where you show us recent papers.
    could i ask you a question about what computer characteristics should i have to use gpu cloud successfully? what characteristics of built in cpu and gpu do i need?

  • @hermysstory8333
    @hermysstory8333 ปีที่แล้ว +2

    It seems like TheBloke's template is missing.

  • @moon8013
    @moon8013 10 หลายเดือนก่อน +1

    would like a video of how to train a model using those steps?

  • @camelCased
    @camelCased 2 หลายเดือนก่อน

    Wondering if this method would also work with large Llama 3 models? Or is there any better and cheaper version to run them and having an API endpoint? I'm just starting getting familiar with SillyTavern and roleplaying, and that one supports OobaBooga, Ollama and Kobold (which I'm using locally for running small models).
    Also, it would be nice to know how to store the models on Runpod (if that's not too expensive) to avoid waiting while they download every time when I run the pod.

  • @nezukovlogs1122
    @nezukovlogs1122 11 หลายเดือนก่อน +1

    When you say dollar per hours, does that per hour mean GPU processing time of uptime of GPU server whether its being used or not

    • @d.paradyss8791
      @d.paradyss8791 7 หลายเดือนก่อน

      It means uptime of gpu server i think, for me its to expensive to generate worst text than chatgpt and worst images than midjourney

  • @zilibabwei
    @zilibabwei ปีที่แล้ว +1

    This is great! Tysm! You're channel is an amazing resource! I fell asleep last night still connected to the RTX 6000 and woke up 7 bucks down! Lol. But it just goes to show you that its really not too expensive to use these resources! It only took me about 5 minutes to download TheBlokes Guanaco65B? Does that seem right? I was expecting much longer.

    • @zilibabwei
      @zilibabwei ปีที่แล้ว +1

      My main goal is to have a python-coding assistant with me all the time. Something that is great at generating code based on english prompts. (I'm not so worried about it knowing the US presidents! lol! Or anything else, for that matter - I just want a little Ai bot thats obsessed with writing code!) I also want it to remember from one session to the next for longer projects. Does this exist already? Does Anyone know? And if it doesn't, how do I train up maybe like a barebones model to sort of become that?

  • @SzczepanBentyn
    @SzczepanBentyn ปีที่แล้ว +1

    Is it possible do download a trained model to my local machine?

  • @DarenZammit
    @DarenZammit ปีที่แล้ว +1

    Thanks for your videos, really informative! HuggingFace URL in description goes to the wrong one :P

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว

      Lol are you sure I didn't really want to send you to an emoji website? (jk, fixed, thank you)

  • @avg_ape
    @avg_ape ปีที่แล้ว

    Thanks for the vid. Great find & insight. Can you make a vid that reviews some of the Bloke's models?

  • @Laberding
    @Laberding 4 หลายเดือนก่อน

    Great video! Can you explain why there usualy is a disconnect buttun but not in this case? If I terminate, I have to setup the model every time again?

  • @Larimuss
    @Larimuss 21 วันที่ผ่านมา

    The bloke doesnt work anymore. Can you please do a new guide for setting up text gen on runpod?

  • @cyraq_0x248
    @cyraq_0x248 ปีที่แล้ว +1

    Do you need to configure a new pod and download the model every time you want to try a LLM?

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w ปีที่แล้ว +2

    can we fine tune for downstream task using it?

  • @contractorwolf
    @contractorwolf 11 หลายเดือนก่อน +1

    great content Matthew, thx

  • @DanRegalia
    @DanRegalia 9 หลายเดือนก่อน

    Hey Matthew... Thanks so much for these videos.. I've been binging on them at work and home in my spare time. It's my goal to be running a small local server for the house soon, with a P40 I picked up on Ebay. I saw this, and I wanted to know if you have any videos that show how to setup and run these locally (via oobabooga) and take advantage of setting up the parameters... I'm also curious if it's possible to have multiple models available to use. Thanks again for all this. Digging the MemGPT and Autogen videos you've done. Just amazing.

  • @rayankhan12
    @rayankhan12 ปีที่แล้ว +1

    Nice!! I always wanted to know how to run open source LLMs on cloud services like AWS, Azure and GCP.. but they're so complicated... I've started a GCP course on YT too but it's still difficult to learn

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +1

      Runpod is suuuuper easy. Enjoy :)

    • @zion9142
      @zion9142 ปีที่แล้ว

      But you have to terminate your work. There's no point in training if it will be deleted afterward.

  • @jgrayhi
    @jgrayhi 9 หลายเดือนก่อน +1

    Thanks!

  • @oolegdan9813
    @oolegdan9813 10 หลายเดือนก่อน +1

    Hi could you please guide me I'm not able to set up "Runpod The Bloke Template" when I hit connect and hit connect to HTTP port I am redirected to a new window and there I see this message "Confirm the character deletion?" please let me know that I am doing wrong.

    • @raducamman
      @raducamman 10 หลายเดือนก่อน

      same here. But it worked a couple days ago. I think something happened to the template.

    • @raducamman
      @raducamman 10 หลายเดือนก่อน

      so it seems only the interface is a bit messed up. There are some layers of div on top of it and you can just delete them from the browser, until they fix the issue.

    • @oolegdan9813
      @oolegdan9813 10 หลายเดือนก่อน +1

      @@raducamman Thanks It loaded for me today and it works just as intended :)

  • @jovialjack
    @jovialjack 9 หลายเดือนก่อน

    those options arent coming up for me when i choose "prompt" it only says 4 options. i dont know why but ive tried this a few times AND spent money on it... doesn't seem to work :/

  • @Adamskyization
    @Adamskyization ปีที่แล้ว

    Can you stop the machine to save all of the configuration but stop the consumption of resources so you don't get charged while pod is stopped?
    So you can later just choose a preconfigured pod and run it?
    Without having to download the model again etc...?

    • @user-jp8sj2ch1r
      @user-jp8sj2ch1r ปีที่แล้ว +2

      I was thinking about the same question :)

  • @NeuroScientician
    @NeuroScientician ปีที่แล้ว +1

    I am trying to work with 30B models, I would like to have a test inference machine at home. Would 7900XFX/4090 do? How about older enterprise stuff like L40/M40? I am thinking of using RunPod or LambdaLabs for training.

    • @blablabic2024
      @blablabic2024 ปีที่แล้ว

      Yes, 7900 XTX should suffice, you'll wait a little more than on 4090 for training time but you'll save 1,000 $, 1,000$ that you can spend on top spec Ryzen 9 CPU and nice amount of RAM. You can always (if you have enough funding) get another 7900 XTX and run them in pair in order to run and train 65B model. I'll most probably go the same route.

  • @JoseP-cw3je
    @JoseP-cw3je ปีที่แล้ว +2

    Pretty good video but how do i change the ui to chat mode? I get a error when i try to change it.

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว

      What error are you getting?

    • @JoseP-cw3je
      @JoseP-cw3je ปีที่แล้ว

      @matthew_berman thank you for your reply, I get bad gateway after I choose the chat option on the ui. And then I cannot reconnect to the server unless that I restart it.

  • @jamesalexander4411
    @jamesalexander4411 9 หลายเดือนก่อน

    I've credited my Runpod account, however, following selecting 'the RTX 6000ADA' and the 'The Bloke Template' Runpod gives me this message "There are no longer any instances available with enough disk space". Does this mean there's no available space for me to run an LLM at this time?

  • @JJSleo-bw9fr
    @JJSleo-bw9fr ปีที่แล้ว

    Can you show how to create a persistent instance so the data is not destroyed but not being charged either? Is there a way to load it onto a SSD to use later (but not be charged for the GPU only SSD space)?

  • @PsyGenLab
    @PsyGenLab 3 หลายเดือนก่อน

    thebloke template is broken atm
    use this instead

  • @k9clubme
    @k9clubme ปีที่แล้ว +1

    much appreciated for the info. BTW, which model is the closest to GPT4 at the moment?

    • @Utoko
      @Utoko ปีที่แล้ว +1

      Claude, but I guess you mean open source than Flacon 40B according to the huggingface leaderboard. but if you want locally Wizard-Vicuna 13B is really good (on the 3.5GPT level).

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว

      I just dropped a new video about Guanaco, which is def the closest to GPT4.

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +1

      Stefan - I tested Falcon and it's unusably slow. Hopefully that'll be fixed soon. Right now Guanaco is the best.

    • @k9clubme
      @k9clubme ปีที่แล้ว

      @@matthew_berman Thank you very much for all your efforts

  • @except_fab
    @except_fab 9 หลายเดือนก่อน

    Got this working with AutoGen as expected. Works great, thanks!
    Running in the error " This model maximum context length is 2048 tokens. However, your messages resulted in over 2170 tokens."
    Have parameter max_consecutive_auto_reply=30 and "max_model_tokens": 1200 on the llm_config
    Can't really get anything good out of it other than making it run.
    Suggestions?

  • @RobertAlexanderRM
    @RobertAlexanderRM ปีที่แล้ว

    The downloading of meta-llama/Llama-2-13b-hf via runpod's textgen gives a 403 unauthorized error. How to fix ?

  • @Gorto68
    @Gorto68 ปีที่แล้ว

    When I try following these instructions, I can not move beyond the save settings for model. It doesn't show saved, rather the same error message on loading. Likewise if I try hitting reload. Nothing happens. Can you please suggest what I might be doing wrong? Note: I only have this problem for the model in this video. I had no problem with the Wizard-Vicuna-30B-Uncensored following the instructions in the other video.

  • @angel1st007
    @angel1st007 ปีที่แล้ว +1

    @matthew_berman - great job, doing these videos. One question though - once I have the model configured, is there a way, I can use it via API interface from that could GPU instance?

    • @dik9091
      @dik9091 ปีที่แล้ว

      check the serverless tab besides the gpu tab

    • @angel1st007
      @angel1st007 ปีที่แล้ว

      @@dik9091 If I go with Serveless instead of GPU cloud, will that allow me to run the model with acceptable performance? The use case is basically to run one of those models via API and use it as OpenAI API alternative. Would that be possible?
      @matthew_berman - if you can make a video on that topic, that would greatly appreciated. Thanks!

    • @dik9091
      @dik9091 ปีที่แล้ว

      @@angel1st007 from what i understand yes that is exactly the point of serverless. Anyone can make some serious money with this when there are models that outperform gpt4 on a private cloud.

    • @angel1st007
      @angel1st007 ปีที่แล้ว +1

      @@dik9091 - can you by any chance point me to a guide on how such serverless service with the LLM model can be spun up? I really appreciate any help you can provide.

    • @dik9091
      @dik9091 ปีที่แล้ว

      @@xlretard send me an invite when you have it setup pls

  • @ktolias
    @ktolias ปีที่แล้ว

    Amazing job! Thanks for sharing. I tried to train the same model on Runpod, but I had some difficult time. Can you please make a fine-tune video? Much appreciated!

    • @dazedandcold
      @dazedandcold 8 หลายเดือนก่อน

      Run pod sucks!!

  • @SantyBalan
    @SantyBalan ปีที่แล้ว

    Can the web ui be used for multiple users.. like i set it up and create a few logins for other users to try out different models ? Assuming the http server is accessible

  • @avi7278
    @avi7278 ปีที่แล้ว +1

    does this install the latest version of textgen web ui? I saw in other videos that runpod has an old version as default. Also, the end wasn't very clear. You have to pay for the machine as if it's running to keep your data?

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว

      Use TheBloke's version, it's the best implementation with all the right things downloaded already.
      Some machines have a "data only" mode which is much less expensive. Then when you want to use it you spin it back up and pay the regular price with your existing data. But yes, you need to pay to keep your data.

    • @zion9142
      @zion9142 ปีที่แล้ว

      Please do a video on this.

  • @Rundik
    @Rundik ปีที่แล้ว

    Can I use them for mining? Not crypto, but vanity addresses etc

  • @dirklaubscher2369
    @dirklaubscher2369 4 หลายเดือนก่อน

    i'm getting an HTTP Service [Port 7860] not ready message. what do i do?

    • @Mirza-sb2gl
      @Mirza-sb2gl 4 หลายเดือนก่อน

      same, have you figured out how to fix it?

  • @kel78v2
    @kel78v2 4 หลายเดือนก่อน

    I keep seeing HTTP Service [Port 7860] Not Ready no matter what GPU I choose

  • @zeonos
    @zeonos ปีที่แล้ว

    Do these providers charge pr use or for just spinning it up and having it idle?

  • @PrincessRedine
    @PrincessRedine ปีที่แล้ว

    I am getting this error when running pygmalion13B Traceback (most recent call last): File “/workspace/text-generation-webui/modules/GPTQ_loader.py”, line 17, in import llama_inference_offload ModuleNotFoundError: No module named ‘llama_inference_offload’
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last): File “/workspace/text-generation-webui/server.py”, line 62, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “/workspace/text-generation-webui/modules/models.py”, line 66, in load_model output = load_func_maploader File “/workspace/text-generation-webui/modules/models.py”, line 262, in GPTQ_loader import modules.GPTQ_loader File “/workspace/text-generation-webui/modules/GPTQ_loader.py”, line 21, in sys.exit(-1) SystemExit: -1

  • @Syn_Slater
    @Syn_Slater ปีที่แล้ว +1

    Handy video, thanks!

  • @ybwang7124
    @ybwang7124 ปีที่แล้ว +1

    so is it as good as GPT4.0? the description is confusing

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว

      No, but it’s very close. Very very close.

  • @djryanashton
    @djryanashton ปีที่แล้ว

    Your videos are very good. One thing I needed to do to get it to run was to edit the pod and increase the volume space.

  • @leonwinkel6084
    @leonwinkel6084 ปีที่แล้ว +2

    Is there a way to call it all via api?

    • @davidnobles162
      @davidnobles162 ปีที่แล้ว +1

      I have the same question, I'm looking into it right now..

    • @gr8ston
      @gr8ston ปีที่แล้ว

      @@davidnobles162 did you figure out? I am able to set up the UI template, however i want the same to be accessed via API in my local python code.

    • @davidnobles162
      @davidnobles162 ปีที่แล้ว

      @@gr8ston I did! super simple to set up. not sure why youtube won't let me post the instructions

    • @davidnobles162
      @davidnobles162 ปีที่แล้ว +1

      @@gr8ston dnobs/runpod-api

    • @davidnobles162
      @davidnobles162 ปีที่แล้ว

      somehow youtube is blocking every comment where I explain ANYTHING. Goodluck..

  • @ecorodri26
    @ecorodri26 ปีที่แล้ว

    Could some one help me? Does oogabooga webui interface work locally CPU+GPU usage with the gpu instalation?

  • @olafge
    @olafge ปีที่แล้ว +1

    I'd like to store TheBloke's template so that I always have easy access to it. How to do that?

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว

      Just click the link and it is stored in runpod!

  • @generichuman_
    @generichuman_ 9 หลายเดือนก่อน +1

    Is there an option to access this via an api endpoint?

    • @paulkiragu8120
      @paulkiragu8120 7 หลายเดือนก่อน +2

      Use a service like ollama web UI which you can configure to talk to external llm models

    • @waynehawley814
      @waynehawley814 4 หลายเดือนก่อน

      I have a video on my page showing you how to use the Text Generation WebUI with its API extension

  • @eyemazed
    @eyemazed 9 หลายเดือนก่อน

    im confused about the pricing. say i do 2 inferences per hour for a day, is that still 24 hours charged for that day?

    • @mag0b3t0
      @mag0b3t0 2 หลายเดือนก่อน

      only while you're using, until you terminate it 7:34

  • @MrVbrabble
    @MrVbrabble ปีที่แล้ว +1

    Is there a way to read and store files on your CPU or designated drive for this method?

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +1

      Can you clarify what you're trying to do?

    • @MrVbrabble
      @MrVbrabble ปีที่แล้ว

      @@matthew_berman Using this setup I would like it to read and write files on my cpu. For example say I have a PDF or Text file, I would like to read the file,understand and summarize it and save it to a txt file. Thank you.

  • @user-fy6de1lq3i
    @user-fy6de1lq3i ปีที่แล้ว

    It doesn't let me open HTTP it's making use Jupter lab. I have no idea how to switch please help. I have no option for HTTP, i'm losing money as I type this please help!

  • @gelbandz
    @gelbandz ปีที่แล้ว

    Thanks this worked great!

  • @Imran-Alii
    @Imran-Alii ปีที่แล้ว

    @Loved it... I appreciate your work!!!!

  • @A.M.8181
    @A.M.8181 ปีที่แล้ว

    how download my dataset for learning Lora model in this cloud?

  • @nikog8326
    @nikog8326 ปีที่แล้ว

    How do I change the install location when pasting and donwloading an LLM into that 'model' section on RunPod?

    • @nikog8326
      @nikog8326 ปีที่แล้ว

      It keeps saying no space left on device

  • @TheStallion1319
    @TheStallion1319 3 หลายเดือนก่อน

    What is the benefit of running cloud gpu vs locally , is there any pro for a local gpu

    • @mag0b3t0
      @mag0b3t0 2 หลายเดือนก่อน

      not having to wait for a model to download and start running every time, and paying them for their slowness in this aspect (you're in theory renting just a GPU but you're paying for the whole system uptime apparently, regardless of actual GPU usage)

    • @TheStallion1319
      @TheStallion1319 2 หลายเดือนก่อน

      @@mag0b3t0 no I meant it in a practical way , is there any technical difference , assuming am using a model in developing an application , would my experience of running it on the cloud be different from running it locally , is there something I wouldn’t be able to do for example or do less efficient

  • @islamicinterestofficial
    @islamicinterestofficial ปีที่แล้ว

    Thanks for the video. Can we use lmsys/vicuna-33b-v1.3 model in it? Or we can only use those models which are associated with TheBroke author???? Please answer that, much appreciated

  • @puredingo9348
    @puredingo9348 ปีที่แล้ว

    So does this mean I won't have to put OobaBooga into my PC to run it?

  • @DailyProg
    @DailyProg 6 หลายเดือนก่อน

    Can you post something like this but for GCP or Azure?

  • @thomasalderson368
    @thomasalderson368 21 วันที่ผ่านมา

    this doesn't seem to work anymore...

  • @Suro_One
    @Suro_One ปีที่แล้ว +1

    Cool, thanks!

  • @qbert4325
    @qbert4325 7 หลายเดือนก่อน

    Is there any free cloud option

  • @jebathuraijb4374
    @jebathuraijb4374 ปีที่แล้ว

    Can I load the gptq model in colab

  • @nat.serrano
    @nat.serrano 10 หลายเดือนก่อน

    and how can I add an api?

  • @wsy987
    @wsy987 ปีที่แล้ว

    I can only load GPTQ, but not GGML, weird.

  • @meworlds8216
    @meworlds8216 ปีที่แล้ว

    this template doesnt have jupyter notebook it sucks no?

  • @morespinach9832
    @morespinach9832 2 หลายเดือนก่อน

    We don’t need GPU if real time high performance is not a must.

  • @urmatallatra
    @urmatallatra ปีที่แล้ว +1

    perfect

  • @yyyzzz-k3r
    @yyyzzz-k3r ปีที่แล้ว

    My page only has 1 GPUs, it doesn't contain 4 GPUs

  • @behnamplays
    @behnamplays ปีที่แล้ว +1

    not affordable tho. If I run it for a day (e.g., 20 hours), I'll be charged the same monthly amount I'm paying for cgpt 🤔

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +1

      True. But, you can turn it off/on easily. So you can use 20 hours over the course of a month.

    • @gr8ston
      @gr8ston ปีที่แล้ว

      use bidding price. i got a gpu worth 1.79$ hour for just 10% of its cost at 0.179$ as a bidding price.

    • @zion9142
      @zion9142 ปีที่แล้ว

      ​@@matthew_bermanin your video you said there's a terminate button and all the data is gone. So how do you turn it off?

  • @michael_gaio
    @michael_gaio 9 หลายเดือนก่อน

    that’s awesome

  • @darkbelg
    @darkbelg ปีที่แล้ว +1

    I feel like you failed to underline that you can get an A100 for 2 dollars an hour!

    • @darkbelg
      @darkbelg ปีที่แล้ว

      Also TheBloke uses Runpod to make the models.

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +2

      Why is that so important? :)

  • @simonherd1768
    @simonherd1768 ปีที่แล้ว

    Thanks

  • @christopherchilton-smith6482
    @christopherchilton-smith6482 ปีที่แล้ว +1

    Do one of these for Starcoder :)

    • @matthew_berman
      @matthew_berman  ปีที่แล้ว +1

      Absolutely! Now that I know how to run it :)

    • @christopherchilton-smith6482
      @christopherchilton-smith6482 ปีที่แล้ว

      @@matthew_berman I am salivating with expectation. I can't wait to try Starcoder out! You rock man!

  • @QEDAGI
    @QEDAGI ปีที่แล้ว +6

    You're not the first to post an episode using Runpod -- you're the latest. While it's great to use dedicated Runpod servers, why don't any of y'all post using their less expensive community servers?

    • @dazedandcold
      @dazedandcold 8 หลายเดือนก่อน

      Run pod sucks balls!! 👎🏽

  • @Lucasbrlvk
    @Lucasbrlvk ปีที่แล้ว +1

    👍😯

  • @holdthetruthhostage
    @holdthetruthhostage ปีที่แล้ว

    My brother i hope it works and affordable

  • @BarkSaw
    @BarkSaw 8 หลายเดือนก่อน

    fuck it's paid holy shit if I am trying to make an uncensored GPT I dont want my credit card linked to it

    • @paulkiragu8120
      @paulkiragu8120 7 หลายเดือนก่อน

      What do you mean is it paid. How cheap can you get to expect GPU to be given out for free 😮

    • @BarkSaw
      @BarkSaw 7 หลายเดือนก่อน

      @@paulkiragu8120 Definitely a “😮” moment

  • @IvanRosaT
    @IvanRosaT 7 หลายเดือนก่อน

    while this is awesome! is there a way to use oobabooga to generate images ?

  • @klammer75
    @klammer75 ปีที่แล้ว +1

    I was able to get the smaller models working but not this guanaco-65b one….basically a config error that this file doesn’t have a file named config.json….then has a link that goes to a hugging face 404 page🫣🤷🏼‍♂️😔

    • @CodyRiverW
      @CodyRiverW ปีที่แล้ว +1

      Same

    • @klammer75
      @klammer75 ปีที่แล้ว

      Can’t even get the smaller models to work now🤷🏼‍♂️🤦🏼‍♂️

  • @JosephConroy
    @JosephConroy ปีที่แล้ว

    Thanks!

    • @JosephConroy
      @JosephConroy ปีที่แล้ว

      Hey Matthew - did you ever get a chance to make a video on training models in the RunPod GUI?