Local vs. Cloud LLMs/RAG - Let's FINALLY End this Debate

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ก.ย. 2024

ความคิดเห็น • 17

  • @zdenekjanda2315
    @zdenekjanda2315 10 วันที่ผ่านมา +6

    Great summary. You could also add one more key point and that is focus on data. You need to have all your foundation data on one place, so you can run agents over everything at same time efficiently. This is key, in my opinion saas will die because exactly of this, you can never maximize yield with data scattered over dozens of apis. With focus on data it's no more local or cloud, it's owned or not. You can for example run all your open source stack with your own data in cloud kubernetes cluster, or even local kubernetes in your home, and use openai strawberry for hard stuff, anthropic for coding and your two rtx 4090 for bunch of task specific 24/7 agents which will keep busy with simple tasks. With such stack and apis, you can use voice controlled aider to code, test and run new versions of software today, while driving home 😅

    • @ColeMedin
      @ColeMedin  10 วันที่ผ่านมา +2

      Thank you! And I love the picture you're painting here haha, that would be the DREAM...
      You hit the nail on the head with SaaS - that's actually been the toughest part for me when I try to justify saving time by adopting another product/API/whatever into my suite of tools. Most of the time, it's best to keep the number of services to a minimum and host as much as you can!
      I am actually working on a video showcasing using Docker compose to run a bunch of services locally for LLMs/RAG - that'll be a good start towards something like what you are describing!

    • @zdenekjanda2315
      @zdenekjanda2315 10 วันที่ผ่านมา +2

      ​@@ColeMedinthen let's make this dream happen. I have sent you mail, let's brainstorm what can be done about it !

  • @lancelot222
    @lancelot222 5 วันที่ผ่านมา +2

    Is there a best cloud RAG you'd recommend? If Chatbase is considered a RAG, the things I didn't like about it are:
    - 11 million character limit (would be great to have something that just synced with your Google Drive)
    - It would go outside your documents to answer questions (even when it was told not to)
    The convenience was great though.
    Awesome video, thanks for breaking this down!

    • @ColeMedin
      @ColeMedin  4 วันที่ผ่านมา

      For a cloud RAG solution I would highly recommend either Supabase (with pgvector) or Pinecone.
      I have not actually heard of Chatbase before, but that certainly doesn't sound ideal that there is a character limit and it always performs RAG even when asked not to...
      I actually have another video on my channel where I build a RAG AI Agent that syncs with a folder in Google Drive! So if you're looking for that and are down to implement something yourself, feel free to check that out:
      th-cam.com/video/PEI_ePNNfJQ/w-d-xo.html

  • @anonymousalias69
    @anonymousalias69 11 วันที่ผ่านมา +2

    Can you please make a video on the best ways to monetize AI-based skills? This is still a very early phase of adoption for businesses and most aren't even aware of how to integrate this technology into their operations.
    We'd love to know how are you doing it!🙂

    • @ColeMedin
      @ColeMedin  10 วันที่ผ่านมา

      I appreciate the suggestion! I am certainly open to making a video like this in the near future - just working on building up an audience first so people will care to hear about my strategies!

  • @tecnopadre
    @tecnopadre 10 วันที่ผ่านมา +1

    I think you are very right on this. Congratulations for your videos

    • @ColeMedin
      @ColeMedin  10 วันที่ผ่านมา

      Thank you very much!!

  • @amerrashed6287
    @amerrashed6287 7 วันที่ผ่านมา +1

    I have mac air m2, most of local host llms very slow. You need really good machine so it's not my option now!😅

    • @ColeMedin
      @ColeMedin  7 วันที่ผ่านมา +2

      Yes that is true! Generally you'll need a GPU with at least 8 GB of VRAM for an 8 billion parameter model like Llama 3.1 8B and 16 GB of VRAM for a 70 billion parameter model like Llama 3.1 70b. So not cheap!
      One fantastic option if you want to use open source models but not run them yourself is to go with Groq!

  • @tecnopadre
    @tecnopadre 10 วันที่ผ่านมา

    I think also the hard answer is when a client asks: how much is the inbound and outbound going to cost monthly? Of course, it depends, although I'm working on a table of users + amount of content, it would be awesome to have a video about it. Models, quantity, prices...

    • @ColeMedin
      @ColeMedin  10 วันที่ผ่านมา

      Yeah great point! So you're looking for a video specifically on how to track token usage/cost when using different models for various use cases?

    • @tecnopadre
      @tecnopadre 10 วันที่ผ่านมา

      @@ColeMedin No! I already do it with Langsmith. I'm just trying to answer with some type of measure, the amount a client is going to spend. How would you reply to that answer? With a table? Which variables should that table have? Users? Lenght of what questions? Models use? I know it's not easy

    • @ColeMedin
      @ColeMedin  10 วันที่ผ่านมา +2

      @@tecnopadre Oh okay sweet! I see what you mean more.
      I would have the client estimate how many requests they think the system will have (put that on them), and then you can estimate the tokens for each request (based on the system message, the # of documents for RAG + chunk size, what you think the output will look like, etc.) and then in a table list out the cost based on tokens for various models. So keeping it simple in terms of the number of variables!

  • @jarad4621
    @jarad4621 6 วันที่ผ่านมา

    Why not FAISS?

    • @ColeMedin
      @ColeMedin  6 วันที่ผ่านมา

      Dude, fantastic question! FAISS is awesome.
      The two biggest reasons are that FAISS isn't something you can plug into n8n super easily and I wanted to just stick to everything that is packaged in this local AI starter kit.
      But I will 100% be making content around FAISS in the near future!