Maximize AI Efficiency With Upstash Semantic Cache: Save On Large Language Model Costs! | Gui Bibeau

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ม.ค. 2025

ความคิดเห็น • 21

  • @abrahamolaobaju1781
    @abrahamolaobaju1781 3 หลายเดือนก่อน +1

    useful thanks

    • @guibibeau
      @guibibeau  3 หลายเดือนก่อน

      Glad you liked! Working on a follow up that I will post soon

  • @codingbyte847
    @codingbyte847 3 หลายเดือนก่อน

    Please make a step by step video tutorial on this, as it is a very good approach in saving costs on LLMs. Thanks a lot😊

    • @guibibeau
      @guibibeau  3 หลายเดือนก่อน

      Hi there! I think this blog post does a good job of showing the details!
      www.guibibeau.com/blog/llm-semantic-cache
      I don't really do step by step videos as I prefer to focus on overviews and new technologies but hope this helps!

    • @codingbyte847
      @codingbyte847 3 หลายเดือนก่อน

      @@guibibeau thanks

  • @opeyemiomodara5888
    @opeyemiomodara5888 4 หลายเดือนก่อน

    I like how not just the code but its real life use in reduced load time and saving up on the costs of calls being made was brought to the fore. Bit of a tongue twister there with upstash and semantic cache in 3:53 as well, lol. Great video!

    • @guibibeau
      @guibibeau  4 หลายเดือนก่อน +1

      Thanks for watching. That was one of my most advanced video yet so I’m unsure if it will find its audience

    • @opeyemiomodara5888
      @opeyemiomodara5888 4 หลายเดือนก่อน +1

      @@guibibeau I am sure it will. Most importantly, it has been created and the knowledge shared already. It will always be a resource that can help builders globally, even if it is in the coming years.

  • @kabirkumar5815
    @kabirkumar5815 4 หลายเดือนก่อน +1

    this is useful, thank you

    • @guibibeau
      @guibibeau  4 หลายเดือนก่อน

      thanks for the comment!

  • @kabirkumar5815
    @kabirkumar5815 4 หลายเดือนก่อน +2

    oh shit, might use this for a product question chatbot.
    hmm, this might also be what a img gen i was using a bit ago was using? noticed that if I sent the same prompt, it gave back the same image and if just slightly different, it was just a slightly different image and everything else pretty much exactly the same, with much less variation than other img gens

    • @guibibeau
      @guibibeau  4 หลายเดือนก่อน

      It could likely be also that the model has it's temperature set to really low. This would reduce variance in the output. Or the model could be overfit to some specific parameters.

  • @toby_solutions
    @toby_solutions 4 หลายเดือนก่อน +2

    OMG!! I submitted a CFP on this topic to a conference.

    • @guibibeau
      @guibibeau  4 หลายเดือนก่อน

      Hope the repo and video can help! Love that topic! Let me know if your talk is accepted!

    • @toby_solutions
      @toby_solutions 4 หลายเดือนก่อน

      @@guibibeau Sure thing!! I'd love to prep with you. It's gonna be awesome!

  • @sreerag4368
    @sreerag4368 4 หลายเดือนก่อน +1

    This is great, but what if the data we are storing is user specific like for eg their pdf/doc data which is unique for each user, then does this work ?

    • @guibibeau
      @guibibeau  4 หลายเดือนก่อน +1

      There is support for namespaces which would allow you to separate your data per user using their id as a namespace key.
      I’ve not played a lot yet with this but I’m planning a video on it.

    • @sreerag4368
      @sreerag4368 4 หลายเดือนก่อน +1

      @@guibibeau Oh that'll be really great

  • @raimondszakis8337
    @raimondszakis8337 4 หลายเดือนก่อน +1

    lol this just sounds like copying LLMs DB into cache, won't it be more easy to run your own LLM?

    • @guibibeau
      @guibibeau  4 หลายเดือนก่อน

      You can run this with your own LLM to! The idea is that inferences is either expensive monetarily if you use a third party or on computations if you run it on your own hardware.
      Either way this will save on computations or money.

    • @raimondszakis8337
      @raimondszakis8337 4 หลายเดือนก่อน +1

      @@guibibeau well yeah makes sense, cache is the king, does that apply to everything as long as data is not changing too frequently where we actually care about it