Hi there! I think this blog post does a good job of showing the details! www.guibibeau.com/blog/llm-semantic-cache I don't really do step by step videos as I prefer to focus on overviews and new technologies but hope this helps!
I like how not just the code but its real life use in reduced load time and saving up on the costs of calls being made was brought to the fore. Bit of a tongue twister there with upstash and semantic cache in 3:53 as well, lol. Great video!
@@guibibeau I am sure it will. Most importantly, it has been created and the knowledge shared already. It will always be a resource that can help builders globally, even if it is in the coming years.
oh shit, might use this for a product question chatbot. hmm, this might also be what a img gen i was using a bit ago was using? noticed that if I sent the same prompt, it gave back the same image and if just slightly different, it was just a slightly different image and everything else pretty much exactly the same, with much less variation than other img gens
It could likely be also that the model has it's temperature set to really low. This would reduce variance in the output. Or the model could be overfit to some specific parameters.
This is great, but what if the data we are storing is user specific like for eg their pdf/doc data which is unique for each user, then does this work ?
There is support for namespaces which would allow you to separate your data per user using their id as a namespace key. I’ve not played a lot yet with this but I’m planning a video on it.
You can run this with your own LLM to! The idea is that inferences is either expensive monetarily if you use a third party or on computations if you run it on your own hardware. Either way this will save on computations or money.
@@guibibeau well yeah makes sense, cache is the king, does that apply to everything as long as data is not changing too frequently where we actually care about it
useful thanks
Glad you liked! Working on a follow up that I will post soon
Please make a step by step video tutorial on this, as it is a very good approach in saving costs on LLMs. Thanks a lot😊
Hi there! I think this blog post does a good job of showing the details!
www.guibibeau.com/blog/llm-semantic-cache
I don't really do step by step videos as I prefer to focus on overviews and new technologies but hope this helps!
@@guibibeau thanks
I like how not just the code but its real life use in reduced load time and saving up on the costs of calls being made was brought to the fore. Bit of a tongue twister there with upstash and semantic cache in 3:53 as well, lol. Great video!
Thanks for watching. That was one of my most advanced video yet so I’m unsure if it will find its audience
@@guibibeau I am sure it will. Most importantly, it has been created and the knowledge shared already. It will always be a resource that can help builders globally, even if it is in the coming years.
this is useful, thank you
thanks for the comment!
oh shit, might use this for a product question chatbot.
hmm, this might also be what a img gen i was using a bit ago was using? noticed that if I sent the same prompt, it gave back the same image and if just slightly different, it was just a slightly different image and everything else pretty much exactly the same, with much less variation than other img gens
It could likely be also that the model has it's temperature set to really low. This would reduce variance in the output. Or the model could be overfit to some specific parameters.
OMG!! I submitted a CFP on this topic to a conference.
Hope the repo and video can help! Love that topic! Let me know if your talk is accepted!
@@guibibeau Sure thing!! I'd love to prep with you. It's gonna be awesome!
This is great, but what if the data we are storing is user specific like for eg their pdf/doc data which is unique for each user, then does this work ?
There is support for namespaces which would allow you to separate your data per user using their id as a namespace key.
I’ve not played a lot yet with this but I’m planning a video on it.
@@guibibeau Oh that'll be really great
lol this just sounds like copying LLMs DB into cache, won't it be more easy to run your own LLM?
You can run this with your own LLM to! The idea is that inferences is either expensive monetarily if you use a third party or on computations if you run it on your own hardware.
Either way this will save on computations or money.
@@guibibeau well yeah makes sense, cache is the king, does that apply to everything as long as data is not changing too frequently where we actually care about it