ความคิดเห็น •

  • @Dillonvu
    @Dillonvu หลายเดือนก่อน +5

    Very excited it's for Flash too! This'll help a lot at work for certain features!

  • @deeplearning7097
    @deeplearning7097 หลายเดือนก่อน +2

    Brilliant Sam, as always, thank you very much.

  • @rluijk
    @rluijk หลายเดือนก่อน

    Thanks! Nice explainer. Will integrate this part in my setup!

  • @danangjeffry
    @danangjeffry หลายเดือนก่อน

    Very useful and easy to understand. Thank you!

  • @gemini_537
    @gemini_537 หลายเดือนก่อน

    This is super useful! ❤

  • @ylazerson
    @ylazerson หลายเดือนก่อน

    great video - thanks!

  • @darshank8748
    @darshank8748 หลายเดือนก่อน

    Great video!

  • @miriamramstudio3982
    @miriamramstudio3982 หลายเดือนก่อน

    Very useful. Thanks

  • @SwapperTheFirst
    @SwapperTheFirst หลายเดือนก่อน +3

    I can immediately see how to use it for quite cheap similarity search. Assuming that you have 1M strings to match, you can put all of them into context window and then ask model each time to find a similar string. Though it will be quite long - it will be not expensive with caching (storing tokens in RAM or on SSD). But this is not scalable to 1B strings. RAG approach is also not possible/feasible.
    Sam, maybe you have some advice on how to solve a similarity search on a scale? On smaller scale you can solve this character-wise, using rapidfuzz or dedupe.
    But how you can solve it on a scale? This business problem is known as "entity matching" or "fuzzy entity matching".
    For example, you want to match "Microsoft corp" to "Microsoft corporation" to "MSFT". Also you want to cluster similar strings under the same unique umbrella - "Microsoft corporation" in the example.
    You can also use regular vector search, but the problem with clustering... How to "shuffle" thru 1B rows to create a reliable index and then make it updateable in real-time.

    • @samwitteveenai
      @samwitteveenai หลายเดือนก่อน +2

      Very interesting comment. AFAIK the most commonly used (combining accuracy, cost and efficiency) models for entity matching are encoder based Bert/RoBerta style models. LLMs can certainly do it they just end up being slow. It would be interesting to see if you could do it with a long context model and perhaps store the list of entities in the prompt and only have give new ones as the output. The challenge is it would still be way to slow for anything real time. This is an interesting challenge, let me think about it a bit more and look for a dataset to test on.

    • @JayanaKalansuriya
      @JayanaKalansuriya หลายเดือนก่อน

      Hi if possible would love to get some advice from you guys!
      There’s a requirement where I have a master catalog of products with of over 5000 products with images and product text
      I want to build a similarity matching solution, in which if i upload an image and if there’s a similar product in the master catalog, I want to find that out
      Basically we need to do a image similarity mapping and the most similar image that matches with the input image should be shown
      How can I work on this? Any advices would be much appreciated!

    • @SwapperTheFirst
      @SwapperTheFirst หลายเดือนก่อน

      @@JayanaKalansuriya with the scale of 5K or so you don't need anything complex.
      Just grab some multi-modal (visual is enough) embedding model and then organize a vector search.
      It is just a regular RAG app, with some small twist, that you create embedding for images as well and also will serve images in the RAG response.
      Or use some managed solution from Google Cloud.

    • @SwapperTheFirst
      @SwapperTheFirst หลายเดือนก่อน

      @@samwitteveenai Thanks a lot, Sam. You're right about Bert/Roberta and these are used in spacy "transformer" model for Named entities extraction. And spacy always is using the latest and greatest, unlike NLTK, which is very conservative and has lots of legacy stuff for compatibility.

    • @eddiehaug
      @eddiehaug หลายเดือนก่อน

      ​@@JayanaKalansuriya - Not sure what your specific use case is, but you might want to take a look at Google's recommendations AI, and there's a Ranking API as well.

  • @matty-oz6yd
    @matty-oz6yd หลายเดือนก่อน

    Any idea how this works? I am trying to work out whether to use an index of relevant context or use the context cache feature. It seems like the details are a closely guarded secret which would mean the only way for me to decide between the two would be to test both lots.
    The use cases seem to be very similar
    Option 1 - Give google a bunch of context, hope that it's good and then run queries against it
    Option 2 - index my context and add information as needed using RAG
    The RAG approach would lead to more tokens being used but at least I know how it works so can set my expectations for how it works. The google approach would be cheaper but I don't know how the context has been processed. I cant intentionally format my data for optimal performance.

  • @GamingClubGermany
    @GamingClubGermany 14 วันที่ผ่านมา

    First of thanks a lot for the video! but why is your voice/the noise so wobbly/? Are you using a TTS model or something like this?
    update: Okay i dont know what you use but its pretty awesome! do you mind sharing infos on what voice "thing" you use?

  • @micbab-vg2mu
    @micbab-vg2mu หลายเดือนก่อน

    Great:)

  • @gen_ai_explorer
    @gen_ai_explorer 22 วันที่ผ่านมา

    How this will benefit? As we can store the information in vector db and use only the relevant chunks at a time right? how does the google caching will help us?

  • @Maisonier
    @Maisonier หลายเดือนก่อน

    I'd love to know how to do that with openwebui and a local model in a single GPU. Do we need to use FAISS or what RAG?

  • @IdPreferNot1
    @IdPreferNot1 หลายเดือนก่อน

    A great video example of this would be demonstrating processing a repo or API docs to help with programming code where the libraries have significantly changed since cutoff date. I still cant believe that gpt-4o cant get the endpoints and structure right for coding with its own current OAI API when you ask it to build code to work with the gpt.

  • @RD-learning-today
    @RD-learning-today หลายเดือนก่อน

    how to use it in Vertex-AI ?

  • @WillJohnston-wg9ew
    @WillJohnston-wg9ew หลายเดือนก่อน

    Any thoughts on how this would apply to real time video? I am trying to create something that does real-time video sentiment analysis.

    • @samwitteveenai
      @samwitteveenai หลายเดือนก่อน

      Real time probably wouldn't work just yet. There are some hacks/techniques they have to do it realtime with Project Astra etc, but I am not sure when that will be available to us externally.

  • @RD-learning-today
    @RD-learning-today หลายเดือนก่อน

    can i use it with vertex ai?

  • @guanjwcn
    @guanjwcn หลายเดือนก่อน

    Thank you, Sam!! Does llama3 have this too?

    • @samwitteveenai
      @samwitteveenai หลายเดือนก่อน

      I think if you served it with vLLM you could do it that has prefix caching

  • @SrikanthCSE-mi9jm
    @SrikanthCSE-mi9jm หลายเดือนก่อน +1

    How do i use it with langchain?

    • @samwitteveenai
      @samwitteveenai หลายเดือนก่อน

      I am not sure if they support this yet or not. my guess is the google-langchain package and vertex langchain package will have to give it support.

  • @johnrperry5897
    @johnrperry5897 15 วันที่ผ่านมา

    Wait what is cayche

  • @MistikBBQ
    @MistikBBQ หลายเดือนก่อน

    Any way of doing this locally with something like ollama? This would actually be amazing to use in some local/edge case

    • @samwitteveenai
      @samwitteveenai หลายเดือนก่อน +1

      you could do it with a local model via vLLM. AFAIK currently not possible in Ollama, but they certainly could add it

    • @MistikBBQ
      @MistikBBQ หลายเดือนก่อน

      @@samwitteveenai Thanks a lot for the reply!

  • @TheRcfrias
    @TheRcfrias หลายเดือนก่อน

    I thought this video was about caching client side to avoid passing around huge payloads for function calling and so 😢

  • @mrnakomoto7241
    @mrnakomoto7241 หลายเดือนก่อน +1

    aussie accent living in usa whats going on

    • @samwitteveenai
      @samwitteveenai หลายเดือนก่อน +1

      actually back living in Singapore for the time being 😀

    • @mrnakomoto7241
      @mrnakomoto7241 หลายเดือนก่อน

      @@samwitteveenai out of curiosity Do you own a house in every country you go?

  • @ahmaddajani3639
    @ahmaddajani3639 หลายเดือนก่อน

    why use context like this instead of using vector store and chunk the content?

    • @SwapperTheFirst
      @SwapperTheFirst หลายเดือนก่อน +1

      you will not get this with a vector store and RAG. Here you have all content availabe for the model.

    • @eddiehaug
      @eddiehaug หลายเดือนก่อน

      Because depending on the use case, you may wanna use one technique vs the other. Adding all the info as context to an LLM is not the same as using RAG where your results may vary greatly depending on the chunk size, the ranking engine, etc.

    • @ahmaddajani3639
      @ahmaddajani3639 หลายเดือนก่อน

      @@eddiehaug Yes correct it depends on the use case, but if you want to save money in case of question answering, RAG is better.

    • @eddiehaug
      @eddiehaug หลายเดือนก่อน

      @@ahmaddajani3639 - yes, agree 👍

  • @vicovico
    @vicovico หลายเดือนก่อน +2

    What's going on with the pronunciation of "cached"?

    • @ScottVanKirk
      @ScottVanKirk หลายเดือนก่อน +1

      It is pronounced Kash. The e is vestigial like our appendix😁

    • @jamiek2039
      @jamiek2039 หลายเดือนก่อน

      😂

    • @matthewwalker7063
      @matthewwalker7063 หลายเดือนก่อน +1

      Engagement baiting

    • @samwitteveenai
      @samwitteveenai หลายเดือนก่อน +1

      lol I was waiting for someone to say something 😀

    • @ariganeri
      @ariganeri หลายเดือนก่อน

      It's what Aussies call English.

  • @otty4000
    @otty4000 หลายเดือนก่อน

    functionly isnt this quick similar to notebookllm

    • @samwitteveenai
      @samwitteveenai หลายเดือนก่อน

      no this is more than just upload the docs/video etc it is having a lot of the values precomputed in the model