Try this Before RAG. This New Approach Could Save You Thousands!

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 ต.ค. 2024

ความคิดเห็น • 55

  • @IntellectCorner
    @IntellectCorner 2 หลายเดือนก่อน +2

    *𝓣𝓲𝓶𝓮𝓼𝓽𝓪𝓶𝓹𝓼 𝓫𝔂 𝓘𝓷𝓽𝓮𝓵𝓵𝓮𝓬𝓽𝓒𝓸𝓻𝓷𝓮𝓻*
    0:00 - Introduction to Prompt Caching with Claude and Gemini API
    1:12 - Gemini API Capabilities and PDF Processing
    2:45 - Considerations When Uploading PDFs to Gemini API
    4:40 - Comparison of Gemini API with RAG Pipelines
    6:46 - Processing PDF Files with and without Context Caching
    9:50 - Testing Gemini API's Multimodal Capabilities
    15:29 - Using Context Caching with Gemini API
    18:31 - Conclusion and Practical Use Cases for Context Caching vs. RAG

  • @unclecode
    @unclecode 2 หลายเดือนก่อน +2

    Amazing combo you introduced in this video, truly impressive! Honestly, it feels like we've finally cracked one of the century's biggest challenges: understanding PDFs. We all know how ugly PDFs can be, but now this solution is unbelievable.
    I’ve got this thought on RAG though, within a single document, it doesn’t seem to make much sense anymore. But for a collection of documents? Absolutely. Imagine we’ve got 100 or even thousands of PDFs. We could create short summaries for each one, and store those in a vector database. When a user asks a question, we could use an embedding model to identify which documents to load into something like Gemini’s context caching.
    This approach would make RAG more about indexing and directing the system to the right documents. So, even though chunking might become less important, all the other strategies still hold.
    This could be a great topic for your next video, handling thousands of PDFs, creating summaries, building a vector database, and then using RAG to select relevant content, which you then pass to Gemini to cache and then process user query.

    • @engineerprompt
      @engineerprompt  2 หลายเดือนก่อน +5

      that's actually a great idea and more practical for large corpus of docs. I will definitely look into it. It will be a fun weekend project and agree can be a very valuable video. Thanks for this great idea!

    • @unclecode
      @unclecode 2 หลายเดือนก่อน

      Then I wait for the coming Monday haha

  • @LikhithV02
    @LikhithV02 2 หลายเดือนก่อน

    Great work! This is what I needed.

  • @turbo2ltr
    @turbo2ltr หลายเดือนก่อน

    Would this work for asking gemini to write code using a private COM interface by passing the COM documentation via context caching? I've been trying to do this with a custom GPT and have not been able to get it working very well, mostly because of limits on the knowledge files for GPTs.

  • @uwegenosdude
    @uwegenosdude 2 หลายเดือนก่อน

    Thanks for the very interesting video. Would it be enough to store the name of the cache (name='cachedContents/hash-value',) to be able to use it later for the next request to my bot?

  • @RickySupriyadi
    @RickySupriyadi 2 หลายเดือนก่อน +1

    so... in RAG data are stored in ssd/hdd
    meanwhile context caching data are stored in RAM?

  • @CryptoMaN_Rahul
    @CryptoMaN_Rahul 2 หลายเดือนก่อน

    Hey I wanted to generate ai size chart generator for apparel sellers
    1) it should cluster similar users with body measurements and successful purchased/ return data
    Recommend size to seller
    Possible with RAG. Or will have to use ML?

    • @engineerprompt
      @engineerprompt  2 หลายเดือนก่อน

      You can do it with RAG or topic cluster.

  • @darkmatter9583
    @darkmatter9583 2 หลายเดือนก่อน

    i i want to upload tons of .txt or .json files and upload it, there is around 5gb (text only) for a specific field the information data is on github to download what can i do?

    • @engineerprompt
      @engineerprompt  2 หลายเดือนก่อน +1

      @unclecode mentioned a really good idea. Create summary for each one of them and do normal RAG on the summaries. This will only return the documents you think are more relevant. Then send it to Gemini or other long context models.

  • @saxtant
    @saxtant 2 หลายเดือนก่อน +1

    Why not use results caching at home, i don't waste calls to paid services.

    • @engineerprompt
      @engineerprompt  2 หลายเดือนก่อน +1

      that's a good idea, you could do it. Might cover it in next video.

    • @Kaalkian
      @Kaalkian 2 หลายเดือนก่อน +1

      is this there documentation for this?

  • @NoidoDev
    @NoidoDev 2 หลายเดือนก่อน

    APIs are okay for some assistants, but I want to run things locally.

    • @engineerprompt
      @engineerprompt  2 หลายเดือนก่อน +1

      check out the localgpt project.

  • @yesweet
    @yesweet 2 หลายเดือนก่อน

    You didn't mention this time the lifetime of cache, why?

    • @engineerprompt
      @engineerprompt  2 หลายเดือนก่อน

      I have a dedicated video on context caching with Gemini and covered it in a-lot more details. th-cam.com/video/KvwJtleXCtU/w-d-xo.html

  • @wtcbretburstjk3726
    @wtcbretburstjk3726 2 หลายเดือนก่อน +1

    RIP RAG. 2021-2024

  • @ahmadzobairsurosh8966
    @ahmadzobairsurosh8966 2 หลายเดือนก่อน

    Thanks for effort! appreciate it.
    @Prompt Engineering

  • @darkmatter9583
    @darkmatter9583 2 หลายเดือนก่อน

    min 4 how is that website called?

    • @engineerprompt
      @engineerprompt  2 หลายเดือนก่อน

      its called excalidraw.com

  • @darkmatter9583
    @darkmatter9583 2 หลายเดือนก่อน

    please do the workflow to automate leetcode to solve with claude api 9usd and i will pay you patreon the 9usd back but i want to learn, as the guy who solved 600 problems in 24h

  • @NLPprompter
    @NLPprompter 2 หลายเดือนก่อน

    insider said, 10 million context length is possible.

    • @oryxchannel
      @oryxchannel 2 หลายเดือนก่อน +1

      aka Eric Schmidt

    • @NLPprompter
      @NLPprompter 2 หลายเดือนก่อน

      @@oryxchannel ha, spot on! nothing can hide from internet transparencies... they try to take down the video but it become more viral, no wonder many powerful politician wanted censorship and "alignments"

  • @ahishverma181
    @ahishverma181 2 หลายเดือนก่อน

    can i get your linkedin

  • @imaspacecreature
    @imaspacecreature 2 หลายเดือนก่อน +13

    I'm hating all of this API talk. People really think using API through mega company artificial intelligence is a good idea? Surely they are using the data exchanged at API as material for their artificial intelligence. People really need to build their own from scratch. Most people can't build their own home and car, but fortunately and unfortunately, artificial intelligence needs to be user developed. It's the safest way and prevents any doomsday scenarios.

    • @washedtoohot
      @washedtoohot 2 หลายเดือนก่อน

      Eh, it’s both. For some use cases it makes sense to use the mega corporation API. For other cases not.

    • @two-jay3475
      @two-jay3475 2 หลายเดือนก่อน

      If you can solve the Fire On Hair problems of your customers in a reasonable cost, Why not?

    • @mysticlunala8020
      @mysticlunala8020 2 หลายเดือนก่อน +1

      If by saying creating our own, you mean using models locally, then yeah, that's safe, specially in industries where some level of confidential data needs to be provided to the GenAI. Other than that, for users, who want answers to simple questions and daily basic use, no, no need to use model.

    • @two-jay3475
      @two-jay3475 2 หลายเดือนก่อน +1

      I think that you should talk to us like that after giving a tone of GPUs to train a LLM

    • @i2c_jason
      @i2c_jason 2 หลายเดือนก่อน

      I was just thinking the same thing! It's "free" because they're absorbing gobs of new training data and probably have carte blanche to do whatever. BUT - I'll take the API calls if my users want the end result and are ok with copies of their data going into the abyss.

  • @ivanlaws622
    @ivanlaws622 2 หลายเดือนก่อน

    What Language is being spoken?

    • @Lemure_Noah
      @Lemure_Noah 2 หลายเดือนก่อน +1

      Welsh.