It’d be excellent if you could test gpt4o and Flash against your RAG and show the results like you did in this video. That would be a nice demonstration of different capabilities and results of course with the use of local LLM
In scientific papers tables are usually in text format. Latex just uses fancy formatting of text to make tables, so table content extraction is not test of visual capabilities of a model.
One Q that I missed: when making API calls to our pdf, does our private data become publicly available in any way? Another amazing vid. Really appreciate all the work you put into making great content.
For free api, Google does say, they can use it for training. For paid api, that doesn't seem to be case. Now just like the other api providers, really it's on your own comfort level and how much you trust their words :)
Hi, can you do a video on this: In a typical AI workflow, you might pass the same input tokens over and over to a model. Using the Gemini API context caching feature, you can pass some content to the model once, cache the input tokens, and then refer to the cached tokens for subsequent requests. At certain volumes, using cached tokens is lower cost than passing in the same corpus of tokens repeatedly.
context catching needs 30k tokens minimum which is useless as most context comes generally under 30k, It might work for very very long codebases or 10 -15 long sized pdfs
here are some considerations to use rag with this llm: - large coupous - token cost . I believe one can bring the total token cost down to an order magnitude. say directly using pdfs taken 30k tokens on average. doing same with rag will cost on average 2k. this was heuristic for about a 15-20 page 1 pdf.
Thanks for your videos and course. You said at the beginning Gemini 1.5 was only good for small docs what would you recommend for a large corpus of multi-modal PDF requirements? Would an agentic approach work to breakup the PDFs into buckets and a single agent to combine responses?
you would need to utilize more traditional approaches : chunking, indexing, retrieval (parent document retrieval is a good approach). It is ALLOT of work to make this in production (trust me we know !) so you have to love it ha
Impressive model. Thank you for the video. I think the main benefit from classic RAG so far for me has been citations and clear sourcing (where the llm can return which page it is using for information). How well does Gemini Flash return this kind of info?
i wanted to build a previous year paper analysis system for my colllege ( engineering ) , there are total 7 departments , all subjects come upto 7*6*8. Can you just guide fine tuning or Rag ??
I don't like using libraries to parse my PDF files. I found it to be more complex and less robust than writing the parsing services myself. I will defintely give flash a try though.
This review is basically pointless. Youre running it on one pdf. The whole pdf can easily be dumped into the context (oai default is 20 x 1000 token chunk). You should be doing it on much larger datasets
RAG in general has been slowly dying as context increases are combined with cost decrease. On top of that, folk are getting better at compression and database use (LLMs understand SQL, etc), and agentic flows. The speed loss and cost to maintain a vector database, just isnt always worth it when I can simply task a flow itself for semantic search and feed it to whatever needs it.
RAG is not dying. It merely depends on the use-case. It was even mentioned several times in this video where this is not a replacement for RAG where there is a large corpus of information (millions of docs). It certainly is evolving however, and quite rapidly. I would love to get to the point where I can avoid having to parse pdfs and documents completely, and just feed docs to a vision model & have that the chunks stored directly in a db. But getting rid of RAG completely? Nah. Not yet. I would say RAG would only go away if there's some way where model training reaches a point you can just throw docs at it and rather than feeding them into a vector db, you can feed docs directly into the llm itself.
Check out the RAG Beyond Basics Course: prompt-s-site.thinkific.com/courses/rag
It’d be excellent if you could test gpt4o and Flash against your RAG and show the results like you did in this video. That would be a nice demonstration of different capabilities and results of course with the use of local LLM
Yes!
That would be great
In scientific papers tables are usually in text format. Latex just uses fancy formatting of text to make tables, so table content extraction is not test of visual capabilities of a model.
Thanks
Thank you 😊
One Q that I missed: when making API calls to our pdf, does our private data become publicly available in any way? Another amazing vid. Really appreciate all the work you put into making great content.
For free api, Google does say, they can use it for training. For paid api, that doesn't seem to be case. Now just like the other api providers, really it's on your own comfort level and how much you trust their words :)
What if Gemma 2 is also able to do this. How could we test this?
Hi, can you do a video on this:
In a typical AI workflow, you might pass the same input tokens over and over to a model. Using the Gemini API context caching feature, you can pass some content to the model once, cache the input tokens, and then refer to the cached tokens for subsequent requests. At certain volumes, using cached tokens is lower cost than passing in the same corpus of tokens repeatedly.
context catching needs 30k tokens minimum which is useless as most context comes generally under 30k, It might work for very very long codebases or 10 -15 long sized pdfs
Hi. Can u show us how to get to the UI ?
here are some considerations to use rag with this llm: - large coupous - token cost . I believe one can bring the total token cost down to an order magnitude. say directly using pdfs taken 30k tokens on average. doing same with rag will cost on average 2k. this was heuristic for about a 15-20 page 1 pdf.
Thanks for your videos and course. You said at the beginning Gemini 1.5 was only good for small docs what would you recommend for a large corpus of multi-modal PDF requirements? Would an agentic approach work to breakup the PDFs into buckets and a single agent to combine responses?
you would need to utilize more traditional approaches : chunking, indexing, retrieval (parent document retrieval is a good approach). It is ALLOT of work to make this in production (trust me we know !) so you have to love it ha
Impressive model. Thank you for the video.
I think the main benefit from classic RAG so far for me has been citations and clear sourcing (where the llm can return which page it is using for information). How well does Gemini Flash return this kind of info?
I haven't tested it on multiple files yet but I suspect that should be possible. I will put together a new tutorial on it when I get a chance.
What about using Gemini Flash to parse the PDFs into markdown and optimally structure it for LLMs and then embedding for RAG?
Pursuing this idea
@@wesleymogaka report back once you do it. Maybe send the TH-camr a link so he can also review it and give you some exposure
your Colab link doesn't work. It doesn't open
love the meta paper choice to scan
thank you so much for this video
i wanted to build a previous year paper analysis system for my colllege ( engineering ) , there are total 7 departments , all subjects come upto 7*6*8. Can you just guide fine tuning or Rag ??
For this, my recommendation will be to use RAG for it.
Cool thanks @@engineerprompt
Why testing Gemini flash? Does Gemini Pro not work better?
Pro is better but has more limitations for free usage.
Great video.
thank you!
great i will test it -:)
Let me know how it goes
small number of pdf means how many? whats ur assumption?
As long as they fit in the context, which is 1M, although I would suggest using about 50-70% of that. Using more can result in lost in the middle
I don't like using libraries to parse my PDF files. I found it to be more complex and less robust than writing the parsing services myself. I will defintely give flash a try though.
Agree, its worth a shot.
Is there demand of rag in the market ?
RAG is the only real application of GenAI at the moment that businesses are actually widely using.
Please run any ad compaign for your channel as your channel has the potential to get 500k subscribes in a hour.
This review is basically pointless. Youre running it on one pdf. The whole pdf can easily be dumped into the context (oai default is 20 x 1000 token chunk). You should be doing it on much larger datasets
gemini 1.5 pro also has this new feature i think
Yes, it does. Its relatively more expensive though if you put it in production.
RAG in general has been slowly dying as context increases are combined with cost decrease. On top of that, folk are getting better at compression and database use (LLMs understand SQL, etc), and agentic flows.
The speed loss and cost to maintain a vector database, just isnt always worth it when I can simply task a flow itself for semantic search and feed it to whatever needs it.
RAG is not dying. It merely depends on the use-case. It was even mentioned several times in this video where this is not a replacement for RAG where there is a large corpus of information (millions of docs). It certainly is evolving however, and quite rapidly. I would love to get to the point where I can avoid having to parse pdfs and documents completely, and just feed docs to a vision model & have that the chunks stored directly in a db. But getting rid of RAG completely? Nah. Not yet. I would say RAG would only go away if there's some way where model training reaches a point you can just throw docs at it and rather than feeding them into a vector db, you can feed docs directly into the llm itself.
Why would u want to pay for cloud GPT !?!? Do it yourself.
checkout localgpt for that :)
As usual I will wait for third parties to verify which google's claims are real and which are just another scam.
NO ITS SLOW