How to Improve LLMs with RAG (Overview + Python Code)

Shaw Talebi

มุมมอง 97 159

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 1 ก.พ. 2025

ความคิดเห็น • 98

@ShawhinTalebi 10 หลายเดือนก่อน ⁺¹²
👉More on LLMs: th-cam.com/play/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0.html
--
References
[1] github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb
[2] th-cam.com/video/efbn-3tPI_M/w-d-xo.html
[3] docs.llamaindex.ai/en/stable/understanding/loading/loading.html
[4] th-cam.com/video/Zj5RCweUHIk/w-d-xo.html
@saadowain3511 10 หลายเดือนก่อน ⁺⁹
Thank you Talebi. No one explains the subject like you
@ShawhinTalebi 10 หลายเดือนก่อน
Thanks :) Glad it was clear!
@abhijitroy8931 6 หลายเดือนก่อน
Agreed
@TonyCerone 6 วันที่ผ่านมา
Thank you Shaw. I like your clear and very powerfull approach : your video is a TOP Video for any Data Scientist I think 🙂
Take care !
@breezer317 12 วันที่ผ่านมา
Just subscribed, content is awesome and your delivery is great , thank you
@SathishkumarKarunamoorthy 4 หลายเดือนก่อน ⁺¹
That was a very clear and concise explanation. I am learning Data Science and find this very useful in understanding RAG. Thankyou so much
@aybmab2 3 หลายเดือนก่อน
Wow. This was such an amazing explanation of the topic. I know very little of LLMs, but understood this very clearly. Thank you!
@juliamadeleineheinze2932 4 หลายเดือนก่อน
This is a great video, super helpful, thank you so much! Also love the helpful links you provide in the description. Honestly, great content, I'm glad I found your videos, going to watch some more now :D
@ShawhinTalebi 3 หลายเดือนก่อน
Thanks for the great feedback :)
@ifycadeau 10 หลายเดือนก่อน
This is so helpful! Thanks Shaw, you never miss!
@ShawhinTalebi 10 หลายเดือนก่อน
Glad it was helpful!
@ericm7572 หลายเดือนก่อน
That was killer Shaw! You are a damn fine teacher.
@ShawhinTalebi หลายเดือนก่อน
Glad it was clear :)
@halle845 10 หลายเดือนก่อน
Thanks!
@ShawhinTalebi 10 หลายเดือนก่อน
Thank you! Glad it was helpful 😁
@inishkohli273 5 หลายเดือนก่อน ⁺¹
Finally completed, thank you so much for this content, waiting for the agents video
@arriyad1arriyad649 หลายเดือนก่อน
Congratulations for this clear explanation!
@beemdude2 4 หลายเดือนก่อน ⁺¹
Super nice ! Great practical content
@Anonymous-or5hq 3 หลายเดือนก่อน
Good work Shaw, appreciate it
@biomedicalit 6 หลายเดือนก่อน
Nice lecture, very informative! I didn't watch the video related to fat tails, but I noticed N.N. Taleb's influence, my favorite author. :-)
@ShawhinTalebi 6 หลายเดือนก่อน
Same here! I actually did his summer school recently: medium.com/the-data-entrepreneurs/i-spent-2-995-on-nassim-talebs-risk-taking-course-here-s-what-i-learned-c442a55a2c64
@e-Course. 5 หลายเดือนก่อน
Thank you for the clear, visually appealing, and easy-to-understand information.
@zahrahameed4098 9 หลายเดือนก่อน
Thankyou so much. Becoming a fan of yours!
Please do a video on Rag with llamaIndex + llama3 if it's free and not paid.
@ShawhinTalebi 8 หลายเดือนก่อน ⁺¹
Great suggestion. That's a good excuse to try out Llama3 :)
@firespark804 10 หลายเดือนก่อน
Awesome video, thanks! I'm wondering if instead of using top_k documents/batches one could define a threshold/distance for the used batches?
@bangarrajumuppidu8354 9 หลายเดือนก่อน
superb explanation Shaw !😍
@muhammadahmed5732 2 หลายเดือนก่อน ⁺¹
Can you do a general cost analysis b/w a fime tuned model vs base model with RAG? Also you should check it from base model with RAG so its impact can be seen more. You can still have your response style specified in system prompt
@ShawhinTalebi 2 หลายเดือนก่อน
At inference a fine-tuned model and its corresponding base model will have equivalent costs. The key cost difference will come from the fine-tuning process. However, the fine-tuning may be negligible depending on the use case.
@AlusineBarrie 6 หลายเดือนก่อน
Thank you for the valuable content - clear, concise
@nistelbergerkurt5309 10 หลายเดือนก่อน
great video as always 👍
does a reranker improve the quality of the output for a RAG approach? like that we could take the output directly from the reranker, right? or what is your experience with reranker?
@ShawhinTalebi 10 หลายเดือนก่อน ⁺¹
Great questions! That's the idea. A reranker is typically applied to the top-k (say k=25) search results to further refine the chunks. The reason you wouldn't use a reranker directly on the entire knowledge base is because it is (much) more computationally expense than the text embedding-based search described here. I've haven't used a reranker in any use case, but it seems to be most beneficial when working with a large knowledge base.
This video may be helpful: th-cam.com/video/Uh9bYiVrW_s/w-d-xo.html&ab_channel=JamesBriggs
@Pythonology 10 หลายเดือนก่อน
Happy Nowruz, kheyli khoob! Question: how would you propose to evaluate a document on the basis of certain guidelines? I mean, to see how far it complies with the guidelines or regulations for writing a certain document. Is RAG any good? shall we just embed the guidelines in the prompt right before the writing? or shall we store the guidelines as a separate document and do RAG? Or ...?
@ShawhinTalebi 10 หลายเดือนก่อน ⁺¹
Happy New Year!
That's a good question. It sounds like you want the model to evaluate a given document based on some set of guidelines. If the guidelines are static, you can fix them into the prompt. However, if you want the guidelines to be dynamic, you can house them in a database which is dynamically integrated into the prompt based on the user's input.
@nataliatenoriomaia1635 3 หลายเดือนก่อน
Great video! Thanks for sharing
@yarinshaked 2 หลายเดือนก่อน
Hey!
Thanks a lot for the great detailed content :)
Why did you choose to use a fine-tuned model and not just the base model? Does applying RAG without fine-tuning also work? I guess it depends on the case, but just out of curiosity
@ShawhinTalebi 2 หลายเดือนก่อน
Great question! I used a fine-tuned model here so that the model would respond in my likeness. One could also the base model and it would work well (only the response style would change).
@michelleming6465 4 หลายเดือนก่อน
Thank you for the useful content. Where could we find the exemplar code for soft prompt and prefix tuning as shown in your video? 😊
@ShawhinTalebi 4 หลายเดือนก่อน
Example code is available here: github.com/ShawhinT/TH-cam-Blog/blob/main/LLMs/rag/rag_example.ipynb
@Allin1UniversalChannel 5 หลายเดือนก่อน
Great content! One question, please: in your example, is everything local and private, or does the data leave your execution environment?
@ShawhinTalebi 5 หลายเดือนก่อน ⁺¹
This example code doesn't make any external API calls so it can run entirely locally given. However, I ran it on Google Colab since the quantized model I used cannot run on Mac (I only have Apple machines)
@interess95 5 หลายเดือนก่อน
Awesome, thank you for the video!
@PayneMaximus 4 หลายเดือนก่อน ⁺²
Is it possible to select a "dynamic" chunk size?
I want to be able to separate documents in chunks that are of varying sizes; this is because I want to chunk specific sections in the documents that have varying sizes.
@ShawhinTalebi 4 หลายเดือนก่อน
Yes definitely! Chunking docs in this way can lead to better performance than blindly chunking across sections.
@florisbokx 2 หลายเดือนก่อน
Very helpful, thanks! I found that getting an answer to a prompt takes quite a long time though (2-3 minutes), also using the T4 GPU from Colab. Is there a way to reduce this?
@ShawhinTalebi 2 หลายเดือนก่อน
Good question! There are a few ways. A simple one is to reduce the chunk size. Alternatively, you could try a smaller LLM.
@deadlyecho 9 หลายเดือนก่อน
Very good explanation 👏 👌
@adityak2710 5 วันที่ผ่านมา
What if i have data collected in json format.
How should i proceed with creating chunks of the data?
Please reply
@jagtapjaidip8 8 หลายเดือนก่อน
very nice. thank you for explaining in details.
@drewgranieri3629 4 หลายเดือนก่อน
Thanks for the content!
Quick question:
when setting up the knowledge base in your example code, you process the medium articles to not include specific chunks. how much of a difference does this actually make in your output? I only bring this question up because lets say you were going to use RAG to make an LLM application where the inputted documents do not follow same concrete structure of medium articles. it would then be pretty challenging to identify all the useless chunks you would not like to include right? do those embeddings make a significant difference in the quality of your output?
@ShawhinTalebi 4 หลายเดือนก่อน
Great question. How you chunk documents can make a big difference in the quality of your RAG system. Doing this right will require data exploration so you can define a pre-processing strategy for your specific use case. I often find that this isn't as challenging as it might seem at the outset.
@vamsitharunkumarsunku4583 10 หลายเดือนก่อน ⁺²
So we get top 3 similar chunks from RAG right, We are adding 3 chunks to prompt template?
@ShawhinTalebi 10 หลายเดือนก่อน ⁺²
Yes exactly!
@TheLordSocke 9 หลายเดือนก่อน
Nice Video, any ideas for doing this on PowerPoints? Want to build a kind of knowledge base from previous projects but the grafics are a problem. Even GPT4V is not always interpreting them correctly. 😢
@ShawhinTalebi 8 หลายเดือนก่อน
If GPT4V is having issues you may need to either 1) wait for better models to come out or 2) parse the knowledge from the PPT slides in a more clever way.
Feel free to book office hours if you want to dig into it a bit more: calendly.com/shawhintalebi/office-hours
@candidlyvivian 7 หลายเดือนก่อน
Hey Shaw, thanks so much for such a helpful video.
I''d love to seek your advice on something :)
Currently we are using OpenAI to build out a bunch of insights that will be refreshed using business data (i.e. X users land on your page, Y make a purchase)
Right now we are doing a lot of data preparation and feeding in the specific numbers into the user/system prompt before passing to OpenAI but have had issues with consistency of output and incorrect numbers.
Would you recommend a fine-tuning approach for this? Or RAG? Or would the context itself be small enough to fit into the "context window" given it's a very small dataset we are adding to the prompt.
Thanks in advance 🙂
@ShawhinTalebi 7 หลายเดือนก่อน
Glad it was helpful! Based on the info provided here, it sounds like a RAG system would make the most sense. More specifically, you could connect your data preparation pipeline to a database which would dynamically inject the specific numbers into the user/system prompt.
If you have more questions, feel free to email me here: www.shawhintalebi.com/contact
@examore-lite 10 หลายเดือนก่อน
Thank you very much!
@edsleite 8 หลายเดือนก่อน
Hi Talebi. Thanks for all you show us. But one question : I did your code with mine database, without the fine tuning and it works, very quickly answers but poor contents. That is the point of fine tuning make better answers ?
@ShawhinTalebi 8 หลายเดือนก่อน
It sounds like you may need to do some additional optimizations to improve your system. I discuss some finer points here: towardsdatascience.com/how-to-improve-llms-with-rag-abdc132f76ac?sk=d8d8ecfb1f6223539a54604c8f93d573#bf88
@lplp6961 10 หลายเดือนก่อน
good work!
@BookshelfJourney 3 หลายเดือนก่อน
Well explained
@peymannaji 3 หลายเดือนก่อน
I found good content here, are you Iranien ?
@ShawhinTalebi 3 หลายเดือนก่อน ⁺¹
Thank you! Yes I am :)
@TheRcfrias 9 หลายเดือนก่อน
Rag is great for semi-static or static content as knowledge base, but which path do you use for dynamic, time-relevant data like current sales from a database?
@ShawhinTalebi 9 หลายเดือนก่อน
That's a great question. The short answer is RAG can handle this sort of data (at least in principle). The longer answer involves taking a step back and asking oneself "why do I want to use RAG/LLMs/AI for this use case?" This helps get to the root of the problem you are trying to solve and hopefully give more clarity about potential solutions.
@TheRcfrias 9 หลายเดือนก่อน
@@ShawhinTalebi Its a common use case at work to know how sales have been improving during the current day or week. It would be nice to know how to link the LLM with the corporate database for current information and reporting.
@michaelpihosh5904 10 หลายเดือนก่อน
Thanks Shaw!
@ariel-dev 10 หลายเดือนก่อน
Really great
@yasinfadaei2278 24 วันที่ผ่านมา
I tried to run the code but it gives this error "ValueError: Directory articles does not exist.". What should I do?
@ShawhinTalebi 20 วันที่ผ่านมา
Make sure the articles folder is in the same path as the code you are running!
@parkerblake-l6e 5 หลายเดือนก่อน
Can we connect this with a Rasa chatbot? I'm building a Rasa chatbot to ask customized questions from users and provide output according to their responses. Can I integrate this model with my chatbot?
@ShawhinTalebi 5 หลายเดือนก่อน ⁺¹
While I haven't used rasa before, it seems they support RAG. rasa.com/docs/rasa-pro/building-assistants/chat-with-your-docs/
@orvirt8385 8 หลายเดือนก่อน
Great video! What is fat-tailedness?
@ShawhinTalebi 8 หลายเดือนก่อน
😉 th-cam.com/video/Wcqt49dXtm8/w-d-xo.htmlsi=E_R7A7IrkbAUVaOs
@halle845 10 หลายเดือนก่อน
Any recommendations or experience on which embeddings database to use?
@ShawhinTalebi 10 หลายเดือนก่อน
Good question! Performance of embedding models will vary by domain, so some experimentation is always required. However, I've found the following 2 resources helpful as a starting place.
HF Leaderboard: huggingface.co/spaces/mteb/leaderboard
SentenceTransformers: www.sbert.net/docs/pretrained_models.html
@tomasbusse2410 9 หลายเดือนก่อน
Very useful indeed
@nirmalhasantha986 7 หลายเดือนก่อน
Thank you so much sir :)
@Blogservice-Fuerth 10 หลายเดือนก่อน
Great 🙏
@ridg2806 8 หลายเดือนก่อน
Solid video
@CppExpedition 8 หลายเดือนก่อน
what do you mean with 'not to scale?' isn't the book at the size of the earth?
@ShawhinTalebi 7 หลายเดือนก่อน
LOL 😂
@arshakcms 6 หลายเดือนก่อน
8:20 Large Language Models Only Understands Text? They Can Recognize images and all right?
@ShawhinTalebi 6 หลายเดือนก่อน
Great question. Language models only understand language (text). However, we see products like ChatGPT and Claude handle them just fine.
There are two ways to do this.
1) Pass image to img-to-text then pass it to a language model
2) Create a multi-modal model (e.g. GPT-4o) which can take text, images, and audio as input
@jjen9595 10 หลายเดือนก่อน
hello, do you have a video showing how to make a datasett and upload it to huggind face?
@ShawhinTalebi 10 หลายเดือนก่อน
Not currently, but the code to do that is available on GitHub: github.com/ShawhinT/TH-cam-Blog/blob/main/LLMs/qlora/create-dataset.ipynb
@raviyadav2552 7 หลายเดือนก่อน
super helpfull
@susdoge3767 7 หลายเดือนก่อน
great channel subbscribed!
@JavierTorres-st7gt 7 หลายเดือนก่อน
How to protect a company's information with this technology?
@ShawhinTalebi 7 หลายเดือนก่อน
Great question! One can approach data security with RAG in the same way as other contexts. In other words, you can set up a permissions layer so that the LLM can only access information consistent the user's permissions.
@scubasquadsteve 4 หลายเดือนก่อน
tailEDEDness! can't unhear it 18:44
@ShawhinTalebi 4 หลายเดือนก่อน
LOL made up words can be hard to pronounce 😂
@yameen3448 8 หลายเดือนก่อน ⁺²
Vector retrieval is quite shite. Trust me. To improve accuracy of retrieval, you need to use multiple methods.

ต่อไป

เล่นอัตโนมัติ

LoRA - Low-rank Adaption of AI Large Language Models: LoRA and QLoRA Explained Simply