RAG with

AI Makerspace

มุมมอง 10 777

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 ธ.ค. 2024

ความคิดเห็น • 43

@AI-Makerspace 10 หลายเดือนก่อน ⁺⁴
Google Colab Notebook: colab.research.google.com/drive/1C1Epju1lVkXTQi2jBq1njrOrmkfg0eQS?usp=sharing
Slides: www.canva.com/design/DAF8HpUMTLQ/HKyd4ajjIgCR2Y5tjyT8Eg/edit?DAF8HpUMTLQ&
@monajam8722 9 หลายเดือนก่อน ⁺¹
These material help us understand better before our class. I am glad I found you guys
@SanjayRoy-vz5ih 10 หลายเดือนก่อน ⁺¹
You guys are the best when it comes to intricacies of working with LLM. You presentation are simple and your explanations are cool and very simple to understand
@csmac3144a 10 หลายเดือนก่อน ⁺¹
Great presentation as always guys! Small note: those of us in the hearing-impaired community are very grateful to folks like Jithin who invest in a decent mic setup. People with normal hearing generally don't see a lot of difference between a macbook mic and a dedicated one, but when your hearing is marginal it can make the difference between intelligible speech and mush that's hard for us to process. Thanks!
@AI-Makerspace 10 หลายเดือนก่อน
Glad to hear that we're coming through loud and clear @csmac3144a!
@AI_by_AI_007 10 หลายเดือนก่อน ⁺¹
Great tool and commitment to serving GenAI Dev community needs ... Thks to all involved
@AI-Makerspace 10 หลายเดือนก่อน
Thanks for the shoutout AI_by_AI! And we agree - the RAGAS team really built a great framework to accelerate RAG app development!
@swakathumamakeshwaran8881 8 หลายเดือนก่อน ⁺¹
Great presentation
@MrTulufan 6 หลายเดือนก่อน ⁺²
RAGAS starts at time 20:15, before which is just an overview of langchain and the RAG QA pipeline
@AI-Makerspace 6 หลายเดือนก่อน
Thanks for the timestamp here MrTulufan!
@kamalyadav4259 10 หลายเดือนก่อน ⁺¹
Thanks for the video. It was really helpful . I was looking for a how we can automate the rag evaluation process .
@alchemication 10 หลายเดือนก่อน ⁺¹
Super nice. Thank you folks! Would be good to also discuss a bit more about the eval data: size, distribution, and how closely should it mimic real data, etc, but awesome stuff nevertheless 🎉
@AI-Makerspace 10 หลายเดือนก่อน
Great feedback! We like the idea of deep-diving on the eval data generation piece 🤔... definitely a piece worth adding as we keep iterating on this content!
@alchemication 10 หลายเดือนก่อน
@@AI-Makerspace For sure! I have learned the hard way that upsampling my data with GPT-4 was great and gave me a highly accurate (according to stats) model, but the generated quality was too good and completely out of the distibution of my messy data, and in production model was not so great ;-/
@kamalyadav4259 9 หลายเดือนก่อน ⁺¹
Hi
I have a use case for text-to-SQL with RAG using LangChain. Is there any example or guide to evaluate the SQL result? Is the metric the same as regular text RAG? Thanks in advance
@MariamAlmutairi-m5t 10 หลายเดือนก่อน ⁺¹
thank you sooo much I Learned a lot from this channel. I did my experiments and I was wondering how can I evaluate the RAG performance ..
@AI-Makerspace 10 หลายเดือนก่อน
You'll want to create a dataset of question/answer/context triplets to evaluate through RAGAS!
@micbab-vg2mu 10 หลายเดือนก่อน ⁺¹
Thank you for the video:)
@MarkDavisRocks หลายเดือนก่อน ⁺¹
at 00:24:15 you give a formula for faithfulness, think it is flawed a bit. Should be (#Claims from the answer which exist in the context) / (#claims in answer). Otherwise there could be >1 result.
@AI-Makerspace หลายเดือนก่อน ⁺¹
Can you be more specific about what the flaw is? Also, why do you choose the word "exist" rather than "inferred from?"
--Here's what appears to be true from the documentation: --
"To calculate this a set of claims from the generated answer is first identified. Then each one of these claims are cross checked with given context to determine if it can be inferred from given context or not."
Three steps to the calculation:
1. Break generated answer into statements
2. For each statement, verify if it can be inferred
3. Calculate Faithfulness!
It seems that the condition "if (and only if) it can be inferred from the context" will keep the faithfulness calculation from going higher than 1.0
@MarkDavisRocks หลายเดือนก่อน
@AI-Makerspace you might be right, but at the point referenced in the video, it talks about the context not the generated answer. So a context like "Paris is a bustling French capital and center of culture and art" could contain 2-3 claims , but the answer to "what is the capital of France" may contain one claim "Paris is the Capital of France". The faithfulness would be 3/1 in that case if they were not related to the golden truth answer. I may be missing something! Great video though, thanks!
@MarkDavisRocks หลายเดือนก่อน ⁺¹
Ah I get it now, duh - the element of the formula "number of claims that can be inferred from the given context" I was reading as the number of claims that can be inferred from the context alone. It's really the number of claims in the generated answer which can be inferred from the given context.
@AI-Makerspace หลายเดือนก่อน
"It's really the number of claims in the generated answer which can be inferred from the given context."
Nice follow-up @Mark! Let's gooo!
We find it helpful to look directly at the prompted examples in the src code here: github.com/explodinggradients/ragas/blob/7d051437a1a5d8e9ad5c42252bf1debf51679140/src/ragas/metrics/_faithfulness.py#L52
You can see how FaithfulnessStatements turn into SentencesSimplified with an example and in general via the instruction given in NLIStatementPrompt as "Your task is to judge the faithfulness of a series of statements based on a given context. For each statement you must return verdict as 1 if the statement can be directly inferred based on the context or 0 if the statement can not be directly inferred based on the context."
@sohailalip.r1626 หลายเดือนก่อน
I am getting OpenAI key error 😢😢😢
@AnkushRathore-s2n 9 หลายเดือนก่อน ⁺¹
how can i perform batch wise ragas evaluation so that my evaluation time will decrease?
@AI-Makerspace 9 หลายเดือนก่อน
I'm not sure that is currently implemented, but I'll update the notebook when it is!
@smklearn-hy9me 6 หลายเดือนก่อน ⁺¹
Does ragas work only on openai model, which model i can use for testset genrator as critic and generative model please help me out
@AI-Makerspace 6 หลายเดือนก่อน
You can use any LLM that has OpenAI API compatibility. This means most closed source models, as well as open source options through certain hosting strategies (NIMs, vLLM, etc)
@smklearn-hy9me 6 หลายเดือนก่อน
@@AI-Makerspace i dont have openai api key
Can i use models from hugging face
@smklearn-hy9me 6 หลายเดือนก่อน
Please help me out
I want to genarate the test set data using models other than hugging face
Like generate model and critic model
@sophiatsasakou3628 10 หลายเดือนก่อน ⁺¹
I am using llamaindex and tried some example code but it doesn't work. Is Ragas integrated with llamaindex? Ragas seems very promising for evaluation and would really like to use it. The error I am getting is in the from ragas.llama_index import evaluate, it cannot find the ragas.llamaindex. I gave up for the time being assuming that Ragas is not integrated with llamaindex?
@AI-Makerspace 10 หลายเดือนก่อน
I beliee they are integrated - but they're making adjustments to their library very often (as is Llama Index), I would submit an issue on their repo with your specific error traceout!
@user-id7zh8ib4n 10 หลายเดือนก่อน ⁺¹
Just curious how was the test having improvements in Faithfulness but a degradation of Correctness? Could u perhaps help me understand how that might be?
@AI-Makerspace 10 หลายเดือนก่อน
I would interpret the results as follows:
We're staying closer to our retrieved context, but we're straying away from the answer provided by the original ground-truth model.
I would want to test further to see why the responses are being marked as less "correct" and see which cases the pipeline failed on to provide more insight.
@anuvratshukla7061 10 หลายเดือนก่อน ⁺¹
He , how can I generate test set (and ground truth ) With open source LLM. I'm using Mixtal. Please assist
@AI-Makerspace 10 หลายเดือนก่อน
You'd want to just create a loop that generates those responses!
As for the ground truths - you'd need to generate those manually or use a larger language model.
@anuvratshukla7061 10 หลายเดือนก่อน ⁺¹
What is the difference between ground truth and response? What actually does ground truth means wrt to evaluation?@@AI-Makerspace
@AI-Makerspace 10 หลายเดือนก่อน ⁺¹
@@anuvratshukla7061 "Ground Truth" is simply referring to a label on LLM responses that label as the "Truth." In general, it would be ideal to have all of these "Truths" written, verified, and optimized by humans. Since this is hardly ever actually done, what's more common is that a more powerful LLM is used to generate the "Ground Truth" on which we can run these analyses. In our case here, GPT-4 is used to create the Ground Truths and GPT-3.5-turbo is used to generate Responses.
It's important to keep in mind at the end of the day that the initial absolute values are much less important than the change in these metrics as you make improvements to your system! In other words, Metrics-Driven Development doesn't require that your Ground Truth data is perfect to begin with!
@anuvratshukla7061 10 หลายเดือนก่อน
@@AI-Makerspace Great thanks :)
@YerlanDuisenbay 10 หลายเดือนก่อน ⁺¹
Hello. It is great job. Yesterday code was working. Today it gives error in line "response = retrieval_chain.invoke({"input": "What are the major changes in v0.1.0?"})". Can you tell me how fix this one?
@AI-Makerspace 10 หลายเดือนก่อน
Could you provide your notebook so I can troubleshoot? I'm not running into that specific error on my end.
@pratheekbabu272 5 หลายเดือนก่อน
Can we use ragas without openai key
@AI-Makerspace 4 หลายเดือนก่อน
If you set-up LLMs and pass them as the Critic/Generator/etc - yes!

ต่อไป

เล่นอัตโนมัติ

Fine-Tuning Embeddings for Better Retrieval