Retrieval-Augmented Generation chatbot, part 1: LangChain, Hugging Face, FAISS, AWS

Julien Simon

มุมมอง 26 026

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 27 ม.ค. 2025

ความคิดเห็น • 63

@AaronWacker ปีที่แล้ว ⁺³
The RAG chatbot you demonstrate is an excellent lesson with HuggingFaceEmbeddings. Regarding how to do it outside GPT being generic enough to have your own vectorDB on demand for any model I had wondered how that was done. Thanks for covering this really great stuff!
@juliensimonfr ปีที่แล้ว ⁺¹
Glad it was helpful!
@jacehua7334 ปีที่แล้ว ⁺⁴
always making great and timely videos.
@juliensimonfr ปีที่แล้ว ⁺¹
Glad you like them!
@TheRealMikeD 10 วันที่ผ่านมา
Very interesting tutorial. Seems like a great, lower-budget, and lower-dev-time alternative to building directly off of Open AI API's. And I am also a fan of European metal!
@juliensimonfr 10 วันที่ผ่านมา
Absolutely. Rock on 🤘
@caiyu538 ปีที่แล้ว ⁺²
Thank you for your lectures.
@juliensimonfr ปีที่แล้ว ⁺¹
You are very welcome
@devilliersduplessis7904 ปีที่แล้ว ⁺¹
Hey Julien, Thanks for an insightful talk last night at the AWS center!
@juliensimonfr ปีที่แล้ว ⁺¹
You're welcome. Thanks for coming!
@justwest ปีที่แล้ว ⁺¹
thanks julien, one can learn so much from these!
@juliensimonfr ปีที่แล้ว
That's the idea 😀
@iAkashPaul ปีที่แล้ว ⁺³
Hey Julien, great job with the video. For QnA on corpus I'd recommend to generate hypothetical questions for each paragraph & ingesting them as well since those would have better similarity to the user input which is usually a question & can also help constrain the model to answer only closed domain questions.
@juliensimonfr ปีที่แล้ว
Yes, that's a nice trick. I tried to keep things simple here ;)
@DCTekkie 9 หลายเดือนก่อน ⁺¹
Thank you, gonna check it out tomorrow!
@juliensimonfr 9 หลายเดือนก่อน
Have fun!
@kuzeyiyidiker1344 9 หลายเดือนก่อน
Thanks for this clear explanation.
@juliensimonfr 9 หลายเดือนก่อน
Glad it was helpful!
@edinsonriveraaedo292 ปีที่แล้ว
Hi Julien, thanks for your video, pretty clear explained ;-)
@juliensimonfr ปีที่แล้ว
Glad it was helpful!
@VenkatesanVenkat-fd4hg ปีที่แล้ว
Superr video, Thanks for trying using open source solutions...
@juliensimonfr ปีที่แล้ว
Glad you liked it
@badbaboye 8 หลายเดือนก่อน
Thanks for the video!
@juliensimonfr 8 หลายเดือนก่อน ⁺¹
You're welcome!
@SebastienStormacq ปีที่แล้ว ⁺²
Thank you Julien - this is super useful and comes at the right time during my writing season (you know what I'm talking about :-) ) As someone else mentioned in the comment, I also received an error when calling Textract. I solved it by adding `pip install amazon-textract-textractor -qU` - hope it might help others
@juliensimonfr ปีที่แล้ว
Ok, good to know. Thanks Seb and good luck with the writing ;)
@SebastienStormacq ปีที่แล้ว
also 'pip install pip install faiss-cpu' :-)
@anserali551 10 หลายเดือนก่อน
Sagemaker with langchain streaming option is generating output
@ComFomeTo ปีที่แล้ว
Thanks a lot! It was very, very helpful.
@juliensimonfr ปีที่แล้ว
You're welcome.
@coolcurly9736 ปีที่แล้ว
It throws : KeyError: 'Blocks'
after running the cell with boto3.client('textrac') thrown by the loader.load(), from parser in langchain
@jingqiwu2865 ปีที่แล้ว ⁺¹
Thanks Julien! very nice video. very curious if there are some compare between bge-small with ada-002 when used in RAG.
@juliensimonfr ปีที่แล้ว
Hi, please check our embeddings leaderboard at huggingface.co/spaces/mteb/leaderboard. ada-002 is #15, bge-small is #8 :)
@krishnasunder9491 8 หลายเดือนก่อน
thanks it was really informative, can do demonstrate fine tuning LLM's with lora and Qlora? In your experience, RAG has better performer over fine tuning ?
@juliensimonfr 8 หลายเดือนก่อน
Llama2 fine-tuning with Qlora: th-cam.com/video/Zev6F0T1L3Y/w-d-xo.html. IMHO RAG and fine-tuning solve different problems and are complementary. RAG lets you access fresh company data and gives you some domain adaptation. Fine-tuning gives you better domain adaptation and lets you customize guardrails and tone of voice.
@GeigenAkademie ปีที่แล้ว ⁺¹
Thanks Julien, for the good tutorial! Some use pinecone, do you see differences/advantages of using faiss over pinecone? Thank you
@juliensimonfr ปีที่แล้ว
FAISS is a simple lightweight open source solution. Pinecone is a fully managed, closed source DB running in the cloud. Depends what you're looking for, and how much work you want to do on managing the solution :)
@aishwaryakumar6504 ปีที่แล้ว
Hi Julien,
Thank you for this video. It's helping me learn a lot. I was trying to run the code. When I attempt the zero shot example, my output is quite different from whats shown in the video. I tried to split it, but I get something like this - [answers:
* 1) The trend is to invest more in solar energy in China.
* 2) The trend is to invest less in solar energy in China.
* 3) The trend is to invest the same amount of money in solar energy in China.
* 4) The trend is to invest more in solar energy in the United States.
* 5) The trend is to invest less in solar energy in the United States. ] Can you please explain why this is happening and how it can be fixed?
@Invincible615 ปีที่แล้ว
Thanks for the tutorial.In my case,i can't use Mistral somehow due to some restrictions on AWS test account.I have used FLAN-T5 but it is giving this error.ValueError: Error raised by inference endpoint: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (422) from primary with message "Failed to deserialize the JSON body into the target type: missing field `inputs` at line 1 column 503".
@juliensimonfr ปีที่แล้ว ⁺¹
The input format for T5 is quite different, so sending a Mistral-formatted message won't work. Not sure what restriction you're facing, but maybe TinyLlama would work? I think you would only have to adapt the prompting format in the content handler.
@Thirumalesh100 ปีที่แล้ว
Great video, But what if user question is related to chat history and it may contain short cuts like he/she/that/it etc then how to handle such cases
@juliensimonfr ปีที่แล้ว
Langchain has different ways to handle this, e.g. python.langchain.com/docs/modules/memory/types/buffer
@Thirumalesh100 ปีที่แล้ว
Thanks@@juliensimonfr ,
Basically it is question rephrase request by passing entire chart history, tried this approach which has cost and token limit problem
Looking for other alternative for the same
@Abhisekgev 10 หลายเดือนก่อน
I want to embed large data. In this case, if I want to embed document without a GPU notebook ml.t3.medium, is it possible to deploy the embedding model as well in some ml.g5.large GPU instance to make the processing faster?
@juliensimonfr 10 หลายเดือนก่อน
Sure, it's what you would do for production.
@rnronie38 9 หลายเดือนก่อน
can you tell how to get the key for sagemaker to work here?
@juliensimonfr 9 หลายเดือนก่อน
Not sure what you mean. Are you looking for a SageMaker tutorial ? See docs.aws.amazon.com/sagemaker/latest/dg/gs.html
@kevinngo3722 ปีที่แล้ว
Hi Julien. The code is not working when I try to run it. I think the error I am getting is related to Sagemaker credentials. I made an account just now but don't know where to get information where I can plug into your code to make this work.
@juliensimonfr ปีที่แล้ว ⁺¹
Start here: docs.aws.amazon.com/sagemaker/latest/dg/howitworks-create-ws.html. Create a notebook instance and make sure its IAM role includes the SageMakerFullAccess and TexttractFullAccess managed policies. Once you've done that, the notebook will run as is.
@kevinngo3722 ปีที่แล้ว
Thanks for your reply! It seems that this leads me to make a Jupyter notebook. How do I integrate this to do what you're showing on Colab in the tutorial?@@juliensimonfr
@Azazello1482 7 หลายเดือนก่อน
Seems like a great video, but I can't move from the starting line. You seem to be skipping over very important details about how to deal with the HuggingFace tokens, AWS security keys, regional compatibility settings with Sagemaker, etc. For example, when running the copied SageMaker code, I get "ValueError: Must setup local AWS configuration with a region supported by SageMaker", but no region seems I try seems to work. Did you cut all the authentication code from your demo? Obviously you don't want to disclose security keys, but at least show/explain that part of the setup code and simply redact the sensitive information.
@juliensimonfr 7 หลายเดือนก่อน
How about going through Hugging Face 101 and SageMaker 101 first?
@Azazello1482 7 หลายเดือนก่อน
@@juliensimonfr Yes, clearly I'll need to do this! Nonetheless, as an educator myself, I think my point is still useful. It's helpful to learners to mention parts that you skip over. You don't have to teach it in this video, but it would be helpful to mention that there are steps one must perform that are not shown in this video.
@ccc_ccc789 10 หลายเดือนก่อน
Thanks!
@juliensimonfr 10 หลายเดือนก่อน
You bet!
@da-bb2up ปีที่แล้ว ⁺¹
Thx for the video :) can you update your vector database by a few lines ( if you want to add data to your knowledge base) automatically by running a python script or something like that?
@juliensimonfr ปีที่แล้ว
Sure, you can keep adding embeddings anytime you want.
@da-bb2up ปีที่แล้ว ⁺¹
oh thats nice :) thx for the answer@@juliensimonfr
@rnronie38 9 หลายเดือนก่อน
how can I call onto my react frontend?
@juliensimonfr 9 หลายเดือนก่อน
A SageMaker endpoint is an HTTPS API, so you can plug it in anything. You should be able to find lots of examples out there.
@debojitmandal8670 ปีที่แล้ว
Y r u deploying first in sage maker
@juliensimonfr ปีที่แล้ว
Because I don't want to manage any infrastructure :)
@whemmakatatt5311 ปีที่แล้ว
godlike

ต่อไป

เล่นอัตโนมัติ

Retrieval-Augmented Generation chatbot, part 2 - LangChain, Hugging Face, OpenSearch, AWS