4-Langchain Series-Getting Started With RAG Pipeline Using Langchain Chromadb And FAISS

Krish Naik

มุมมอง 45 224

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 6 ต.ค. 2024
RAG is a technique for augmenting LLM knowledge with additional data.
LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing the appropriate information and inserting it into the model prompt is known as Retrieval Augmented Generation (RAG).
github: github.com/kri...
---------------------------------------------------------------------------------------------
Support me by joining membership so that I can upload these kind of videos
/ @krishnaik06
-----------------------------------------------------------------------------------
Fresh Langchain Playlist: • Fresh And Updated Lang...
►LLM Fine Tuning Playlist: • Steps By Step Tutorial...
►AWS Bedrock Playlist: • Generative AI In AWS-A...
►Llamindex Playlist: • Announcing LlamaIndex ...
►Google Gemini Playlist: • Google Is On Another L...
►Langchain Playlist: • Amazing Langchain Seri...
►Data Science Projects:
• Now you Can Crack Any ...
►Learn In One Tutorials
Statistics in 6 hours: • Complete Statistics Fo...
End To End RAG LLM APP Using LlamaIndex And OpenAI- Indexing And Querying Multiple Pdf's
Machine Learning In 6 Hours: • Complete Machine Learn...
Deep Learning 5 hours : • Deep Learning Indepth ...
►Learn In a Week Playlist
Statistics: • Live Day 1- Introducti...
Machine Learning : • Announcing 7 Days Live...
Deep Learning: • 5 Days Live Deep Learn...
NLP : • Announcing NLP Live co...
---------------------------------------------------------------------------------------------------
My Recording Gear
Laptop: amzn.to/4886inY
Office Desk : amzn.to/48nAWcO
Camera: amzn.to/3vcEIHS
Writing Pad:amzn.to/3OuXq41
Monitor: amzn.to/3vcEIHS
Audio Accessories: amzn.to/48nbgxD
Audio Mic: amzn.to/48nbgxD

ความคิดเห็น • 66

@krishnaik06 6 หลายเดือนก่อน ⁺⁷
Support me by joining membership so that I can upload these kind of videos
th-cam.com/channels/NU_lfiiWBdtULKOw6X0Dig.htmljoin
@cocgamingstar6990 5 หลายเดือนก่อน
sir we request you, kindly share the document Screen too so we can learn much many more thing ...
@rafikyahia7100 5 หลายเดือนก่อน ⁺¹²
The fact that you start from scratch with dependencies and libraries, is just excellent, you save beginners so much headache and confusion, Thank you very much!
@raph8240 6 หลายเดือนก่อน ⁺⁸
You have done so much for the data science community. Some of your videos are worth more than $1000
@niranjannithiyanantham9279 6 หลายเดือนก่อน ⁺²
Krish i started to watch your video for the past 3 days you are just amazing man
And you dropped this 👑
@Nishant-xu1ns 6 หลายเดือนก่อน ⁺⁴
thanks sir for continuing the langchain series .really helpful for me
@sagarmhaisne110 6 หลายเดือนก่อน ⁺³
Thanks Krish, these videos helps to crack interviews !!
@Tech_Enthusiasts_Shubham 6 หลายเดือนก่อน ⁺²
thanks for bringing back to back helpful content sir i am really loving your new series by heart
@nagbhushanrsubbapurmath2247 6 หลายเดือนก่อน ⁺²
Hi Krish,
I am following you from last 1 year,
Joined PwSkills, Understood your ML and DL videos, you have explained everything in simple words so that A non tech person can also understand it very well.😊
Need to know more about Gen Al, How to evaluvate the Gen Al Models....?
Make some videos on the Evaluvation Part of Gen AI LLM Models,
Also make some videos on Fine Tuning LLMs.
Thank you !!! for the Help in Advance.
@lenovox1carbon664 6 หลายเดือนก่อน ⁺¹³
Hi sir i understood syntax for using ollama and openai is almost similiar, but i think it would be better if u use ollama whenever possible bcz i think many of us wont be using openai for learning purpose (also u might be able to fix any ollama related errors which we may face during our learning phase).
@deepaksingh9318 หลายเดือนก่อน
Amazing Krish.. I love the way you explain things from very basic and try to cover all the concepts, Libraries, Imports in your module with their use.. Thanks for making this series .. It was much needed one 😊😊
@sagaromar4326 4 หลายเดือนก่อน ⁺¹
Great content Great way of teaching loved it ❤ Thanks Krish for this wonderful content really needed it
@muhammednihas2218 2 หลายเดือนก่อน
Thank you krish , this video is very helpful for me to include RAG in my project
@utkarshkapil 6 หลายเดือนก่อน ⁺¹
This is SO good!! Have been waiting!!!!
@birbalk99 6 หลายเดือนก่อน ⁺¹
I'm very excited about the next upcoming video
@yashrajmotwani118 หลายเดือนก่อน
Great videos! They helped me learn so much in a better way
@wftrdshometoprofessionalfo142 6 หลายเดือนก่อน
Amazing man! Really Appreciated! Looking forward to watching next video!!!!!
@lalaniwerake881 4 หลายเดือนก่อน
This is extremely helpful! - Thank you very much
@adityavipradas3252 6 หลายเดือนก่อน
Going great so far. Thank you.
@Rider12374 3 หลายเดือนก่อน ⁺¹
krish , please try to explain steps you are performing and why they are required. Please do this and you would be the best teacher.
@emiliobravo1385 5 หลายเดือนก่อน
Great Video from Mexico
@mdfaiz4583 6 หลายเดือนก่อน
Thank you so much Krish
@mahikhan5716 5 หลายเดือนก่อน
super content krish , i appreciate lots
@captiandaasAI 6 หลายเดือนก่อน
Love and respect!!
@vos72 5 หลายเดือนก่อน
Absolutely fabulous tutorial. It really helped clarify some things for me. Very clear and concise, straightforward. Learned a lot -- keep your awesome videos coming! Do you mentor people by any chance?
@r1ckmav 5 หลายเดือนก่อน
Great explanation Krish and great content
@sheikhobada8305 5 หลายเดือนก่อน
Thanks Krish Sir
@pradeepjungkarki4510 6 หลายเดือนก่อน ⁺¹
Sir please make a separate videos on chroma db or any kind of vector databases
@captiandaasAI 6 หลายเดือนก่อน
Thanks alot krish !!!!!!!!!!!!!
@DivyanshChawda 5 หลายเดือนก่อน
Hi Krish thanks for the videos pls create a video about Langchain memory.
@techtalksabhishek 5 หลายเดือนก่อน
Good work Krish. Keep it up
@Nishant-xu1ns 5 หลายเดือนก่อน
excellent video
@explorewithskp1237 6 หลายเดือนก่อน ⁺²
The interview questions that interviewer asked me on GenAI are 1. Why overlapping while converting into chunks 2. Suppose if the overlapping words are 50 then how would you know that 50 words are overlapped 3. What is indexing vectors 4. Where would you store these vectors 5. How would you split the text or pdf text into chunks 6. what is RAG, Ollama, Cloudea
@shankarpentyala1660 5 หลายเดือนก่อน ⁺¹
Why overlapping while converting into chunks:
There are a few reasons to use overlapping chunks when processing text:
Capture Context: Overlapping chunks ensure that sentences don't get split at awkward points, preserving context between chunks. This can be important for tasks like information retrieval or question answering.
Reduce Boundary Issues: When searching for specific phrases, overlapping chunks can avoid missing matches that fall on chunk boundaries.
Improve Retrieval: Overlaps allow for more flexibility in retrieving relevant information, especially when dealing with complex or ambiguous queries.
How to know 50 words are overlapped:
There are a few ways to determine the number of overlapping words between chunks:
Maintain a Counter: During chunking, keep track of the number of words processed so far. When creating a new chunk, compare this counter with the previous chunk's ending position to calculate the overlap.
Use Fixed Overlap Size: Define a fixed number of words for overlap (e.g., 50 words) and adjust chunk boundaries accordingly.
Character-Level Processing: If you're working with character-level models, simply count the number of overlapping characters between chunks.
What are indexing vectors:
Indexing vectors are dense numerical representations of text documents or terms. These vectors encode the meaning and relationships within the text using techniques like Word2Vec or GloVe.
Where to store indexing vectors:
Indexing vectors can be stored in various ways depending on the application:
In-Memory Storage: For smaller datasets and real-time applications, vectors can be kept in memory for fast access.
Database Storage: For larger datasets, efficient databases like FAISS or Annoy can store and manage vectors for retrieval tasks.
Distributed Storage: Large-scale systems might use distributed file systems like HDFS or cloud storage solutions like Amazon S3 for vector storage.
How to split text or pdf text into chunks:
There are multiple ways to split text or PDF documents into chunks:
Fixed-Size Chunks: Divide the text into chunks of a predetermined size (e.g., 500 words). This is simple but might not be optimal for capturing context.
Sentence-Based Chunks: Split the text at sentence boundaries. This ensures that each chunk represents a complete thought.
Word-Based Chunks: Split the text based on word boundaries. This offers more granular control but might break up context.
Sliding Window: Use a sliding window approach with a fixed chunk size and defined overlap to create overlapping chunks as mentioned earlier.
Large Language Models (LLMs):
RAG, Ollama, Cloudea:
RAG (Retrieval-Augmented Generation): This is a technique for combining retrieval systems with large language models (LLMs) to improve the factual accuracy and informativeness of the LLM's outputs.
Ollama: This is a Python library that provides access to various LLMs, including open-source models like LaMDA or Jurassic-1 Jumbo. It simplifies interacting with LLMs and building RAG applications.
Cloudea: This term is not commonly used in the context of LLMs or text processing. It might be a misspelling of "CloudML" (Google Cloud Machine Learning Engine) or a reference to a specific, less popular service.
Asked Gemini google model
@explorewithskp1237 5 หลายเดือนก่อน ⁺²
@@shankarpentyala1660 Super Thanks 👍
@sanjuladissanayake5295 6 หลายเดือนก่อน
Legend!❤
@Vir-se2kb 4 หลายเดือนก่อน ⁺¹
Thanks for uploading good content videos. But one suggestion is before writing any line of code, it will be better if would explain the reason why you have written this line of code will be very helpful. Just watching your left screen & writing the same on the right screen will not be helpful for audience.
@rkjyoti4167 6 หลายเดือนก่อน
superb
@tejakarpuramswaroop4229 6 หลายเดือนก่อน
I saw the Reka Ai model. It is an open source model. Please do some tutorials on how we can build applications based on that API.
@AbhinavKumar-tx5er หลายเดือนก่อน
I am been following your tutorial . It has some amazing contents. Curious to know how accurate the result will be when the dataset is huge?
@amritsubramanian8384 5 หลายเดือนก่อน
Gr8 video
@nishantchoudhary3245 6 หลายเดือนก่อน
Waiting for next video
@shreyasbs2861 6 หลายเดือนก่อน
Nice video
@payalbhattad8048 6 หลายเดือนก่อน
Thanks it was great!
Can you make a same with excel file. It would be great if we can see some example of it.
@mohsenghafari7652 5 หลายเดือนก่อน
tank you
@AyoubMisbahi-ph3pi 6 หลายเดือนก่อน
Thanks Krish. I have a question please: How can I define an optimal chunk size and chunk overlap? Also, could you please use the Milvus database for another project?
@RahulPrajapati-jg4dg 5 หลายเดือนก่อน
Hello sir can you create some video related Ray lib with ml dl, transformers etc.....
@dhanashrikolekar-j7e 6 หลายเดือนก่อน ⁺¹
Hello Sir I have doubt into RAG service into Azure CosmosDB vcore this service is available or not Can we create Index ,Data Source and Indexer into Azure cognitive search service
@ysrinu4497 5 หลายเดือนก่อน
Hi @krish Thank you for providing the such beautiful tutorial. When we are going through the RAG pipeline for creating the embedding OPENAI embedding used, but I don't have open ai access for alternative you mentioned ollama embeddings. when I try to import the ollama embedding unable to find out the ollama embeddings. Could you pls assist on this.
@Vir-se2kb 4 หลายเดือนก่อน
Bro. Did you check the result for the query - "Who are the authors of the attention is all you need research paper?" ???? Along with the author name, it comes with some extra line from the pdf. You did not mention how to avoid that.
@venky433 5 หลายเดือนก่อน
@Krish, Getting error for Vector embedding code even though i have all required modules ...
## Vector Embedding and Vector Store
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
db = Chroma.from_documents(documents[:1],OpenAIEmbeddings())
Note: i have install "onnxruntime" library but still same error.
Error:
ValueError: The onnxruntime python package is not installed. Please install it with `pip install onnxruntime`
@norendermoody6509 4 หลายเดือนก่อน
Can we use Prompt as a query using Open AI LLM model?? and get answers to it. Any project involving Prompt and Embeddings
@ShivamGupta-qh8go 6 หลายเดือนก่อน
i am not watching the videos currently as my exams are going on , but can you please give an idea about by when will the series end??
@tanishhhh38 4 หลายเดือนก่อน
I had a question Krish, while querying the vector database using similarity search function, is the embedding APIs used to generate embedding for the query and then comparing it to existing vectors to provide results?
@jhhh9106 6 หลายเดือนก่อน
Hello sir can you create advance concept of Gen AI
@tharunps8048 6 หลายเดือนก่อน ⁺¹
Can we perform RAG on Tabular Data ?
@aryansalge4508 6 หลายเดือนก่อน
yes
@tharunps8048 6 หลายเดือนก่อน
@@aryansalge4508 will that answer any kind of query related to tabular data ?....like mean, median, correlation, unique categories, groupby ?
@MrAhsan99 5 หลายเดือนก่อน
@@tharunps8048 try PandasAi
@page_of_sky_ หลายเดือนก่อน
Does, using html pages enable us to read images ?
@shivtaneja866 4 หลายเดือนก่อน
Is it chunking visualisations also?
@Bazor4all 6 หลายเดือนก่อน
What about pinecone database. I think a lot have changed recently with pinecone. I find it difficult doing retrieval
@swet_gokugod9382 3 หลายเดือนก่อน
Stucked at this line
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
db = Chroma.from_documents(documents[:5], OpenAIEmbeddings())
Error:- ImportError: cannot import name 'run_in_executor' from 'langchain_core.runnables.config'

ต่อไป

เล่นอัตโนมัติ

5-Langchain Series-Advanced RAG Q&A Chatbot With Chain And Retrievers Using Langchain