🎯 Key points for quick navigation: 00:02 *📹 The second video in the RAG from Scratch series focuses on indexing, a crucial component of RAG pipelines.* 00:28 *🔍 The goal of indexing is to retrieve documents related to a given question using numerical representations of documents.* 00:53 *📊 Numerical representations of documents are used for easy comparison and search, with approaches including sparse vectors and machine learning-based embedding methods.* 01:08 *💡 Embedding methods compress documents into fixed-length vectors that capture their meaning, allowing for efficient search and retrieval.* 02:03 *📈 Documents are split into smaller chunks to accommodate embedding models' limited context windows, and each chunk is compressed into a vector representation.* Made with HARPA AI
When performing RAG, you need to encode your input data into embeddings because that is what the LLM understands, it is from these embeddings that the model will perform decoding of the output and subsequently give you the result you've asked for. These embeddings need to be stored somewhere, they can be small or very large, some vector dbs can be free, opensource and be in memory like faiss and chroma, they can also be paid and hosted like pinecone
@@bald_ai_dev the process is a little different The LLM doesn't touch the embeddings. The embeddings are used to convert the documents to a form that can more quickly and accurately be compared to the question (which is also converted to an embedding). This is done by an embeddings model (in this example an embeddings model from OpenAI (referred to as OpenAIEmbeddings() in the code) is used. These embeddings and their associated documents need to be stored somewhere (in this example Chroma is used). This is the indexing phase. After the comparison between the embedding of the question and the stored documents, a subset of the documents which have a high similarity (in embedding space) with the question is given to the LLM. This is the retrieval phase. Finally, the LLM uses the returned documents and it's own knowledge to reason and given an answer to the user. This is the generation phase..
@@sepia_tone Still didn't quite get what's the role of indexing here. Relevant documents are retrieved based on the embeddings of the split documents and the embedding of the question. So what's indexing doing here?
Langchain, Python, Jupyter notebooks... I'm tired of people getting more and more abstracted from real knowledge of the systems they use. You guys are GONE. Your systems are so high level but you have no idea what you are using or doing. Guineapiggies... everything you work on is accessible like u in a glass house. Don't you feel any remorse for supporting/advocating for systems that have no evident connection to what they are doing? Honestly all our children in the world shouldnt even be given access to a smart phone until they can understand what is in their hands. Same goes for all the child-adults here who are glossy eyed from their shiny new toys, of highlevel tools. You know how to use them, but have no idea how it works, or why it matters. How can you verify your data is safe with all of these layers upon layers of products/frameworks/toys/services? Getting so far lost in all this without advocating for LOCAL, OPENSOURCE, PRIVATE, MINIMALIST anything is lowkey disgusting. I hope more people start malding because other wise we will live in a society where 1 out of a million people think for themselves... and everyone just blindly follows shills
they do have a support for local LLMs and they also support FAISS. Me coming from search ads and machine learning, just here to learn what this fuss is all about. will let you know if there is anything new which search ads doesn't already do. :)
For the very first time in my mechanical engineering life, I think i learned something in detail about software engineering! Thank you!
🎯 Key points for quick navigation:
00:02 *📹 The second video in the RAG from Scratch series focuses on indexing, a crucial component of RAG pipelines.*
00:28 *🔍 The goal of indexing is to retrieve documents related to a given question using numerical representations of documents.*
00:53 *📊 Numerical representations of documents are used for easy comparison and search, with approaches including sparse vectors and machine learning-based embedding methods.*
01:08 *💡 Embedding methods compress documents into fixed-length vectors that capture their meaning, allowing for efficient search and retrieval.*
02:03 *📈 Documents are split into smaller chunks to accommodate embedding models' limited context windows, and each chunk is compressed into a vector representation.*
Made with HARPA AI
Amazing series! Thank you Lance!
Thanks. Nice video short and clear. But why do you need to store the embedding in a vector db
For efficient similarity searches
When performing RAG, you need to encode your input data into embeddings because that is what the LLM understands, it is from these embeddings that the model will perform decoding of the output and subsequently give you the result you've asked for. These embeddings need to be stored somewhere, they can be small or very large, some vector dbs can be free, opensource and be in memory like faiss and chroma, they can also be paid and hosted like pinecone
@horyekhunley thanks for the insight
@@bald_ai_dev the process is a little different
The LLM doesn't touch the embeddings. The embeddings are used to convert the documents to a form that can more quickly and accurately be compared to the question (which is also converted to an embedding). This is done by an embeddings model (in this example an embeddings model from OpenAI (referred to as OpenAIEmbeddings() in the code) is used. These embeddings and their associated documents need to be stored somewhere (in this example Chroma is used). This is the indexing phase.
After the comparison between the embedding of the question and the stored documents, a subset of the documents which have a high similarity (in embedding space) with the question is given to the LLM. This is the retrieval phase.
Finally, the LLM uses the returned documents and it's own knowledge to reason and given an answer to the user. This is the generation phase..
@@sepia_tone Still didn't quite get what's the role of indexing here. Relevant documents are retrieved based on the embeddings of the split documents and the embedding of the question. So what's indexing doing here?
Still did nt understand what indexing is
hey Lants, would you consider 'RAG' frameworks to be fairly Machiavellian?
Langchain, Python, Jupyter notebooks... I'm tired of people getting more and more abstracted from real knowledge of the systems they use.
You guys are GONE.
Your systems are so high level but you have no idea what you are using or doing. Guineapiggies... everything you work on is accessible like u in a glass house.
Don't you feel any remorse for supporting/advocating for systems that have no evident connection to what they are doing?
Honestly all our children in the world shouldnt even be given access to a smart phone until they can understand what is in their hands.
Same goes for all the child-adults here who are glossy eyed from their shiny new toys, of highlevel tools. You know how to use them, but have no idea how it works, or why it matters. How can you verify your data is safe with all of these layers upon layers of products/frameworks/toys/services?
Getting so far lost in all this without advocating for LOCAL, OPENSOURCE, PRIVATE, MINIMALIST anything is lowkey disgusting. I hope more people start malding because other wise we will live in a society where 1 out of a million people think for themselves... and everyone just blindly follows shills
they do have a support for local LLMs and they also support FAISS. Me coming from search ads and machine learning, just here to learn what this fuss is all about. will let you know if there is anything new which search ads doesn't already do. :)