End to end RAG LLM App Using Llamaindex and OpenAI- Indexing and Querying Multiple pdf's

Krish Naik

มุมมอง 47 967

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 28 ม.ค. 2024
In this video we will create a Retrieval augmented generation LLm app using Llamaindex and Openai. Here we will be indexing and query multiple pdf's using llamaindex and openai.
github code: github.com/krishnaik06/Llamin...
----------------------------------------------------------------------------------------------
Support me by joining membership so that I can upload these kind of videos
/ @krishnaik06
----------------------------------------------------------------------------
►Data Science Projects:
• Now you Can Crack Any ...
►Learn In One Tutorials
Statistics in 6 hours: • Complete Statistics Fo...
End To End RAG LLM APP Using LlamaIndex And OpenAI- Indexing And Querying Multiple Pdf's
Machine Learning In 6 Hours: • Complete Machine Learn...
Deep Learning 5 hours : • Deep Learning Indepth ...
►Learn In a Week Playlist
Statistics: • Live Day 1- Introducti...
Machine Learning : • Announcing 7 Days Live...
Deep Learning: • 5 Days Live Deep Learn...
NLP : • Announcing NLP Live co...
---------------------------------------------------------------------------------------------------
My Recording Gear
Laptop: amzn.to/4886inY
Office Desk : amzn.to/48nAWcO
Camera: amzn.to/3vcEIHS
Writing Pad: amzn.to/3vcEIHS
Monitor: amzn.to/3vcEIHS
Audio Accessories: amzn.to/48nbgxD
Audio Mic: amzn.to/48nbgxD

ความคิดเห็น • 120

@krishnaik06 4 หลายเดือนก่อน ⁺¹⁵
Fixed the issue and reuploaded the video again
@vinayyadav6522 3 หลายเดือนก่อน
Hi Krish, Please make a complete video of bedrock llama2 chat further steps for providing the output as API to the front-end or to check on postman by passing inputs and inference parameters using fastapi or Django.
@shantanusharma5617 3 หลายเดือนก่อน
I love your videos. Before starting the setup, could you make sure that your code is future proof by sharing the Python/Conda version you are using?
Preferably, start with the `pyenv install` command. Could you also commit the `requirements.txt` file with the version number you used? Thank you 🙏
@user-jx3wy6fe4s 3 หลายเดือนก่อน ⁺³
Hi sir I am getting error "cannot import name 'VectorStoreIndex' from 'llama_index' (unknown location)". Can you help me with this?
@balvendarsingh9905 2 หลายเดือนก่อน
same issues with me @@user-jx3wy6fe4s
@user-gk7ox3of4b 2 หลายเดือนก่อน
Can you use OCR model also to read images in pdf.
@faqs-answered หลายเดือนก่อน ⁺¹
I really love the way you teach these hard concepts with so much enthusiasm that it sounds so easy. Thank you so so much.
@Venom-gt3hi หลายเดือนก่อน ⁺¹
You are amazing and your videos taught me more than any of my graduate professors could. Thank you
@ajg3951 4 หลายเดือนก่อน ⁺²
This session is fantastic! It would be great if you could also demonstrate how to change the default embedding, specify which embedding the model is using, and explain how to switch between different models such as GPT and LLM. Additionally, it would be helpful to cover how to utilize this dataset to answer specific questions.
@phanindraparashar8930 4 หลายเดือนก่อน ⁺¹
much-awaited series. would be nice if we have even more complex rag applications.
@ariondas7415 4 หลายเดือนก่อน ⁺⁵
please use open source LLMs
As a student, it's difficult to come up with budgets for openai api key
btw just wanted to thank you for everything you're doing!!
@1murali5teja 4 หลายเดือนก่อน ⁺¹
Thanks for the video, I have been constantly learning from your videos
@narsimharao8565 4 หลายเดือนก่อน ⁺²
Hi Krishn sir, very thank you for this video❤
@bernard2735 3 หลายเดือนก่อน ⁺¹
Thank you - this was a great tutorial. Liked and subscribed.
@jcneto25 4 หลายเดือนก่อน ⁺¹
Excellent Tutorial. Thanks
@bawbee27 17 วันที่ผ่านมา
I love how verbose this is. Thank you!
@deepak_kori 2 หลายเดือนก่อน
thank you sir making such video these are amzaing video🤩🤩
@akshatapalsule2940 4 หลายเดือนก่อน ⁺¹
Thankyou so much Krish!
@khalidal-reemi3361 2 หลายเดือนก่อน ⁺¹
eagerly waiting for a video to include databases.
@summa7545 2 หลายเดือนก่อน
Hello krish, first of all, I'd like to thank you for all your guidance. Your videos are my main source of study. Now, my query related to this video. The codes have been changed from the one you are showing. Most remain same with addition of core to the library. But I couldn't find any for vectorindexautoretriever, mainly the keywords to be used inside. Currently it's asking for vectorstoreinfo apart from index and similarity top k
@lixiasong3459 3 หลายเดือนก่อน ⁺²
Thank you very much, Sir. In your Llamaindex playlist, it says five videos so far, but 2 unavailable videos are hidden. do I have to pay and become a member to be able to say the full playlist? Thanks again for the amazing videos!
@pavankonakalla4668 4 หลายเดือนก่อน ⁺²
So it is power full than Azure AI Search?? or it does the same thing as AI search(Azure cognitive search).
@bevansmith3210 3 หลายเดือนก่อน ⁺²
Great channel Krish! Is it possible to create a RAG/LLM model to interact with a database to ask statistical type questions? what is the max, min, median, mean? basically to create a chatbot for non-technical users to interact with spreadsheets
@aravindraamasamy9453 4 หลายเดือนก่อน ⁺²
Hi Krish , I have a doubt regarding the project I am doing. So the project is that from a pdf file I need to create a excel file which have 5 columns and the info in excel can be filled from the pdf. Can I get a an approach to solve the problem using llm. I am looking forward to hearing from you.
@user-bj2gw4bu2w 3 วันที่ผ่านมา
In your playlists, you have been using the llm with some API key, but where is the RAG here?
@alfatmiuzma 4 หลายเดือนก่อน ⁺¹
Hello Krish, Thanks for this informative video on RAG and LLAmaIndex. I have one doubt - When you query "what is attention is all you need", the source having 0.78 similarity score is chosen as Final Response instead of the source having similarity score 0.81. Why?
@user-jo3kt2hv9f 4 หลายเดือนก่อน ⁺²
Thanks Sir.
May i know where did we use OpenAI here, Can we use any open source model like Llama-2?
@seanrodrigues1842 3 หลายเดือนก่อน ⁺¹
Since we are using open ai, does it mean we are using one of the gpt models? There was no parameters in the code to choose what llm model to you. How do we select a particular open ai model?
@RanjitSingh-rq1qx 4 หลายเดือนก่อน ⁺¹
Wow sir, I were waiting this video ❤
@user-fj4ic9sq8e 2 หลายเดือนก่อน ⁺¹
Hello,
thank you so much for this video.
i have a question related of summarize questions in LLM documents.
For example in vector database with thousands documents with date property, and i want ask the model how much document i received in the last week?
@rizwanat7496 3 หลายเดือนก่อน ⁺¹
I am using mistral open source model, and I want to store the relevant documents that are retrieved. How do to it?
@sravan9253 หลายเดือนก่อน
Instead of using an LLM to generate embeddings of input data, we are using LlamaIndex here to embed and index the same?
@shobhitagnihotri416 4 หลายเดือนก่อน ⁺²
We cam do same thing in langchain , so what id difference
@ashusoni6448 2 หลายเดือนก่อน
sir, As you know that libraries like llama index are still undergoing various changes, please try to mention the exact version of library in requirements.txt files
@amritsubramanian8384 25 วันที่ผ่านมา
Gr8 Video
@StutiKumari-yn5ws 11 วันที่ผ่านมา
Hi krish if there is an option to store index in hard disk , then why we need vector store like chroma db
@achukisaini2797 4 หลายเดือนก่อน
Sir i need your help i am using llama index and saving the embeddings in pinecone using sentence transfromer but i am not to connect with the pinecone
@AniketGupta-et7zw 4 หลายเดือนก่อน
Hi Krish, can you also make a roadmap video on data engineer.
@ArunkumarMTamil 4 หลายเดือนก่อน
Teach about Direct Preference Optimization
@tonydavis2318 22 วันที่ผ่านมา
Out of curiosity, why are you using python 3.10 instead of the current stable version 3.12?
@udaysharma138 4 หลายเดือนก่อน ⁺¹
Can you please create a Video on How we can Summarize a long PDF with Mistral or Llama-2 to get a very Efficient output , Because with Open AI we have great amount of Context Length , But with these Open Source LLM Models we are Restricted while summarize a Large PDF
@Dream-lp7km 3 หลายเดือนก่อน
Sir company mein Google colab or jupyter notebook kisme work krte hai
@kamitp4972 4 หลายเดือนก่อน
Sir can you please make an implementation video on TableGPT?
@awakenwithoutcoffee 17 วันที่ผ่านมา
Hi there Krish, amazing tutorial once again but I'm running in the issue that the "maximum context length is 8192 tokens". How can we best chunk per PDF page/chapter if the PDF size > 8k tokens ?
*EDIT*: Our use-case is the following: we want to retrieve 100% accurate text from a page or chapter. Is this possible or does the AI only knows how to summarize ?
@karmicveda9648 4 หลายเดือนก่อน
🔥🔥🔥
@harik5591 4 หลายเดือนก่อน
Can you create an application with indexing images and creating a prompt with similarity search for a given image content
@pranavgaming7634 3 หลายเดือนก่อน ⁺⁴
Amazing Energy Krish. I am your student in Master GenAI class. I am trying this project but i am getting Import Error while loading VectorStoreIndex,SimpleDirectoryReader from llama_index. I have tried loading only one but status quo. Could you please guide me to fix it
@linalogvina6001 หลายเดือนก่อน
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
@shashankag5361 หลายเดือนก่อน
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
@kunalbose6360 4 หลายเดือนก่อน
Can we have some content where we can fine tune as well as feedback or advance RAG for QnA ❤❤ Or Triplets way for RAG
@piyush-A.I.-IIT 4 หลายเดือนก่อน ⁺³
Thanks! Just a quick question:
For indexing the documents, does it call openAI api internally? I understand for retrieval it calls openAI api to formulate the final answer. But I am unclear whether it calls api for indexing. I need to index 10000 pages document so I have to account cost if it calls openAI api.
@vivekshindeVivekShinde 4 หลายเดือนก่อน
As per my knowledge, for Indexing it doesn't use OpenAI. For retrieval it does.
Correct me if I am wrong
@ajg3951 4 หลายเดือนก่อน
@@vivekshindeVivekShinde we are using indexing with the API because by default we are utilizing OpenAI's text embedding. Indexing involves embedding the text into respective folders. However, you have the freedom to change the embedding method from OpenAI to any other open-source option available for this purpose.
@nefelimarketou1892 2 หลายเดือนก่อน
thankyou!
@srishtisdesignerstudio8317 4 หลายเดือนก่อน
Krish Ji, hi. we are into Stock market and use ML which we only use for LSTM and weka and to some extent knime and rapidminer for building simple models involving moderate levels of data sets 4000 to 7000 instances may go upto 10000 instances and 8-10 features hence not very big models in terms of size as may be termed in actual seious ML. We saw in one of your videos building LSTM on TF on your 1650 gtx laptop we guess. We had been training our models on CPU only till now and it consumed a lot of time. however we have recently started working on sentiment library and wish to implement it into our models to make some auto trading bots. could you please guide us on our laptop purchase I mean will a 1650 be good enough or do need to invest heavily. we have shortlisted some gaming budget laptops with 70-80k range with rtx 4050 or 3050. your valuable suggesttions will be of great help. dont want to waste our money and also think you are quite well versed in the subject. you suggestions please........
@Chuukwudi 24 วันที่ผ่านมา
Thank you Krish.
Important Notes:
llama_index doesn't support python 3.12
If you're decide to use python 3.11, while importing, you will need to use `from llama_index.core.`.
@fatimazehra5962 19 วันที่ผ่านมา
Which location is to be added in method_location?
@nelohenriq 2 หลายเดือนก่อน
What about doing all this but using only open source models from HF?
@saumyajaiswal6585 4 หลายเดือนก่อน
Thank you for the awesome video. Can you please tell the best approach where we have a multiple pdf chatbot.The Pdfs can have text,images,tables.The answer should contain text, images and tables(or get answers from them) from the pdfs itself wherever required.
@vivekshindeVivekShinde 4 หลายเดือนก่อน
Facing similar issue. Let me know if you find something. It ll be helpful
@saumyajaiswal6585 3 หลายเดือนก่อน
@@vivekshindeVivekShinde sure....you found any solution?
@sanjaykrish8719 21 วันที่ผ่านมา
can llamaindex be used with Llama? why did openai name it after metas llama?
@abax_ 4 หลายเดือนก่อน ⁺¹
sir can you plz use a opensource model in next video such as google palm i tried using the palm model but VectorStoreIndex is constantly demanding open api key , even took help with the docs but i am only able to get response without chaining the pdf
@Munnu-hs6rk 5 วันที่ผ่านมา
you should make same video using open source llms if we can make the project in free why we should pay......and also make end to end streamlit app
@lordclayton 9 วันที่ผ่านมา
Can anyone link the llamaindex playlist that Krish Naik has started? I can't seem to find it somehow
@RahulAthreyaKM 4 หลายเดือนก่อน
can we use Gemini with llama index?
@bindupriya117 6 วันที่ผ่านมา
Can you make a video how to RAFT RAG handson video
@user-wo2mu6jl3g 4 หลายเดือนก่อน
where can I find those pdf's used in the project?
@keepguessing1234 19 วันที่ผ่านมา
My requirements.txt is not able to install... Throwing error
@Innocentlyevil367 4 หลายเดือนก่อน
Hey krish can u do end to end project on model fine tuning
@ishratsyed77 3 หลายเดือนก่อน
Llama-index installation is giving errors Any suggestions?
@yetanotheremail 4 หลายเดือนก่อน
Finally
@allaboutgaming836 3 หลายเดือนก่อน
Getting error for importing CohereRerank
ImportError: cannot import name 'CohereRerank' from 'llama_index.core.postprocessor'
Causing the error while importing SimilarityPostprocessor
from llama_index.core.indices.postprocessor import SimilarityPostprocessor
@ambarpathakkilbar 4 หลายเดือนก่อน
very basic question - is llama index using the open api key you initialized in the os environment ?
@ambarpathakkilbar 4 หลายเดือนก่อน
also where exactly did you use open ai I am not able to understand it
@vivekshindeVivekShinde 4 หลายเดือนก่อน ⁺¹
No. I think Llamaindex is not using thr OpenAI api key. Also he didn't use it anywhere in the project. Like he said in future we will create more complex conversational bots, maybe at that time he ll use it. He just added that OpenAI part for sake of maintaining the future flow. I might be wrong. Feel free to correct me.
@subhamjyoti4189 3 หลายเดือนก่อน
@@vivekshindeVivekShinde 'VectorStoreIndex' is using openai internally for generating embeddings.
@omerilyas7347 2 หลายเดือนก่อน
@@subhamjyoti4189 I dont think VectorStoreIndex uses OpenAI's embeddings.
@qzwwzt 3 หลายเดือนก่อน ⁺¹
Sir, congrats on your lessons! I'm from Brazil. I tried other PDFs in Portuguese. At the end of the response, the text came in English. "These are the enteral nutritional
requirements for preterm infants weighing less than 1500g."
Is it possible to get everything in Portuguese?
Tks a lot
@marcelobeckmann9552 2 วันที่ผ่านมา
Did you try to ask LLM to translate the response to Portuguese?
@harshsingh7842 3 หลายเดือนก่อน
how to create open api key please tell me please help me with this doubt
@surendra1764 4 หลายเดือนก่อน
how to work with tabular data
@prasanthV-ji1ub 2 หลายเดือนก่อน ⁺¹
We are getting an Error that says ---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
c:\Users\wwwdo\Desktop\LLAMA_INDEX\Llamindex-Projects\Basic Rag\test.ipynb Cell 1 line 4
1 ## Retrieval augmented generation
3 import os
----> 4 from dotenv import load_dotenv
5 load_dotenv()
ModuleNotFoundError: No module named 'dotenv' even though I try to add python-dotenv in the requirments.txt
@rishabkhuba2663 หลายเดือนก่อน
Same here
Did you find a workaround?
@bhanu866 หลายเดือนก่อน
How can we start this in colab
@Decoder_Sami 2 หลายเดือนก่อน
from llama_index import VectorStoreIndex,SimpleDirectoryReader
documents=SimpleDirectoryReader("data").load_data()
ImportError Traceback (most recent call last)
Cell In[20], line 1
----> 1 from llama_index import VectorStoreIndex,SimpleDirectoryReader
2 documents=SimpleDirectoryReader("data").load_data()
ImportError: cannot import name 'VectorStoreIndex' from 'llama_index' (unknown location)
How can I fix this issue any suggestions, please!
@Decoder_Sami 2 หลายเดือนก่อน ⁺²
Yes I got it
The correct code should be like this:
from llama_index.core import VectorStoreIndex,SimpleDirectoryReader
documents=SimpleDirectoryReader("data").load_data()
@tonydavis2318 9 วันที่ผ่านมา
@@Decoder_Sami That little tidbit took me about 3 hours to figure out. Thanks for posting!
@AkshayKumar-nh4fv 2 หลายเดือนก่อน
ERROR: Failed building wheel for greenlet
Failed to build greenlet
ERROR: Could not build wheels for greenlet, which is required to install pyproject.toml-based projects
getting this error while installing frameworks from requirements.txt
@sachinborgave8094 2 หลายเดือนก่อน ⁺¹
don't see the pdf's
@kamalakantanayak3250 4 หลายเดือนก่อน
How is this different from embedding technique ??
@kishanpayadi8168 4 หลายเดือนก่อน
As far as I understand, It is RAG is based on embedding for similarity search. LLAMA index is just at frame work to build application on top of it.
@chinnibngrm272 4 หลายเดือนก่อน
Hi sir
Previously I have tried with Gemini pro
In that project while extracting text from pdf of 32 pages it's not extracting all text...
That's why I am not able to get perfect answers..
What I have to do sir...
Please help me to solve
@krishnaik06 4 หลายเดือนก่อน ⁺²
Use this technique it will work
@chinnibngrm272 4 หลายเดือนก่อน
@@krishnaik06
Sure sir
@chinnibngrm272 4 หลายเดือนก่อน
@@krishnaik06
Thank you soo much sir for helping lot of students....
You are Amazing😍
Waiting for more projects.
And also one request from my side sir... Please share some project ideas to us as assignments.
It will help us to do it on our own
Please sir... Please share some application ideas
@akj3344 4 หลายเดือนก่อน
@@chinnibngrm272 omg stop begging.
@surendra1764 4 หลายเดือนก่อน
how to fine tune llms
@khanmahmuna 27 วันที่ผ่านมา
please can you build a project based on document summarisation app using RAG and LLM without locally downloading the llm and without using gpt model,it would be very helpful or anyone can guide me through this from the viewers it would be very helpful.
@MirGlobalAcademy 3 หลายเดือนก่อน
Why don't you use vs code as code WRITING purpose. why are you using PyCharm inside vs code?
@DataDorz 3 หลายเดือนก่อน ⁺¹
what is the need of openai in this video?
@thespritualhindu หลายเดือนก่อน
For response synthesis. Once the relevant nodes are retrieved, it is passed as a context to LLM(openAI) model and then LLM provided the answer in much better way to the users query.
@manzoorhussain5275 3 หลายเดือนก่อน
ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'c:\\programdata\\anaconda3\\lib\\site-packages\\__pycache__\\typing_extensions.cpython-39.pyc'
Consider using the `--user` option or check the permissions.
WARNING: Ignoring invalid distribution -umpy (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution - (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -umpy (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution - (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -umpy (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution - (c:\programdata\anaconda3\lib\site-packages)
Getting the errors and warnings when I am trying to install the packages from requriement.txt file
kindly help
@GaganaMD วันที่ผ่านมา
Sad that openAI api is no longer free
@vijjapuvenkatavinay8207 4 หลายเดือนก่อน
I'm getting rate limit error sir.
@kishanpayadi8168 4 หลายเดือนก่อน
Toh gemini use karle mere bhai
@svdfxd 4 หลายเดือนก่อน
With all due respect, the speed with which you are posting videos makes it very difficult to keep up with the learning pace.
@ImranKhan-jn6zh 2 หลายเดือนก่อน
Hello Krish,
I have not seen any video related to how to evaluate LLM
Can you please upload videos on how to evaluate llm model and which evaluation metrics can be used for specific usecase... Like q&a, summerization etc
As I am getting this question in every interview and not able to answer itt
@MangeshSarwale 4 หลายเดือนก่อน ⁺¹
sir I did'nt have the paid openai key so while running code i am getting the error(RateLimitError : You have exceed your current quota) at the line index=VectorStoreIndex.from_documents(documents,show_progress=True)
please tell how to solve this
@kishanpayadi8168 4 หลายเดือนก่อน
Either create a new account and get free but limited access for 30 days or use gemini pro
@shravaninevagi5729 หลายเดือนก่อน
did you find any alternative? i am stuck here as well
@allinoneofficial5300 14 วันที่ผ่านมา
@@shravaninevagi5729 in source file .venv\Lib\site-packages\llama_index\core\embeddings\utils.py change below for GooglePalmEmbedding which is working in my case
Install llama-index-embeddings-google by command "pip install llama-index-embeddings-google"
"""Embedding utils for LlamaIndex."""
import os
from typing import TYPE_CHECKING, List, Optional, Union
if TYPE_CHECKING:
from llama_index.core.bridge.langchain import Embeddings as LCEmbeddings
from llama_index.core.base.embeddings.base import BaseEmbedding
from llama_index.core.callbacks import CallbackManager
from llama_index.core.embeddings.mock_embed_model import MockEmbedding
from llama_index.core.utils import get_cache_dir
from llama_index.embeddings.google import GooglePaLMEmbedding
EmbedType = Union[BaseEmbedding, "LCEmbeddings", str]
def save_embedding(embedding: List[float], file_path: str) -> None:
"""Save embedding to file."""
with open(file_path, "w") as f:
f.write(",".join([str(x) for x in embedding]))
def load_embedding(file_path: str) -> List[float]:
"""Load embedding from file. Will only return first embedding in file."""
with open(file_path) as f:
for line in f:
embedding = [float(x) for x in line.strip().split(",")]
break
return embedding
def resolve_embed_model(
embed_model: Optional[EmbedType] = None,
callback_manager: Optional[CallbackManager] = None,
) -> BaseEmbedding:
"""Resolve embed model."""
from llama_index.core.settings import Settings
try:
from llama_index.core.bridge.langchain import Embeddings as LCEmbeddings
except ImportError:
LCEmbeddings = None
# Check if embed_model is 'default' or not specified
if embed_model == "default" or embed_model is None:
# Initialize Google PaLM embedding
google_palm_embedding = GooglePaLMEmbedding()
embed_model = google_palm_embedding
return embed_model
@uniqueavi91 5 วันที่ผ่านมา
same for this:
from llama_index.core.response.pprint_utils import pprint_response
@PASChildAbuse 10 วันที่ผ่านมา
getting error "ValueError: Unknown encoding cl100k_base. Plugins found: ['tiktoken_ext.openai_public']" at index = VectorStoreIndex.from_documents(documents,show_progress=True)

ต่อไป

เล่นอัตโนมัติ

Step-by-Step Guide to Building a RAG LLM App with LLamA2 and LLaMAindex