Build a PDF Document Question Answering System with Llama2, LlamaIndex

Bhavesh Bhatt

มุมมอง 171 677

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 11 พ.ย. 2024

ความคิดเห็น • 43

@davidpratr 11 หลายเดือนก่อน ⁺¹⁴
And by the way, the shared code is not using the DB at all. Try to run the notebook without the part of connecting the DB and it will work in the same way. The whole RAG process is built on top of RAM with Llamaindex.
@evm6177 11 หลายเดือนก่อน ⁺¹²
😎Cool stuff and well paced presentation. Interesting, we now have an offline alternative to (preferably) uploading the entire PDF to the web and running a browser based Ai chatbox to run a query.. By the way, Thank you for introducing us to LLAMA 2. Waiting for more videos on this. 👍👍
@darkloxgamer 11 หลายเดือนก่อน ⁺¹
how i can embeed to my website ?
@fabsync 10 หลายเดือนก่อน ⁺²
love your teaching style! It would be great to see another video without the use of gradient or anything that is a paid service.. it gets pretty expensive quickly...
@bhattbhavesh91 10 หลายเดือนก่อน
Great suggestion!
@SharadShukla-e7s 8 หลายเดือนก่อน ⁺¹
Hi Bhavesh, can you please tell the version of the llama-index you have used for this project. I am having issues with the imports as new versions of llama index seems to have few things changed. Please reply!!
@thecraftssmith1599 8 หลายเดือนก่อน
Same problem man
@mohdshafi7558 11 หลายเดือนก่อน ⁺²
Engineer bhai zindabad
@robin4neetandmbbsstudents 9 หลายเดือนก่อน ⁺²
Sir as a medical student, i want to create such system of asking from medical books pdf ,but the size is large around 600 mb and 2000 pages nearly....is it possible to do so...??
Currently i use a website named chatpdf which allow to ask questions by uploading pdf of size not more than 10 mb ,and only 2 pdf daily we can upload....
@ishavmahajan 9 หลายเดือนก่อน ⁺¹
Suppose I only have unstructured/semi-structured tabular data in pdf then what approach can we use other than RAG framework.My task is to summarise the tabular data
@rizwanat7496 8 หลายเดือนก่อน
What if I want the retrieving the docs? how to get that?
@sunilkumarpradhan.4376 10 หลายเดือนก่อน
i tried running you streamlit code but it fails somehow due to some st.cache() deprecation , any fixes to it ?
@PANDURANG99 9 หลายเดือนก่อน
I am trying to solve similar problem of my organisation. So I have some questions.I made a research that uses openAI in background with langchain.
1.I don't want to use openAI for security reasons.
2.can we use this models without internet using python?
@_ADITYASATAPATHY 8 หลายเดือนก่อน
cannot import name 'ServiceContext' from 'llama_index'
This is a the error showing up everytime I run the file in Colab. Please help me out
@sumitchopra3319 5 หลายเดือนก่อน
for gradient use
%pip install llama-index-llms-gradient
%pip install llama-index-llms-openai
%pip install llama-index-readers-file pymupdf
%pip install llama-index-finetuning
!pip install llama-index gradientai -q
from llama_index.llms.gradient import GradientBaseModelLLM
@SharadShukla-e7s 8 หลายเดือนก่อน ⁺¹
Hi Bhavesh, kindly dont rely heavily on third party sites for creating vector db for small usecases, when it can be handled easily on ram using lancedb and other similar packages. currently after 3 months of your upload there are hardly any llama import working most of the things are deprecated which seem natural but with the version clashes its hard to get anything running. I request you to please give out a dependency file too so that versions can be compared when referring to your projects.
@davidpratr 11 หลายเดือนก่อน ⁺²
I’d would be good be good to test the same with a local llama2 instead of using Gradient and see if the quality of the answer is the same.
@abhaykrishnag4734 9 หลายเดือนก่อน
I have created a database, but I can't see my code. Kindly guide me where to check the code sir.
@musicxx6075 10 หลายเดือนก่อน ⁺¹
Can i use this for my philosophy research papers readings to understand and chat with the PDFs? Or if you any other pdf chatting site ?
@bhattbhavesh91 10 หลายเดือนก่อน
Yes!
@satyagurucharan4455 11 หลายเดือนก่อน ⁺¹
how would do or the steps, when the content is web based. The QnA chatbot has to answer the question based on the content present in the given website. We are using llama2 7b and it is not giving accurate answers to the questions asked, the answers has to be from the website, but sometimes it gives additional information that is not part of the website. How would we fine tune and train. It would be helpful if you can share some suggestions or do a video on that
@davidpratr ปีที่แล้ว ⁺⁵
Have you tried testing with multiple papers on the same topic? I’d then the retriever getting all related data from each pdf well? And is able then llama2 to summarise well all the document fragments when they are a much more? Actually I’m building an equivalent setup with videos and using weaviate but I’m facing the above issues. Even with weaviate the LLM part is only compatible with paid APIs now.
@wilsonmartin8954 10 หลายเดือนก่อน
Make some manual testing tutorials or which is best manual testing channel bcoz I'am starting to learn plz help me
@bhattbhavesh91 10 หลายเดือนก่อน
Sure 👍
@sanketmahakalkar2674 11 หลายเดือนก่อน ⁺⁵
1)What is minimum system requirements?😅
2) Sir ,i want to creat chat bot of 6 subject pdf around 1000 pages, pdf are not OCR.
3) is it complete open source or any subscription?
@dipaksavaliya8222 10 หลายเดือนก่อน
can anyone tell me how can i extract specific content from the PDF and save that in json format?
and that content is multiple choice questions i want to extract only those questions with the relevant options of that question but there are other content available which should be excluded
@bhattbhavesh91 10 หลายเดือนก่อน
Jsonify the response!
@dipaksavaliya8222 10 หลายเดือนก่อน
@@bhattbhavesh91 but there are also an other contents which I don't want in response and with my code I am getting it, i got the response in json but not that i wanted
@HemangJoshi 11 หลายเดือนก่อน
Nice video🎉
@bhattbhavesh91 11 หลายเดือนก่อน
Thank you so much 😀
@nazarmohammed5681 10 หลายเดือนก่อน
also make when query not related to pdf then llm should respond by its own
@bhattbhavesh91 10 หลายเดือนก่อน
Sure
@naijilaji9022 7 หลายเดือนก่อน
Anyone here who got Unauthorised Exception while running the base model slug, max token line?
@Purvi-k4k 10 หลายเดือนก่อน
Where is the json file?
@borntodoit8744 8 หลายเดือนก่อน ⁺¹
it's using a vector dB to store the unstructured data (pdfs) as structured data (nodes & a node index)
the LLM (LLaM2) then retrieves the structured data from vector DB
once retrieved & merged with LLM
LLM is then ready to apply
- input Prompt P
- evaluate P (using model) then
- output completion C (using all training data LLaM2 & pdf's)
@unitemaster8183 11 หลายเดือนก่อน ⁺²²
I hope you dont get Popular and stay to me and get us ahead of others,
...❤
@yt_souvik 11 หลายเดือนก่อน ⁺⁶
Sorry I found it 🙂
@Nishandh_Mayiladan 10 หลายเดือนก่อน
😂
@ajayvikrant108 10 หลายเดือนก่อน
LOL me too😂😂 @@yt_souvik
@zues_ezy 9 หลายเดือนก่อน ⁺²
Insecurities are popping up 😂
@honor9lite1337 9 หลายเดือนก่อน ⁺¹
But Why? 😮

ต่อไป

เล่นอัตโนมัติ

How I'd Learn NLP in 2024 (If I Had to Start Over)