Langchain: PDF Chat App (GUI) | ChatGPT for Your PDF FILES | Step-by-Step Tutorial
ฝัง
- เผยแพร่เมื่อ 18 พ.ค. 2023
- "Build a ChatGPT-Powered PDF Assistant with Langchain and Streamlit | Step-by-Step Tutorial"
In this comprehensive tutorial, you'll embark on a project-based journey where we leverage Langchain and Streamlit to develop an interactive ChatGPT for your PDF documents. With the power of an LLM (Large Language Model) such as OpenAI's ChatGPT, we'll create an application that enables you to ask questions about PDFs and receive accurate answers.
🦾 Discord: / discord
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: / promptengineering
💼Consulting: calendly.com/engineerprompt/c...
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
Link to the code: pastebin.com/mcHG4cY4
Learn how to harness the power of Langchain, an open-source Python (and Javascript) framework, to create intelligent applications. Discover Langchain's capabilities in training GPT models on your data and generating personalized LLMs. Explore text embeddings and their integration with Langchain using OpenAI's API.
In this tutorial, we'll guide you through building a fully functional Streamlit application. Train GPT on PDF documents and fine-tune it to your specific use case. Experience the seamless user interface as you upload PDFs, ask questions, and receive prompt answers from the LLM.
Unleash Langchain's versatility in chatbots, document analysis, and more. Automate tasks and improve efficiency using Langchain with Streamlit.
Take your natural language processing skills to the next level. Start building powerful applications with Langchain today!
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu... - วิทยาศาสตร์และเทคโนโลยี
If you are interested in learning more about how to build robust RAG applications, check out this course: prompt-s-site.thinkific.com/courses/rag
Wow amazing! Now this retro fitted with privateGPT would be gold!
Great stuff for explaining the flow of the architecture so well. I like the pace and also how details regarding each part of the architecture are explained with just enough details not to overwhelm the viewer.
Glad you enjoyed it!
Learnt a lot from just this one video. Appreciate all the hard work that went into making these fantastic tutorials. Thanks very much.
Thank you :)
Same up there! Please teach us more end to end project, i really learned things here@@engineerprompt
It is my first time to know streamlit....That's amazing!!! Thanks for the sharing. Your step by step video is ultra clear... Thanks a lot.
Glad it was helpful!
Thank you for this clear, concise explanation. It’s definitely the best I’ve found so far. New to Langchain and quite excited. Thanks for your help.
You're very welcome!
Tried the whole videos, really functional and you have a great insight in different concepts. Thanks a lot for this detailed video. Once of the best videos watched in recent times.
glad you found it useful.
Thank you for these in depth videos. Learning so much. Thanks again!!
Great content, very well explained. Really appreciate the long videos with elaborate explanations.
Thank you 😊
It was one of the coolest video on LLMs thanks for sharing such knowledge, I learnt alot from this video.
The pace, contents and explanation are all great. Thank you for a great video. Always checking your channel for great content.
Thank you for your kind words 🙏
Thank you for a beautiful step-by-step learning walkthrough. I like how you articulate each area clearly and precisely. Good on you, and keen to see more videos, and well done.
Glad it was helpful!
Thank you so much, Ive been self taught on AI and this video has really helped me on AI... God Bless
Keep up the great work. I love your detailed videos.
Great Video And you explained it really well too. Earlier I was confused about why langchain is used but your video helped me to understand !
Thank you !
Am halfway to the video and I am so excited, how easy you are making me to learn… kudos to you
Glad you are finding this useful
Does it requires to buy chatgpt API to build this project ? @@engineerprompt
Thank you for this video. Just signed up as a patron. Keep up the great work and all the best to you.
Thank you for your support!
Extremely useful. Thanks for making this awesome useful project
Well done, this is just what im looking for. now i just need to find out how to clean old scanned PDF files so i can chat with them. 👍
One day you will get million pls make this amazingt videoes with new technologies daily for enhance our knowledge thank you so much brother !!!
Best videos! Instruction is so clear. I learn so much from you. Thanks
Thank you!
Loved this, learned so much today. Thank you
Great work! Thanks for sharing. Keep it up :)
Thank you so much man this was of great help cheers!
and you my friend are a true hero!
What would change if we decide to use an open source LLM, what lines should be changed and how? Would be a good idea for a future video
Also a good idea for the wallet 😎
I agree too
This, plus I want to load in directories with PDFs as entire topical collections to query
For Open source options:
Check this out for embeddings: th-cam.com/video/ogEalPMUCSY/w-d-xo.html
Check this out for LLMs: th-cam.com/video/wrD-fZvT6UI/w-d-xo.html
Thanks for this wonderful video.
Please make more videos of chain_type, as well as explore more configuration for OpenAI and other LLMs.
Thanks❤
Thank you so much!
excellent, thanks
What if you have a semi-structured markdown document that contains it's own semantic structure (like H1 headings), and want to chunk a document based on them? The langchain docs for the MarkdownTextSplitter method mention splitting into chunks for sentences and recombining.. but how?
thank sm bro your the best fr
Great video man!! Can you help me on the changes that needs to be made if I don’t want to use historical data or documents please?
Thank You !
Thankyou buddy.
that is a very beautiful walkthrough. Just one question, Can you pls tell how to change the language of conversation to some other language like hindi?
very nice video ! thank you very much
but i have a problem :
-sometimes answers are very long so they are cut off,
-and how to do a chat like chatgpt, where we use the previous queries and responses, so there is a context
thank you !!
Grt video from a genius man. Hats off. can you pls record a short video using google palm LLM and instructor embedding and also hosting of this applicaiton on VPS?
Great video, can we do same using llama2 with open source embeddings(sentence-transformers)?
thanks for sharing
thanks buddy🥺❤❤❤❤
Very Informative tutorial .Thanks for educating us
Glad it was helpful!
@rajithkumar3424 can u explain me ..... waiting for your reply
Based on your workflow, if you ask a question it will end up re-uploading the document and re-processing the pdf document, text splitter, and all that stuff. Is there a way to prevent that from happening? It happens because streamlit reads python code, top-down every time a change is made on the page.
Thank you very much for the video!!!
How can I implement an azure openai key with its respective uri?
Can we use a pdf thsg has images..??
Will the image also come shownin the questions?
Thank you sir
This is a great video and I learned a lot! Does anyone know if the FAISS library has changed though because I tried implementing the same code and am getting an error on the following line: docs = VectorStore.similarity_search(query=query, k=3), 'list' object has no attribute 'similarity_search'
Thank you for this!
How about a tutorial on making my own chat bot, with only the information I have provided it, then use it in my own app ?
I tried this and it works really well, while testing the feature, i find that if i ask questions that require knowledge outside the pdf, the ai is still able to answer the question. This is suprising becuase the only knowledge source should be the pdf and nothing else right?
Want to connect?
💼Consulting: calendly.com/engineerprompt/consulting-call
🦾 Discord: discord.com/invite/t4eYQRUcXB
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Join Patreon: Patreon.com/PromptEngineering
▶ Subscribe: www.youtube.com/@engineerprompt?sub_confirmation=1
The Pastebin link is not working can u share the git repo link for this. that would be much help.
What is the benefit of using OPENAI API and call it local?! The whole point is to make it safer by not sending any data out or cutting cost buy not paying for API! Without those objectives, it is a useless tutorial
@engineerprompt
Would love an open source version of this that we can swap in different embedding and llm models as the space evolves.
Coming soon :)
Its to slow for a book
Thank you for the tuto, iquestion: waht is the model used here for open AI, and why when I ask "Could you extract the product name and return a JSON representation of the answer" the given answer is "No, I cannot extract the product name and return a JSON representation of the answer." ? Any help would be appreciated.
Great stuff, thank you for spending time recording this video! Question - if I would like to do the embeddings calculation and semantic search on my server, which model would you recommend?
Check out my localgpt project, there I am using llama2 for the llm part along with instructor embedding
@@engineerprompt thanks! And you have earned yourself a subscriber!
How would you integrate image processing for technical diagrams, drawings, circuits, etc?
Really an amazing video, glad you shared this with us!
Thanks for your great video, as usual. I noticed that you place the focus on treatment of a given file, about things like summarizing or QnA on the example file. Have you ever tried to: 1. Add multiple documents on a similar subject in order to, 2. Request a cohesive consolidation of information about that subject. In other words, to add, say 20 docs, all talking about Langchain (just to give an example) and containing overlaps in different sections, etc. To then, ask for a consolidated document on the subject that would aggregate points from the different documents?
That's an interesting application. I think it will be possible with careful prompt engineering. It is a good idea and I will explore it further for sure. Thank you.
Hey, I have doubt. In vector store you are storing embeddings, but When you upload another file and ask question on that document it will also use embeddings of previous one pdf because you are not deleting those data. So how to resolve that problem?
Can you make it go through "somename" folder and use all pdf's there then ask it question from those files?
thank you
Thanks!
Thank you, appreciate the support 🙏
Superb video
Thank you so much 😀
Do you any idea why when we send one question with the chunked embedding we gonna have:
2 requests text-embedding-ada-002-v2, 2 requests ?
1 request gpt-3.5-turbo-16k-0613, 1 request
Why 2 requests of embedding not only one ?
how can we restrict the output if it is not present in the document we provided have you made any tutorial on that
Great video, any advice on integrating images in the pdf and not just text when splitting them into chunks?
Can u share ur code with me I m getting error
Amazing video! Never learned that much in such a small time.
I just have one question: How many documents can i upload and use at the same time without getting issues?
For this specific code, it's limited to one pdf but you can extend it easily to multiple pdfs. The limiting factor in the case is going to be the hardware you use for computing embedding as well as the vectorestore for information retrieval.
Great video. Thank you very much 👏👏👏
Can it deal with multiple pdfs?
Yes, you will need to modify the code a little but it can manage it
I get this error (when i load a pdf): AttributeError: module 'openai' has no attribute 'error' can you please help i have been debugging for 2 hours?
I have this use-case where there are different types of documents. I can parse documents using document loaders using langchain. But, there are images also in these documents. I want to store them as metadata and if answer generated from a context chunk it show the image also. Please help.
Real nice explaination, Could u help me out with the open ai api key? It only works if you have a paid openai account , any other alternative?
Would like to try to implement this on my laptop.. i am new to GCP or Azure.. Does this need Google Platform to implement?
What should I do to show tables and figures and formulas from PDF?
Thank you very much! well explained!
is there a way to print out the cost of the embeddings step, the same way you did on the query?
It would be great to know how much it cost to load the files without opening the account on their site
I haven't looked at it. Let me check.
I want to translate the above 5mb pdf file at once with this library, so that all the pictures and the order of the paragraphs are preserved, does this library have this capability?
I have some pdf scans, that I'd like to try Chat-GPT4 Vision on to see if it can OCR the tables and export to a csv.
Do you think the ChatGPT4's Assistant API pricing is better ( "Code interpreter $0.03 / session Retrieval $0.20 / GB / assistant / day (free until 03/01/2024) ) than PDFChat? Perplexity Pro (Also $20/mo)?
Or perhaps free AIs tools that might work with API and image analysis? For example, Google's CopyFish or Gemini Pro Vision? Also doesn't Bing's CoPilot use ChatGPT4?
in the video, the constitution.pdf is uploaded. there were only 13 states back then. why 50 is returned as the answer? is the PDF really used as the data source when we do the query?
Thanks a lot, though it was a long video i was able to complete and create a full working application. Any one who is reading the comment and yet to watch the full video, please go ahead lots of learning and its a complete tutorial. One question if you can answer will the langchain or open ai be storing the pdf in their servers. iam asking this question to understand if i can upload any confidential document
Thank you for the kind words and glad you found it useful. If you have confidential documents, I wouldn't recommend using OpenAI if you are not comfortable sharing. I would recommend to checkout my localGPT project. Everything runs locally on your own hardware. You will need to use a bigger model than the default model.
Which service offers largest pdf size/functionality ratio?
great content but i'm having a error that is RecursiveCharacterTextSplitter is not importing from langchain.text_splitter any solution please
How to create questions that mentioned in chatpdf when file is loaded?
How can I make it so that the user can repeatedly ask questions on the document like ChatGPT with the AI having history of previous conversations?
what if we want to upload pdf , not user , use just come and ask question about pdf ? give pdf information to the model before user interact with it?
hello, can you explain how to use model from hugging face? without openAI token. Thankss
*Can you do the same with Falcon 7B or 40B ?*
I was wondering actually and I know it's a dumb question but how do you make the file or folder that you've shown in the video in vscode?
i actually have another question - how to set vsc to point to the environment interpreter, it is now using global python, i want to use the localgpt python?
Thank you very much for making this video. It has been very useful for me to implement my own solution based on chatgpt with my own documents.
The chatPDF tool you base it on and show at the beginning of the video, for each answer it tells you what part of the pdf the information was taken from. Would you know how I can implement something similar?
check this out: python.langchain.com/en/latest/modules/chains/index_examples/qa_with_sources.html
AWESON. 🎉 Keep up the good - > great work. 😊
Thank you 🙏
Im facing rate limit error while using open ai api key without being used once.
Every time I use faiss, I am getting deserialization error. Please advise
Is there a max character limit? I would like to put as much text as possible into the PDF because I am working with lengthy projects. Also will it read code just the same if the code is put into a PDF?
You will be limited by the context window of the llm part. For data retrieval part, any length can work. Just make sure the chunk size is not too big
Hello teacher. Your videos are excellent. I'm getting an error when trying to insert the OPENAI_API_KEY char, and I'm not sure where to place it. Could you help me?
This specific version uses a .env file to store the OpenAI key. So create a file called ".env", make sure it's not a text file and the extension of the file is env. Then put your openai api key in there like this:
OPENAI_API_KEY=YOUR_KEY
Then run the above code, hope this helps.
Alternatively, you can put your api key in the file itself if you are not sharing this with anyone. In this case, after your import all the packages add this line to your code:
OPENAI_API_KEY="YOUR_KEY"
Great tutorial!!
I was unable to get past typeerror: cannot pickle _thread.rlock object error so I removed the if and else statements and ran the model without pickling the VectorStore.
Thanks again, keep up the good work!
Hey, can you please share the code? I need it for a project
@@kdbarchives # # embeddings
store_name = pdf.name[:-4]
st.write(f'{store_name}')
embeddings = OpenAIEmbeddings()
VectorStore = FAISS.from_texts(chunks, embedding=embeddings)
essentially just get rid of if/else and the with statement
@@kdbarchives , I created GitHub repo with the code: langchain_pdf_chat_app . Cannot post full link - apparently it's not allowed.
Sir I’am facing an error EOFError: Ran out of input can you please help me out to solve this error , i have followed the same approach that you have done but still I’am facing thi error please help me out
Thanks for your tutorials! When it comes to LLM and contex, could you please also include questions that are not related to the vectorestore data (e.g. My name is Anna. Whats your name?), so we can see how the LLM responds? I get strange replies for normal questions after indexing the PDF data. I am able to control the questions via the prompt template plus prompt engineering and get much better answers but not always...
That's a good insight and will be adding that.
can you please tell me how to get page numbers reference of the output
starting 2-3 days it's working properly but now I have run the same It shows an error with incorrect API but I am giving the right API.
can you share the excalidraw link? very cool
Thanks
Thank you!
How can you make sure the open source model does not hallucinate or make numbers up in its answer? Is there a video where you show how to improve the answers of open source models using prompt engineering?
will be making one soon on the topic.
Could you please tell me how to deploy the application as well?
I'm getting this error
TypeError: cannot pickle '_queue.SimpleQueue' object
someone help