ChatGPT for YOUR OWN PDF files with LangChain

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 ก.ค. 2024
  • If you're looking to harness the power of large language models for your data, this is the video for you. In this tutorial, you'll discover how to utilize LangChain to extract valuable information from your PDFs, utilizing OpenAI Text Embeddings. Step-by-step, we'll guide you through setting up LangChain to communicate with your PDF files, enabling you to conduct efficient and effective information retrieval. By the end of this video, you'll have the skills you need to leverage advanced language processing technology and elevate your data analysis.
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬
    ☕ Buy me a Coffee: ko-fi.com/promptengineering
    |🔴 Support my work on Patreon: Patreon.com/PromptEngineering
    🦾 Discord: / discord
    ▶️️ Subscribe: www.youtube.com/@engineerprom...
    📧 Business Contact: engineerprompt@gmail.com
    💼Consulting: calendly.com/engineerprompt/c...
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    Links:
    Google Colab Notebook: colab.research.google.com/dri...
    LangChain Documentation: python.langchain.com/en/lates...
    OpenAI API: platform.openai.com/account/b...
    GPT4All Technical Report: s3.amazonaws.com/static.nomic...
    ☕ Buy me a Coffee: ko-fi.com/promptengineering
    #LargeLanguageModel #LangChain #InformationRetrieval #PDF #OpenAITextEmbeddings #StepbyStep #DataAnalysis #LanguageProcessingTechnology #AI #MachineLearning #NaturalLanguageProcessing #NLP #Tutorial #HowTo
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 426

  • @engineerprompt
    @engineerprompt  ปีที่แล้ว +3

    Want to connect?
    💼Consulting: calendly.com/engineerprompt/consulting-call
    🦾 Discord: discord.com/invite/t4eYQRUcXB
    ☕ Buy me a Coffee: ko-fi.com/promptengineering
    |🔴 Join Patreon: Patreon.com/PromptEngineering
    ▶ Subscribe: www.youtube.com/@engineerprompt?sub_confirmation=1

  • @besarthysniu1230
    @besarthysniu1230 ปีที่แล้ว +21

    Very clear, thorough, well paced and learner-centered. What an amazing educator!

  • @martynas-al
    @martynas-al ปีที่แล้ว

    A very clear explanation. Before this video, I was confused about the purpose of embeddings and how the actual answers are produced and the video explained it very well.

  • @nickstaresinic9933
    @nickstaresinic9933 ปีที่แล้ว +22

    Well, done. You filled in several important holes in my understanding of how to code something like this for my domain.

  • @calabisan
    @calabisan ปีที่แล้ว +4

    Great work! Thanks! Works out of the box. Shorter and clearer impossible 🙂

  • @ricksegalCanada
    @ricksegalCanada ปีที่แล้ว +20

    Excellent video. In three minutes, I learned more about how AI works in general than 100s of other videos. Well done, sir.

    • @sicfxmusic
      @sicfxmusic 6 หลายเดือนก่อน

      Let me see your watch history 🤣🤣

  • @jrs999999
    @jrs999999 ปีที่แล้ว +4

    Really interesting and helpful! Thanks for taking the time to put this video together.

  • @helter2K10
    @helter2K10 ปีที่แล้ว +2

    Nice work - very clearly explained and you addressed the code fragments really well - look forward to more vids!!

  • @gybeturkey107
    @gybeturkey107 ปีที่แล้ว

    Very well laid out and all answered. Thank you.

  • @arthur...barros
    @arthur...barros ปีที่แล้ว +2

    Excellent educator. loved the well paced video. Thanks for sharing your knowledge and findings

  • @andresmontoya4870
    @andresmontoya4870 ปีที่แล้ว +1

    Mindblowing! Very clear and your explanation is excellent! Thanks ;)

  • @oryxchannel
    @oryxchannel ปีที่แล้ว +25

    OMG someone took the time to talk about usage costs. No one has yet herded up usage case scenarios in relation to cost from major AI vendors. Thanks for your consideration in this area.

  • @lynnqi6451
    @lynnqi6451 ปีที่แล้ว

    Your explanation is very clear! Love it! Thank you very much!

    • @engineerprompt
      @engineerprompt  ปีที่แล้ว

      Glad you found it useful. Appreciate the kind words.

  • @AIEinstein
    @AIEinstein ปีที่แล้ว +2

    AWESOME Video! This kind of apps are really good :)) the workflow gets improved too much

  • @ianabrahams5434
    @ianabrahams5434 ปีที่แล้ว +1

    Thanks for a very instructive video and learned quite a bit from your step by step guide. Much appreciate the effort you put in & you have inspired me to keep expanding my knowledge in this area. Thank you.

  • @JavArButt
    @JavArButt ปีที่แล้ว +1

    Very nice content - thank you for that introduction

  • @maamardli
    @maamardli ปีที่แล้ว

    Great tutorial! thank you very much!

  • @indianmonk8746
    @indianmonk8746 11 หลายเดือนก่อน

    OSM, I really liked your to the point video, Thank you

  • @VastIllumination
    @VastIllumination ปีที่แล้ว

    I love you. thank you for making this so easy!

  • @peterthegreat7125
    @peterthegreat7125 ปีที่แล้ว +2

    Super useful, this is what I have been looking for, ❤ love it!

    • @dongnguyenanh7282
      @dongnguyenanh7282 ปีที่แล้ว

      hello, how do you get the location of the pdf files on the drive?

    • @peterthegreat7125
      @peterthegreat7125 ปีที่แล้ว

      @@dongnguyenanh7282 "/content/gdrive/My Drive/" is the root dir of your gdrive, you can append you file path in gdrive after this root dir. you can treat it as a real folder and use 'ls' to find out where your file is.

  • @thecutestcat897
    @thecutestcat897 ปีที่แล้ว

    Thanks, this helps me a lot!

  • @port7421
    @port7421 ปีที่แล้ว +1

    It was a very helpful guide. Thanks! Great that I was able to test it quickly thanks to your notebook link.

    • @dongnguyenanh7282
      @dongnguyenanh7282 ปีที่แล้ว

      hello, how do you get the location of the pdf files on the drive?

    • @iamjustahair1315
      @iamjustahair1315 ปีที่แล้ว +1

      @@dongnguyenanh7282 This is the default that u should use /content/gdrive/My Drive/data/2023_GPT4All_Technical_Report.pdf.
      i would suggest to make a folder named 'data' and place your pdf file in it. It worked for me

    • @port7421
      @port7421 ปีที่แล้ว

      @@dongnguyenanh7282 Hi, I uploaded my own file to my Google Drive. You must allow access to the drive while signed in to your Google account. For me it looks like this:
      reader = PdfReader('/content/gdrive/My Drive/my.pdf')

  • @user-yj9qp1kj5o
    @user-yj9qp1kj5o 10 หลายเดือนก่อน

    Thank you!

  • @scottrogers8100
    @scottrogers8100 ปีที่แล้ว

    Great video! Thank you!

  • @maxpiau4004
    @maxpiau4004 ปีที่แล้ว

    Thanks, this was my this afternoon to do.

  • @billk6512
    @billk6512 ปีที่แล้ว

    Thank you!. Fantastic stuff.

  • @andre-le-bone-aparte
    @andre-le-bone-aparte ปีที่แล้ว +2

    Just found your channel, Excellent Content! - Another sub for you sir!

  • @sm4849
    @sm4849 ปีที่แล้ว

    Brilliant tutorial mate

  • @cybersamurai99
    @cybersamurai99 ปีที่แล้ว

    Magnificent!!

  • @faststart6409
    @faststart6409 ปีที่แล้ว

    Thank you for good video😊

  • @not-a-weasel
    @not-a-weasel ปีที่แล้ว +1

    Thanks for sharing!

  • @chinmaybhalerao5062
    @chinmaybhalerao5062 ปีที่แล้ว +1

    Excellent video!

  • @lokash
    @lokash ปีที่แล้ว

    Thank you. Very interesting

  • @tchrapko
    @tchrapko ปีที่แล้ว +12

    At this point it doesn't get any easier than that! I was able to drop in a technical document that makes my eyes bleed when I read it and just start asking questions of it instead. Great job! If someone would bundle this up into a nice little application and let me aim it at directories full of documents I think they could make a boatload of money.

    • @blockchainbrudda3051
      @blockchainbrudda3051 ปีที่แล้ว

      What do you mean 'aim it at directories' ?

    • @tchrapko
      @tchrapko ปีที่แล้ว +1

      @@blockchainbrudda3051 aka "folders"
      Like D:/Technical Documents/
      I can't wait for the day when SharePoint has AI assistance built in so a company can ask natural language questions of their business content and get back Chat-GPT style answers with links to the source material. It'll be a revolution for content management and productivity.

    • @tommasterplus
      @tommasterplus ปีที่แล้ว

      Chatpdf

    • @adi2soni
      @adi2soni 11 หลายเดือนก่อน

      Working on It

  • @wernershintaku6104
    @wernershintaku6104 ปีที่แล้ว

    Very good and clear.

  • @mathewpelletier5348
    @mathewpelletier5348 ปีที่แล้ว

    Great info. thanks! keep it up:)

  • @dealersagent
    @dealersagent ปีที่แล้ว

    Very good video. Thank you

  • @italoaguiar
    @italoaguiar ปีที่แล้ว

    Excellent!! 🎉

  • @user-dp7lr5qh6o
    @user-dp7lr5qh6o 6 หลายเดือนก่อน

    great video

  • @the_video_freak
    @the_video_freak ปีที่แล้ว

    Best thing ever seen ❤

  • @adrianogferreira
    @adrianogferreira ปีที่แล้ว

    Tks to share. Very Interesting.

  • @user-gp6ix8iz9r
    @user-gp6ix8iz9r ปีที่แล้ว +3

    Thanks for this video it's really helpful. Could you make a video on how to do embedding with gpt4all and langchain on colab , it would be cool to be able to run your own models and have your own extra data sets

  • @TheMarComplex
    @TheMarComplex ปีที่แล้ว

    Thank you so much!

  • @saeedbello
    @saeedbello 7 หลายเดือนก่อน

    Well explained. Thank you for sharing your knowledge with us. I want to ask if it is possible to get response of a query from the vector database and ad well as the outside the vector database

  • @MVergaraQ
    @MVergaraQ ปีที่แล้ว

    Man I love your tutorials! Do you have any advice on converting scanned pdfs to text for this same application? what are tools you'd recommend?

  • @rolandowise
    @rolandowise 11 หลายเดือนก่อน

    Thanks so much, this was very helpful! You mentioned doing a version that can take in multiple files within a folder, what are the changes required? Will the embeddings retain a correlation to the rest of their respective file (e.g. if i ask who are the authors of a particular quote somewhere in the middle of a paper, how will it know that it relates to the names right at the beginning if there are multiple different papers embedded?)

  • @TylerKlug
    @TylerKlug ปีที่แล้ว +3

    Fantastic video. I'm sure someone has made a follow-up somewhere, but can you help me understand how to wrap everything into my own UI where I can pass a parameter through to the search query so it can effectively act as a chatbot?

  • @user-wt6mj6df7j
    @user-wt6mj6df7j ปีที่แล้ว

    Nice work! If i want to process multiple. Can we do this by adding more inputs?

  • @snaky1310
    @snaky1310 ปีที่แล้ว +1

    That was a great video, thanks!
    But in the end, how do you then output the ChatGPT message outside of Langchain into your apps?

  • @javshah105
    @javshah105 ปีที่แล้ว

    Thanks a lot

  • @KOREAyoungwoo
    @KOREAyoungwoo ปีที่แล้ว

    I am waiting for multiple file read, thanks a lot!

  • @vilztord1367
    @vilztord1367 ปีที่แล้ว

    Great video! How can I select another GPT model instead of the one you used for the tutorial? thanks!

  • @sasangaabeywickrama1303
    @sasangaabeywickrama1303 ปีที่แล้ว

    Good stuff! How mush would it cost for the vector DB for the demonstrated operations ?

  • @Alice_Fumo
    @Alice_Fumo ปีที่แล้ว +2

    Wow, I stared at that opening graph for like 10 minutes being in awe, realizing the implications and uses, marveling at the elegance. This is insanely similar to an approach I thought of to extract new information during conversation, but this is more elegant.
    I should start making graphs of my approaches, since they do tend to get pretty complex and sometimes I lose track of what I'm doing or trying to do.

  • @xevenau
    @xevenau ปีที่แล้ว

    thank you! Is there a way to adjust the token size of the output? i would like to add more context to the output. also on minute 9 you mention changing ai model. how exactly do i do that?

  • @miguelcabaero5843
    @miguelcabaero5843 8 วันที่ผ่านมา

    Hello in the case that i had a diagram, graph, chart, or any kind of graphic organizer in the pdf, is it possible for that too to be inputed? Thank you so much btw for the excellent video.

  • @MAButh
    @MAButh ปีที่แล้ว

    Nice video! I assume that DeepL uses a similar approach to translate PDFs. I used it but encountered some problems. For example, if a sentence does not end on one page, it can cause problems and return nonsense. This may have been the reason for our "Overlap"? So, I rewrote some 250-page-long documents to eliminate any overlapping sentences from page to page. (From now on, I will compare translating a text to making queries, since both require a comparable amount of "work" for GPT.) This helped a lot, but not always.
    In my opinion, the reason for the occasional issues is that it is difficult to predict the number of tokens required for each page. If the text, like in my case, is complex scientific or technical content, GPT will need more tokens for the same number of characters than it would for a fairy tale, for example. Therefore, with a technical or scientific document, you may run out of tokens very quickly if the content is complex. Whether it's translating or making queries, I believe this problem will arise.
    Perhaps we need to wait for GPT to upgrade the maximum number of tokens by 2-3 times from now until it can handle any kind of text. Currently, you could reduce the format of your pages to ensure that each page has less (con)text.

  • @login2video
    @login2video 7 หลายเดือนก่อน

    Very nice... explained at the right pace.... keep up the good work... it would be more helpful if a repo is maintained...

    • @engineerprompt
      @engineerprompt  7 หลายเดือนก่อน

      Thank you, that’s a great idea

  • @rodrigodefariacustodio7458
    @rodrigodefariacustodio7458 ปีที่แล้ว

    Hi, thanks for the video, great content. Liked and subscribed. I have a question, I have implemented the same code here and the results are very different. For instance, the tool doesn't find the authors, for some reason it thinks the authors are the authors mentioned in the 2nd of the references. I also asked it "what is the title of the article" and it consistently misses, saying it's "Llama ...". Have you come across this or know why it's behaving in such a way?

  • @yuong8139
    @yuong8139 ปีที่แล้ว

    He man good stuff, thanks alot!

  • @adytech5788
    @adytech5788 ปีที่แล้ว +1

    Hello, how do you think i can handle the same process with lot of files of my own company database, i have few Gigabytes of files that i would need to scan & chunks to create my own database, then connect with GPT4all to interact with question regarding my company, give some tasks etc...
    thx for the head up

  • @antarikshverma8999
    @antarikshverma8999 ปีที่แล้ว

    Thank you for this awesome content! I have a query ,I am trying to ingest a huge pdf let say 1000 pages pdf ,it is failing while doing ingestion. I am using azure open ai for this. Can u pls put some thoughts how a huge pdf can be used in this scenario?

  • @asepmulyana9085
    @asepmulyana9085 ปีที่แล้ว

    Thanks for your video! How can I change the PDF file using URL instead of google drive?

  • @billblackmon7704
    @billblackmon7704 ปีที่แล้ว

    What Dev tool are you using? Great work❤

  • @YugKhatri-ht8kd
    @YugKhatri-ht8kd ปีที่แล้ว +2

    what is the approx cost of API, if I use a University Subject's textbook with 1000 pages? I mean cost of embedding the pdf data to model and also the search cost for questions. Can you tell the cost in the form of API pricing or tokens?

  • @user-xf4yg3qo2c
    @user-xf4yg3qo2c 11 หลายเดือนก่อน

    wich part of the colab code do you change to add multiple pdfs ?

  • @MirGlobalAcademy
    @MirGlobalAcademy 8 หลายเดือนก่อน

    thanks

  • @MohitKumar-gp6nr
    @MohitKumar-gp6nr ปีที่แล้ว +1

    I have some JSON files which I want to use for chatbot data source. How to store the JSON information in Croma DB using embedding and then retrieve it based on the user query. I googled a lot but did not find any answers.

  • @xxasadekx
    @xxasadekx ปีที่แล้ว

    Hi very Good video, my question to you , what maximum size is permitted, can we upload 2-10G files of info, or even more, with this procedure? Let me know if we have to develop another type of architecture? Best Regards

  • @hideonbush3252
    @hideonbush3252 ปีที่แล้ว

    good, thanks

  • @DavidG2P
    @DavidG2P ปีที่แล้ว

    How does this compare to simply asking BingBot in the Edge Browser's sidebar about a currently displayed PDF document?

  • @ziga1998
    @ziga1998 ปีที่แล้ว +1

    I have a question.. So what If I want to have like a knowledge of chatGPT model which I specify, plus the added information from the PDF file? How is this achievable?

  • @kevennguyen3507
    @kevennguyen3507 8 หลายเดือนก่อน

    How can I combine the RetrievalQAWithSourcesChain from your other tutorial into these codes. Basically, I want to provide the references which will return the page number or numbers, within the PDF document, that the answer is found. Please help.

  • @ianabrahams5434
    @ianabrahams5434 ปีที่แล้ว

    Thanks!

  • @nottyverseOfficial
    @nottyverseOfficial ปีที่แล้ว +2

    ChatPDF is also good. I have used it.. and its free for 120 pages, 3 PDFs/day, and 50 questions/day... one can pay $5 per month to get very good upgrades

  • @RonanMcGovern
    @RonanMcGovern ปีที่แล้ว

    Very informative video. How would you recommend doing summarisation of a long pdf. Do you think recursive summarisation OR semantic clustering, sampling from clusters, and then summarisation of clusters, would be best?

  • @cstan2381
    @cstan2381 ปีที่แล้ว +1

    Thanks! Is there a cost associated when you call OpenAIEmbeddings(). Can I run a local LLM model to answer the query?

    • @engineerprompt
      @engineerprompt  ปีที่แล้ว

      Thanks this out: th-cam.com/video/MlyoObdIHyo/w-d-xo.html

  • @GimbaGoyo
    @GimbaGoyo ปีที่แล้ว +1

    Nice, I don't have the basic coding skills and I feel that's a must. I will like to challenge you though to create an App that can compare two or more than two documents and to discover if there are issues of copy and paste or plagiarism between the documents without running a search across the whole internet. Is this doable?

  • @sauravmukherjeecom
    @sauravmukherjeecom ปีที่แล้ว

    Can we use llama based llms like alpaca, vicuna etc for this?

  • @tanyaalexander1460
    @tanyaalexander1460 ปีที่แล้ว

    Amazing video with concise and clear explanations! Question: Is there a way for me to use Azure and One Drive to do this? I'm a noob and am not sure how but your video makes me willing to try. My organization (healthcare) has mountains of PDFs with gold we cannot mine in them.

    • @engineerprompt
      @engineerprompt  ปีที่แล้ว +1

      Yes you can do that. I doesn't matter where the data is stored. You can run this on Azure.

  • @kashanasim7903
    @kashanasim7903 6 หลายเดือนก่อน

    The model used by default is text-davinci-003 and it is now deprecated so what should we do now ?
    Any latest code for the above project ?

  • @manjubishnoi8328
    @manjubishnoi8328 ปีที่แล้ว

    Very good explanation, loved it. Just wanted to know about what tool you are using to make diagrams?

    • @python360
      @python360 ปีที่แล้ว +1

      excalidraw

  • @Passe1811
    @Passe1811 ปีที่แล้ว

    Is there any difference between using PyPDFLoader (from LangChain) and PDFReader?

  • @yousufleads
    @yousufleads ปีที่แล้ว +1

    I assume there is no one-click .exe file (yet) or a clear GUI?

  • @nitingoswami1959
    @nitingoswami1959 ปีที่แล้ว

    Amazing

  • @hiutuanting4643
    @hiutuanting4643 ปีที่แล้ว +2

    Is it possible to feed an entire GitHub project to GPT and ask it to explain or give ideas on how to modify the code?

  • @user-om3zo6wr1p
    @user-om3zo6wr1p ปีที่แล้ว

    Is there any way to save the generated embeddings to a file and later on I can load it from disk to avoid the embedding generation again and again ? If possible can you please give me a sample

  • @sportscardvideos
    @sportscardvideos ปีที่แล้ว

    What's the best video for someone with little to no python experience but wants to use langchain

  • @taznainfathima
    @taznainfathima ปีที่แล้ว +1

    How do u load multiple pdfs in LangChain ?

  • @jonnythrive
    @jonnythrive ปีที่แล้ว +2

    Thanks for the videos! They've helped understand a lot about GPT stuff. But how do I change the language model?

  • @jukebox1209
    @jukebox1209 ปีที่แล้ว

    Hi, what's the tool you used for the flowchart at the beginning of the video? Thanks!

  • @kaini8635
    @kaini8635 ปีที่แล้ว +1

    thanks for the video, just wonder how do you do extraction if the pdf page contains mixed text and image/chart

    • @engineerprompt
      @engineerprompt  ปีที่แล้ว +2

      The file I tested has table and images but this will ignore them. It can only do text based info retrieval.

  • @prazyraj1735
    @prazyraj1735 2 หลายเดือนก่อน

    I have this use-case where there are different types of documents. I can parse documents using document loaders using langchain. But, there are images also in these documents. I want to store them as metadata and if answer generated from a context chunk it show the image also. Please help.

  • @kicheko4980
    @kicheko4980 9 หลายเดือนก่อน

    You sir I am buying you a coffee

  • @leadhelix
    @leadhelix ปีที่แล้ว +1

    Great video! Could I use this to take multiple pdfs? Using zapier to automate a clients task, and one of them I need to parse individual pdfs each on their own time. So could it read multiple files from the Google drive, unlike the one you have selected?

    • @engineerprompt
      @engineerprompt  ปีที่แล้ว

      checkout this video for multiple files, you can integrate this with zapier: th-cam.com/video/s5LhRdh5fu4/w-d-xo.html

  • @GooberStudios
    @GooberStudios 9 หลายเดือนก่อน

    great simplified video explanation. In the part where you choose the text-ada model. Can you replace that with the model id of an openai fine-tuned model we created? This way we can use the fine-tuned model to speak with the pdf?

    • @engineerprompt
      @engineerprompt  9 หลายเดือนก่อน

      Yes, you should be able to do that easily.

    • @GooberStudios
      @GooberStudios 9 หลายเดือนก่อน

      @@engineerprompt so basically if i wanted, i can say have a fine tuned model that speaks like Thor read my pdf knowledge base and answer in the way of Thor. is this correct?

  • @vincentalcala5893
    @vincentalcala5893 ปีที่แล้ว

    Thank you very much for the video, excellent content and production!
    I have a question, how can I change the text-ada model for a davinci?

    • @engineerprompt
      @engineerprompt  ปีที่แล้ว +2

      OpenAI(model="text-davinci-002", temperature=0.7), you can change the model here.

  • @shinycaroline3722
    @shinycaroline3722 ปีที่แล้ว

    I am passing the entire document and able to retrieve all the details I need in a single prompt. But response time goes higher. Vice versa if I go with multiple prompts response time is less but since I need to pass the input document everytime usage of token goes high. I am building an application in drf and I don't need any user interface for this. Just need to hit the openAI once to get relevant results from the document and send as json response. Any solutions?

  • @ardenkuyumcu715
    @ardenkuyumcu715 ปีที่แล้ว +2

    Hi! Thanks for the enlightening content. What tool are you using for this sketch of the idea?

    • @engineerprompt
      @engineerprompt  ปีที่แล้ว +2

      Thank you: Tool: excalidraw.com/

    • @Peter-oz1oo
      @Peter-oz1oo ปีที่แล้ว

      In this video you use colab, is the code transferrable to a script on my PC?

  • @M-ABDULLAH-AZIZ
    @M-ABDULLAH-AZIZ 10 หลายเดือนก่อน

    having data in a file and real time embeddings vs embeddings in a db for chatbot for an application (provides information about an application)?