Python RAG Tutorial (with Local LLMs): AI For Your PDFs

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 พ.ค. 2024
  • Learn how to build a RAG (Retrieval Augmented Generation) app in Python that can let you query/chat with your PDFs using generative AI.
    This project contains some more advanced topics, like how to run RAG apps locally (with Ollama), how to update a vector DB with new items, how to use RAG with PDFs (or any other files), and how to test the quality of AI generated responses.
    👉 Links
    🔗 GitHub: github.com/pixegami/rag-tutor...
    🔗 Basic RAG Tutorial: • RAG + Langchain Python...
    🔗 PyTest Video: • How To Write Unit Test...
    👉 Resources
    🔗 Document loaders: python.langchain.com/docs/mod...
    🔗 PDF Loader: python.langchain.com/docs/mod...
    🔗 Ollama: ollama.com
    📚 Chapters
    00:00 Introduction
    01:06 RAG Recap
    03:22 Loading PDF Data
    05:08 Generate Embeddings
    07:16 How To Store and Update Data
    10:46 Updating Database
    11:45 Running RAG Locally
    15:12 Unit Testing AI Output
    20:29 Wrapping Up

ความคิดเห็น • 228

  • @tinghaowang-ei7kv
    @tinghaowang-ei7kv หลายเดือนก่อน +12

    It's hard to find such high quality videos on China's Beep, but you've done it, thank you so much for your selflessness. Great talk, looking forward to the next video. Thanks again, you did a great job!

    • @pixegami
      @pixegami  หลายเดือนก่อน

      Thank you! Glad you enjoyed it!

  • @frederichominh3152
    @frederichominh3152 24 วันที่ผ่านมา +12

    Best tutorial I've ever seen in a long time, maybe ever. Timing, sequence, content, logic, context... everything is right in your video. Thank YOU and congrats, you are smart as hell.

    • @heesongkoh
      @heesongkoh 12 วันที่ผ่านมา

      agreed.

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Wow, thanks for your comment. I really appreciate it, and I'm glad you liked the video.

  • @denijane89
    @denijane89 27 วันที่ผ่านมา +4

    That was the most useful video I've seen on the topic (and I watched quite a lot). I didn't realise that the quality of the embedding is so important. I have one working code for local pdf ai, but I wasn't very impressed by the results. That explains why. Thank you for the great content. I'd love to see other uses of local LLMs.

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Glad you liked it! Thanks for commenting and for sharing your experience.
      And absolutely - when building apps with LLM (or any kind of ML/AI technology), the quality of the data and the index is really non-negotiable if you want to have high-quality results.

  • @musiitwaedmond1426
    @musiitwaedmond1426 24 วันที่ผ่านมา +3

    this is the best RAG tutorial I have come across on youtube, thank you so much man💪

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +1

      Thank you! I appreciate it!

  • @JaqUkto
    @JaqUkto 20 วันที่ผ่านมา +2

    Thank you very much! I've started my RAG using your vids. Of course, much of your code needed to be updated, but it was simple even given my zero knowledge of Python.

    • @sergiovasquez7686
      @sergiovasquez7686 18 วันที่ผ่านมา +1

      You may could share the updates to us 😅

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Nice work, glad you got it working!

  • @nachoeigu
    @nachoeigu หลายเดือนก่อน +4

    Your content is amazing! Keep it going. I would like to see the continuation of this video in terms of how to upload and automate the workflow in the cloud AWS and how to integrate the chat interface with telegram bot

    • @pixegami
      @pixegami  หลายเดือนก่อน +2

      Glad you liked it, and thanks for the suggestions. My next video will be focused on how to deploy this to the cloud - but I hadn't thought about the Telegram bot idea before, I will look up how to do that.

  • @paulham.2447
    @paulham.2447 หลายเดือนก่อน +3

    Very very useful and so much well explained ! Thanks.

    • @pixegami
      @pixegami  หลายเดือนก่อน

      Thank you!

  • @user-xk3tj5cj8p
    @user-xk3tj5cj8p หลายเดือนก่อน +3

    Recently discovered your channel 🎉 , subscribed 😊 keep up the awesome content

    • @pixegami
      @pixegami  หลายเดือนก่อน +1

      Thank you! Welcome to the channel!

  • @nascentnaga
    @nascentnaga หลายเดือนก่อน +3

    Suuuuuper helpful. I need to test this for a work idea. thank you!

    • @pixegami
      @pixegami  หลายเดือนก่อน

      You're welcome!

  • @fabsync
    @fabsync 15 วันที่ผ่านมา +2

    Oh man.. by far the best tutorial on the subject.. finally someone using pdf and explaining the entire process! You should do a more in-depth series on this...

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +1

      Thank you for the feedback :) Looks like with the interest this topic has received, I'm definitely keen to dive into it a bit deeper.

    • @fabsync
      @fabsync 11 วันที่ผ่านมา +1

      One of the questions that I was asking myself with pdf.. do you clean the pdf before doing the embeddings .. or this is something that you can resolve by customizing the prompt?
      What would be a good way to do semantic search after using pgvector..? I am still struggling with those answers

    • @pixegami
      @pixegami  10 วันที่ผ่านมา

      @@fabsync Yeah I've had a lot of people ask about cleaning the PDFs too. I think if you have PDFs that have certain structural challenges, I'd probably recommend to find a way to clean/augment it for your workflow.
      And LLM prompt can only go so far, and cleaning noise from the data will always help.

  • @joxxen
    @joxxen หลายเดือนก่อน +3

    Very nice, I wish I had this guide few weeks ago, had to learn it the hard way xD

    • @pixegami
      @pixegami  หลายเดือนก่อน

      You got there in the end :)

  • @NW8187
    @NW8187 17 วันที่ผ่านมา +3

    Simplifying a complex topic for a diverse set of users requires an amazing level of clarity of thought, knowledge and communication skills, which you have demonstrated in this video. Congratulations! Here are some items on my wish list for you when you can get to it. 1. Ability for users to pick among a selected list of open-source LLMs. A list that users can keep it updated. 2. build a local RAG application for getting insights from personal tabular data, which stored in multiple formats e.g. excel/google sheets, PDF tables

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Thanks for your comment, I'm really glad to hear it was helpful. I appreciate you sharing the feedback and suggestions as well, I've added these items to my list of ideas for future videos :)

  • @jial.5245
    @jial.5245 หลายเดือนก่อน +4

    Thank you so much for the content👍🏼 very well explained! Would be great to see a use case of using autogen multi-agent approach to enhance RAG response.

    • @pixegami
      @pixegami  หลายเดือนก่อน +1

      Glad you liked it, thank you! And thanks for the suggestion and project idea :)

  • @zhubarb
    @zhubarb 27 วันที่ผ่านมา +1

    Crystal clear. Great video.

    • @pixegami
      @pixegami  10 วันที่ผ่านมา

      Thank you! Glad to hear that :)

  • @AlexandreBarbosaIT
    @AlexandreBarbosaIT หลายเดือนก่อน +3

    Smashed the Subscribe button! Awesome content! Looking forward for the next ones.

    • @pixegami
      @pixegami  หลายเดือนก่อน

      Thank you! Glad you enjoyed it, and welcome!

  • @basselkordy8223
    @basselkordy8223 หลายเดือนก่อน +3

    High quality stuff. Thanks

    • @pixegami
      @pixegami  หลายเดือนก่อน

      Glad you liked it!

  • @muhannadobeidat
    @muhannadobeidat หลายเดือนก่อน +1

    Great video and nicely scripted. Thanks for the excellent effort.
    I find that nomic 1.5 is pretty good for embedding and lightweight as well. I did not do actual performance metric based analysis of that but actual recall and precision testing is pretty impressive with 768 dimensions only.

    • @pixegami
      @pixegami  หลายเดือนก่อน

      Thank you! Glad nomic text worked well for your use case :)

  • @gustavojuantorena
    @gustavojuantorena หลายเดือนก่อน +2

    Great content as always!

    • @pixegami
      @pixegami  หลายเดือนก่อน

      Thanks for watching!

  • @KrishnaKotabhattara
    @KrishnaKotabhattara หลายเดือนก่อน +3

    For evaluation, use RAGAs and Langsmith.
    There is also an SDK for azure which does same things as RAGAs and Langsmith.

    • @pixegami
      @pixegami  หลายเดือนก่อน

      Oh, thanks for the recommendation. I'll have to take a look into that.

  • @RasNot
    @RasNot 29 วันที่ผ่านมา +1

    Great content, thanks for making it!

    • @pixegami
      @pixegami  28 วันที่ผ่านมา

      Glad you enjoyed it!

  • @elvistolotti45
    @elvistolotti45 6 วันที่ผ่านมา +1

    great tutorial

  • @mrrohitjadhav470
    @mrrohitjadhav470 หลายเดือนก่อน +8

    After searching 100s of videos journey ends here. 😍Please would you make a tutorial making a knowledge graph using Ollama?

    • @pixegami
      @pixegami  หลายเดือนก่อน +2

      Thanks, glad your journey came to an end :) Thanks for the suggestion - I've added the idea to my list :)

    • @mrrohitjadhav470
      @mrrohitjadhav470 29 วันที่ผ่านมา

      @@pixegami Aweeeeeeeesome, Just want to slightly change the knowledge graph based on pdf,txt (own data). Sorry for not elaborating, but too much own data makes it difficult to find connections between many sources.

  • @kozark875491
    @kozark875491 8 วันที่ผ่านมา

    Very high quality video! Thank you!!!
    What are the min requrements to download and run locally llama3?

  • @pampaniyavijay007
    @pampaniyavijay007 15 วันที่ผ่านมา +1

    Superb bro 🤩

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Thank you!

  • @iainhmunro
    @iainhmunro หลายเดือนก่อน +2

    This is pretty good. I was wondering how I could integrate this with my current python scripts for my AI Calling Agent, so if someone wanted to call the number, they could chat with the PDF.

    • @pixegami
      @pixegami  28 วันที่ผ่านมา

      I think that certainly should be possible, but it's quite complicated (I haven't done anything like that before myself).
      You'd probably need something to hook up a phone number/service to an app that can transcribe the text in real like (like what Alexa or Siri does), then have an agent to figure out what to do with that interaction. And eventually hook it up to the RAG app.
      After that, you'll need to seriously think about guard-rails for the agent, otherwise you could end up with it getting your business into trouble. An example of this is when Air Canada's chatbot promised a customer a discount that wasn't available: www.bbc.com/travel/article/20240222-air-canada-chatbot-misinformation-what-travellers-should-know

  • @JohnBoen
    @JohnBoen 9 วันที่ผ่านมา

    He he he... tests are easy. I was wondering how to do those.
    Prompt:
    State several facts about the data and construct a question that asks for each fact.
    Create tests that look for the wrong answer...
    Give me 50 of each...
    Give me some examples of boundary conditions...
    Formatting...
    In an hour I will have fat stack of tests that would normally take a day a day to create.
    This is awesome :)

  • @mehmetkaya4330
    @mehmetkaya4330 หลายเดือนก่อน +1

    Great tutorial! And if you could please do a tutorial on when/how to know data within the documents (pdf or csv etc) has changed?

    • @pixegami
      @pixegami  28 วันที่ผ่านมา

      Thanks for the suggestion :) That's a good idea, I think I'll have to plan it...

  • @mingilin1317
    @mingilin1317 28 วันที่ผ่านมา +1

    Great video! Successfully implemented RAG for the first time, so touching. Subscribed to the channel already!
    In the video, you mentioned handling document updates. Do you have plans to cover this topic in the future? I'm really interested about it!
    Also, is "ticket_to_ride" and "monopoly" sharing the same database in example code? What if I don't want them to share? Is there a way to handle that?

    • @pixegami
      @pixegami  10 วันที่ผ่านมา

      Awesome! Glad to hear about your successful RAG project, well done!
      I've had a lot of folks ask about vector database updates, so it's something I definitely want to cover.
      If you want to store different pieces of data in different databases, then I recommend put another layer of logic on top of the document loading (and querying). Have each folder use a different database (named after each folder), then add another LLM layer to interpret the question, and map it to which database it should query.

  • @sergiovasquez7686
    @sergiovasquez7686 18 วันที่ผ่านมา +1

    I just subscribed to your channel… very high vids on TH-cam

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Thank you! Welcome.

  • @60pluscrazy
    @60pluscrazy หลายเดือนก่อน +1

    Excellent 🎉🎉🎉

    • @pixegami
      @pixegami  28 วันที่ผ่านมา

      Thank you! Cheers!

  • @maikoke6768
    @maikoke6768 9 วันที่ผ่านมา

    correct and improve:
    The issue I have with the Rag is that when I ask about something in a document that I know doesn't exist, the AI still provides a response, even though I would prefer it not to.

  • @ishadhiwar7636
    @ishadhiwar7636 2 วันที่ผ่านมา +1

    Thank you for the fantastic tutorial! It was incredibly helpful and well-explained. I was wondering if you have any plans to release a video on fine-tuning this project using techniques like RLHF? It would be great to see your insights on that aspect as well.

    • @pixegami
      @pixegami  16 นาทีที่ผ่านมา

      Thank you! Glad you enjoyed the video. I've noted the suggestion about fine-tuning-I hadn't considered it yet, but thanks for sharing that idea with me.

  • @ayoubfr8660
    @ayoubfr8660 หลายเดือนก่อน +4

    Great stuff as usual! Could we have a video about how to turn this RAG app into a nice and proper desktop app with a graphic interface? Cheers mate.

    • @pixegami
      @pixegami  หลายเดือนก่อน +1

      Good idea, thanks! I'll note it down as a video idea :)

    • @ayoubfr8660
      @ayoubfr8660 หลายเดือนก่อน +1

      @@pixegami Thank you for the reply and reactivity! Have a nice day!

    • @J3R3MI6
      @J3R3MI6 23 วันที่ผ่านมา +1

      @@pixegamiI subbed for the advanced RAG content

  • @AiWithAnshul
    @AiWithAnshul 26 วันที่ผ่านมา +1

    This is an impressive setup! I'm currently using Weaviate as my Vector DB along with Open AI Models, and it's working really well for handling PDFs, Docs, PPTs, and even Outlook email files. However, I've been struggling to integrate Excel and CSV files into my Knowledge Base. For small Excel files, the vector approach seems fine, but it's challenging for larger ones. I'd love to get your input on how to build a system that incorporates Excel files along with the other formats. I've considered using something like PandasGPT for handling the Excel and CSV files and the traditional RAG approach for the remaining file types (PDFs, Docs, etc.). Perhaps adding an agent as the first layer to determine where to direct the query (to the RAG model or PandasGPT) would be a good idea? What are your thoughts on this?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Thanks for your comment and for sharing your challenges and ideas. I think if you are mixing free-form text (like documents) and something more traditionally queryable (like a DB), it does make sense to engineer some more modality into your app (like what you suggested).
      I haven't explored that far myself so I can't share anything useful yet. But I'll be sure to keep it in mind for future videos. Good luck with your project!

  • @JorgeGil-qf6zy
    @JorgeGil-qf6zy 16 วันที่ผ่านมา +1

    thank you for the video pixegami, question, how will you do to implement follow up questions based on the last answer?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      You'd probably need a way to store/manage memory and use it as part of the next prompt. I haven't explored this much myself, but it's a topic I'm interested to look into as well, so thanks for the comment :)

  • @nickmills8476
    @nickmills8476 5 วันที่ผ่านมา

    To update the chromadb data for PDF chunks whose data has changed, store the PDF document contents hash in the metadata field. In addition to adding IDs that don't already exist, select records whose metadata.hash has changed and update these records, using collection.update()

  • @danielcomeon
    @danielcomeon 26 วันที่ผ่านมา +1

    Thanks a lot. Great video!!! I want to know how to add new data to the existing database with new unique IDs.

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Thanks! Glad you liked it. If you just want to **add** new data, the chapter on updating the database should already cover this. You just need to add new files into the folder, and run the `populate_database` command again. Any pages/docs on in the database will be added.
      But if you meant updating existing pages/segments in the existing data, then yes I'll have to make a video/tutorial about that :)

  • @user-xz1jh9qv1k
    @user-xz1jh9qv1k 13 วันที่ผ่านมา +1

    Very good content. Thanks for making it. Actually I liked your validation idea but How about for the descriptive answers to evaluate? Does pytest and prompt together works? Also did you make tutorial on how to update the vector database when file content changes.

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Thanks, I appreciate it! I think you can also try to use the same evaluation strategy for descriptive answers. I've also seen other commenters mention frameworks for evaluating LLM responses so that might be worth looking into as well.

  • @LaptopiaLTD
    @LaptopiaLTD 19 วันที่ผ่านมา +1

    Thank you for making this concise, clear, and helpful video. Is there a limitation on the quantity and size of PDF files? I'm currently using ChatRTX; however, it has limitations due to being beta. Perhaps a video exploring this question and ChatRTX limitations would be useful?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +1

      Glad you enjoyed it! I haven't had a look at ChatRTX yet, but thanks for the suggestion.

  • @HimSecOps
    @HimSecOps 27 วันที่ผ่านมา +1

    Amaizing bro! Thank you I request you to tell how to connect the prompt part to ui

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +1

      Thanks for the suggestion :) This is on my list to work on as well, stay tuned!

  • @edgarallik9995
    @edgarallik9995 25 วันที่ผ่านมา +1

    Thank you for the video! Would there be an advantage in an LLM pre-processing the user query before it is embedded and matched against the database of documents? I'm thinking perhaps embedding the more useful parts of the user question to help in the matching process.

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Yes, there's actually quite a bit of research on "query transformation" to improve RAG results (and it's effective too). E.g. there's a technique that uses an LLM to create a "fake answer" to the question, then use that fake answer to query (instead of the question).
      Here's a paper on that: boston.lti.cs.cmu.edu/luyug/HyDE/HyDE.pdf

  • @rob679
    @rob679 21 วันที่ผ่านมา +1

    On model param size, 7B models are enough. Not related to this video, but I'm using Llama3 8B with OpenWebUI's RAG and it works but it sometimes have problems to refer to correct document while giving correct answer (it will hallucinate document name), but its how its RAG implementation are.

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Interesting, I haven't tried this with the 7GB models yet. Thanks for sharing!

  • @chhil
    @chhil 6 วันที่ผ่านมา

    Thank you for your content and access to your github repo. I tried modifying the code to read java file from the folder and keep getting an error Unsupported mime type: text/x-java-source. I use the GenericLoader to load the java file. Any pointers on where to look for a solution?

  • @beatsofbinary
    @beatsofbinary 10 วันที่ผ่านมา +1

    I love your video. For the problem of updating a chunk, would a timestamp make sense? For example save the last modified date of a PDF file to each chunk
    Then after reading the documents again, compare if it's newer than that one in the database -> overwrite the chunks?

    • @pixegami
      @pixegami  9 วันที่ผ่านมา

      I'm not sure how that would work - if you had a PDF that splits into 100 chunks, and you updated chunk 57, then how does your system know that the chunk was updated? If you *knew* it was updated you can store timestamp, but how do you know when the chunk is updated in the first place?
      I think you're quite close - rather than a timestamp, you can consider using a hash of the chunk (e.g. MD5), then use that to figure out if the chunk has change or not.

    • @beatsofbinary
      @beatsofbinary 9 วันที่ผ่านมา

      @@pixegami I thought I could save metadata such as Last Modified directly when importing data. If the Vector database already contains a PDF document that has the same name but is older, it will be overwritten. If it is the same age, the PDF is skipped. I would therefore query the database to see which documents are already in the database. But then all chunks of the older PDF would have to be deleted, you're right. The idea with the hash is good, read in the PDF, split it, create a hash, save it as metadata and then compare the hashes of the chunks. But that leads to further problems, doesn't it? If, for example, only a few words change within a chunk, but the overlap changes as a result, will several or all chunks change at the same time?

  • @EduardoJGaido
    @EduardoJGaido 13 วันที่ผ่านมา +1

    Hello! Thank you for a great video. I ask you or the community, I have a hard problem to solve: i want to make a chatbot using a local LLM with RAG (keep reading please!) BUT i want to use it for my business so the clients of my physiotherapy clinic can ask it and it responds JUST WITH the information that is fed. Otherwise, it says "Oh, i don't know, wait please" so the secretary can answer instead. Just with that, I would be happy. I have a lot of FAQ listed with the answers (in a json friendly format). I can't find this answer anywhere. If you have any information, would be apreciated. Cheers from Argentina.

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      I see, if it's an FAQ I'd structure each question or piece of information as a separate file, and just use a directory loader to turn it into Langchain documents. Then you can try using the technique in this video to see if works.
      To get it to respect the answer boundary, you can probably write an explicit prompt or an evaluation step to say "I don't know" if the answer isn't clear from the queries.

  • @edwardtse8631
    @edwardtse8631 หลายเดือนก่อน +2

    This is a very good tutorial, how do you solve the problem of edit a data file? to store sha-1 of the file?

    • @pixegami
      @pixegami  หลายเดือนก่อน

      Exactly right - you can use an even simpler hash function like MD5 just to check that the content of each chunk hasn't changed.
      You'll still have to loop through all the chunks though to calculate the hash and to compare them. That should be fine for 100s or 1000s of chunks, but it might not scale too well beyond that.

  • @nickmills8476
    @nickmills8476 5 วันที่ผ่านมา +1

    Using a local embedding model: mxbai-embed-large, got me similar results to your monopoly answer.

    • @pixegami
      @pixegami  3 นาทีที่ผ่านมา

      Thanks for sharing! I hadn't tried that one yet.

  • @NicolaRomano
    @NicolaRomano 14 วันที่ผ่านมา +1

    Thanks for the great video! This might be a silly question, but say I want to extract some information from a bunch of PDF files. So, for example in your example of manuals I have manuals for 100 different games and want to know for each of them how many players is the game for. Is there a difference in processing all of the documents at once, putting them in a vector store and then query this versus processing one document at a time? I can see the advantage of having a vector store in case you want to ask another question, so you don't have to reprocess all of the documents, but aside from that? Also, can you somehow limit what context the LLM uses? Say I want to ensure the LLM is using file1.pdf but not file2.pdf how would I go about that?

    • @REINOSO195
      @REINOSO195 13 วันที่ผ่านมา +1

      Amigo eso es tema de tesis, lul

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +1

      Thanks for watching and for the great questions. You are asking about 3 different use-cases I think, so it's probably best to tackle them one at a time as separate problems.
      1) If I wanted a very specific piece of information from all my documents, and I need it be accurate, then I'd probably write purpose-specific logic to extract it rather than make it a general-purpose thing.
      2) If you still wanted to run general queries across all the knowledge, then that could be a separate project (maybe using a knowledge graph instead of just a vector DB).
      3) You could store separate DBs, or use the tags/meta-data to filter out page that aren't from the one you want.
      These are all very basic solutions to you problems, but you get the idea - a good solution will depend on your use-case and how strong you need the solution to be.

    • @NicolaRomano
      @NicolaRomano 8 วันที่ผ่านมา

      ​@@pixegamithanks I've been looking into metadata filtering and it seems to be a good direction indeed! I'm currently testing it and seems to do a good job!

  • @derekpunaro2422
    @derekpunaro2422 หลายเดือนก่อน +2

    Hi Pixe! I was wondering how would you write the get_embedding_function for Chat GPT OPEN AI?

    • @pixegami
      @pixegami  หลายเดือนก่อน

      My first RAG project actually uses OpenAI embeddings: th-cam.com/video/tcqEUSNCn8I/w-d-xo.html
      Here is the documentation and code examples from Langchain: python.langchain.com/docs/integrations/text_embedding/openai/

  • @Queenmaya99
    @Queenmaya99 28 วันที่ผ่านมา +2

    hello! do you know if there is a way to filter the db.similarity_search_with_score, based on metadata? for example, if you had a query and you wanted it to only reference the monopoly pdf to answer it.

    • @pixegami
      @pixegami  28 วันที่ผ่านมา +1

      I think it might be possible. I haven't tried it, but looks like Chroma has an underlying API for filtering (and Langchain should surface it too): docs.trychroma.com/usage-guide#querying-a-collection
      collection.query(
      query_embeddings=[[11.1, 12.1, 13.1],[1.1, 2.3, 3.2], ...],
      n_results=10,
      where={"metadata_field": "is_equal_to_this"},
      where_document={"$contains":"search_string"}
      )

  • @shaigrustamov5115
    @shaigrustamov5115 23 วันที่ผ่านมา +1

    Great videos 👍
    Do you think RAG is better than fine tuning for invoice data extraction?
    If I have 5000 invoices and want to train a model for data extraction, do you know if I need to prepare both OCR and labels? Do the labels also need to contain bounding boxes? Is not this very time consuming? Are there models that can be trained without bounding boxes?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +1

      I don't have a ton of experience with fine-tuning LLM models so I can't compare them or advise them for your use-case. But in general, I prefer not to fine-tune, since that is quite a slow/expensive process and it'll bind your use-case tightly to your model (a RAG approach lets you switch LLMs more easily).
      For the OCR issue, I think you'll probably want to separate that process out as a different problem. You might want a computer-vision process to first extract the data for your document and standardize them somehow for your RAG workflow later.

  • @gianmarco-lr7wc
    @gianmarco-lr7wc หลายเดือนก่อน +1

    hank you! It would be great to see how to deploy this to the cloud, for examplw aws!

    • @pixegami
      @pixegami  หลายเดือนก่อน

      That's a great idea!

  • @E.X.P.JP_roblox
    @E.X.P.JP_roblox 20 วันที่ผ่านมา +1

    If my word documents contain mostly tables instead of text, and the format of the tables are all over the place and may be difficult to work with (with merged cells and sub tables), would the chunking and embedding steps stay the same? Or would you recommend cleaning up the word documents to kind of “flatten” the contents and perhaps make it easier for AI model to understand? However, manually cleaning up the files doesn’t sound like a scalable solution.

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +2

      Very good question, that's a real challenge I've run into as well in other projects. It's hard to answer-the chunking/prompting strategy really depends on the data, and the best way to know is to just test different strategies.
      Of course, you probably don't want a "manual" solution, but it might make sense to do things manually for a small part of the data just to see what works.
      With tables, I'd probably attempt to pre-process it somehow so it's in a format that could be embedded more easily. For example, I might normalize the headers/columns of the table into every row. E.g. instead of "30", I might change it to "price: 30", because I notice most table formats have the headers at the top, but that is cut-off if the table is split into two or more chunks.
      Complex problem! Let me know if you find something that works.

  • @MaliciousCode-gw5tq
    @MaliciousCode-gw5tq 6 วันที่ผ่านมา

    Question will RAG method works even if the data are too big? Example 1k pages each file?

  • @maxi-g
    @maxi-g 21 วันที่ผ่านมา +1

    Hey, I have a question. I tried to load a fairly large PDF (100 pages) into the database (approx. 400 documents). However the add_to_chroma function seems to be excruciatingly slow. The output from ollama shows that the embeddings only get requested once every two seconds or so. There is also no CPU or GPU load on my system when this process is running. Is there any way to improve this? Thank's already

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +1

      This is most definitely because of the time it takes to embed each page (since you mentioned embeddings get requested once every two seconds). Your Ollama model might not be able to fully leverage your hardware, which is potentially why your don't see your CPU/GPU load rise up.
      You could experiment by switching this to use an online embedding API (like OpenAI or AWS Bedrock) and see if it's faster. Or you could double check to see if Ollama is using your GPU correctly (github.com/ollama/ollama/blob/main/docs/gpu.md)

  • @vidfan1967
    @vidfan1967 หลายเดือนก่อน +2

    I want to verify an existing PowerPoint document against a PDF, e.g. with an updated law text. I want to find out, whether my statements on each PPT slide are still true.
    Challenge: each slide contains one or more statements, which should be verified against the PDF, for example you have 6 or 10 bullet points on one slide. To use RAG I cannot use them all for the query as they might be quite diverse and would knowledge from the PDF, that is not specifically matching any one point but all of them together.
    Also: the context on the slide should be considered together with each statement, e.g. the title of the slide, an intro text above the bullet point list, or the upper level information for a statement that sits in a sub-structure of bullet points.
    I guess I would somehow need to split the statements in the PPT in logical chunks, preserving the context. Is there a python function I could use? Or can this be done with AI (e.g. few shot) after the slide text has been extracted?
    If this is of wider interest, I would appreciate a video on this 🙂

    • @pixegami
      @pixegami  หลายเดือนก่อน

      Thanks for sharing your use case. Yup, this is definitely something you need to solve during the "chunking" phase of the process.
      For example, there are some experimental "chunking" functions from Langchain you could try: python.langchain.com/docs/modules/data_connection/document_transformers/semantic-chunker/
      Also, you could "bake" in custom information to each chunked document itself. E.g. something like (each variable name is made up):
      chunk.metadata["page_content"] = title_of_doc_str + context_str + actual_page_text_str

  • @DCW09
    @DCW09 28 วันที่ผ่านมา +1

    Cursory glance says - add a hashing function to the chunk metadata, this way the chunk should have a unique identifier (MD5, SHA, Etc) if anything changes the hash will also change. Then its just simple logic to validate current chunk page.index against an existing one's hash. If its different, overwrite. If its not, dont waste the cycles.
    In practice, I am not 100% sure that this would be the approach but at least the theory here should be pretty on point for identifying changes with few compute cycles.

    • @pixegami
      @pixegami  28 วันที่ผ่านมา +1

      Yup! I think that's probably the way I'd do it too. If there's too documents and you need to scale it, then I guess you can hash the entire document as well first to narrow the search space each time.

    • @MichaelTanOfficialChannel
      @MichaelTanOfficialChannel 22 วันที่ผ่านมา +1

      @@pixegami I just want to add that I would apply the hash to the entire page and not the chunk. A page can be edited in a way where the content is shorter than previous version, thereby causing the number of chunks to be less that what it previously was. And I would also remove all chunks belonging to the said page before adding new chunks, so as not to have an orphaned chunk from the previous lengthier page.

  • @uwegenosdude
    @uwegenosdude หลายเดือนก่อน +1

    Thank you very much for the really helpful video. I also made bad experience when not using OpenAI to create embeddings. Can you recommend a free embedding model that produces good results and that I could run locally? I tried to use anythingLLM but RAG results were really bad. But I‘m not sure if the reason might be the language of my documents, they are all German.

    • @pixegami
      @pixegami  หลายเดือนก่อน

      Thanks, glad you found it helpful! For local Embedding models, I haven't tried anything other than nomic-text so far. Have you tried all the other embedding models on ollama.com/library? There is a couple there.
      For non-English text, I heard Mistral (based in France) is quite good. I don't know if it's available on Ollama (officially), but it's open source so you should be able to get a copy: docs.mistral.ai/capabilities/embeddings/

    • @uwegenosdude
      @uwegenosdude หลายเดือนก่อน

      @@pixegami thanks for your help. I will try this out.

  • @mohsenghafari7652
    @mohsenghafari7652 หลายเดือนก่อน +1

    Hi dear friend .
    Thank you for your efforts .
    How to use this tutorial in PDFs at other language (for example Persian )
    What will the subject ?
    I made many efforts and tested different models, but the results in asking questions about pdfs are not good and accurate!
    Thank you for the explanation

    • @mohsenghafari7652
      @mohsenghafari7652 28 วันที่ผ่านมา

      help please🙏🙏

    • @pixegami
      @pixegami  27 วันที่ผ่านมา

      Answered then in a separate comment, but I think it just boils down to finding a model that works well with the language. E.g. : huggingface.co/MaralGPT/Maral-7B-alpha-1

    • @mohsenghafari7652
      @mohsenghafari7652 27 วันที่ผ่านมา

      thanks dear 🌹

  • @mo3x
    @mo3x หลายเดือนก่อน +10

    So it is just an advanced ctrl+f ?

    • @pixegami
      @pixegami  หลายเดือนก่อน +2

      Yes, that's one way to think about it. Still, incredibly powerful.

  • @natemiles4951
    @natemiles4951 20 วันที่ผ่านมา +1

    maybe a dumb question but im curious why the embeddings would be any different if using the same model to generate locally vs the cloud (IE bedrock)?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +1

      The same model should generate the same embeddings, so if your local model is EXACTLY the same as using a cloud model it should work.
      In this video I'm using different models: 1) Titan (via AWS Bedrock) for the embeddings and 2) Mistral for the LLM agent.

  • @nihatdemir2000
    @nihatdemir2000 หลายเดือนก่อน +2

    Could you make video same project with LLAMA 3?

    • @pixegami
      @pixegami  หลายเดือนก่อน +1

      You can do it here! Just change the Ollama model to Llama 3: ollama.com/blog/llama3

  • @TrevorDBEYDAG
    @TrevorDBEYDAG วันที่ผ่านมา +1

    Thank you for the tutorial, I guess it should be a better way to create chunks, not only by character count because it cuts the paragraph in disruptive way. May be another library to split in paragraphs or at least end of the sentence?

    • @pixegami
      @pixegami  25 นาทีที่ผ่านมา

      The Recursive Text Splitter actually attempts to do what you suggest: python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/recursive_text_splitter/
      "It tries to split on them in order until the chunks are small enough. The default list is ["

      ", "
      ", " ", ""]."

  • @albertozacchini3388
    @albertozacchini3388 19 วันที่ผ่านมา +1

    Hi! Thanks so much for the video, it was super useful. However, i face that is a big problem for my laptop give all the context to the LLM, it will blow my CPU. Do you know any API of other free LLM instead of a local one?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +1

      I haven't used any free LLM APIs myself, but the OpenAI and the AWS Bedrock ones are pretty cheap if you use the turbo models. Otherwise, I found this Reddit thread discussing free LLM APIs: www.reddit.com/r/deeplearning/comments/1350qtu/what_are_some_small_llm_models_or_free_llm_apis/

    • @albertozacchini3388
      @albertozacchini3388 10 วันที่ผ่านมา

      @@pixegami yeah I give up on searching but I found the Mistral API really nice for embeddigs, still cheap and super fast to use. By the way thank you so much!

  • @williamroales709
    @williamroales709 28 วันที่ผ่านมา +1

    awesome! thanks ! how can i use a GPU to make local faster?

    • @pixegami
      @pixegami  28 วันที่ผ่านมา

      I think it's probably a setting in Ollama. Have a look at: github.com/ollama/ollama/blob/main/docs/gpu.md

  • @MrAtomUniverse
    @MrAtomUniverse 15 วันที่ผ่านมา +1

    How do you display your code that way for video , would love to do that for my work

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      I actually use a lot of custom tooling to generate the slides, but you can use this to do something very similar: carbon.now.sh/

  • @xspydazx
    @xspydazx 19 วันที่ผ่านมา +1

    Question : once loading a vector store , how can we output a dataset from the store to be used as a fine tuning object ? as this is the most important part : producing the data from documents that will be used to fine tune the model : as the rag will have the domain documents etc embedded into the vector store:
    As you know retrieval time will be reduced after fine tuning the data into the model : but after deciding if the data is good to update of course , hence all historical chat should also be uploaded into the rag at the end of the chat?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +1

      It really depends on what you want your training data to look like. But honestly I haven't done a lot of work in LLM fine-tuning myself so I can't really give advice on this question yet.

    • @xspydazx
      @xspydazx 8 วันที่ผ่านมา

      @@pixegami in truth the beauty of langchain is it's document loaders , hence it enables for you to create datasets from your documents ... Or sites , so it just needs to be saved to JSON . Or your rag will continue to grow . As we know the rag is to fill the training gap !
      But if we have converted our docs we can run a fine tuning session and basically update the model .. so only for large docs or new docs should be in the rag .... The rag ...IE augmented query's ... This.can be done with your chains !! By your chain of questions to your chain , a still augmented response !

  • @JoseLuisCornejoRivas
    @JoseLuisCornejoRivas 12 วันที่ผ่านมา +1

    Now how to choose the correct document and not mix contexts from different documents?
    Thinking about a large documentary database

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Good question. I haven't looked into this myself, but I suspect you'd need to upgrade how you store/query data, maybe using something like a knowledge graph instead (e.g. neo4j): python.langchain.com/v0.1/docs/use_cases/graph/

  • @roxtonjhon
    @roxtonjhon 22 วันที่ผ่านมา

    How to retain context information in multi-turn conversations

  • @rude_people_die_young
    @rude_people_die_young หลายเดือนก่อน +3

    Great work 🎉❤

    • @pixegami
      @pixegami  หลายเดือนก่อน

      Thank you!

  • @ai26prasad.p25
    @ai26prasad.p25 27 วันที่ผ่านมา +1

    Hello sir , can you make one video how to create a dataset to train to a LLM model .
    I am getting too much data loss while I am trying to train the my data to the model , so please make a video about who to create a data for LLM and How to train it ?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Thanks for the suggestion. Fine-tuning an LLM isn't on my radar yet, but I'll add it into my list since a couple of people have already asked.

  • @gowthamkrishna6283
    @gowthamkrishna6283 หลายเดือนก่อน +1

    can we do hashing on the page content to only update the modified? any one can share any link to do the same, thanks!

    • @pixegami
      @pixegami  27 วันที่ผ่านมา

      Yup, that's the strategy I'd probably use to detect sections of pages being updated too. I don't have a video on it yet, but it's a great idea I'll add to my list of potential tutorials!

  • @nirmalkumar007
    @nirmalkumar007 29 วันที่ผ่านมา +2

    which part of the code makes the API call to the OLAMA server ? Kindly help

    • @pixegami
      @pixegami  28 วันที่ผ่านมา +1

      The Langchain Ollama wrapper class (e.g. python.langchain.com/docs/integrations/text_embedding/ollama/ for the embedding) wraps all the code to call Ollama for you.

  • @rahulsharmaah
    @rahulsharmaah หลายเดือนก่อน +3

    can we get an api of this and apply in our application

    • @pixegami
      @pixegami  28 วันที่ผ่านมา +1

      Yup! That's going to be the plan for my next video (hosting a RAG app in the cloud)

  • @johnharryduavis3414
    @johnharryduavis3414 28 วันที่ผ่านมา +1

    Im wondering if you can deploy it in huggingface so I can use it for my mobile app?

    • @pixegami
      @pixegami  28 วันที่ผ่านมา

      I'm not sure, I haven't used HuggingFace much self. But for sure I know it's definitely possible to deploy it as an API using some of the other cloud platforms (AWS, Azure, etc).

  • @devmit2071
    @devmit2071 หลายเดือนก่อน +2

    How do you do this NOT running it locally? i.e. using the AWS cloud for pretty much everything (PDF in vector database, Langchain, Bedrock etc...)

    • @pixegami
      @pixegami  10 วันที่ผ่านมา

      You'd have to change all the LLM functions to be cloud based (e.g. AWS Bedrock or OpenAI), wrap the app in an API (like FastAPI) and Docker, and deploy it to the cloud (probably as a Lambda function).
      I'm working on a video about that now, so stay tuned :)

    • @devmit2071
      @devmit2071 10 วันที่ผ่านมา

      @@pixegami Thanks. I'll drop you an email with some ideas

  • @phizicks
    @phizicks 11 วันที่ผ่านมา +1

    md5 of the data to the index, if it matches, no update needed. unless I didn't understand the requirement

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Yup, I think that's probably how you'd go about indexing/updating specific chunks. You might also need a tree structure (e.g. hash the entire document or categorize them first) if you want to work in the scale of 10k+ documents.

  • @nvajay3829
    @nvajay3829 หลายเดือนก่อน +1

    Brother, this is working well for PDFs, but PDFs with tables and json files it's not able to do even if i use json loader and modify the code. Can you make a video for it ?
    Local RAG for large json files

    • @pixegami
      @pixegami  27 วันที่ผ่านมา +1

      Ahh yes, any weird formatting inside a PDF will be challenging. I'd probably try to approach it by seeing if I can first parse it into Markdown or HTML, because I think data in those formats are a little easier to work with.
      If you have any examples of PDFs you'd like to parse, please feel free to share them here and I'll see if it gets interest for me to cover in a future video. Thank you!

  • @P4jMepR
    @P4jMepR 24 วันที่ผ่านมา +1

    Would this solution be suitable for evaluating the best candidates from a 100 resumes for example? How would You approach such a project?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +1

      I can't really say if it's a good solution for your use case. I'm sure some companies use LLMs as an initial filter for things like candidates/resumes. But for such a high-judgement, important decision, it's best to really validate it against human output.

  • @siswanto4045
    @siswanto4045 2 วันที่ผ่านมา +1

    Wow, this is what I wanted. But can it be running on open webUI? Like running ollama locally with webUI? Thank you

    • @pixegami
      @pixegami  12 นาทีที่ผ่านมา

      Glad to hear that! I do plan to do a tutorial later on how to build a a web UI for your app, so stay tuned.

    • @siswanto4045
      @siswanto4045 5 นาทีที่ผ่านมา

      @@pixegami can't wait to watch the video

  • @Nabeel27
    @Nabeel27 12 วันที่ผ่านมา +1

    How to deploy to the cloud so you can share the application?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      I'm actually working on that idea for my next video, so stay tuned!

  • @prateeksaxena7808
    @prateeksaxena7808 15 วันที่ผ่านมา +1

    Hi, pls let us know how to deploy this?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +1

      This will be the topic of my next video, so stay tuned!

  • @Ammarsays
    @Ammarsays 6 วันที่ผ่านมา +1

    I am just a layman and I want to know if text splitters count the characters, words or sentences for a given chunk size? And if the text splitters can identify sentences or paragraphs in text?

    • @pixegami
      @pixegami  นาทีที่ผ่านมา

      Yup, the splitter should attempt to do that. Here's the documentation: python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/recursive_text_splitter/
      "It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["

      ", "
      ", " ", ""]."

  • @ChauNguyen-tt9pz
    @ChauNguyen-tt9pz 10 วันที่ผ่านมา +1

    Hi. Thank you for your great videos here. I have learned a lot from it. I'm currently trying to run the code locally but Im running into a server problem. My error is ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it. Could you help me debug this error?

    • @pixegami
      @pixegami  9 วันที่ผ่านมา +1

      Hmm, which command/script are you running to see that error? From what little information I have, I'd say it sounds like an issue to connect to the local Ollama server. Have you started up the Ollama server successfully on your machine?

    • @ChauNguyen-tt9pz
      @ChauNguyen-tt9pz 8 วันที่ผ่านมา

      @@pixegami Hi. Thank you for your reply. I didnt set up the Ollama server before so that's why it was giving that error.

    • @ChauNguyen-tt9pz
      @ChauNguyen-tt9pz 8 วันที่ผ่านมา

      @@pixegami However, I'm getting this error when Im using the BedRockEmbedding function: ValueError: Error raised by inference endpoint: An error occurred (UnrecognizedClientException) when calling the InvokeModel operation: The security token included in the
      request is invalid. Do I need to set up some kind of AWS local server as well?

  • @beatsofbinary
    @beatsofbinary 11 วันที่ผ่านมา +1

    How to add a memory? For example if I want to ask follow up questions (contextual)?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา +1

      Good question. I haven't explored this myself yet, but many LLM libraries (including Langchain) seem to have a concept of a "memory" module: python.langchain.com/v0.1/docs/modules/memory/
      That'd probably be where I'd start.

    • @beatsofbinary
      @beatsofbinary 11 วันที่ผ่านมา +1

      @@pixegami Thanks! I've seen another tutorial about that. The guy who created the script collected the chat history in a python list and added it to the prompt. Basically the LLM was given the questions and answers and was asked to reformulate that to a history. That was added then to the prompt something like: "Please answer the human's question based on this chat History"

    • @pixegami
      @pixegami  10 วันที่ผ่านมา

      @@beatsofbinary Ah yup, even if you use a memory module, at the end of the day all LLMs are stateless so you pretty much always have to basically just add it back into the prompt :)

  • @prateemnaskar1656
    @prateemnaskar1656 18 วันที่ผ่านมา +1

    I want to execute the instructions in my PDF directly, How can I do it?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      You'd probably have to structure the app more like an agent, so it can look for information, but also take actions and execute on it: python.langchain.com/v0.1/docs/modules/agents/

  • @pm1234
    @pm1234 หลายเดือนก่อน +6

    I successfully run it locally with ollama embeddings, but it pretends its answer is from the PDF while I'm 100% sure it's from elsewhere. Let me explain: My monopoly PDF rules are in French, my LLM is in French (vigostral), my question is in French (translation of the test question: How much total money does a player start with in Monopoly?), but the reply is stating $1500, an amount and a currency that are NOT in the French PDF book of rules (150'000Frs, very old rules), and it cites the sources: ['data/1d-monopoly-regle.pdf:5:1', 'data/1d-monopoly-regle.pdf:3:4' ... Asking the same question in French directly to ollama (run), without the PDF, states the actual amount (2000€). So, it makes me wonder if it really works all the time without bias (model info vs doc info), why does it cite sources while giving an answer that is not related to the sources, and how to identify what is not working in this case.

    • @Yakibackk
      @Yakibackk หลายเดือนก่อน

      Maybe its related to embedding ? Which one are you using?

    • @thaslim7869
      @thaslim7869 หลายเดือนก่อน

      Is this work offline?

    • @pm1234
      @pm1234 หลายเดือนก่อน

      @@Yakibackk Ollama offers (only) 3 embeddings, I tested nomic and mxbai.

    • @pm1234
      @pm1234 หลายเดือนก่อน

      @@thaslim7869 ollama works locally.

    • @juryel3561
      @juryel3561 หลายเดือนก่อน +1

      You can explicitly tell it not to use information outside of the provided documents through its instructions since its RAG you can do that.

  • @user-sx3mm5ii5u
    @user-sx3mm5ii5u 28 วันที่ผ่านมา +1

    does it deals with images inside the pdfs?

    • @pixegami
      @pixegami  28 วันที่ผ่านมา

      No, I don't think. You probably need some additional logic to parse/describe the images into text first (maybe using a multi-modal model to do that).

    • @user-sx3mm5ii5u
      @user-sx3mm5ii5u 28 วันที่ผ่านมา

      @@pixegami o you have any idea?
      some pdfs have text based content..some pdfs may have scanned medical reports etc..

  • @bachirafik8040
    @bachirafik8040 หลายเดือนก่อน +3

    Hi sir I got a aws error

    • @pixegami
      @pixegami  27 วันที่ผ่านมา

      What error did you get? Did you set up AWS CLI and enable the embedding models in your AWS account as well?

  • @gonzalocueva6631
    @gonzalocueva6631 14 วันที่ผ่านมา +1

    hardware requirements?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      This will probably depend on the model. For example: ollama.com/library/llama2
      7b models generally require at least 8GB of RAM
      13b models generally require at least 16GB of RAM
      70b models generally require at least 64GB of RAM

  • @bitcoinstacker
    @bitcoinstacker 23 วันที่ผ่านมา +1

    Try phi-3-mini-128k

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Thanks for the suggestion, I hadn't tried this one yet.

  • @Yakibackk
    @Yakibackk หลายเดือนก่อน +8

    We want llama3

    • @mohsenghafari7652
      @mohsenghafari7652 หลายเดือนก่อน +3

      we can replace model with lama3

    • @pixegami
      @pixegami  28 วันที่ผ่านมา

      Sure! Just update your Ollama server to download and run Llama3: ollama.com/library/llama3

    • @jack_s
      @jack_s 5 วันที่ผ่านมา

      Is python can connect with llama 3?

  • @codyross489
    @codyross489 27 วันที่ผ่านมา +2

    why not show this in an IDE?

    • @pixegami
      @pixegami  10 วันที่ผ่านมา +2

      Hmm, I find that with the slides, I can move through the content faster (save you time) and focus on the part of the code that matters.
      I do use IDE screen-caps when I demo the app, and the project is fully available on GitHub too. But if my viewers do prefer IDE recordings to follow along I'm happy to do more of that :)

  • @Jordan-tr3fn
    @Jordan-tr3fn 29 วันที่ผ่านมา +2

    I didn't find this useful..
    to follow your "tutorial", we need to know what you are talking about but if we know what you are talking about, we don't actually need your help to build this..
    It would have been way better to code along with us
    Like "Tech with tim" did.. way better to follow than just watching some code screenshots..

    • @pixegami
      @pixegami  28 วันที่ผ่านมา +1

      Thanks for the feedback. This tutorial is a little bit advanced, and it does require understanding of the first chapter (th-cam.com/video/tcqEUSNCn8I/w-d-xo.html) to follow this in real time.
      Which was the first concept/code that lost you in this video? I can reflect on how to better structure the project next time.

    • @Jordan-tr3fn
      @Jordan-tr3fn 27 วันที่ผ่านมา

      @@pixegami I I have some experience with python and langchain, I already built some projects but it was a few weeks ago, I wanted to try RAG and local LLMs (llama3), I thought I would code along with you :).
      I tried to follow the code but noticed it was mainly snippets (?) so I checked the Github repo and saw a few files without a Readme, I didn't want to pull the code.. I was a bit confused by the tutorial..
      To be fair I also should have checked your previous video but I think I did miss this part...
      The quality of your videos are good, I just like the idea to code with the video because I have some difficulties to structure my code. (which part of the code should be above or below etc.) I find tutorials like those by "Nicholas Renotte" very helpful, where we build a project together.

  • @TheBratMaria
    @TheBratMaria 26 วันที่ผ่านมา +1

    Hello! amazing video! I was wondering can we use something like that for get_embeddings_function():
    embeddings = HuggingFaceEmbeddings(
    model_name="/yi34b/Yi-34B-Chat-8bits") .
    Can the code also be used with quantized models?

    • @pixegami
      @pixegami  11 วันที่ผ่านมา

      Yup, you should be able to switch the embedding model to any embedding supported by Langchain (incl. Hugging Face).
      I haven't looked into what quantised models are yet, so I can't answer if it will work or if there's any difference.

  • @user-lq1md3dw9z
    @user-lq1md3dw9z วันที่ผ่านมา +1

    Your video is REALLY good. For embedding, could you please update your code to make it work forBertTokenizer.from_pretrained('bert-base-uncased') and
    model = BertModel.from_pretrained('bert-base-uncased')?

    • @pixegami
      @pixegami  29 นาทีที่ผ่านมา

      Thank you. Have you tried using this embedding and model yourself? In theory you should just be able to replace the lines I used in the video.