RAG in Production - LangChain & FastAPI

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.ค. 2024
  • In this video, I'll demonstrate the compelling reasons for integrating LangChain with FastAPI to effectively bring your application into production.
    Code: github.com/Coding-Crashkurse/...
    Timestamps:
    0:00 Biggest mistakes
    1:40 API Code
    7:10 Digest for row-wise update
    8:10 Asynchronous vs. Synchronous
    10:30 Making requests to the endpoints

ความคิดเห็น • 62

  • @Challseus
    @Challseus 4 หลายเดือนก่อน +2

    Thankful for channels like this that go above and beyond the standard tutorials 💪🏾

  • @saurabhjain507
    @saurabhjain507 4 หลายเดือนก่อน +1

    Another helpful video! please create more videos of langchain in production

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน

      Next monday i will release one about Monitoring with langfuse

  • @Canna_Science_and_Technology
    @Canna_Science_and_Technology 4 หลายเดือนก่อน

    When processing a file for RAG, I save its name, metadata, and a unique ID in a structured database. This unique ID is also assigned to each chunk in the vector database. If a file needs updating or deleting, the unique ID in the database is used to modify or remove the corresponding entries in the vector database.

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน

      Yes, very robust solution :)

    • @RobertoFabrizi
      @RobertoFabrizi 4 หลายเดือนก่อน

      Just to see if I understood you right, let's assume that you have a file (product catalog, functional software specification, your pick) that is a doc with 100 pages. You use a document loader to load it, then split it with a recursive character text splitter with a chunk size of 1000 and overlap of 100, then embed those chunks and store them in a vector db, saving thousands of them all created from the file to ingest. Then a single line around the start of that file changes, but that has repercussions to all later chunks that even though are technically the same data, they are partitioned differently from before (assuming that the change before them caused the chunking process to creare different chunks, maybe the modificed row is longer than before). How do you efficiently update your vector db in this scenario? Thank you!

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน +1

      @@RobertoFabrizi You wont just read a whole catalog in memory at once. You should have each page seperate as raw data. Then you split each page into smaller chunks. I would even argue against a fixed chunk size, but this is something I will cover in my next (small) video.

  • @say.xy_
    @say.xy_ 4 หลายเดือนก่อน

    Best best best!!!

  • @picklenickil
    @picklenickil 2 หลายเดือนก่อน

    Just came to comment that, maintaining a backend for this will be hard!

  • @pvrajanrk
    @pvrajanrk 3 หลายเดือนก่อน

    Great video.
    Can you add your thoughts on including state management for maintaining the chat window for the different chat sessions? This is another area I see as a gap in Langchain Production.

    • @codingcrashcourses8533
      @codingcrashcourses8533  3 หลายเดือนก่อน

      I did another video about this and cover that in my Udemy course. The answer for me is Redis, where you set key value pairs. Key is the conversation id and value is the stringified conversation

  • @daniel_avila
    @daniel_avila 3 หลายเดือนก่อน

    Hi thanks for this! I have a question about digest specifically.
    I understand that would be a great way to compare page_content for changes, but I'm not sure where to do this programmatically, or where to inspect where this is happening already. As far as I know, this is not happening already and maybe more on this would be helpful to someone new to pgvector.
    Following how documents are added, it seems embeddings are created regardless.

    • @codingcrashcourses8533
      @codingcrashcourses8533  3 หลายเดือนก่อน

      There is the indexing api to do this. Or do you mean visually like a git diff?

    • @daniel_avila
      @daniel_avila 3 หลายเดือนก่อน

      @@codingcrashcourses8533 I was unaware this would involve indexing API but makes sense, however there's no official async pgvector implementation for the indexing manager: langchain-ai/langchain, issue #14836

  • @swiftmindai
    @swiftmindai 4 หลายเดือนก่อน

    As always excellent content. I have learned from your previous content about use of langchain index api (SqlRecordManager). Now, I've learned about using of hashing function (generate_digest). I believe both are for same purpose. I'm wondering which one would be better coz I don't see the way to measure performance for both methodology. Appreciate your suggestion.

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน +1

      Thank you! I think its just important to understand the concepts of WHY Langchain introduces something like that and learn about the limitations. I found it hard to use the indexing API when there are a large amount of documents.

    • @swiftmindai
      @swiftmindai 4 หลายเดือนก่อน

      It took me literally few days to understand and implement the indexing API concept. I even had to switch to PGvector from other vector store provider which i was using earlier since indexing api was only applicable to sql based vector store. But now, I love PGVector more than any other. I thank you alot for your production implementation video as I literally use this as the basis of my latest project.

  • @DePhpBug
    @DePhpBug หลายเดือนก่อน

    Still new with all the concept here , saw the video about having API on top of the model's API is this correct? For having an abstraction layer on top of model.
    Am i correct to say , my model need to sit in let;s say server A , then i need to create the API in server B to connect to the A ?

    • @codingcrashcourses8533
      @codingcrashcourses8533  หลายเดือนก่อน +1

      Exactly. Adding one Layer is crutial nornally, Adding more can but must not make sense for your usecase

    • @DePhpBug
      @DePhpBug หลายเดือนก่อน

      @@codingcrashcourses8533 thanks

  • @alchemication
    @alchemication 4 หลายเดือนก่อน

    Very nice. Did you consider langchain serve before trying an inhouse solution? Just curious..

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน

      Langserve is more about prototyping in my opinion:)

    • @alchemication
      @alchemication 4 หลายเดือนก่อน

      Interesting take on it, I think they promote it as a prod env API, but as usual, without actually trying for real - you won’t know 😅 best!

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน

      @@alchemication Well, I am quite good with FastAPI and use it since a very long time, so I would in general prefer not to add an abstraction layer on top of it. My first glance on it was like "ok, that´s quick, but robust code is something different"

  • @omaralhory8065
    @omaralhory8065 4 หลายเดือนก่อน

    Hi,
    I am following your codebase, and I really like it.
    I am still unsure why do we need to update the data via an API, if we can have an ETEL (Extract, Transform, Embed, Load) Data Pipeline that runs on a schedule if new data comes on.
    Why do we give such access to the client, + why is it an API that gives access to deleting records.
    What would you do differently here? Would you develop a CMS in order to maintain the relationship between the client and the db?

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน +1

      You could also do it that way, but in this repo I dont have a pipeline or anything. There is more than one way to do it :-). I currently have no good solution for updating data without the additional API Layer.

    • @omaralhory8065
      @omaralhory8065 4 หลายเดือนก่อน

      @@codingcrashcourses8533 Thank you for being responsive! your channel is a gem btw.
      Usually RAGs data sources aren't predictable, maybe a data lake (delta lake by Databricks) can be quite beneficial here, you can utilize pyspark to do the pipeline and it will be great when it is connected to Airflow for example for scheduling.

  • @yazanrisheh5127
    @yazanrisheh5127 4 หลายเดือนก่อน

    Can you show us how to implement memory with LCEL and if possible, caching responses? Thanks

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน

      The Memory classes from Langchain are not a good way to work in production, they are just for prototyping. In real world apps you probably want to handle all of that in Redis

  • @alexandershevchenko4167
    @alexandershevchenko4167 4 หลายเดือนก่อน

    Thank you for videos! Can you please make a video about tools that can be used for both performance measurement and accuracy tracking? Basically how to build test environment for bot before realising to production

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน +1

      RAG Performance? Performance of service with a Load Test? What would interest you?

    • @alexandershevchenko4167
      @alexandershevchenko4167 4 หลายเดือนก่อน

      ​@@codingcrashcourses8533 Performance of service with a Load Test will be super cool!

    • @alexandershevchenko4167
      @alexandershevchenko4167 4 หลายเดือนก่อน

      @@codingcrashcourses8533 RAG Performance will be cool!

    • @alexandershevchenko4167
      @alexandershevchenko4167 4 หลายเดือนก่อน

      ​@@codingcrashcourses8533 Rag perfomance will be great!

    • @alexandershevchenko4167
      @alexandershevchenko4167 4 หลายเดือนก่อน

      @@codingcrashcourses8533 attempting to respond to your question 101 time: please make video about r-a- g performance (please TH-cam don't ban my reply). I am developing a bot that is able to answer question based on the transript of the video lecture and other course materials to speed up learning proces. if I am not misteken first one will be more relevant?
      How can I deduce that my bot is ready for production? Thank you :)

  • @zendr0
    @zendr0 4 หลายเดือนก่อน

    have you thought about caching implementation in RAG based systems? Curious.

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน

      Yes, but currently it is one of the less prioritized topics, since when you use the whole conversation history, the amount of Times we you can use the Cache is not too often. Have you worked with Cache before?

    • @zendr0
      @zendr0 4 หลายเดือนก่อน

      I have used in memory cache. Can we do something like... we use cache to store the embeddings and then do cosine similarity on the new input query embeddings and the ones in the cache. If the score is more than a threshold then it is somewhat obvious that the query has been asked previously, so we just use the cache to answer that. What do you think?
      @@codingcrashcourses8533

  • @mcdaddy42069
    @mcdaddy42069 4 หลายเดือนก่อน

    why do you have to put your vectorstore in a docker contrainer?

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน

      Containers are just the way to go. You dont have to, but it makes everything so much easier.

  • @sskohli79
    @sskohli79 4 หลายเดือนก่อน

    great video thanks! can you please also add requirements.txt to your repo

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน +1

      can add that today, yes :)

    • @sskohli79
      @sskohli79 4 หลายเดือนก่อน

      Thanks!

    • @sskohli79
      @sskohli79 4 หลายเดือนก่อน

      @@codingcrashcourses8533 not there 😞

    • @omaralhory8065
      @omaralhory8065 4 หลายเดือนก่อน

      Can yo add it please? I checked and its not there

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน +1

      @@omaralhory8065 really sorry, forgot about it

  • @Canna_Science_and_Technology
    @Canna_Science_and_Technology 4 หลายเดือนก่อน

    Will Gemini 1.5 and beyond kill RAG?

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน

      Highly doubt that with Gemini 1.5, but beyond hopefully. Currently Answers still are bad then your context is larger than 20 Documents or so

    • @xiscosmite6844
      @xiscosmite6844 4 หลายเดือนก่อน

      ​@@codingcrashcourses8533Curious why you think answers are bad after that size and how Gemini could solve that in the future. Thanks for the great video!

    • @codingcrashcourses8533
      @codingcrashcourses8533  4 หลายเดือนก่อน

      @@xiscosmite6844 I dont trust gemini after i tried it on my own :)

  • @dswithanand
    @dswithanand 2 หลายเดือนก่อน

    How to integrare langchain chat memory history with fastapi

    • @codingcrashcourses8533
      @codingcrashcourses8533  2 หลายเดือนก่อน

      You don´t. You want your API to be stateless normally.

    • @dswithanand
      @dswithanand 2 หลายเดือนก่อน

      @@codingcrashcourses8533 I understand that.. I am working on a sqlbot and using fastapi along with it. But this bot is not able to retrieve the context memory. Can you help with that. Langchain has ChatMemoryHistory library which can be used for this.

  • @no-code-po-polsku
    @no-code-po-polsku 4 หลายเดือนก่อน

    Blur your OpenAI API key form .env