Building End-to-End RAG Pipelines Locally on Kubernetes

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 ก.ย. 2024
  • Watch the replay and a deep dive into my KubeCon Paris demo. Tune in if you are interested in learning how to run an end-to-end RAG pipeline on cloud native infrastructure.

ความคิดเห็น • 4

  • @AIWithShrey
    @AIWithShrey 5 หลายเดือนก่อน

    Any reason why you chose BAAI and not any other Embedding model? What are the impacts of mix and matching the Embedding model and the LLM? My current app works just fine with GPT4ALL Embeddings, and Gemma 1.1 7B.
    Another note: Deploying a quantized LLM will significantly reduce VRAM usage, Gemma 7B Q8_0 quantized takes up 12 gigs of VRAM for me. Implementing KEDA and using Quantization in tandem will be a game-changer.

  • @venkatathota633
    @venkatathota633 5 หลายเดือนก่อน

    could you please provide git repo for the above code?