Developing and Serving RAG-Based LLM Applications in Production

แชร์
ฝัง
  • เผยแพร่เมื่อ 30 พ.ย. 2024

ความคิดเห็น • 14

  • @junaidiqbal4104
    @junaidiqbal4104 ปีที่แล้ว +5

    🎯 Key Takeaways for quick navigation:
    00:05 🚀 Initial Motivation and Project Start
    - Started building LM applications to gain firsthand experience and improve user experience.
    - Developed a RAG application, focusing on making it easier for users to work with products.
    - Emphasized the importance of underlying documents and user questions in building such applications.
    01:31 🌐 Community Engagement and Insights
    - Encouraged sharing insights and experiences on building RAG-based applications.
    - Acknowledged the community's early stage and the value of diverse perspectives.
    - Welcomed external input to enrich the collective understanding of RAG applications.
    03:07 🧩 Experimentation with Data Chunking
    - Explored different strategies for efficient data chunking, moving beyond random chunking.
    - Utilized HTML document sections for precise references and better understanding of content.
    - Aimed for a generalizable template, potentially open-sourcing a solution for various HTML documents.
    05:14 🗃️ Vector Database and Technology Choices
    - Chose Postgres as the Vector database, emphasizing familiarity and compatibility.
    - Highlighted the increasing options of specialized databases for LM applications.
    - Advised selecting a database based on team familiarity but exploring new options for specific features.
    06:10 🔄 Retrieval Workflow and Database Query
    - Described the retrieval process, including embedding queries and calculating distances.
    - Discussed pros and cons of building Vector DB on Postgres versus using dedicated solutions.
    - Addressed potential limitations based on document scale and the flexibility of different databases.
    08:20 📏 Considerations for Context Size and Token Limits
    - Acknowledged token limits in LM context and model-specific variations.
    - Encouraged experimenting with different chunk sizes, possibly using multiple embeddings for longer chunks.
    - Highlighted the importance of adapting to the LM's limitations and exploring diverse experimental setups.
    09:29 🔍 Evaluation Metrics and Component-wise Assessment
    - Introduced the two major components for evaluation: retrieval workflow and LM response quality.
    - Explained the evaluation process, including isolating each component for focused assessment.
    - Shared insights into the challenges and considerations of scoring LM responses.
    11:32 📊 Evaluator Selection and Quality Assessment
    - Used GPT-4 as an evaluator based on empirical comparison and understanding of the application.
    - Discussed the limitations of available LM models and potential biases in self-evaluation.
    - Advocated for iterative improvement and potential collaboration with external LM development communities.
    15:13 📈 Iterative Evaluation and System Trust Building
    - Illustrated the iterative evaluation process, starting with trusting an evaluator.
    - Demonstrated the evaluation flow, using different configurations and trusting the chosen LM's outputs.
    - Emphasized the importance of building trust in each component before assessing the overall system.
    17:04 ❄️ Cold Start Strategy and Bootstrapping
    - Presented a cold start strategy using chunked data to generate initial questions.
    - Addressed noise reduction by refining generated questions and encouraging creativity.
    - Described the bootstrapping cycle from clean slate to using generated data for further annotations.
    18:38 🔄 Continuous Learning and Evaluation Scaling
    - Responded to questions about the number of examples for cold start and overall evaluation.
    - Advocated for a balance of quantity and diversity in examples for comprehensive evaluations.
    - Stressed the importance of continuous learning, adaptation, and leveraging automated pipelines for scaling evaluations.
    19:49 📈 Chunk Size Impact on Retrieval and Quality
    - Retrieval score increases with chunk size but starts tapering off.
    - Quality continues to improve even as chunk sizes increase.
    - Code snippets benefit from longer context or special chunking logic.
    21:30 🧩 Number of Chunks and Context Size
    - Increasing the number of chunks improves retrieval and quality scores.
    - Larger context windows for LLMs show a positive trend.
    - Experimentation with techniques like RoPE for extending context.
    22:30 🛠️ Fixing Hyperparameters During Tuning
    - Fixing hyperparameters sequentially: context size, chunk size, embedding models.
    - Experimentation with spread and fixing parameters once optimized.
    - Illustrates a pragmatic approach to hyperparameter tuning.
    23:12 🏆 Model Selection and Benchmarking
    - GPT-3.5-based model (GTE base) outperformed larger models on their use case.
    - Emphasizes the importance of evaluating models based on specific use cases.
    - Benchmarking against openai's text embedding and choosing a smaller, performant model.
    23:56 💰 Cost Analysis and Hybrid LM Routing
    - Cost analysis comparing different LM models.
    - Introduction of a hybrid LM routing approach for cost-effectiveness.
    - Consideration of performance, cost, and hybrid routing for optimal results.
    25:10 🤖 Classifier vs. Language Model for Routing
    - Classifier used for routing decisions due to speed considerations.
    - Mention of training a classifier using a labeled dataset for routing.
    - Potential transition to LM-based routing as LM inference speed improves.
    27:17 🔄 Future Developments and System Integration
    - Integration of components into larger systems, citingAnyScale's doctor application.
    - Anticipation of more developments and applications in the future.
    - Acknowledgment of the importance of iteration in building robust systems.
    Made with HARPA AI

  • @jzziesing
    @jzziesing ปีที่แล้ว +4

    I would love to see an hour long presentation on this!

  • @TymonVideos
    @TymonVideos ปีที่แล้ว +2

    Really enjoyed this talk - found a lot of value in it. Both speakers are clearly so knowledgable, and i love the extra little details the chap in the blue hoodie gave throughout. Would love to connect & share!

    • @TymonVideos
      @TymonVideos ปีที่แล้ว

      ...just realised the "chap in blue" is a co-founder! No disrespect meant :) awesome

  • @ndamulelosbg8887
    @ndamulelosbg8887 4 หลายเดือนก่อน

    Great presentation. Just one question: What is relevance_score in this case? Is it an aggregation of grounding metrics for all reference examples?

  • @rohvir2615
    @rohvir2615 9 หลายเดือนก่อน +1

    goated video no cap

    • @mumcarpet109
      @mumcarpet109 8 หลายเดือนก่อน +1

      on god, we making out the hood with this one 💯

  • @noosfera_It
    @noosfera_It ปีที่แล้ว

    amazing work! thank you

  • @JavierTorres-st7gt
    @JavierTorres-st7gt 5 หลายเดือนก่อน

    How to protect a company's information with technology ?

  • @victorhenriquecollasanta4740
    @victorhenriquecollasanta4740 10 หลายเดือนก่อน

    Top gs

  • @noosfera_It
    @noosfera_It ปีที่แล้ว +7

    when accents swap.

    • @charlesthompson8938
      @charlesthompson8938 9 หลายเดือนก่อน

      😂😂 and awesome content though.

  • @tunglee4349
    @tunglee4349 5 หลายเดือนก่อน

    great content! thanks a lot!