3 Effective steps to Reduce GPT-4 API Costs - FrugalGPT

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ม.ค. 2025

ความคิดเห็น • 34

  • @SamuelJunghenn
    @SamuelJunghenn ปีที่แล้ว +11

    Their logic missed one critical point when calculating the cost, if you are serving 15,000 people a month with customer service your wages are much much higher than $21,000 even if you hire people in developing countries. Carry on, your content is awesome 😎

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว +3

      Thanks for sharing this!

    • @hrutuselar8839
      @hrutuselar8839 11 หลายเดือนก่อน

      I wanna be this guy, reading between the lines

  • @miinyoo
    @miinyoo 11 หลายเดือนก่อน

    This is a great model of how to separate out expensive queries from common ones. Like a local FAQ. Really cool idea.
    I can see a real use case for this when SORA hits the actual market because when it does, it's going to be very expensive. Get the heavy lifting stuff done in short as possible bursts and then use a lighter model based on SORA locally to extrapolate on what SORA responded with to fill in the gaps. Example. You only need one response for any given scene say a master shot, maybe a couple. Wide angle and complete. Then outside of SORA, use those responses, knowledge of the "protocols" and put that data through a light weight Open Source model with cache and some kind of convolution to generate the closeups, cutaways, alternate takes, etc. Now, SORA isn't there yet by any means to be production ready but it will get there eventually up to and including full dialog, personality archetypes, everything in the box.

  • @samgomarezsmith3887
    @samgomarezsmith3887 ปีที่แล้ว +2

    Very good method for using gpt4 in combinaison of cheeper models

  • @tarun4705
    @tarun4705 ปีที่แล้ว +3

    But I think as per the OpenAI policy, we cannot use its responses to finetune another model. So, I am not sure if we can follow option - d) Model fine-tuning for commercial usage. Btw, In option 5 how is the score calculated to check whether the answer is correct or not since we can't rely on open-source LLMs to calculate the score, we might need GPT-3 or GPT-4 to calculate the score right. In that case instead of making an API call to GPT-4 to calculate the score why not send the query itself.

  • @anuragmishra-yu2yx
    @anuragmishra-yu2yx ปีที่แล้ว +1

    I have hard time understanding how they implement the "scoring function" to decide the quality of generated response for the model? and how they decide the threshold to neglect the model's response. If anyone could help me to understand that

  • @klammer75
    @klammer75 ปีที่แล้ว +1

    Great work and I love this idea! Unique integrations and novel recombinations of existing models and tools is definitely the way forward! Good stuff!🥳🦾🤩

  • @Naaoos1
    @Naaoos1 ปีที่แล้ว

    How to calculate score of your answer ? and with reference to what?

  • @conceptsintamil
    @conceptsintamil ปีที่แล้ว

    But how does model cascade works on passing the prior contexts to the last most expensive model without compromising on the token reduction? Wouldn’t GPT4 need previous contexts to send accurate response? In that case wouldn’t the token size increase ?

    • @ThomasTomiczek
      @ThomasTomiczek ปีที่แล้ว

      If that is the case, you can maybe not send it. But often you do not need a lot of context to answer the question.

  • @marcosmagana8930
    @marcosmagana8930 9 หลายเดือนก่อน

    Fantastic content! Could you please make a tutorial with an actual code implementation of FrugalGPT? Thanks! I truly appreciate it.

  • @surajitchakraborty1903
    @surajitchakraborty1903 ปีที่แล้ว

    Hey , Great Video. With regards to Prompt Adaptation, is the library Promptify be able to be used in someway ?

  • @seikatsu_ki
    @seikatsu_ki ปีที่แล้ว +2

    Thank you for your content. However, the sound of typing on Typewriter was distracting.

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว +2

      Thank you for the feedback. I just tried something different. I'll improve it.

  • @tharzzan
    @tharzzan ปีที่แล้ว

    I'm not sure i follow the logic of this paper. Would you be kind enough to demonstrate how this whole concept can be applied in a real world use case?

  • @boscoraj6148
    @boscoraj6148 5 หลายเดือนก่อน

    Wow thanks man!

  • @sto2779
    @sto2779 5 หลายเดือนก่อน

    6:46 - The work I do (c/c++ software engineering) requires 100% GPT-4 responses in providing 98% bug-free code and its really difficult to comprehend the code questions asked, also more difficult when asking to review the code. Redirecting some parts of my question means the quality of the answers won't be good when it was answered by other GPT models. I think the only way to reduce costs is that the user needs to test which GPT platform is really good at and have to send easier questions to specific GPT models based on ranking. The only issue is how can such a complex question be split into smaller questions and asked to different models in which the results of all the responses are later combined. I tested many GPT models, there not great, even for simple code questions. GPT 3.5 is also not that great, however GPT4 is really great, not sure how this paper is valid.

  • @sammathew535
    @sammathew535 ปีที่แล้ว

    "The cache idea is common sense in Software Engineering" - often a challenge to implement though. I guess in this case the implementation would be along the lines of Vector DBs.

  • @fire17102
    @fire17102 ปีที่แล้ว +2

    Thanks for the video! Really interesting! Just Please don't use the typing sound effects, theyre really, really annoying

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว +3

      Thank you for the feedback. Another sub mentioned the same. Sorry about that. I'll avoid it

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว +1

      Btw, that sound itself is annoying or it's annoying because it's overused?

    • @fire17102
      @fire17102 ปีที่แล้ว +3

      @@1littlecoder thanks for the quick reply man! Love your channel
      Well, it's especially hard since it is not even synced to the text, it keeps playing until the text disappears, and it's as loud as your voice. I would prefer 0 bells and wisels. If you do it again, do it very short and very quite. But again no nonsense is better imo
      Good luck and all the best!

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว +2

      @@fire17102 Thanks so much. I appreciate the detailed feedback.

  • @DistortedV12
    @DistortedV12 ปีที่แล้ว

    hmm what the heck is prompt selection?

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว

      Maybe I didn't do a good job, please read the paper it has examples.

  • @vekRaft
    @vekRaft 6 หลายเดือนก่อน

    Too many keys typed for so less words

  • @emmanuelkolawole6720
    @emmanuelkolawole6720 ปีที่แล้ว +1

    What we need is local cpp for llm "HuggingFaceH4/starchat-alpha A". It is much better than any other open source chat models at coding. And it does very well at chatting too. This is better than Vicuna.