The REAL cost of LLM (And How to reduce 78%+ of Cost)

แชร์
ฝัง
  • เผยแพร่เมื่อ 30 ก.ย. 2024

ความคิดเห็น • 156

  • @jasonfinance
    @jasonfinance 8 หลายเดือนก่อน +57

    I never got the point of setting up those LLM monitor before, but the step by step guide in the end showing how you use it & how it lead to real cost reduction is gold (70% is crazy!); Will try it out, thank you!

  • @agentDueDiligence
    @agentDueDiligence 8 หลายเดือนก่อน +23

    Hi Jason!
    another alternative to measure costs in your script is to simply use the chat completions information provided by the api of openai.
    every time you call the API, it will return the total tokens in the response json in the "usage" dictionary. That way, you can monitor & control your usage as well.

    • @christislight
      @christislight 7 หลายเดือนก่อน

      Exactly what I do!

  • @que-tangclan
    @que-tangclan 8 หลายเดือนก่อน +17

    This is the best AI content I have seen all week. Thank you for this.

  • @hidroman1993
    @hidroman1993 8 หลายเดือนก่อน +5

    "Comment if you want a video about this" your videos are so good I will click anyways ❤️

  • @Joe-bp5mo
    @Joe-bp5mo 8 หลายเดือนก่อน +14

    Didn't realise the cost gap between GPT4 & Open source model like Mixtral is so big! 200x more expensive really change how I think of building LLM products;
    Thanks for sharing! will definitely try to optimise my LLM apps!

  • @kguyrampage95
    @kguyrampage95 8 หลายเดือนก่อน +12

    Bro that's crazyyy, I literally just wrote down notes on reducing costs in different approaches today. I was about to test them out and saw this video in my inbox. damn very on time.

  • @Max-zy2ie
    @Max-zy2ie 8 หลายเดือนก่อน +3

    When building multi agent orchestration systems, what is your preferred stack? Do you use langchain, autogen or just native APIs?

  • @450aday
    @450aday 7 หลายเดือนก่อน +1

    you really should not use Ai's for multiplication, use a calculator. Find Tool Ai is an important Ai to save money. Button Ai is another good one.

  • @michaelwallace4757
    @michaelwallace4757 8 หลายเดือนก่อน +9

    A step by step build of an agent architecture would be invaluable! Thank you for the video.

  • @Tanvir1337
    @Tanvir1337 8 หลายเดือนก่อน +2

    Mixtral 8x7b*

  • @savire.ergheiz
    @savire.ergheiz 7 หลายเดือนก่อน +1

    Sorry to say this but almost all of your mentioning here are based on bad planning and rushing things out without thinking of the after effects.
    Its not just in AI. Its always been like that since forever if you tried to follow hype.
    Unless you got backed by big companies or investor planning way ahead with costs is always be a must.

  • @Ke_Mis
    @Ke_Mis 8 หลายเดือนก่อน +9

    Your content is just superb as always Jason!

  • @serenditymuse
    @serenditymuse 8 หลายเดือนก่อน +3

    Excellent. Most of his videos are but this one was especially useful to me.

  • @oryxchannel
    @oryxchannel 7 หลายเดือนก่อน +1

    See groundswell paper dated Jan 29th 2024: "Towards Optimizing the Costs of LLM Usage." These Indian authors are gonna kick some serious but regarding costs. I see the FrugalGPT paper in your video too. Thank you for offering real world case scenarios of your personal experience. Edit: This video is a trove on frugal LLM building. Awesome job!

    • @AIJasonZ
      @AIJasonZ  7 หลายเดือนก่อน

      Thank you!

  • @ZacMagee
    @ZacMagee 8 หลายเดือนก่อน +4

    Love your content man. You have helped me really expand my knowledge and push my boundaries

  • @r3kRaP
    @r3kRaP 7 หลายเดือนก่อน +2

    You should change your name to jAIson

    • @AIJasonZ
      @AIJasonZ  7 หลายเดือนก่อน

      Hahah love it

  • @misterloafer5021
    @misterloafer5021 8 หลายเดือนก่อน +4

    Yes, please do a video on multi agent methods

  • @kguyrampage95
    @kguyrampage95 8 หลายเดือนก่อน +3

    at 8:05 you made an obvious mistake with the maths, your probably meant the cheapest model not mistral. since it would 50x cheaper not 214x cheaper

    • @AIJasonZ
      @AIJasonZ  8 หลายเดือนก่อน +2

      Ahh I highlight the wrong row, if should be mistral 7b, thanks for spotting this mate!

    • @kguyrampage95
      @kguyrampage95 7 หลายเดือนก่อน

      @@AIJasonZ Hey this video was great by the way! I am learning to make video to showcase some my experiments and I am hoping I can produce as much quality as you!

  • @leandroimail
    @leandroimail 8 หลายเดือนก่อน +2

    Tks very much for this video. I have been having problems with the cost of my agents. I will do this tips and clue that you gave. Thks again.

  • @oscarcharliezulu
    @oscarcharliezulu 8 หลายเดือนก่อน +3

    Excellent video great to hear real world experience from a real Dev

  • @yazanrisheh5127
    @yazanrisheh5127 8 หลายเดือนก่อน +1

    Hey Jason. You said at around minute 9 that we should use a model like GPT 4 to get data and then use that to fine tune but how much data do we need so that our fine tuned mistral model will be performing as good as gpt 4?

  • @MaximIlyin
    @MaximIlyin 8 หลายเดือนก่อน +1

    Great video, thanks!
    Why not store Agent conversation memory in embeddings and retrieve only relevant (by cosine similarity) to the current user query as a context?
    (Like a RAG for conversation memory)

  • @quickcinemarecap
    @quickcinemarecap 8 หลายเดือนก่อน +4

    00:05 Using autonomous sales agents led to unexpected high costs.
    02:11 AI startup costs fluctuate with usage
    06:12 Marketing teams are adopting AI for automation and hyper-personalized customer experiences.
    08:19 Using smaller models can reduce cost by multiple magnitudes.
    12:27 Customize router for cost reduction
    14:23 Using small models can significantly reduce the token and word count for large language models
    18:13 Reducing large language model costs
    19:55 Analyze token consumption for cost optimization
    23:18 Agent executor identifies cost breakdown and offers cost reduction strategies
    25:00 Using GPT-3.5 turbo and staff documents for detailed and cost-effective summarization.

    • @christopherd.winnan8701
      @christopherd.winnan8701 8 หลายเดือนก่อน

      Usman, what does it cost in terms of tokens to to run your summary AI? Does it use an open source model?

    • @quickcinemarecap
      @quickcinemarecap 8 หลายเดือนก่อน

      @@christopherd.winnan8701 0.05 for every 30 minutes of summary

    • @quickcinemarecap
      @quickcinemarecap 8 หลายเดือนก่อน

      @@christopherd.winnan8701 its gpt4 and cost me 30 cents for every 1 hour of summary

  • @nicechannel9720
    @nicechannel9720 8 หลายเดือนก่อน +1

    A great dive into the cost of Al models as it is hard to find related content. Can you do a video about how much Openai is roughly spending on computaion cost and also how this constraint will hinder the adaptation of these models in the enterprise space. Great job man 👍

  • @taylorthompson4212
    @taylorthompson4212 8 หลายเดือนก่อน +4

    This video came at the perfect time. Thank you

    • @AnshTiwari-fx2yq
      @AnshTiwari-fx2yq 8 หลายเดือนก่อน +1

      ikr, grateful to Jason

  • @gsolaich
    @gsolaich 7 หลายเดือนก่อน +1

    We were planning to build ai assistant kind apps but always pull back due to cost it incurs , this is a fabulous video that has given us a new direction to go ahead. Thanks a lot .... looking forward to see other videos

  • @matten_zero
    @matten_zero 8 หลายเดือนก่อน +17

    This is the biggest flex ever! 💪I can only dream to be as cool of an AI Engineer as you. I thought building a digital agent with automatic voice that can do RAG was cool.
    There are levels to this game an Jason is on a whole different world. Thanks for posting these videos. It's educational, funny and inspirational for me.

  • @clamhammer2463
    @clamhammer2463 7 หลายเดือนก่อน +1

    I had this idea for LLM routing a while back and wondered why nobody has done it. I figured there was some sort of information I didnt have that was stopping it.

  • @YoannGrudzien
    @YoannGrudzien 7 หลายเดือนก่อน

    Prompt Engineer and LLM developer here.
    GPT4 32k is not the most powerful model, it is outclassed by GPT-4- preview-1106 and now GPT-4-preview-0125 which is even better.
    Not only is GPT-4-32k worse, it is also 6 times more expensive ! ($0.06/1k token for GPT 4 32k, and only $0.01/1k token for gpt-4-preview-0125)

  • @vinception777
    @vinception777 8 หลายเดือนก่อน +1

    Thanks a lot, like James Briggs and some other, your content is outstandingly great. These are really important information that I need at work 🙏☺

  • @chengchangyu
    @chengchangyu 22 วันที่ผ่านมา

    A step by step build of an agent architecture would be very helpful. I am looking forward of it.

  • @Beloved_Digital
    @Beloved_Digital 8 หลายเดือนก่อน +1

    I am a newbie when it comes to build AI powered apps.
    Although i don't fully understand all you say because i am still learning the basics all i can say is Thank you for sharing this valuable contents with us

  • @nikilragav
    @nikilragav 6 หลายเดือนก่อน

    14:56 - seems like this might not work well for needle in haystack approaches, right? Because if you want to ask "what departments were present at this session?" the bigger model does not have an answer to that in its context. You'd need some kind of vector similarity check first to assess whether the answer might even exist in the context given to the bigger model? And if not, give the whole thing? Or at least do some RAG-style look up and fetch? I'm not so sure how well RAG can do needle in haystack searching though. Seems highly dependent on your embedding model, and openAI doesn't have an option to use GPT4 embedding space, right?

  • @slddive9025
    @slddive9025 5 หลายเดือนก่อน

    Hubspot AI Course?! You found that useful?! What a waist of time. It's a leadmagnet for lead generation. Affiliate link etc.

  • @betun130
    @betun130 4 หลายเดือนก่อน

    Superb content Jason, I will highly recommend your videos to everyone getting their hands dirty with LLMs. I am gonna try some of these myself. It's a shame I didn't build it before because something like the AI router occurred to me but I do not have the patience to implement these.

  • @bhaumiks.6543
    @bhaumiks.6543 6 หลายเดือนก่อน

    I am intrested learning about architecture. By the way, Amazing videos...

  • @jakobbourne6381
    @jakobbourne6381 7 หลายเดือนก่อน

    Stay ahead in the competitive market by leveraging the unique capabilities of *Phlanx's Caption Generator* , which not only saves you valuable time but also contributes directly to revenue growth through increased customer engagement.

  • @LaelAl-Halawani-c4l
    @LaelAl-Halawani-c4l 6 หลายเดือนก่อน

    That's not true that's a 'new type of cost'. Traditional software companies always need to care and look out for API costs. Anyone who used gdloud or aws racked up some unexpectedly high API costs one way or the other. You can also set some spending limits in your API settings on OpenAI platform.

  • @mattbegley1345
    @mattbegley1345 6 หลายเดือนก่อน

    Excellent!👍 Applying that Assistant Hierarchy to your Sales Agent would be a good video.

  • @momentumsoftio
    @momentumsoftio 7 หลายเดือนก่อน

    You can also use natural language processing lemmatization to convert words into their lemma, or root word, to reduce the content "weight" or token count. You don't need the extra word garbage like suffixes. LLMs do a good job of extracting meaning from lemmatized content. Its like you are cutting through the syntactic sugar of the English language and getting to the root meaning and not wasting the LLMs time

  • @noodjetpacker9502
    @noodjetpacker9502 7 หลายเดือนก่อน

    I don’t know if this is a stupid question but why doesn’t ChatGPT already implement these features for themselves? Or do they already do these?

  • @dadlord689
    @dadlord689 7 หลายเดือนก่อน

    Sort of a scam business. Fake girlfriend? Hak is this? What a waste of everything

  • @Blueprint4Murder
    @Blueprint4Murder 7 หลายเดือนก่อน

    There are countless free ai bots for all sorts of things and you can even make your own. If you are paying for ai functionality you are being ripped off. Even the best ai models enter morality loops if you are trying to use them for questionable things and maybe that should be a problem to look at.

  • @kernsanders3973
    @kernsanders3973 6 หลายเดือนก่อน

    Think what would also work in terms of the agents scenario, in real life there is a moderator between huge disagreements with employees. Which would be their team lead. So the if a disagreement occurs where its multiple replies the TL needs to step in and lay down the rules and law for work and code of conduct and make a final decision on the disagreement.

  • @aifortheworld7152
    @aifortheworld7152 8 หลายเดือนก่อน

    did you get the ai girlfriend to work? Because you can now create ai sales agent for your website to talk to. hope to hear from you

  • @rchaumais
    @rchaumais 7 หลายเดือนก่อน

    Many thanks for your useful video.
    Have you evaluated Nemo from Nvidia ?

  • @simonmassey8850
    @simonmassey8850 7 หลายเดือนก่อน

    companies put in “fair usage” clauses to cap or throttle users. ask you smart “sales agent” about that idea.

  • @matten_zero
    @matten_zero 8 หลายเดือนก่อน +1

    I've done that before @18:46. It works pretty well esp when you combine with SPR (popularized by David Shapiro).

  • @holdingW0
    @holdingW0 7 หลายเดือนก่อน +1

    Excellent video. Subbed and hope you keep the content coming!

  • @JimMendenhall
    @JimMendenhall 8 หลายเดือนก่อน +1

    Thanks for sharing your insights from your work. It's very helpful!

  • @nufh
    @nufh 7 หลายเดือนก่อน

    I managed to build the clone for AI GF for free now with local LLM.

  • @nexusinfosec
    @nexusinfosec 7 หลายเดือนก่อน +1

    Yes please for a video deepdiving into agent architecture for autogen

  • @shervintheprodigy6402
    @shervintheprodigy6402 หลายเดือนก่อน

    This is a great video! Exactly what I was looking for!

  • @addisobi772
    @addisobi772 หลายเดือนก่อน

    Great Jason , You have help me understanding alot

  • @mohamedaminehamza
    @mohamedaminehamza 7 หลายเดือนก่อน

    it's Real life silicon Valley serie Scenario where two ai start talking to each lol.

  • @Cygx
    @Cygx 8 หลายเดือนก่อน

    I wouldn’t be surprised if you made $1 million a year soon

  • @goutamkelam6117
    @goutamkelam6117 7 หลายเดือนก่อน

    🎯 Key Takeaways for quick navigation:
    19:51 💡 *Analyze token consumption for cost optimization.*
    20:19 💻 *Install Lens Smith and set up.*
    21:01 🛠️ *Setup environment variables for connection.*
    21:43 📊 *Implement tracking methods for insights.*
    22:12 📚 *Utilize Lanching for research projects.*
    23:06 📝 *Log project activities for monitoring.*
    24:03 💰 *Analyze token costs for optimization.*
    24:31 📉 *Reduce GPT-4 usage for cost savings.*
    25:12 📄 *Implement content summary for efficiency.*
    26:09 ✂️ *Optimize script tool for better results.*
    Made with HARPA AI

  • @sewingsugar9892
    @sewingsugar9892 8 หลายเดือนก่อน +1

    This channel is so underrated

  • @SophieCheung
    @SophieCheung 7 หลายเดือนก่อน

    thanks for your video! :)

  • @TimBnb
    @TimBnb 8 หลายเดือนก่อน +1

    Cette chaîne est la meilleure école existante à ce jour.
    Merci Jason

  • @geneanthony3421
    @geneanthony3421 7 หลายเดือนก่อน

    This narrated by Tommy Wiseau?

  • @WaxN-ey6vj
    @WaxN-ey6vj 8 หลายเดือนก่อน

    Since GPT development is rapid, I think making fine-turning model is risky due to time consuming.
    The cost won’t be a big deal as Open AI constantly develops a new model and reduces the cost of previous one.

  • @jonmichaelgalindo
    @jonmichaelgalindo 7 หลายเดือนก่อน

    Low-cost LLMs will win. Opensource, low parameter count, fast inference architecture, compute distributed to regional servers.

    • @MsDuketown
      @MsDuketown 6 หลายเดือนก่อน

      Security as hardware appliance, ie. Pluton chip.

  • @alibahrami6810
    @alibahrami6810 8 หลายเดือนก่อน

    Great video! Could you please make a video about putting an llm to the production, with concerns of parallellism, memory and gpu usage, load ballancing, effective software artitechure? How to scale up a local llm to be accessible world wide like gpt, with optimizing memory and resources in mind? THanks

  • @roke4025
    @roke4025 8 หลายเดือนก่อน

    🎉 Brilliant mate. I’m a fiend for compressing costs to maximum, but I found out that during cost compression some models (eg. Mistral tiny) are not able to make proper custom tool calls and are unable to extract out the JSON response result from the tool call. As soon as a switch is made to an OpenAI model fine tuned to recognise json schemas, tool calls work perfectly (in Flowise). Is that why you persist in using OpenAI models in your calls? As opposed to using a Mistral or Llama inference? So you can achieve the right tool calling?

  • @tks5182
    @tks5182 7 หลายเดือนก่อน

    Would appreciate a course or even a comment on what knowledge you need and what concepts you should know to be an AI & ML Engineer

  • @breathandrelax4367
    @breathandrelax4367 7 หลายเดือนก่อน

    Hi Jason,
    thank you for the video impressive work !
    while building the app what do you think of using if /else chain that will reroute to a particular llm ?

  • @tirthb
    @tirthb 6 หลายเดือนก่อน

    Wow, super practical tips.

  • @funny_tiger11
    @funny_tiger11 8 หลายเดือนก่อน

    Is portkey ai an example of opensource LLM Router? ( I have not used it, but it seems to allow the capability for what you spoke about limitation of Neutrino AI

  • @mosca204
    @mosca204 8 หลายเดือนก่อน

    So you inadvertently built a massive email warm-up. At least you will not be flagged as spam for a long time ahah.
    PS: It would be great to see a sales agent video soon ;)

  • @headrobotics
    @headrobotics 8 หลายเดือนก่อน

    For fine tuning a small model from a large one, what about OpenAI terms of service? Has it changed to allow?

  • @xonack
    @xonack 8 หลายเดือนก่อน +1

    ecoassistant video please!

  • @JohnByrneLSM
    @JohnByrneLSM 7 หลายเดือนก่อน

    Excellent video! I just ran into issues with memory for conversations and I really like the strategies you've offered in this. Thank you.

  • @ianalmeida4759
    @ianalmeida4759 8 หลายเดือนก่อน

    Reminds me of that scene in Silicon Valley where AI Dinesh speaks to AI Gilfoyle

  • @archerkee9761
    @archerkee9761 6 หลายเดือนก่อน

    nice video, thanks!

  • @momentumsoftio
    @momentumsoftio 7 หลายเดือนก่อน

    😱 an infinite ai loop holy shit 😂

  • @ismbeatz
    @ismbeatz 8 หลายเดือนก่อน

    Kek 5k man i feel sorry 😅

  • @prestonmccauley43
    @prestonmccauley43 8 หลายเดือนก่อน

    If you use the big ones like azure bedrock etc, they are so expensive on deploy with the compute

  • @Bakobiibizo
    @Bakobiibizo 8 หลายเดือนก่อน

    hahah i knew that was going to happen eventually. eventually it will be all ai agents just talking back and forth

  • @prestonmccauley43
    @prestonmccauley43 8 หลายเดือนก่อน

    What other services have you found for deployment that are cost friendly? You have to install vms containers and more

  • @subratnayak2682
    @subratnayak2682 8 หลายเดือนก่อน

    For the cascade method how will measure the score for each new question while on the production?

  • @evermorecurious91
    @evermorecurious91 7 หลายเดือนก่อน

    BRO, this is gold!!!

  • @sanesanyo
    @sanesanyo 8 หลายเดือนก่อน

    Can someone please explain me how GPT4 32k is more powerful than GPT 4 128k Turbo? I thought GPT 4 128k Turbo was the best Open AI model.

    • @ryzikx
      @ryzikx 8 หลายเดือนก่อน

      its not idk why he says that

    • @AIJasonZ
      @AIJasonZ  8 หลายเดือนก่อน

      In my experience, gpt4 turbo is faster, cheaper, however, less stable performance & a bit “dumber” than. Gpt4 32k;
      E.g. when I build agents, I found gpt4 turbo often ignore some instructions & forget doing some steps; while using 32k the performance is much more stable

  • @rishi8413
    @rishi8413 8 หลายเดือนก่อน

    really love your videos, are there any packages or libraries to use these 7 methods you discussed

  • @matten_zero
    @matten_zero 8 หลายเดือนก่อน

    I'm taking all of this for my startup. This is the way and creates a moat for you assuming you hold on to the weights afterwards

  • @user-qr4jf4tv2x
    @user-qr4jf4tv2x 7 หลายเดือนก่อน

    small llm are the future

  • @gabrieleguo
    @gabrieleguo 7 หลายเดือนก่อน

    Thanks Jason, your content is always on point and very insightful. Keep it up man!

  • @SergiySev
    @SergiySev 7 หลายเดือนก่อน

    such a good video!

  • @ryzikx
    @ryzikx 8 หลายเดือนก่อน

    ive always wanted to do this but im too dumb and lazy lmao, good to see someone like you is doing it

  • @mjkbird
    @mjkbird 8 หลายเดือนก่อน

    Isn't it against OpenAI's ToS to use the output as training data?

  • @xugefu
    @xugefu 8 หลายเดือนก่อน +1

    Thanks!

  • @vinitvsankhe
    @vinitvsankhe 7 หลายเดือนก่อน

    But what if I need an AI that needs to be trained with one data snapshot?

  • @blackswann9555
    @blackswann9555 7 หลายเดือนก่อน

    Very interesting

  • @the-ghost-in-the-machine1108
    @the-ghost-in-the-machine1108 8 หลายเดือนก่อน

    this was an intense, highly informative lecture. Thanks Jason, appreciate your work!

  • @aiforsocialbenefit
    @aiforsocialbenefit 7 หลายเดือนก่อน

    Excellent tutorial. Thank you!

  • @MrTalhakamran2006
    @MrTalhakamran2006 7 หลายเดือนก่อน

    Thank you Jason for your hard work to put this together.

  • @Ryan-yj4sd
    @Ryan-yj4sd 8 หลายเดือนก่อน

    Fine tuning for token reduction is a key technique I’ve used

  • @Jeff-wl1cz
    @Jeff-wl1cz 8 หลายเดือนก่อน

    bro 2 bots met and they couldnt shut up haahahah

  • @zhubarb
    @zhubarb 8 หลายเดือนก่อน

    This is a very good video. Appreciate it.

  • @froozynoobfan
    @froozynoobfan 8 หลายเดือนก่อน

    dinesh and gilfoil will laugh at this #siliconvalley