RouteLLM achieves 90% GPT4o Quality AND 80% CHEAPER

Matthew Berman

มุมมอง 53 777

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 21 ก.ค. 2024
RouteLLM, a new project, and paper from lmsys.org, allows for the intelligent routing of prompts to the "right" model. It achieves 90% of the quality of GPT4o but reduces the cost by 80%.
Go from shiny demos to reliable AI products that delight your customers with Langtrace. Visit the website to learn more and join the community of innovators
Try Langtrace FREE today: bit.ly/4bzB3nJ
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com
Need AI Consulting? 📈
forwardfuture.ai/
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
👉🏻 Instagram: / matthewberman_ai
👉🏻 Threads: www.threads.net/@matthewberma...
👉🏻 LinkedIn: / forward-future-ai
Media/Sponsorship Inquiries ✅
bit.ly/44TC45V
x.com/lmsysorg/status/1807812...
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 226

@matthew_berman 13 วันที่ผ่านมา ⁺⁵⁴
My "AI Stack" is RouteLLM, MoA, and CrewAI. What about you?
@craiggriessel1872 13 วันที่ผ่านมา ⁺¹
AISheldon 🤓
@shalinluitel1332 13 วันที่ผ่านมา ⁺⁴
It would be best to have alternatives to all these which are free and open source. Maybe later down the line.. The video is really cool tho! Thanks Matthew
@santiagomartinez3417 13 วันที่ผ่านมา ⁺⁸
Is MoA mixture of agents?
@AIGooroo 13 วันที่ผ่านมา ⁺²⁰
Mathew, please do the full tutorial on how to set this up. thank you
@smokewulf 13 วันที่ผ่านมา ⁺¹⁵
RouteLLM, MoA, and Agency Swarm. Should do a video on Agency Swarm. I think it is the best agentic framework
@davtech 13 วันที่ผ่านมา ⁺¹²⁰
Would love to see a tutorial on how to set this up.
@AlexBrumMachadoPLUS 13 วันที่ผ่านมา ⁺²
Me too ❤
@bamit1979 13 วันที่ผ่านมา
I think some other AI enthusiast covered it a few days back. It was quite easy. Check TH-cam.
@ChristianNode 13 วันที่ผ่านมา
get the agents to watch it and do it.
@sugaith 13 วันที่ผ่านมา
On how to set this up IN THE CLOUD as well or preferebly
@averybrooks2099 13 วันที่ผ่านมา ⁺³
Me too but on a local machine instead of a third party service.
@cool1297 13 วันที่ผ่านมา ⁺⁷²
Please do a tutorial for local installation for this. Thanks
@camelCased 13 วันที่ผ่านมา ⁺⁴
What exactly? As I understand, RouteLLM is not an LLM itself but just a router.
You can install local LLMs very easily using Backyard AI.
@m8hackr60 13 วันที่ผ่านมา ⁺²
Sign me up for the full tutorial!
@DihelsonMendonca 13 วันที่ผ่านมา
@@camelCased Or LM Studio
@bigglyguy8429 13 วันที่ผ่านมา ⁺¹
@@camelCased But how to use the router with Backyard?
@camelCased 13 วันที่ผ่านมา
@@bigglyguy8429 Why would you want to use the router at all, if running LLM models locally?
@clapppo 13 วันที่ผ่านมา ⁺²⁸
it'd be cool if you did a vid on setting it up and running it locally
@anubisai 13 วันที่ผ่านมา
Olama or LLM studio?
@velocityerp 13 วันที่ผ่านมา ⁺¹⁰
Matthew - for those of us who develop line-of-business apps for SME businesses - local LLM deployment is a must. Would certainly like to see you demo RouteLLM with orchestration - Thanks!
@josephremick8286 13 วันที่ผ่านมา ⁺²⁴
I am a cyber security analyst who knows very little about coding so, between your videos and just straight asking ChatGPT or Claude, I am ham-fisting my way through getting AI to run locally. Please keep making tutorial videos - I am excited to see how to impliment RouteLLM!
@s2turbine 13 วันที่ผ่านมา ⁺⁴
I agree, I'm pretty much in the same boat as you. The problem is that my knowledge is outdated by the time I finally figure things out because there is so much advancement in so little time. I think we need a "checkpoint" how-to on how to do things now, as opposed to 3 months ago.
@DihelsonMendonca 13 วันที่ผ่านมา
If you don't know much about anything, like me, but want to run LLMs locally, you just need to install LM Studio. No need to understand anything. On the software, it has even the option to download and install them, and run. That's what I use. Now that I learned a bit more, I will try to install Open WebUI, Ollama and Docker, these are way more complicated. 🎉❤
@bernieapodaca2912 13 วันที่ผ่านมา ⁺²
Yes! Please show us a comprehensive breakdown of this great tool!
I’m also interested in your sponsor’s product, LangTrace. Can you possibly show us how to use it?
@caseyvallett8953 13 วันที่ผ่านมา ⁺⁶
Absolutely do a detailed tutorial on how to get this up and running!
@MichaelLloydMobile 13 วันที่ผ่านมา ⁺⁵
Yes, please provide a tutorial on setting up the described language model.
@aiforculture 13 วันที่ผ่านมา ⁺²
Great breakdown, much appreciated. I definitely foresee local LLMs becoming dominant for organisations as soon as next year. My advice during consults is for them not to invest a massive amount in high-end data secure cloud systems, but just to hang on a little, work with dummy data on current models to build up foundational knowledge, and then once local options exist they can start diving into more sensitive analytics.
@AshishKumar-hg2cl 13 วันที่ผ่านมา ⁺¹
Hey Matt, yes it would be great if you could show a demo of how to setup this model on Azure OpenAI or Azure Databrix and then use it in the application.
@AngeloXification 13 วันที่ผ่านมา ⁺³
I feel like everyone is realising things at the same time. I started 2 projects, the first an LLM co-ordination system and a chain of thought processing on specific models
@dezigns333 13 วันที่ผ่านมา ⁺¹⁶
It's time people admit that benchmarking off GPT4 is stupid. When GPT4 came out it was amazing. Now its no better than any other LLM. Ever since OpenAI introduced cheaper Turbo models, the quality has gone down hill. They sacrificed intelligence for speed to the point where they have plateaued in quality and its not getting better no matter how new models they release.
@orthodox_gentleman 13 วันที่ผ่านมา
Thanks for being real bro. I absolutely agree with you. I barely even use ChatGPT anymore because it sucks.
@irql2 13 วันที่ผ่านมา
"Now its no better than any other LLM" -- do you really believe this? Seems like you do. That's certainly a take.
@kyleabent 13 วันที่ผ่านมา
I agree man I don't care about speed as much as I care about accuracy. I'll happily wait for a better response than rapidly go through 2-3 quick responses that need more time in the oven.
@mrbrent62 13 วันที่ผ่านมา ⁺¹
I also saw where they will have 20TB m.2 drives in a couple of years. Running this LLM locally will be really cool.
@joe_limon 13 วันที่ผ่านมา ⁺⁷
There seems to be a hold up on the highest end models as the leading companies continually try to improve safety while watching their competition. Nobody seems to want to jump in and release a new/better model at risk of the potential "dangerous" label being applied to them. So a lot of the progress remains hidden in the lab, waiting for competition to finally engage.
@steveclark9934 13 วันที่ผ่านมา ⁺¹
Improve safety really means neuter.
@davidk.8686 13 วันที่ผ่านมา
So far with LLM's "data is code" ... it is inherently unsafe, unless something fundamentally changes
@D0J0Master 13 วันที่ผ่านมา ⁺¹
How would this effect mixture of agents? Could we have multiple route llms combined together since they use such lower compute?
@jamesvictor2182 13 วันที่ผ่านมา
Just popping up to say thanks Matthew. You have become almost my only required source for AI news because your take is right up my street every time. Great work, keep it coming
@CookTheBruce 12 วันที่ผ่านมา
Yes! The tutorial. Great vid. Sharing with my crew...Just beginning an AI Consultant agency and cost is an existential threat!!!
@wardehaj 13 วันที่ผ่านมา ⁺¹
Thanks for this video. Very informative.
Please make a full tutorial about the setup of route llm and what the recommendations of the local pc should be. Thank you in advance!
@MEvansMusic 2 วันที่ผ่านมา
can this be used to route between agents as opposed to model instances? for example routing to chain of thought agent vs simple q and a agent?
@madelles 13 วันที่ผ่านมา ⁺¹
It would be interesting to see how this will work on your AI benchmark. Please do a setup and test
@danielhenderson7050 13 วันที่ผ่านมา ⁺¹
I think you misrepresented the graph. The "ideal router" point on the graph is likely just that - the ideal. I don't think that's claiming actual results
@MarcvitZubieta 13 วันที่ผ่านมา ⁺¹
Yes! please we need a full tutorial!
@NNokia-jz6jb 13 วันที่ผ่านมา ⁺⁵
So, how to run it. And on what hardware?
@threepe0 12 วันที่ผ่านมา
Lmgtfy
@limebulls 13 วันที่ผ่านมา
Yes please full set up!
@rilum97 13 วันที่ผ่านมา
You are so consistent bro, keep it up 🙌
@socialexperiment8267 13 วันที่ผ่านมา ⁺¹
Danke! As always great!🎯👍
@antonio-urbanculture 13 วันที่ผ่านมา
Yes I really like your idea of a complete install and running tutorial. Go for it. 🙏 Thanks 👍
@kamilnowak4329 13 วันที่ผ่านมา
The only channel where i actually watch ads. Very interesting stuff
@MoadKISSAI 13 วันที่ผ่านมา
Always yes for full tutorial
@KingMertel 13 วันที่ผ่านมา
Hey Matt, what are these routers exactly? (They are not LLM I understand) And how do they determine where to route to?
@nate2139 13 วันที่ผ่านมา
This sounds interesting, but does it offer the same capability that the OpenAI API offers with customizable assistants, RAG, and function calling? I still have yet to find anything that compares. Would love to see something open source that can do this.
@johngrauel1661 13 วันที่ผ่านมา
Yes - please do a full tutorial on setup and use. Thanks.
@parimalthakkar1796 11 วันที่ผ่านมา
Would love a local setup tutorial! Thanks 😊
@dantfamily9831 13 วันที่ผ่านมา
I'd be interested in what hardware is needed to run something like this locally. I was waiting until late fall or early next year to buy, but I might need to get an intern system to train up. I am big on local control except when needed to reach out.
@Ed-Shibboleth 13 วันที่ผ่านมา
That's good stuff. I will take a look at the codebase. Thanks for sharing
@jlwolfhagen 12 วันที่ผ่านมา
Would love to see a tutorial on setting up RouteLLM! 🙂
@rafaeldelrey9239 12 วันที่ผ่านมา
The article used GPT 4, not GPT4-O, which is already 50% of GPT4 cost. Or am I missing something?
@sophiophile 13 วันที่ผ่านมา
After developing exclusively on GPT models, then joining an org with a ridiculous amount of free GCP credits and being pushed to use Gemini family instead- I can honestly say that while differences on benchmarks may seem small, they end up being really extreme in practice. I spent days smashing my head against a wall trying to get Gemini to provide quality responses, and after switching to 4o, I was literally ready to deploy.
There still don't seem to be great benchmarks that represent performance of generative models well.
@davieslacker 13 วันที่ผ่านมา
I would love to catch a tutorial of you setting it up!
@thecatsupdog 13 วันที่ผ่านมา
Does your local model search the internet and summarize a few web pages? That's what chatgpt does for me, and that's all I need.
@harshshah0203 13 วันที่ผ่านมา ⁺¹
Yes do make a whole tutorial on it
@ralfw77 13 วันที่ผ่านมา
Hi Mathew,
I love your channel. I’m curious if you would be willing to explore Pi ai? It doesn’t compare to the others in the same way. Maybe it’s hard to test. But very interesting. It’s trained to be empathetic and you can actually have a conversation with voice that feels satisfying.
@solifugus 13 วันที่ผ่านมา
Yes please... Full tutorial on setting this up to run locally. Also, I'd like to know how to setup multi-modal so I can show my images and casually talk to it (local).
@MagusArtStudios 13 วันที่ผ่านมา
First thing I did a year and a half ago was routing different LLMs via a zero-shot classifier. Looks like Route has done the same thing lol. I figured it was common sense.
@mafo003 13 วันที่ผ่านมา
Ive seen you do techdev before and would love to see you do this one as well please.
@phieyl7105 8 วันที่ผ่านมา
Problem with this method is that there are some trade offs. While it maybe cheaper at answering a question directly; you sacrifice its social intelligence. Even though you get the right answer, the way the answer is phrased can be the difference between either a toddler or a graduate student. Personally I wauld want to talk with the graduate student.
@MattReady 13 วันที่ผ่านมา
I’d love a guide to easily set this up for myself
@Idea-LabAi 13 วันที่ผ่านมา
Please do a tutorial. And need to measure performance to validate the performance - cost graph.
@galdakaMusic 13 วันที่ผ่านมา
We need something locally for non difficult pourpouses. For example local home Assitant control.
@PatrickWriter 13 วันที่ผ่านมา
Yes please make a tutorial on the routerLLM.
@AseemChishti 10 วันที่ผ่านมา
Yes, give a walkthrough video for RouteLLM
@aleksandreliott5440 7 ชั่วโมงที่ผ่านมา
I would love to see a tutorial on how to get this running locally.
@xhy20x 13 วันที่ผ่านมา ⁺¹
Please do a demonstration
@macjonesnz 12 วันที่ผ่านมา
I think they are saying the brown dot is where an ideal LLM would be placed, I'm not sure that Route LLM is better than Claude 3 Opus. SO not sure where on that chart their router actually is. probably down with Llama 3 8b. Cause it's only job its to route.
@woszkar 13 วันที่ผ่านมา
Is this an LLM that we can use in LM Studio?
13 วันที่ผ่านมา
its just a proxy to send queries to two models, weak vs strong. It's not a new LLM.
@imramugh 11 วันที่ผ่านมา
I’d love to see a demo if possible.
@hipotures 13 วันที่ผ่านมา
Reading and watching anything about AI is like a live broadcast of the Manhattan Project in 1942. The current year is 1944?
@audiovisualsoulfood1426 9 วันที่ผ่านมา
Would also love to see the tutorial :)
@leonwinkel6084 13 วันที่ผ่านมา
For coding this would be insane. Mixed local and api endpoints
@knecting 13 วันที่ผ่านมา
Hey Matt, please do a tutorial on setting this up.
@3enny3oy 13 วันที่ผ่านมา
You should consider including Semantic Kernel and GraphRAG in that ideal stack
@executivelifehacks6747 13 วันที่ผ่านมา
I suspect these features, plus dedicated non-GPU hardware will eventually reduce energy costs per "thought" to less than the human brain. Currently perplexity using Sonnet 3.5 thinks GPT4 uses 25x more.
@parthwagh3607 12 วันที่ผ่านมา
yes we need detailed video
@monnef 13 วันที่ผ่านมา
Promising, but a bit mess with naming. They are using GPT-4 to mean at least GPT-4 Turbo and GPT-4 Omni in various places. I am not even sure if on some place they don't really mean the older model GPT-4.
@martingauthier5245 12 วันที่ผ่านมา
It would be really cool to have a tutorial on how to implement this with ollama
@ashtwenty12 13 วันที่ผ่านมา
Could you do a tutorial on RAG (retrieval augmented generation) ? I think I'll be pretty massive thing in agentic archetecure. Also I think RAG might soon be more than just text and PDFs 😂 in the not too distant future.
@geekswithfeet9137 11 วันที่ผ่านมา
Every single time I’ve seen a claim like this, the output in real usage never compares
@orthodox_gentleman 13 วันที่ผ่านมา
This wasn’t just released. It had been around for a while. Now that GPT-4o and Claude 3.5 Sonnet exist things are much cheaper. I can understand using a local LLM with these two but overall the cost savings are not as big of a deal as before.
13 วันที่ผ่านมา
API for claude and GPT is sitll expensive.
@ritviksinghal9190 13 วันที่ผ่านมา
An implementation would be interesting
@MPXVM 13 วันที่ผ่านมา
If runs on local machine, why needs OPENAI_API_KEY ?
13 วันที่ผ่านมา ⁺¹
because it still needs to query weak models (like mistral) and strong models like GPT
@angelwallflower 12 วันที่ผ่านมา
yes I vote for tutorial for set up please thank you
@andresfelipehiguera785 13 วันที่ผ่านมา
A tutorial would be great!
@BradleyKieser 13 วันที่ผ่านมา
Yes please, do the tutorial.
@calvingrondahl1011 13 วันที่ผ่านมา
Thank you Matt🖖🤖👍
@sapito169 13 วันที่ผ่านมา
wonderfull
know you can offer a low cost service and a primun service at diferent prices
@RaedTulefat 13 วันที่ผ่านมา
Yes Please. a tutorial!
@HawkX189 13 วันที่ผ่านมา ⁺¹
Let me launch this... Online models are saving themselves yet because of context.
@Alice_Fumo 13 วันที่ผ่านมา
I really don't find this to be a big deal. I expect people select the model to use themselves on a per-task basis on what they believe is the most appropriate one for the task. For me the decision process is really simple:
1. is it code or requires complex problem-solving? -> Claude 3.5 Sonnet
2. Do I want to have a deep conversation with a creative partner -> Claude 3 Opus
3. Is it anything the other models would refuse? -> GPT-4o
4. Is it too private for any of the above? -> Local LLM
I don't need a router for this and I wouldn't trust it to reliably choose the same way I would either.
@keithhunt8 12 วันที่ผ่านมา
Yes, please.🙏
@mickmickymick6927 13 วันที่ผ่านมา
95% of my queries, even GPT 4o or Sonnet 3.5 can't answer so I don't know what your queries are that local models usually handle fine.
@nashad6142 13 วันที่ผ่านมา
Yessss! Go open source
@davidk.8686 13 วันที่ผ่านมา
When "data = code", how can you have security while having a actually useful / powerful AI?
@mikezooper 13 วันที่ผ่านมา
It doesn’t change anything. LLMs are good at certain tasks (most of which aren’t as useful as we need, and most don’t help us earn money). AI has plateaued. They haven’t replaced software engineers.
@heltengundersen 3 วันที่ผ่านมา
Claude 2.5 Sonnet missing from the chart.
@tytwh 13 วันที่ผ่านมา ⁺¹
Do you and Wes Roth collaborate? They uploaded an identically titled video 2 hours ago.
@matthew_berman 13 วันที่ผ่านมา
No. He just copied me exactly...again
@opita 13 วันที่ผ่านมา
Can you please look into alloy voice assistant
@samuelopoku4868 12 วันที่ผ่านมา
If I could like and subscribe harder I would. Tutorial would be fantastic thanks 👍🏿
@Kutsushita_yukino 13 วันที่ผ่านมา
how??
@rawleystanhope3251 13 วันที่ผ่านมา
Full tutorial pls
@mdubbau 13 วันที่ผ่านมา
Please do a tutorial on setti g up
@keithycheung 12 วันที่ผ่านมา
Please do a tutorial !
@分享免费AI应用 12 วันที่ผ่านมา
90% GPT4o Quality? More like 100% snake oil! Where do I sign up for this "RouteLLM" deal?
@trelligan42 13 วันที่ผ่านมา
@7:07, "causal" not "casual". #FeedTheAlgorithm
13 วันที่ผ่านมา
While this looks promising, it is just a router that forwards simple queries to weak models while forwarding hard queries to strong models. This assumes that the queries can be divided between strong and weak models. If your work is truly intensive, I don't see much reduction here as it still requires querying strong models most of the time.
@chrismann1916 12 วันที่ผ่านมา
Now, who has this in production?
@WylieWasp 12 วันที่ผ่านมา
4:59 you lost me completely with langtrace! What does it do in why would I want it?
@user-em2hr4gj1f 13 วันที่ผ่านมา
Can you make a comparison video with LangGraph X GraphRAG?
@yazanrisheh5127 13 วันที่ผ่านมา
do a tutorial on this please
@jackbauer322 13 วันที่ผ่านมา
don't ask in the comments each time JUST DO IT !!!

ต่อไป

เล่นอัตโนมัติ

AI News: GPT4o Mini, Vampire Drones, New Jailbreak, LLaMA 400b, Robot with HUMAN Hands