Why I'm Staying Away from Crew AI: My Honest Opinion

Data Centric

มุมมอง 19 802

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 15 มิ.ย. 2024
Crew AI is not suitable for production use cases. I’ll be going through why I believe this is the case and what you should do instead when building your own apps.
Need to develop some AI? Let's chat: www.brainqub3.com/book-online
Register your interest in the AI Engineering Take-off course: www.data-centric-solutions.co...
Hands-on project (build a basic RAG app): www.educative.io/projects/bui...
Stay updated on AI, Data Science, and Large Language Models by following me on Medium: / johnadeojo
Is AutoGen just HYPE? Why I would not use AUTOGEN in a REAL use case, Yet: • Is AutoGen just HYPE? ...
GitHub repo: github.com/john-adeojo/crew_a...
Multi-hop questions: arxiv.org/pdf/2108.00573
Chapters
Introduction: 00:00
Inro to multi-hop questions: 01:31
Schematic of the agent workflow: 06:42
Crew AI Python Code: 10:29
Testing the Crew AI Workflow: 24:48
What’s next for Multi-Agent frameworks: 47:50
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 116

@HassanAllaham หลายเดือนก่อน ⁺⁴⁰
This is one of the best videos I have ever seen related to AI. Let me list some points:
1- Do not ever expect to have acceptable costs as long as you are depending on ClosedAI
2- I am with you that such frameworks are not production-ready
3- In my opinion, Such a framework can be useful if an easy way to modify the hidden prompts is available.
4- Such a framework can be useful if there is a manager agent (the only one that needs to have a strong LLM) and the other agents depend on a small LLM/s.. breaking the task into small "easy" tasks should make this able to be done using small LLM/s.. (Open source small LLM/s).
5- The custom tools availability of each agent should help (The agent who does the main search should be different from the agent who reads what is inside each result of the search) - Specialization leads to creativity - By this, we can add just for one agent a direction to force it to mention the URLs of info sources from which it built its answer..
6- I think no agent flow would be able to be truly autonomous as long as it does not have a self-reflection mechanism i.e. self-improvement mechanism.
7- When trials or expectations show that the result from one agent might be loooooong, I think it would be better to add an agent just to summarize this result and replace the original result by the summarization in the workflow history.
Any way, Thanks for the good content. 🌹🌹🌹
@Data-Centric หลายเดือนก่อน ⁺¹
Thanks for your comment, I'm glad you found it helpful. Broadly agree with the points you've raised here.
@MindVaccine หลายเดือนก่อน ⁺⁴
I really appreciate your comment and agree with your points, especially #1 and #4. I disagree with one of @Data-Centrics conclusions when he says that future is large-ai models that can just perform these tasks directly. While he may be right about the ability, that is not the issue. It will be one of costs. As you said, if I break the problem into simple tasks/agents, I get to use simple - and cheap - opensource LLMs. There are times when a large model is required, perhaps for planning or reporting agents, but you really want to keep the ClosedAI model usage to a minimum.
Furthermore, I expect that the opensource models will improve as well, so that as they continuously improve, each of my agents will improve over time as well. Given enough advancement, the opensource AI models may be able to replace any usage of closedAI, saving even more money in the long term.
And then looking to the future, I can expect a time when the cost of training/fine-tuning models will come down. I will actually be able to use my agents, and the data I collect in production, as training content to train/fine-tune my own models that are based on free opensource models. Now I can create models that will outperform any closed-ai model for even less money. And I still have the option of using a closed-ai should I need it.
And for my last point, I know how enthusiastic the industry is about ChatGPT4, but my experience with it is not so positive. Yes, it has very broad knowledge, but I find it is horrible at following instructions. I have to wonder if those hyping it have actually used it for anything more than just chatting. I would be interested in other opinions on this...
@HassanAllaham หลายเดือนก่อน ⁺¹
@@MindVaccine I agree with you.
Related to "future is large-ai models" I do not think this is true but it may become true in the "far" future.
This is because the essential principle that nowadays LLMs built on is the calculation of the probability of the next tokens (prediction of the next tokens) which is, in itself, can not be a real true imitation of human intelligence. We have to understand the real meaning of "probability". Mathematically this depends on the "resolution" of numbers expressing .. i.e. how many numeric characters after the period "." your GPU can use... This by itself depends on the way nowadays computation is processed = binary.
I believe this may change in the future only when analog computers coupled with quantum-based hardware become publicly available. Only then you can expect to reach what is called AGI.
When you read research you find the words: "We discovered" This means discovered by chance and as a result of trials. You don't read "We expected and calculated then checked that calculated expectation to find it true".. Well, why? ... Simply you cannot get the exact same answer from the same question.. (The result of probability)
If the research keeps running in the same direction there will always be the need to have a huge computation power... There are 100s of very simple questions that you may ask GPT4 and you will get very strange wrong answers... There are many simple prompts in which you can jailbreak LLMs (Overcome the guards' locks)...
The same above-mentioned questions you may ask to a "fine-tuned" small open-source LLM and you can get good answers.
To be sure that the task made by LLM will be 99.95% successful you need to fine-tune this LLM for this task. While to fine-tune Large AI models like GPT4 you need a huge computation power, you can fine-tune a small open source LLm using a good consumer-grade PC. That is the real meaning of "Specialization leads to creativity" above and that's why I believe a specialized small LLM is better than a giant one.
In my opinion, the only good usage of the large LLM is to generate good clean datasets needed to fine-tune the small open-source LLM. Other than this, the usage of something like GPT4 is not cost-effective at all.
Although the benchmarking methods used to estimate LLM performance are not good enough to reflect the real results in the real apps, the big variations between the results of different LLMs can be used to say which is better... and one can find a small open-source LLM which can win the competition against GPT4 in a specific task.
Concerning following instructions, that is why there is the "instruct" version of many LLMs.. which means that this version was pre-trained to obey the user's instructions.. This problem can be clear when you use LLM with increased window context size.. As much as the increase in window context size as much as the LLM will be "horrible" at following instructions... In my opinion, one of the weak bad benchmarking is using only one needle-in-the-middle when benchmarking such LLMs.. This problem in the case of GPT4 not related to the window context size since it is supported by a huge computation power but it is related to the way CloseAI manages to add the guarding locks.
@mickelodiansurname9578 หลายเดือนก่อน ⁺²
Or alternatively simply TELL the model in the prompt to be concise. Simply prompt it to conserve tokens - or add in prompting to give it a budget it also needs to keep an eye on!
@MindVaccine หลายเดือนก่อน
@@HassanAllaham Good to know I'm not alone. I really think these closedAIs are all just hype. I have been using TheBloke/Mistral-7B-Instruct-v0.2-Q8_0, and it will, given the right prompt, gives me consistent results and follows my instructions. Try the same with any of the large closedAI and you will get very inconsistent results that are all over the place. I just don't see any of the closedAIs being useful in production today. And those pushing these closedAI models, I don't think any of them have tried using them in a production environment - like I said, it is all just hype! And I'm getting these results without training. Once I have good training data, I don't see how any closedAI could compete far into the future. I again, there is nothing that says I can't use a model like GPT4 if I really have the need.
By the way, just telling GPT4 to be concise is hopeless. The more they train it so as to not be jail broken, the worse the results are. In my opinion, they are not just training to refuse certain requests, they are training it to never give a consistent answer. Telling a model to not respond for some agenda, right or wrong, will, in my opinion, always degrade the models performance. My experience is that GPT4 is getting WORSE WITH TIME, not better. And I have not seen any benchmark for consistency of a model.
And then there is the elephant in the room, context size. The larger the context, the better for almost any application. But use a large context with GPT4, and watch your tokens wash away. And the larger the context, the worse the performance. But I am starting to see some of the opensource models starting to get larger context windows without sacrificing performance. This is where I think opensource models will shine and outperform there closed source brothers. Just so many more people experimenting and trying out new methods.
@elcaribbeannomad2079 หลายเดือนก่อน ⁺⁴
I started using CrewAI one month ago and I detected all the same problems John detailed in the video, not having control over the flow of the software is hard to handle when you are use to design and implement complex solutions and algorithm, I think João (The founder of CrewAI) know it and thats why in the "Draft Gmail New Emails" example introduced lang graph to have a little more control over the flow but it doesn't solve the inefficient and unneeded token utilization. It is important to know that this problems are not exclusive of CrewAI, Autogen also suffer the same disease. The idea of developing with the future of LLMs in mind is something I didn't have in my radar, make total sense to me. Great video, keep working John!
@TheFocusedCoder หลายเดือนก่อน ⁺¹⁴
This has been my experience with the frameworks. Going custom is where I’m likely going to end up
@rudomeister หลายเดือนก่อน ⁺¹
With custom tools and agents, everything is possible. Why not get a team to create their own group-hierarchy tree of agents after demand? It's possible to do. Giving agents sub-processing with the prompt "Hack NASA", with 100% subprocessing commands, yes, is possible as well. But it doesn't help blaming the old laptop running inside the TV-bench when CIA breaks into your door.. haha
@HassanAllaham หลายเดือนก่อน
@@rudomeister Dangerous funny example of one of the most powerful techniques that can be used to get the maximum power of AI
@Maisonier หลายเดือนก่อน
Custom how?
@TheFocusedCoder หลายเดือนก่อน
@@Maisonier frameworks are just a set of patterns and abstraction packaged up in a library. Not saying its easy but building a smaller scope library over using third party frameworks is pretty common in software engineering across the board/industries. Most frameworks come from what people are often building on their own. Someone just packages it up. Harrison from Langchain himself has said this for example.
@thatsalot3577 หลายเดือนก่อน
@@Maisonier most of these frameworks just add extra boilerplate prompts which are appended to your queries for a specific behaviour, they make the flakey llms from text->text to proper predictable input and output formats
@Bana888 หลายเดือนก่อน
Great walkthrough. Really like the level of details. Like the analysis and recommendations at the end. Awesome video. Keep up the great work.
@benh8199 หลายเดือนก่อน ⁺²
What are your thoughts on Agency Swarm by VRSEN? I haven’t used the framework yet but the author claims you can customize all prompts, including the framework prompts. In the author’s videos he also claims autogen and crewai are not good for production, whereas agency swarm is. Would love to hear your evaluation/opinion about this framework.
@ARCAED0X หลายเดือนก่อน ⁺³
Hey Data Centric, great video break down you have here. When these gpt models first came out from open AI, I thought I should be doing everything with AI all of the workflows I want to get done. But in fact I was wrong. Really you need to know the ins and outs of your workflow and you need to a way to force the system to produce reproducible outputs. This comes in narrowing the problem space or providing the system with solutions to mimic. I’m thinking about making a video about this to demonstrate or a short blog post of sorts. I don’t believe AGI will be the solve to all our problems narrow solutions to our individual problems is the way.
The best we could get with the next model from open AI is faster , cheaper inference + Better understanding of prompts so that we may simply declare what we want the AI to do like a person and pair this up heavily with automation of the parts of the system we know and understand well. Think of AI as a small bridge and not necessarily a car to take you somewhere
@RafiDude หลายเดือนก่อน ⁺³
Very good analysis! Could you please do a similar video on LangGraph?
@Whiskey9o5 หลายเดือนก่อน
I just found you and subscribed. You came to the same conclusion I have. I dug into these frameworks and came to the same conclusion to build my own framework from the ground up for my use cases. The other part is that I use the Julia language.
@madhudson1 หลายเดือนก่อน
Great video. Had success with some hobby projects using langgraph. Experimented with crewai, but felt exactly like you mentioned regarding loss of control
@trsd8640 หลายเดือนก่อน
Thank you a lot for this video. Very important!
@augmentos หลายเดือนก่อน
Thank you! More honest takes needed in agentic space!!!!
@oelberdomingos หลายเดือนก่อน ⁺¹
one of the problems are in the RAG model. It does not do a good reasoning. There is more models today, such as graph search (much better for "wikipedia" content), where you can use and other workflows to use a better reasoning. But I don't know if you can use with CrewAi
@HarpaAI 24 วันที่ผ่านมา ⁺¹
🎯 Key Takeaways for quick navigation:
00:30 *🧩 Explanation of Multihop Questions*
- Multihop questions are designed to be challenging by requiring preceding knowledge.
- Questions are structured as linear or parallel decompositions.
- Understanding the structure of multihop questions is essential for building effective agent workflows.
06:39 *🗂️ Overview of Agent Workflow in Crew AI*
- The agent workflow in Crew AI consists of a planning agent, search agent, integration agent, and reporting agent.
- Each agent has a distinct role in the workflow, from breaking down questions to organizing information and delivering responses.
- Feedback loops between agents help refine the investigation process and ensure accuracy in responses.
10:25 *🤖 Setting up Tasks and Descriptions in Crew AI*
- Tasks in Crew AI are assigned to specific agents and include detailed descriptions, expected outputs, tools required, and contextual information.
- Different agents have different responsibilities, such as conducting searches, organizing information, or delivering final responses.
- Providing clear and concise descriptions for tasks helps guide the behavior of each agent in the workflow.
23:29 *🤖 Types of workflows in Crew AI*
- Explaining sequential and hierarchical workflow structures in Crew AI.
- Highlighting the role of the manager LLM in hierarchical workflow.
- Differentiating between sequential and hierarchical operations.
24:47 *🔄 Testing multi-agent workflow in Crew AI*
- Setting up tasks in Crew AI for multi-agent workflow testing.
- Tracking OpenAI API usage costs before running the workflow.
- Comparing the speed of agent workflow in Crew AI to Autogen in a production scenario.
41:21 *💰 Cost analysis of complexity in question answering*
- Analyzing the cost of answering a two to three-hop question in Crew AI.
- Expressing concerns about the high cost of search operations in Crew AI.
- Discussing the potential challenges of using Crew AI in production due to cost implications.
48:08 *🤖 Issues with Crew AI*
- Crew AI lacks interpretability compared to autogen
- Inconsistency in the output of multi-agent workflows
- High cost for running workflows limits practical use cases
50:03 *🛠️ Pros and Cons of Crew AI*
- Easy setup for multi-agent workflows in Crew AI
- Suitable for experimentation and prototyping, not for production
- High cost and inconsistency are critical barriers to adopting Crew AI
50:30 *🔮 Future of Multi-Agent Frameworks*
- Multi-agent frameworks like Crew AI and autogen are limited to current model capabilities
- As language models improve, the need for multi-agent frameworks may diminish
- Custom workflows may be more beneficial for specific production applications
Made with HARPA AI
@tonycarter8440 17 วันที่ผ่านมา
New subscriber here, great content! I'm evaluating several other Agent frameworks, your insight was very valuable.
@nicocesar หลายเดือนก่อน
Very honest review and I agree with most points. The conclusion is spot on. As humans we have organization of work with standardized process and I feel these Framework are matching that and wrapping agents around it. I wonder if we will find another way to organize work in the future with more powerful agents
@NoCodeFilmmaker หลายเดือนก่อน ⁺⁶
Phi-3 through Ollama on a Pi5 using CrewAi, everything local and running acceptable
You need to also add prompt logic to reduce unnecessary searches.
@Sergio-rq2mm หลายเดือนก่อน ⁺¹
Ill have to check this out, but I had pretty terrible results running local models with crewai. I used LLAMA3, Mixtral, Mistral, etc. Never tried Phi-3, though I did try it with Langgraph and it couldnt consistently use the tools that were available to it.
Im curious as to your experience and what you have tested with it
@daviddiligentful 28 วันที่ผ่านมา ⁺⁴
I have lots of problems with local llms with search tools. In fact, it never is able to use the search tools in the first place...
@shimotown 17 วันที่ผ่านมา ⁺¹
prompt logic where? there are decisions being made within the LLM chain already do you mean hardwiring logic with python?
@carinebruyndoncx5331 หลายเดือนก่อน
I feel the reasoning example is more an llm test, than a kind of automation where you would use a multi agent framework
@randyh647 หลายเดือนก่อน
In my experience it has trouble with the third round of adding features to existing code, then I switch to a regular ai and just give it small tickets. Also playing around with different models may work too!
@free_thinker4958 หลายเดือนก่อน ⁺¹
Personally speaking, i found that langgraph adds more controllability if you want to use current agentic frameworks + i noticed that memory in agents plays a big role in self improving and learning from past experiences, i tried the memory feature in crewai and it is not that bad, also in autogen (teachability feature), i would like if you could do a similar video but with the agency swarm framework this time, it looks promising and has more controlability.
@JCMShadow1994 หลายเดือนก่อน
Thanks for covering this topic. Most discussions aren’t honest about their usefulness in prod
@darkesco 25 วันที่ผ่านมา
I need to find a good AI Agent framework. I successfully accessed comfyUI through API and want to have agents generate images for me as well as evaluate them for quality. I started with CrewAI, but curious to know which agentic software is better for this type of project.
@avg_ape หลายเดือนก่อน
Outstanding review. Thank you. Yes, the frameworks are nice to learn from and test proof of concepts (POCs). However, I see the frameworks to be analogous to 'no-code' app platforms - easy to deploy but a challenge to scale and improve.
@tonyppe 16 วันที่ผ่านมา
I have a similar simple set up while I play and learn. Start small, get it working. There are TON of things that arent common knowledge or are undocumented in crewai but it does work. And while each LLM gives different results even the same LLM will give different results for the exact same prompt, I have had successes with open source local LLMs.
The power of it is when you start getting more complex config. This is where you now need to be a software dev and data science grad. I am neither of these. So I find it extremely difficult hurdle to get over and I get stuck a lot.
@johnpaulgorman 17 วันที่ผ่านมา
Yes I hit the same issues and so many more when used with local ollama models. Looping issues, unable to find co-workers, lots of errors thrown by the framework based on language model response missmatch. The issues list is also growing rapidly with little sign of being resolved even for small issues or items already resolved. Lack of observability was a real pain. The lack of local llmops was also a pain, I linked to external llmops site but found it useless given the local
Model outputs.
@DavidYang-kd8qr หลายเดือนก่อน ⁺²
Very informative ❤
@JulianHarris หลายเดือนก่อน ⁺²
What about using Claude Haiku? It’s a bunch cheaper I think
@strength9621 หลายเดือนก่อน ⁺²
Just the type of channel I’ve been looking for
@krisvq 24 วันที่ผ่านมา ⁺³
I think this "agent swarms" idea is not the way to go, at all. It makes no sense. You don't need 15 agents to answer simple questions. I think the solution is to reduce use of LLMs for everything that can be solved with designated function calls. The way to go is to build functions that address desired use cases and use LLMs for summarization of results. If you combine knowledge databases, search engines, function calls and summarization, this can be done at a fraction of the cost. I can see a scenario where a single agent instance is well instructed to run a sequence of logical functions. Even if a few agents do it would be ok. In thisbcase you'd use a lot less tokens.
@Data-Centric 23 วันที่ผ่านมา
Thanks for the feedback.
@JulianHarris หลายเดือนก่อน ⁺³
Seems to me the main issues highlighted are 1. Value over direct prompts 2. Prompt engineering issues (specifically trying to get it to provide references in this case) 3. Cost 4. Latency.
1. You’re never going to get fresh results from an LLM: web search is essential for this
2. To me I don’t see much difference in the prompt engineering issues than you’d normally have. I wonder what prompt would result in references included? DSPy tries to automate this issue btw
3. Cost is on a downward trend. I’d love to know for example how claude/haiku performs, or llama-3-70b.
4. Latency: For sure it is a batch / offline / task switch scenario totally agree. For now. Try it with groq though. The LPU is 10x faster.
@Data-Centric หลายเดือนก่อน
You have some solid points, thanks for raising. I tried DSPy several months ago, I might revisit it. Just on your first point, there was a web search tool used by one of the agents.
@prodigroup หลายเดือนก่อน
Try Guidance “guidance-ai” and DSPy for comparison.
@prodigroup หลายเดือนก่อน
Try Guidance “guidance-ai” in comparison with DSPy.
@justrobiscool4473 หลายเดือนก่อน
Have you tried the maestro framework?
@MarkoTManninen หลายเดือนก่อน
I have been thinking to utilize decison making intermediate agents. At the moment, if agent needs to decide if the knowledge is general, what tool to use, construct arguments, tell reasoning etc. It seems to be too much. Possibly cheaoer models, maybe even free local models can do a lot of the repetitive tasks and produce simple control flow logic so that unnecessary steps can be skipped.
I tested agents 10 months ago amd concludes that they were for deep pockets. Already the development epoch becomes expansive, not to speak of production. Whence I do count on local models getting on par with gpt4 this year. Then best models can do the heavy lifting and the inference that requires state of the art reasoning.
@user-du6zo7zp2k 26 วันที่ผ่านมา
I tend to agree with the future of agents. Bigger closed source models are going to get smarter and will probably out-perform multiple agents in an agentic workflow. But the rise of open source and local models with fine-tuned specialisation and hallucination reduction techniques could still perhaps make a competitive solution that runs much cheaper and more privately.
@niceplace123 9 วันที่ผ่านมา
Hm.. I just fed your questions to Mistral's regular chat with large model without any agents, and it provided the answers. Am I doing something wrong?
@MartinBlaha หลายเดือนก่อน
I stopped experimenting with CrewAI after my initial experiments which produced over USD 10 of API costs on just one afternoon. Sure, it's probably me not knowing enough about CrewAI. But I think my biggest issue is the lack of transparency about how the agentic process is being executed. Again, I'm just a beginner. But for now, it's just a black box to me.
As next, I'll be experimenting with AutoGen and local LLMs ;-)
Thanks for sharing your thoughts.
@gileneusz 14 วันที่ผ่านมา
Thank you for great feedback on this framework. They are getting better and better, but llms quality are bottleneck... we need true AGI, but that would be maybe next year 😆
@ilanlee3025 16 วันที่ผ่านมา
Excellent video. Liked and subscribed
@jasonb_ หลายเดือนก่อน ⁺¹
My advice, is to use sequential process (and try to minimize prompts for goals/tasks) for Q/A style applications. Hierarchical seems buggy and yes not ready for production, Crew is quite a fresh project with plenty of space to grow.
@Data-Centric หลายเดือนก่อน
Only issues with the sequential approach is implementing feedback loops.
@jasonb_ หลายเดือนก่อน
@@Data-Centric You can use delegation and max_iter in the agent setup with sequential.
@williamwong8424 หลายเดือนก่อน ⁺¹
@@Data-Centric do u mean because it's sequential, you can't give feedback during middle of the process or when u give feedback at the end of the process, it is too late? when u have to run the agents again, it has to start all over
@lavamonkeymc หลายเดือนก่อน
Did you text with Langgraph??
@alexemilcar6525 19 วันที่ผ่านมา ⁺¹
Great explanation video, but bad use case for multi agents framework
@ilanlee3025 16 วันที่ผ่านมา
Funnily enough as an ADHD who hates reading a lot I can give you a tip to save a lot of money. You can prompt the chat agents how long you want the answer to be. You can say "keep the answer to 1 paragraph"
@vastvitamins1966 หลายเดือนก่อน
Great video. I think most of the problems you had may be due to a lack of prompt engineering. For example specify the type of output you want such as output length should be no more than what you desire. For example a paragraph this should save on tokens. Also when dealing with agents it's important to repeat important info a few times. But as I said earlier this is a great video. A good topic to bring to people's attention . Thanks for sharing.
@Data-Centric หลายเดือนก่อน ⁺¹
Thanks, appreciate the insights here!
@andrewowens5653 หลายเดือนก่อน ⁺²
Thanks, that was a great explanation and tutorial. I could literally write 10,000 words in response, but that would not be practical. I've studied all the trends in AI since 1977, but I've also spent 25 years studying cognitive neuroscience and related subjects, so I have a different take on the way AI should be implemented. My own personal research involves the creation of a brain inspired cognitive architecture. I'm considering the possibility of putting a very small language model at the core. The system would be designed to learn from experience, instead of being force fed the internet. Anyway, my experience with crew AI has been anything but good so far. Have you considered using something like Ludwig for fine-tuning and LoRAX for serving on your local system? If you could get that to work, you could save 95% of your expensive ChatGPT calls. You could use ChatGPT to create a custom Training Data Set for back propagation or PEFT of a smaller open source model. I'm looking forward to your future content. Thanks again.
@ayoubfr8660 6 วันที่ผ่านมา ⁺¹
A conversation with you regarding ai and few ideas would be invaluable man!
@BradleyKieser หลายเดือนก่อน
Always really good content from this guy, he's brilliant.
@Data-Centric หลายเดือนก่อน
Thank you for the support.
@GregPeters1 หลายเดือนก่อน
Well done
@vroep6529 หลายเดือนก่อน ⁺¹
in my experience claude is much cheaper / better than GPT 4, sonnet has consistently given me great results and even haiku too. I however am running it on a custom python solution where it recursively breaks down and summarizes what it is doing along the way. By using these smaller cheaper models I am able to do many parallel prompts which I believe creates a better end output. It was for instance able to create a web scraping module test it, and later use it by itself to draw information related to another question (this cost under 10 cents in total). it does have problems still and it is sort of capped to doing wmall projects / tasks as it debugs stuff line by line (it tries running the program and captures the output, so it can only fix ONE error at a time, so as problem complexity increases the cost/time scale is exponentially increasing). Interesting video I appericiate your content.
@vroep6529 หลายเดือนก่อน
On another important note, I get much better responses when I provide JSON objects and demand responses as JSON objects, only in the end it will summarize it to actual natural language. It seems the models have a much easier time understanding what exactly to do if they reformat your task into a JSON object themselves, this way it can interconnect stuff that would be hard to hardcode, for instance it might need an action which is not available yet, and maybe something is not a question but a task intead etc.
@yoyartube หลายเดือนก่อน ⁺²
You can set the model to streaming and you'll start to see the result output sooner.
@ShaunPrince หลายเดือนก่อน ⁺¹
You don't need the streaming output for AI agents. That gimmick is just for people that use chatbots. Disabling streaming output will give you more more performance and use less resources. Also less code to deal with.
@yoyartube หลายเดือนก่อน
@@ShaunPrince I absolutely said streaming is needed in this case but you are right it isn't.
I didn't know streaming was a gimmick; I will tell my users streaming is a gimmick and they should look at a blank page while the completion loads.
Which performance is worse when streaming is enabled? Which benchmarks are you referring to exactly? What is the mechanism that causes performance (whatever that means in this case) to be degraded? Please help I don't want to lose performance.
It's also good that I can take out the 2 whole lines of code to enabled steaming, this will be better.
Thanks for this useful input.
@brandonwinston หลายเดือนก่อน ⁺⁴
I'm going with Langgraph myself after looking into it and Autogen and CrewAI. I loved Autogen though!
@Data-Centric หลายเดือนก่อน ⁺²
Haven't used Lang graph yet myself, but I'll check it out.
@andydataguy หลายเดือนก่อน ⁺¹
Langgraph is awesome. Its tops for orchestration @Data-Centric
@jarad4621 หลายเดือนก่อน
Look into agency swarm, now that it can do other models its going to be epic
@MrGluepower 18 วันที่ผ่านมา
The use case of multi-hop questions is such a niche use case and you are presenting it as core feature for multiagents. Sure, that use case might be challenging but in real corporate world usual workflows as not of that format. Each step is simple but there are 10,000 steps.
@Data-Centric 18 วันที่ผ่านมา
We are building automated workflows for clients and we are not using crew AI. For your simple steps, RPA or something like Zapier can work. The use is niche, but it was a way to showcase what these agentic workflows can do without requiring access to corporate data.
@Mohamed-sq8od 18 วันที่ผ่านมา
if you base you whole opinion on crewai on the cost of tokens of openai, use a local model or ollama :p
@6lack5ushi 18 วันที่ผ่านมา
I love this so much!!!
Is this a bad answer to the Napoleon “Napoleon occupied the city where the mother of the woman who brought Louis XVI style to the court died in **1804**.
Output from operation 0002: Marie Antoinette, the Queen consort of King Louis XVI of France, is known for bringing a more extravagant and luxurious style to the French court during her husband's reign. She was born an Archduchess of Austria and was the youngest daughter of Empress Maria Theresa and Emperor Francis I.
Marie Antoinette's mother, Empress Maria Theresa, died on November 29, 1780, at the Hofburg Palace in Vienna, Austria. She died of natural causes at the age of 63, having ruled the Habsburg Empire for 40 years. Her death greatly affected Marie Antoinette, who was very close to her mother despite living in France since her marriage to Louis XVI in 1770.
Output from operation 0000: Marie Antoinette, the Queen consort of King Louis XVI of France, is known for bringing a more extravagant and luxurious style to the French court during her husband's reign. Born an Archduchess of Austria, she was the youngest daughter of Empress Maria Theresa and Emperor Francis I.
Empress Maria Theresa, Marie Antoinette's mother, died on November 29, 1780, at the Hofburg Palace in Vienna, Austria. She died of natural causes at the age of 63, having ruled the Habsburg Empire for 40 years. Her death greatly affected Marie Antoinette, who was very close to her mother despite living in France since her marriage to Louis XVI in 1770.
Napoleon occupied the city where Marie Antoinette's mother, Empress Maria Theresa, died in **1804**.”
Off by a year??? But this is exactly why we built our own in house solution and don’t rely on crew ai or any other multi agent framework
@Canna_Science_and_Technology หลายเดือนก่อน ⁺⁶
The usual black box disaster. Always code your own.
@stuj1279 7 วันที่ผ่านมา
I disagree that you need to know the first President of Namibia to answer the question of who succeded him/her. You just need to know who the second President of Namibia was.
@gileneusz 14 วันที่ผ่านมา
43:12 if it would be scraped by JINA AI it would be smaller and cheaper
@st.3m906 หลายเดือนก่อน ⁺²
I like Crew AI to get a low fidelity idea of how the system will work before I make it and to also get an idea of where things will go wrong so I can fix it in production.
Secondly, it's a pretty good tool for copywriting imo. I don't like the use of tools that much - it makes it more stupid from my expiernce.
@TestMyHomeChannel หลายเดือนก่อน
Great educational video about CrewAI. As for the high costs, what if we use a local LLM like llama3? I assume these types of genetic application do not require sophisticated reasoning and Llama 3 could be sufficient. Second, as for missing some features, what if some knowledge programmers like you or others could improve CrewAI to add those needed for following the process and displaying the citations. Thanks for the video.
@m12652 14 วันที่ผ่านมา
2:51 why would you need to know who the first president was to find out who the second was? Couldn't you just ask "who was the second president...?"?
And why would you need to know which part of the UK to know what currency it uses... they all use pond sterling?
@nikosnos790 หลายเดือนก่อน
that's all good and very detailed, as far as building that system, but what is the answer to the question of the TITLE ? you are not staying away mate, you are building agents with it, i will suggest you to be more clear with your titles next time, just give us what the title say or change it. hope this is helpful, keep it going
@jarad4621 หลายเดือนก่อน ⁺²
Looks good, would like you to test Agency Swarm which entire point is to be production capable, it works but not sure if there yet. Ive also heard agents aren't quite there yet overall;, inefficient, their use cases are very specific currently especially where costs dont matter as much so not viable with the top paid models doing simple tasks like research (you would always use a use model system, Opus CEO wtuih swarm of Jaiku for example, right person right job), gpt4 for all is a bad user choice but still shows a point, however combine agents plus a free local but still good model like phi3 or llama 8b and one individual could automate insane quantities of work at no cost, thats where the magic, is not gpt4, or maybe only if the manager is a top model to oversee 1 complex section and the rest are haikus or cheap models, as the models are not as good ive seen a video about how performance can reach close to gpt4 levels got smaller models when placed into an agentic systems and patters with reflection, feedback, collaboration i forget the rest but processes that ensure quality
@Data-Centric หลายเดือนก่อน
Thanks for your comment. I haven't tried Agency Swarm yet, I'll look into it. The last point your raised is interesting a mixture of models with a set workflow of collaboration, reflection, feedback.
@pollywops9242 24 วันที่ผ่านมา
This seemed like cumbersome to me but 0.30 cent for 1(!) prompt omg
I'm just using models that are less great because I'm just messing around and can't justify spending more and more on it
@michaelthompson8251 23 วันที่ผ่านมา
consider.
use cases that would work given crewai
@watchdog163 17 วันที่ผ่านมา
It's you! You're going to create Skynet! 🤣
@st.3m906 หลายเดือนก่อน ⁺²
I think you'll like lang graph is you find this as the main issue wtih Crew AI
@Data-Centric หลายเดือนก่อน ⁺¹
I'll definitely try lang graph.
@iamdanfleser หลายเดือนก่อน ⁺¹
There is an option in vs code to auto save files.
@ingoeichhorst1255 หลายเดือนก่อน
Yea. I do not like the magic that is going on behind the scenes. Autogen, CrewAI and LangGraph do not even have a good verbose logging so you would have a chance to understand it. And LangSmith is a looked in nightmare.
@6lack5ushi 18 วันที่ผ่านมา
The billy Giles answer is interesting there is a billy giles here who died in America New York. He died in New York “He died at Mount Sinai Hospital in New York City on Sept. 25, 2021 after an eight year struggle with progressive anti-MAG peripheral neuropathy”
But I’m guessing this is the wrong billy giles
@6lack5ushi 18 วันที่ผ่านมา
At the same time Google will say Belfast?! So truth becomes the crux in these questions
@AI-Wire 21 วันที่ผ่านมา
There is a problem with your logic in the first question. One does not need to first know who the first president of Namibia was in order to know who succeeded him. One can simply learn who was the second president.
@madhudson1 หลายเดือนก่อน
It shows promise, but found it too unreliable, especially 'tooling'
@dg-ov4cf หลายเดือนก่อน
Some of those multi hop question examples are ungrammatical, borderline nonsensical. Was that paper peer reviewed
@ThomasTomiczek หลายเดือนก่อน
It is not prohibitively expensive - it is naive.
First, even using OpenAi you do not need to use the Turbo 4 model for everything. Web search? Use 3.5 to extract the information from the result - tool use and data preparation is a typical case for lower models.
Second, your use of the planner is naive - every step should have corrective reviews, even the planner. Is the resulting plan something that can be optimized?
But generally - yes, we are not there yet, in functionality and cost. If CrewAi is handing a complete context over - that is a fundmental issue.
@Data-Centric หลายเดือนก่อน
Surely adding additional agents to review after every step will increases the cost, latency, and probably the reliability of the overall workflow? I agree you probably could use some smaller models for parts of the workflow. However. I did try using 3.5-turbo initially and had subpar results, specifically with the web search agent.
@truehighs7845 หลายเดือนก่อน
Yes but say you are testing while developing it will cost you hundreds of dollars just in test calls. I mostly use local AIs.
@ThomasTomiczek หลายเดือนก่อน
@@truehighs7845 Oh, I agree - except that local AI has SERIOUS problems. Crazy low content window, no complex prompts in most cases. Three is a Moat actually. Anyhow, the idea that AI - now - would be cheaper than a human equivalent makes little sense. Over time, yes, but now independence and "works 24/7" are core elements. Within a year or two, prices will be down another 90% - but imagine even how much a minimum wage worker gets.
But really, we need decent local AI that can follow complex prompts and handle a 100k context without falling apart. And yes, it can take 2x48gb ram - "local" does not have to mean low end (which soon is higher thanks to DDR7 coming in way higher per chip capacities). But Llama was quite - well, 8000 context is not cutting it.
@truehighs7845 หลายเดือนก่อน
@@ThomasTomiczek Yes but to run samples and test the mechanics with smaller sets ot whatever you can iterate as much as you want, I have 2 x RTX A4000 it's also quite fast to work with, then down the line if customers want to use an API and pay for it, bless their little hearts, but I ain't paying for testing my developments.
Also I am using Mixtral llava3 and before that solar, they are not that bad. As a rule of thumb they are all equally dumb in the same way, I don;t see major differences besides the context indeed. But those guys have 50000 gpus, not 2...
@ThomasTomiczek หลายเดือนก่อน
@@truehighs7845 Well, I found that most of my prompts aren ot wroking outside of OpenAi so far - and I can not make them simple enough to work. I hope some of the fine tuning focuses on that - as well as long context training.
@DarxKies 27 วันที่ผ่านมา
Your hands movement give the impression of lashing at the viewer and it is distracting. Try putting the camera higher or further away.
@TSKTECHIN หลายเดือนก่อน
totally agree !! @crewai is no good for production, very inconsistent results, my experience till now is not so great and when using GPT4 we can run a huge bill.. 😞, Thanks for the honest review 🙏 of this tool which has long way to go.. 😞 😛
when using gpt-3.5 model, the results are inconsistent and throws error when running on different data sets or param, just no way of debugging the error..
```
File "", line 1, in
File "C:\ProgramData\miniconda3\Lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f576'
```

ต่อไป

เล่นอัตโนมัติ

Forget CrewAI & AutoGen, Build CUSTOM AI Agents!