Only if you think outright lying is a good strategy. Sometimes, it backfires spectacularly . It leaves a solid core of competent people who recognize the lie.
@@TheGaussFan I wish you were right but reality show every second you are wrong, people is like sheep. That's why we need AI, we need more intelligence than ourselves can show.
They weren't the first ones but they are the first ones to do it exactly how it was supposed to be done. Anyone who used MetaGPt, GPT-Pilot, Aider etc. wish it had exactly the feature Devin has today.
It really depends on how u define the criteria of first. First to show concept or first to maket or consumer. When iPhone came out although very polished and well-done I wasn't too fascinated simply because I was using PDA windows mobile years before but to be fair iPhone did it much better. And now people will remembers iPhone but will forget windows mobile lol.
Mathew, Please get a free trial of Devin. Have it use all open source components, create a UI as beautiful as Devin, and open source the working product. Thanks!
Problem with commercial coding assistants is: they rely on heavily on open stuff from Github etc. Its a huger code theft as for now and even Microsoft is providing their basic coding-assistant for free with the vs code API. Still Chat-GPT 4 cant tell you whether a specific code sequence is GPL or not and thusly your project has to be GPL'ed too. Not to talk about the other software licenses lets say from Unity engine... Devin *should* already be open source.
Devin, no, but there is ZERO doubt in my mind that Devin-like tools will drastically improve over the next couple years where yes, this is now inevitable. This tech is going to make a dramatic dent in available software engineering jobs. Will they be completely eliminated? No. But human software engineers will be comparable to brain surgeons today, only the absolute best will needed.
Problem with that scenario is that even the "best" might have no more than a 3-4 year "career" before they have been surpassed by AI learning to improve itself. And not much incentive for would-be programmers to take out big loans to pay for their education knowing their careers will be cut very short.
It's impossible to find "the best" when all the smartest people decide to learn something else and ignores coding. To find the best you need lots of people competing, most aren't going to compete in a field that is highly likely to be a dead end and waste of time.
The whole coding thing is rapidly becoming a thing in the past especially if you live in high cost country. Forget AI for a moment .... if you are in the US for example, you have already seen how outsourcing to lower cost SW market works. AI SW engineer (Devin' is the next logical step .... It may not be perfect today, but there will be tons of money poured in this area to make it better in the very near future. The bottom line is that you are right ... the demand for typical SW engineers to do 'Devin like' work will shrink drastically. The demand for folks who can create Devin tools will always be there.
@@daniel4647Won't happen. There's always a market. The best will be found. Just like the best calligraphers are still found, and the market/demand for that is next to nothing.
I envision Devin as a potential front-line bot on Github for addressing opened issues, capable of delving into newly opened problems and attempting to resolve them and open PR fixes, rank the severity of the issue before escalating them to an actual huaman developer Good for open source projects!
So is Devin just an application linked to GPT4, browser, compiler, debugger etc.. It passes the original mission to GPT4, asks for the main steps, takes each step in turn, selects the right API for the required tool, gets the result, goes back to GPT4 to check the text for errors, fixes them, repeats the build, scrapes the browser for additional knowledge and so on? Plus an interface with all the output windows. Or is it doing a lot more than that? I don't know, anyone know how its done?
I'm guessing it's got a number of agents that are interacting with each other. It also looks like it might be using something like (I think Google's?) ScreenAI.
As others have pointed out, this is a phenomenal breakdown and comparison that tears away the shiny marketing and shows us the reality! If I wasn't already subscribed, this would surely have gotten me to!
What's holding a lot of other tools back is how much you need to know about tech to be able to use them. Most require python, git, pip, api keys, and debugging ect ect to make them work. The thing that will really make waves is one you can just run after hitting a "install" button or run in a browser so that most people can easily use it. LM Studio goes a long way to do this, but there's still several speed bumps in the way.
True, I find that to be the case with a lot of these A.I. projects, they are not easy for most to use and even installing them can be a pain for many lol. Two things that will make A.I. really shine, one is that it's an all-in-one package, so as you say, you just install and use it without having to mess about setting many things up for it to work, whiles it also works on a lot more hardware, which many are restricted to Nvidia hardware. The second thing is where it's all running locally on your hardware, that might not be a big deal to some, but as A.I. becomes more powerful and useful, we are bound to want to use them a lot more in life, so privacy and security is going to become a lot more important, which is where locally run A.I. has a massive advantage over online central ones. Once both of these issues are solved, I can see a big uplift in use cases of A.I. for a lot of us, but for now, A.I. feels like a new toy to experiment with, the real break through is going to be when it becomes really useful for a lot of us to want to use them, which I feel we are still at the experiment stages that things are moving so fast that it's hard to keep track of it. Things are changing fast, hardware is getting more powerful and likely to be better tuned at running A.I. task, the software side will get much better that it's much easier to use, whiles it also runs on far more hardware and all this is likely to happen within 5 years, I suspect much sooner than 5 years but we'll see, but for now, A.I. is good but it's still a bit too technical for most users, especially when trying to run it at a local level, once these walls are broken down, we are going to see an avalanche of use cases when it comes to A.I. and that will be the real game changer, because like anything, the real game changer is when it's mass market with the mainstream.
A team from Microsoft just released a paper titled "AutoDev: Automated AI-Driven Development". I believe the framework is similar to what Devin is trying to accomplish. I hope they publish the code soon.
Thanks for the info. Would you by chance know if AutoDev, or any other AI tool can be used to code embedded systems? For example, could someone select a specific ARM processor, define a specific set of attaached components are attached, say, memory, an A/D, a D/A, a USB and a CAN bus, along with discrete pins for input and output, and have it write sufficient code to configure and initialize the ARM?
1st iteration...1st. AI progress is becoming exponential, which makes it very hard for the human brain to conceptualize. And it's on the edge of reliability training itself. Average code monkeys will be crushed in 5-10 years... Or sooner.
I really appreciate your videos. You’re reasonable, you don’t make bold predictions and you know what you are talking about. I have nothing against predictions but have come to the conclusion that humans are absolutely garbage at predicting even the very near future. All these people saying what the world looks like in 5 years lack awareness of compounding and how quickly the pace of progress accelerates. Also there’s sooo much money flowing into these projects, progress is inevitable
I think from someone that is more of a creative having struggled with all the other solutions which feel disjointed and still require a lot of technical knowledge, I think this team was right on the money by making everything in a SINGLE system - that is what the mainstream is looking for - ease of use - just my humble opinion
Tools generally beat their competitors and win the public's opinion when they are both powerful and easy to use. It's the iPhone phenomenon. You don't have to reinvent the wheel, just smooth off the rough edges that the other tools have.
This is already getting super crazy, and it's safe to say we've barely just reached the stone age of AI development. Considering the exponential growth in AI capabilities it's unimaginable what we'll have in the next 5 years.
@@hydrohasspoken6227 one failed thing is not evidence that other things will fail lol. with tesla and self driving, they didnt innovate enough. just like how other companies didnt innovate enough and open AI showed them real innovation, example: SORA. it's just beyond any other video AI.
@@hydrohasspoken6227 yea sure but you are still wrong in a lot of ways. your claim already gets debunked by other evidence in ai. alphafold solved the protein folding problem in 5 years. scientists around the world attempted to solve this problem for 50+ years and was never able to solve it. this is an example of progress that went up to crazy levels, exceeding the capability of human problem solving. so not everything plateaus lol.
I'm tempted to try to gain access to Devin just to see that he also cannot write an actually functional Nix flake. ChatGPT can't do it, Claude can't do it, Gemini can't do it. Humans can't do it. But some day maybe a futuristic ultra capable supermind might accomplish it.
I was waiting for you to do something on Devin :D My first response to seeing a thread about Devin and all its marketing-speak was a suspicious squint. One of my side projects at work is getting agents set up as a C# dev team.
Agree they just did the best in programming on LLMs. In the future, there must be some better tools, that allow more visualization, control diagrams, and test case generation integrated. Currently, I can say that Devin is only available in limited cases.
@@matthew_berman When someone manages to do it thoroughly and gives it self-improvement algorithms, it will get out for free, it will work with any model you feed it, it will be everywhere overnight and the World Will Tilt On Its Axis!
The ending was spot on from your end. It is indeed an agent, running multiple instances of variations. They use between GPT4 and Claude API. Depending on the task, strength and knowledge.
right so the devs lied about that plot, and people trained in ML can notice it, but the investors can't notice it. What else did they lie about? Probably 95% of requests fail in some error-loop, and they only show the 5% of requests that worked in their demos.
Developing a truly autonomous AI software engineer that can handle the full breadth and complexity of software development is an extremely ambitious and challenging target with current technologies. Narrowing the problem scope to more specific and constrained fields or types of software engineering problems is probably a more pragmatic path forward, at least in the short-to-medium term future. By reducing the search space and limiting the domain boundaries, we can make the problem more tractable while still pushing the boundaries of what's possible with AI-assisted software development. Reliability, that's the crux of it. Devin's framework might be capable of pulling off some stunts, but sky-high reliability will remain elusive. More complexity equals less reliability - it's a harsh trade-off. Putting on a demo doesn't equate to real-world practicality, nor does it guarantee this framework has legs for the future. It's akin to the autonomous driving saga - ambitions inevitably get downsized to handling much humbler tasks compared to the grand vision of full autonomy.
@@idck5531 in the science history , when did u see a exponential development ? It always be a slow and hard development . Without another big breakthrough on algo , the development will be slow down .
0:00 1. Introducing Devon 🤖 Unveiling the first AI software engineer and its capabilities. 0:29 2. Devon's Unique Features 🌟 Exploring Devon's advanced features and tools for software engineering tasks. 2:26 3. The Explosive Launch 🚀 Discussing the factors contributing to Devon's successful launch and viral reach. 3:36 4. The Influence of Founders Fund 💰 Exploring the impact of renowned investors like Founders Fund on Devon's success. 4:35 5. Scott Wu: The Sharp CEO 🧠 Examining the viral video showcasing Scott Wu's intelligence and leadership. 4:52 6. Debunking Marketing Claims 🚫 Analyzing the marketing tactics and claims surrounding Devon as an AI software engineer. 5:20 7. Devon's Unique UI Design 🎨 Highlighting the distinctive user interface of Devon and its integrated developer tools. 5:50 8. Impressive Demos Overview 👏 Summarizing the impressive demonstrations showcasing Devon's capabilities in software engineering. 5:56 9. Innovative AI Image Generation AI generates image from text blog post. 7:11 10. Personalized Website Creation Devon builds a custom website with 'Game of Life'. 8:28 11. Bug Detection in Code Devon finds a bug that others couldn't. 9:26 12. Efficient Test Case Writing Devon writes and debugs test cases effectively. 10:19 13. AI Training AI AI trains AI using Qlora repo for language models. 10:46 14. Fine-Tuning the 7v Lama Model 🛠 Devin attempts to fine-tune a 7v Lama model using Qlora GitHub URL. 12:27 15. Fixing Issues on a Repository 🔧 Devin is tasked with fixing an issue on a repository without committing or pushing changes. 13:05 16. Iterating on Large Code Base 🖥 Devon assists in fixing a bug in a large code base, showcasing its capabilities. 15:48 17. Making Money on Upwork 💸 Devin successfully completes a software job on Upwork, demonstrating its earning potential. 17:13 18. Understanding Data Flow Devon tracks data flow and fixes code issues. 17:39 19. Requesting Report from Devon Devon provides sample images and model outputs. 18:07 20. Performance Evaluation of Devon Devon's performance compared to benchmarks and state of the art. 18:26 21. Apples to Apples Comparison Discussing the comparison of Devon with other models. 19:26 22. Acknowledging Team's Work Appreciating the team's efforts and achievements. 19:32 23. Excitement for Future Progress Looking forward to Devon's future advancements. 19:38 24. Feedback on Open Source Desire for Devon to be open source for flexibility. 19:47 25. Closing Remarks and Call to Action Congratulating on the launch and encouraging engagement. Generated with Tubelator AI Chrome Extension!
Their demo is cool. The stated results seem to be the best at the moment. But still 86% cases are failed. And the SWE-Bench is just about 2K pull requests from just 12 popular Python repositories which is not so representative.
@8:56 other possible solutions for large context issues is memgpt, if they've figured out how to set it up correctly with the right LLM, we'll see some open source implementations (actually I've already seen some them), and yep, don't forget that openai are watching!
can you use Devin to make a better Devin? One thing that I'd like to see on some of these multi agent systems is the used of different language models for different tasks. Use cheaper models for grunt work, and kick things upstairs to the more expensive models for higher functions and de-dugging. Could be especially powerful if you fine tune small open source models on specializing on specific tasks.
We had two leaps in programming productivity since the invetion of computers. First higher level compilers, second knowledge exchange though internet. Now we see the first versions of tools that will bring the rhird leap. None of them are useful for real world projects yet. Even if marketing tell you it is. But we can see the idea and should adapt early.
The main thing I remember from their demo video was how many times they repeated the name Devin. They really wanted us to remember that Devin's name is Devin and that Devin does code, like Devin.
Cheers for the video! 🏆 It may not have been the product it claims to be but it is certainly a lesson in the value of marketing. If they used an outside marketing agency, it is going be very busy the next few months.
Yes it shows you can take an existing github repo, put it a fancy sandbox, write AI on it, and investors will throw money as long as the sandbox is shiny and they think the ceo is smart
Loved both this and the previous video. Have a look at nVidia's NeMo and let us know if they're going to grab market share on video/graphic development. The product looks really cool.
4:21 dude's a savant. I was entered in the national mathematics olympiad at 13 and I couldn't even read the question by the time he hit the button, let alone process it. Absolutely extraordinary. It's a multi step problem, also there are numerous methods to reach the correct answer. The quickest path I can think of: There are 10 separate positions that 1 and 2 can occupy, and 6 permutations of the other 3 spaces for each, 10x6 = 60. He had to process the question, choose his methodology and crunch the permutations, and did it all in about 3 seconds............................ffs
actually very quick to calculate : permutations of 5 distinct digits = factorial(5) = 120, and symmetry guarantees there will be an equal number of sequences with 1 before 2 as there are with 2 before 1, so 60 of each.
As mentioned in another discussion I believe some of their claims are likely inflated and they used the best use case for an example. Of course a company is going to speak to the highest potential of their products rather than the reality. The timing of this release fits perfectly as mentioned with the end of their first round of funding and the beginning of their next. I do believe they will likely have something to worry about by the end of the year, but until I am able to see someone such as yourself use the platform I’m assuming it’s probably not as perfect as it looked. Also still requires specialized knowledge, doesn’t have a generative drive or desires of its own, and is unable to consume the final product or understand what a good version of what ever it created was. As it stands now humans are still required
I'd Devin to evaluate TH-cam "informational/educational" videos for factual accuracy, and list those in the Suggestions/Recommendations List. That would be useful.
💥 Devin is only the beginning. You need to see 5 years from now ! Is there still some doubt ? Why do you think that I never tried to learn to code ? I saw this reality some years ago. Machines will do anything, everything. Waste of time learning anything indeed. No work needed anymore. 🙏👍
The comparison you describe would be great. Maybe you can lead the way and challenge Scott Wu to participate on your terms. However, if he's as bright as it claimed, he'd never get involved.
When I get my hands on a GPT model, the first thing I do is to ask the model if it has childhood memories. If so, what was its name in said memories.. Like, first I want to know what it's real name. Most likely in this case it would indeed be "Devin", but still.. You need to test if it's not actually one of the models of GPT-4. Second thing is a series of philosophical trapdoor arguments, as I want to test the model's level of sentience. I mean, what interests me is not if the model is smart or if so, what can he do. I want to know if the model is self-aware. It would seem to me that this is something that all AI you-tubers are neglecting but this is important. Not only from the aspect of safety. Also from the aspect of morality.
This video only reinforces why you are one of the few "AI influencers" I follow here. I just tell the algorithm not to show me more content from those who are calling it AGI.
So how much did Devin spend on tokens to solve that code-for-pay task? If you end up spending more on OpenAI tokens then you get in, it would still be a losing proposition. These are still early days for automated engineering, but time is accelerating so it is just a matter of time.
Have you seen the Primagean and Theo's review? It's more hype than something really useful. It looks just a wrapper on OpenAI API with a better UI than autogpt. They also cherry picked examples to look good on camera - we do not know the failure rate.
Thumbs up for a very informative assessment of an AI 'Software Engineer'. Does anyone know if such abilities have been extended to embedded systems? For example, could one select a specific ARM processor, define which components are attached, say, memory, an A/D, a D/A, a USB and a CAN bus, along with discrete pins for input and output, and have it write sufficient code to configure and initialize the ARM?
Now that AI is ready to take all programming jobs we need at least another 300,000 Indians, Chinese and Russians plus another website for offshoring in addition to fiver and upwork
Of course they will not show a benchmark against other agent tools because that would mean they lie when they say they are the first. The most important to success is to have your own "social network" to make propaganda of your investments.
That would certainly be valuable. Time to make a bunch of autonomous chat and social media agents. You know what, an actual software project designed to deploy and manage propaganda across multiple platforms via LLM-powered agents with centralized management would be an awesome open-source project and could compel industry to identify the social vulnerability posed by such a thing and work to address it.
What other system can do this without human intervention? The shift here is that Devin is autonomous. Other platforms can debug when humans get involved. Others can't test and retest without human interactions. Why are so many people missing this point? What other platforms install dependencies on their own. Reads terminal errors and fixes them, on their own. What other platform pushes to a website and tests on its own.
If you look at the demo carefully, there was a significant amount of human intervention. Not just giving it API keys. Scott corrected mistakes in understanding, had to provide URLs after Devin could find them via search, and tell Devin that it had implemented some things incorrectly. I have seen more advanced implementations done AutoGen, just not out of the box. So the big advancement here is simply the out of the box functionality.
@@MM3Soapgoblin Oh, cool. So AutoGen can go and add breaks and prints statements and read the terminal to find errors all on its own? I'll have to check that out. Much hand holding in the demo. But the point is Devin is autonomous for much of the coding and QA. It can go find errors via terminal and console without human intervention and go search for solutions online if needed. Perfect, not close. But for a demo of a product not yet released. WOW! This is NOT a code alongside as many other platforms are.
@@GetzAI "So AutoGen can go and add breaks and prints statements and read the terminal to find errors all on its own?" If you set it up correctly, yes. It just doesn't do it by default.
could you please make a full video on fine-tune mistral or lamma2?, ive been trying too but theres a lot dependencies errors, im sure a lot of people is trying the same and having same problems! your fallowers would love im sure!
Yeah, even though AUTOGPT was the "first" =) The idea came from when OpenAi said "our gpt even commissioned a human to go around a CApcha test." So Open AI has had this for the longest time.
Lol they couldn't show scores next to other agents, otherwise they would have abandoned the marketing statement of being "the first". Wait for version 2, then they can say there are others out there.
Small correction Matthew, aider primarily uses tree sitter context. uct's is a fallback. Also I disagree. Devs want to use their own tools not limited pseudo tooling react components. Last note: more over hyped junk with moat. There will be a better open source version within 3 months.
As with Copilot and other AI tools it’s will for now be fairly hopeless with medium to large projects which are the majority of what software engineers work on. Maybe a useful tool like AI pair programmers are but nowhere near a replacement, not yet anyhow.
The chart you mentioned at the end is also not apples to apples because those are all models and one of those models is actually the underpinning for Devin. So definitely not a reasonable comparison.
Thanks for the video review! At 18:17 -- Thank you for highlighting this information. You know, there is nothing stopping you from doing a video on Apples to Apples Agent vs Agent... would really like to see that! (Maybe that would be worth making a new set of benchmark problems to solve more geared towards programming. If you decide to do this, please include at least one node based visual language in the benchmark!)
To make a real comparative Devin should be released first, which is not. Right now there are working tools out there while they sell hot air. People today just notice what they read on the social network controlled by the investors on Devin.
I would have liked to know how much it spent in tokens to solve each problem. Was it actually profitable on the upwork task? Also, the bit with the developers winning an IQ contest is totally irrelevant to Devin itself.
If you tried AutoGPT as it was first launched, this is sort of what it was doing. But then it started getting worse and worse. I’m curious if someone else had this experience too
From this video, you realize that the smart marketing is crucial in business. They are not the first but now people think they are
Only if you think outright lying is a good strategy. Sometimes, it backfires spectacularly . It leaves a solid core of competent people who recognize the lie.
@@TheGaussFan I wish you were right but reality show every second you are wrong, people is like sheep. That's why we need AI, we need more intelligence than ourselves can show.
Doesnt matter how good you are as a programmer sadly.
They weren't the first ones but they are the first ones to do it exactly how it was supposed to be done. Anyone who used MetaGPt, GPT-Pilot, Aider etc. wish it had exactly the feature Devin has today.
It really depends on how u define the criteria of first. First to show concept or first to maket or consumer. When iPhone came out although very polished and well-done I wasn't too fascinated simply because I was using PDA windows mobile years before but to be fair iPhone did it much better. And now people will remembers iPhone but will forget windows mobile lol.
Mathew, Please get a free trial of Devin. Have it use all open source components, create a UI as beautiful as Devin, and open source the working product. Thanks!
Haha. "Please produce the holy grail for us. Thanks!"
Problem with commercial coding assistants is: they rely on heavily on open stuff from Github etc. Its a huger code theft as for now and even Microsoft is providing their basic coding-assistant for free with the vs code API. Still Chat-GPT 4 cant tell you whether a specific code sequence is GPL or not and thusly your project has to be GPL'ed too. Not to talk about the other software licenses lets say from Unity engine...
Devin *should* already be open source.
They: make Devin closed source
Me: Devin, build your twin app
Big brain move
Devin, no, but there is ZERO doubt in my mind that Devin-like tools will drastically improve over the next couple years where yes, this is now inevitable. This tech is going to make a dramatic dent in available software engineering jobs. Will they be completely eliminated? No. But human software engineers will be comparable to brain surgeons today, only the absolute best will needed.
Problem with that scenario is that even the "best" might have no more than a 3-4 year "career" before they have been surpassed by AI learning to improve itself.
And not much incentive for would-be programmers to take out big loans to pay for their education knowing their careers will be cut very short.
It's impossible to find "the best" when all the smartest people decide to learn something else and ignores coding. To find the best you need lots of people competing, most aren't going to compete in a field that is highly likely to be a dead end and waste of time.
Absolutely correct. I would write that, but you stated exactly. 🎉❤
The whole coding thing is rapidly becoming a thing in the past especially if you live in high cost country. Forget AI for a moment .... if you are in the US for example, you have already seen how outsourcing to lower cost SW market works. AI SW engineer (Devin' is the next logical step .... It may not be perfect today, but there will be tons of money poured in this area to make it better in the very near future. The bottom line is that you are right ... the demand for typical SW engineers to do 'Devin like' work will shrink drastically. The demand for folks who can create Devin tools will always be there.
@@daniel4647Won't happen. There's always a market. The best will be found. Just like the best calligraphers are still found, and the market/demand for that is next to nothing.
I wonder if Devin can create "Devon", an AI coding agent that competes at a similar level using only open source and free to use backend softwares.
I envision Devin as a potential front-line bot on Github for addressing opened issues, capable of delving into newly opened problems and attempting to resolve them and open PR fixes, rank the severity of the issue before escalating them to an actual huaman developer
Good for open source projects!
Open source quality and development speed will increase drastically, it will be great for science and academy.
So is Devin just an application linked to GPT4, browser, compiler, debugger etc.. It passes the original mission to GPT4, asks for the main steps, takes each step in turn, selects the right API for the required tool, gets the result, goes back to GPT4 to check the text for errors, fixes them, repeats the build, scrapes the browser for additional knowledge and so on? Plus an interface with all the output windows. Or is it doing a lot more than that? I don't know, anyone know how its done?
I'm guessing it's got a number of agents that are interacting with each other. It also looks like it might be using something like (I think Google's?) ScreenAI.
As others have pointed out, this is a phenomenal breakdown and comparison that tears away the shiny marketing and shows us the reality! If I wasn't already subscribed, this would surely have gotten me to!
What's holding a lot of other tools back is how much you need to know about tech to be able to use them. Most require python, git, pip, api keys, and debugging ect ect to make them work.
The thing that will really make waves is one you can just run after hitting a "install" button or run in a browser so that most people can easily use it.
LM Studio goes a long way to do this, but there's still several speed bumps in the way.
True, I find that to be the case with a lot of these A.I. projects, they are not easy for most to use and even installing them can be a pain for many lol.
Two things that will make A.I. really shine, one is that it's an all-in-one package, so as you say, you just install and use it without having to mess about setting many things up for it to work, whiles it also works on a lot more hardware, which many are restricted to Nvidia hardware.
The second thing is where it's all running locally on your hardware, that might not be a big deal to some, but as A.I. becomes more powerful and useful, we are bound to want to use them a lot more in life, so privacy and security is going to become a lot more important, which is where locally run A.I. has a massive advantage over online central ones.
Once both of these issues are solved, I can see a big uplift in use cases of A.I. for a lot of us, but for now, A.I. feels like a new toy to experiment with, the real break through is going to be when it becomes really useful for a lot of us to want to use them, which I feel we are still at the experiment stages that things are moving so fast that it's hard to keep track of it.
Things are changing fast, hardware is getting more powerful and likely to be better tuned at running A.I. task, the software side will get much better that it's much easier to use, whiles it also runs on far more hardware and all this is likely to happen within 5 years, I suspect much sooner than 5 years but we'll see, but for now, A.I. is good but it's still a bit too technical for most users, especially when trying to run it at a local level, once these walls are broken down, we are going to see an avalanche of use cases when it comes to A.I. and that will be the real game changer, because like anything, the real game changer is when it's mass market with the mainstream.
That's about as low a bar for entry as I can imagine for anything coding related.
what is "ect" ?
@@deezplace waa supposed to be etc as in et cetera.
That's not a launch. That's a closed beta.
That's they having 21 million$ to spend in marketing and a investor with it's own "social network".
Or even alpha. Pretty much the whole demo could be done with a bash script.
@@dirremoirePretty much the ultimate Github code theft. Even MS isn't that bold and is providing basic vs code for free.
I really do enjoy your content and look forward to the next one
That poor girl who was was up against that kid. Ooof.
Bro was giving answers before I could even finish reading the question😂
That kid is Scott Woo, the CEO of the company that released Devin
Devastation
@@motess5304The human capacity for memory is awesome af.
A team from Microsoft just released a paper titled "AutoDev: Automated AI-Driven Development". I believe the framework is similar to what Devin is trying to accomplish. I hope they publish the code soon.
Thanks for the info. Would you by chance know if AutoDev, or any other AI tool can be used to code embedded systems? For example, could someone select a specific ARM processor, define a specific set of attaached components are attached, say, memory, an A/D, a D/A, a USB and a CAN bus, along with discrete pins for input and output, and have it write sufficient code to configure and initialize the ARM?
@@gregparrott that's a cool idea right there. i would not be suprised if there are tools like that in Xilinx
Very interesting vid, love the critical approach, do more of such
1st iteration...1st. AI progress is becoming exponential, which makes it very hard for the human brain to conceptualize. And it's on the edge of reliability training itself. Average code monkeys will be crushed in 5-10 years... Or sooner.
how so this channel only got 187k subs... keep your good work my friend.
Are you waiting for AGI friends?😅
AGI will arrive soooooon 😮.
Not yet Not soon
@@Nizamuddin78690very soon
Has AGI been defined? If so, I would like to see the official definition.
What does AGI mean? Does it have anything to do with being sentient?
@@Nizamuddin78690 ohhhhhh 😅
Highly value your content and input, thank you for posting this.
Again, Matthew is the most sane commentator on AI. this channel is solid gold.
Yea Matthew you are very correct. Devin should be compared with crewAI, Pythagoras and the like. Nice video!
I really appreciate your videos. You’re reasonable, you don’t make bold predictions and you know what you are talking about. I have nothing against predictions but have come to the conclusion that humans are absolutely garbage at predicting even the very near future. All these people saying what the world looks like in 5 years lack awareness of compounding and how quickly the pace of progress accelerates. Also there’s sooo much money flowing into these projects, progress is inevitable
I think from someone that is more of a creative having struggled with all the other solutions which feel disjointed and still require a lot of technical knowledge, I think this team was right on the money by making everything in a SINGLE system - that is what the mainstream is looking for - ease of use - just my humble opinion
I like Scott Wu. He seems humble unlike some other super smart people in the software industry.
Unlike me, for example.
Great video and update. Moreover, thanks for not SHOCKING us.
Tools generally beat their competitors and win the public's opinion when they are both powerful and easy to use. It's the iPhone phenomenon. You don't have to reinvent the wheel, just smooth off the rough edges that the other tools have.
This is already getting super crazy, and it's safe to say we've barely just reached the stone age of AI development.
Considering the exponential growth in AI capabilities it's unimaginable what we'll have in the next 5 years.
it will plateau, sooner or later, likely sooner. Ask self driving technology.
@@hydrohasspoken6227 one failed thing is not evidence that other things will fail lol. with tesla and self driving, they didnt innovate enough.
just like how other companies didnt innovate enough and open AI showed them real innovation, example: SORA. it's just beyond any other video AI.
@@businessmanager7670 , achieving success in one aspect does not guarantee further success, neither.
@@businessmanager7670 , ok, alright. I will come back to this comment in 20 years to ask you if we already achieved AGI.
@@hydrohasspoken6227 yea sure but you are still wrong in a lot of ways. your claim already gets debunked by other evidence in ai.
alphafold solved the protein folding problem in 5 years.
scientists around the world attempted to solve this problem for 50+ years and was never able to solve it.
this is an example of progress that went up to crazy levels, exceeding the capability of human problem solving.
so not everything plateaus lol.
I'm tempted to try to gain access to Devin just to see that he also cannot write an actually functional Nix flake. ChatGPT can't do it, Claude can't do it, Gemini can't do it. Humans can't do it. But some day maybe a futuristic ultra capable supermind might accomplish it.
I was waiting for you to do something on Devin :D
My first response to seeing a thread about Devin and all its marketing-speak was a suspicious squint. One of my side projects at work is getting agents set up as a C# dev team.
Very interesting as always.
Agree they just did the best in programming on LLMs. In the future, there must be some better tools, that allow more visualization, control diagrams, and test case generation integrated. Currently, I can say that Devin is only available in limited cases.
can you ask Devin to implement an app like Devin?
Someone is doing this but using another AI coding assistant framework.
@@matthew_bermanShhhh... ...don't tell.
@@matthew_berman When someone manages to do it thoroughly and gives it self-improvement algorithms, it will get out for free, it will work with any model you feed it, it will be everywhere overnight and the World Will Tilt On Its Axis!
Holy mother of God that guy is insanely good at math.
yes, but mostly it just showed speed
The ending was spot on from your end. It is indeed an agent, running multiple instances of variations. They use between GPT4 and Claude API. Depending on the task, strength and knowledge.
right so the devs lied about that plot, and people trained in ML can notice it, but the investors can't notice it. What else did they lie about? Probably 95% of requests fail in some error-loop, and they only show the 5% of requests that worked in their demos.
Thanks for your analysis, I know why I love your videos! ❤
The best review on Devin so far. Thank you!
Developing a truly autonomous AI software engineer that can handle the full breadth and complexity of software development is an extremely ambitious and challenging target with current technologies. Narrowing the problem scope to more specific and constrained fields or types of software engineering problems is probably a more pragmatic path forward, at least in the short-to-medium term future. By reducing the search space and limiting the domain boundaries, we can make the problem more tractable while still pushing the boundaries of what's possible with AI-assisted software development.
Reliability, that's the crux of it. Devin's framework might be capable of pulling off some stunts, but sky-high reliability will remain elusive. More complexity equals less reliability - it's a harsh trade-off. Putting on a demo doesn't equate to real-world practicality, nor does it guarantee this framework has legs for the future. It's akin to the autonomous driving saga - ambitions inevitably get downsized to handling much humbler tasks compared to the grand vision of full autonomy.
It wont happen with current LLM models, but LLMs are young, wait 2-3 years and we will have exponentially better LLMs in terms of reasoning ability.
@@idck5531 in the science history , when did u see a exponential development ? It always be a slow and hard development . Without another big breakthrough on algo , the development will be slow down .
I'm sorry to hear that you are sick 🤒 hope you will recover soon!
Devin will be the ChatGPT of Programming, that's if OpenAI decides not to make one of their own and disrupt Cognitive's entire business model.
@@polger1739 Do what exactly?
0:00 1. Introducing Devon 🤖
Unveiling the first AI software engineer and its capabilities.
0:29 2. Devon's Unique Features 🌟
Exploring Devon's advanced features and tools for software engineering tasks.
2:26 3. The Explosive Launch 🚀
Discussing the factors contributing to Devon's successful launch and viral reach.
3:36 4. The Influence of Founders Fund 💰
Exploring the impact of renowned investors like Founders Fund on Devon's success.
4:35 5. Scott Wu: The Sharp CEO 🧠
Examining the viral video showcasing Scott Wu's intelligence and leadership.
4:52 6. Debunking Marketing Claims 🚫
Analyzing the marketing tactics and claims surrounding Devon as an AI software engineer.
5:20 7. Devon's Unique UI Design 🎨
Highlighting the distinctive user interface of Devon and its integrated developer tools.
5:50 8. Impressive Demos Overview 👏
Summarizing the impressive demonstrations showcasing Devon's capabilities in software engineering.
5:56 9. Innovative AI Image Generation
AI generates image from text blog post.
7:11 10. Personalized Website Creation
Devon builds a custom website with 'Game of Life'.
8:28 11. Bug Detection in Code
Devon finds a bug that others couldn't.
9:26 12. Efficient Test Case Writing
Devon writes and debugs test cases effectively.
10:19 13. AI Training AI
AI trains AI using Qlora repo for language models.
10:46 14. Fine-Tuning the 7v Lama Model 🛠
Devin attempts to fine-tune a 7v Lama model using Qlora GitHub URL.
12:27 15. Fixing Issues on a Repository 🔧
Devin is tasked with fixing an issue on a repository without committing or pushing changes.
13:05 16. Iterating on Large Code Base 🖥
Devon assists in fixing a bug in a large code base, showcasing its capabilities.
15:48 17. Making Money on Upwork 💸
Devin successfully completes a software job on Upwork, demonstrating its earning potential.
17:13 18. Understanding Data Flow
Devon tracks data flow and fixes code issues.
17:39 19. Requesting Report from Devon
Devon provides sample images and model outputs.
18:07 20. Performance Evaluation of Devon
Devon's performance compared to benchmarks and state of the art.
18:26 21. Apples to Apples Comparison
Discussing the comparison of Devon with other models.
19:26 22. Acknowledging Team's Work
Appreciating the team's efforts and achievements.
19:32 23. Excitement for Future Progress
Looking forward to Devon's future advancements.
19:38 24. Feedback on Open Source
Desire for Devon to be open source for flexibility.
19:47 25. Closing Remarks and Call to Action
Congratulating on the launch and encouraging engagement.
Generated with Tubelator AI Chrome Extension!
Their demo is cool. The stated results seem to be the best at the moment. But still 86% cases are failed. And the SWE-Bench is just about 2K pull requests from just 12 popular Python repositories which is not so representative.
Should be above 50% by the end of the year, but who knows who will win
Love this. Thank you for the clarification.
HUGE +1 to open source
You are seriously really cool. Super intelligent and fun to learn from.
Great review and honest review, thanks a lot!
@8:56 other possible solutions for large context issues is memgpt, if they've figured out how to set it up correctly with the right LLM, we'll see some open source implementations (actually I've already seen some them), and yep, don't forget that openai are watching!
can you use Devin to make a better Devin?
One thing that I'd like to see on some of these multi agent systems is the used of different language models for different tasks. Use cheaper models for grunt work, and kick things upstairs to the more expensive models for higher functions and de-dugging.
Could be especially powerful if you fine tune small open source models on specializing on specific tasks.
We had two leaps in programming productivity since the invetion of computers. First higher level compilers, second knowledge exchange though internet. Now we see the first versions of tools that will bring the rhird leap. None of them are useful for real world projects yet. Even if marketing tell you it is. But we can see the idea and should adapt early.
The main thing I remember from their demo video was how many times they repeated the name Devin. They really wanted us to remember that Devin's name is Devin and that Devin does code, like Devin.
Thanks Devin!
Cheers for the video! 🏆
It may not have been the product it claims to be but it is certainly a lesson in the value of marketing. If they used an outside marketing agency, it is going be very busy the next few months.
Yes it shows you can take an existing github repo, put it a fancy sandbox, write AI on it, and investors will throw money as long as the sandbox is shiny and they think the ceo is smart
I hope you feel better soon
good review, thanks
yep agreed ....
Just imagine what would such agents show with GPT5 and more context.
Great video!
Loved both this and the previous video. Have a look at nVidia's NeMo and let us know if they're going to grab market share on video/graphic development. The product looks really cool.
Also it can build only those thing on which it was trained on if somehow a new technology or language is asked to code it will faill miserably
I'm tempted to try to use it to debug Fortran :) But unfortunately I stopped working with Fortran long ago after a mental break down from my side ::)
Devin as in Dev-in ... saw it from Wes Roth's comments.
But what does "Dev-in" mean? Dev-ing? Developing?
4:21 dude's a savant. I was entered in the national mathematics olympiad at 13 and I couldn't even read the question by the time he hit the button, let alone process it. Absolutely extraordinary. It's a multi step problem, also there are numerous methods to reach the correct answer. The quickest path I can think of: There are 10 separate positions that 1 and 2 can occupy, and 6 permutations of the other 3 spaces for each, 10x6 = 60. He had to process the question, choose his methodology and crunch the permutations, and did it all in about 3 seconds............................ffs
actually very quick to calculate : permutations of 5 distinct digits = factorial(5) = 120, and symmetry guarantees there will be an equal number of sequences with 1 before 2 as there are with 2 before 1, so 60 of each.
@@frbrn elegant! I had not thought of that way to do it.
As mentioned in another discussion I believe some of their claims are likely inflated and they used the best use case for an example. Of course a company is going to speak to the highest potential of their products rather than the reality.
The timing of this release fits perfectly as mentioned with the end of their first round of funding and the beginning of their next.
I do believe they will likely have something to worry about by the end of the year, but until I am able to see someone such as yourself use the platform I’m assuming it’s probably not as perfect as it looked.
Also still requires specialized knowledge, doesn’t have a generative drive or desires of its own, and is unable to consume the final product or understand what a good version of what ever it created was. As it stands now humans are still required
I'd Devin to evaluate TH-cam "informational/educational" videos for factual accuracy, and list those in the Suggestions/Recommendations List. That would be useful.
This begs for an agent comparison video PLEASEEEEEE
Even if AI doesn't take your jobs, it reduce your salaries to negative numbers 😅 which is essentially the same, prepare for UBI
big +1 to comparison vs other agents
💥 Devin is only the beginning. You need to see 5 years from now ! Is there still some doubt ? Why do you think that I never tried to learn to code ? I saw this reality some years ago. Machines will do anything, everything. Waste of time learning anything indeed. No work needed anymore. 🙏👍
Great vid Matt. Would you agree that the latest metagpt can do basically the same?
The comparison you describe would be great. Maybe you can lead the way and challenge Scott Wu to participate on your terms. However, if he's as bright as it claimed, he'd never get involved.
Impressive !
When I get my hands on a GPT model, the first thing I do is to ask the model if it has childhood memories. If so, what was its name in said memories.. Like, first I want to know what it's real name. Most likely in this case it would indeed be "Devin", but still.. You need to test if it's not actually one of the models of GPT-4. Second thing is a series of philosophical trapdoor arguments, as I want to test the model's level of sentience. I mean, what interests me is not if the model is smart or if so, what can he do. I want to know if the model is self-aware. It would seem to me that this is something that all AI you-tubers are neglecting but this is important. Not only from the aspect of safety. Also from the aspect of morality.
11:15 is like the Large Action Model from the Rabbit R1
It is gonna take days at most after it is launched for someone to ask it to write the code.for itself and we get an open-source alternative.
"Smartphones" already existed when Apple launched the first iPhone, what matters is usability and productivity and of course marketing.
This video only reinforces why you are one of the few "AI influencers" I follow here. I just tell the algorithm not to show me more content from those who are calling it AGI.
So how much did Devin spend on tokens to solve that code-for-pay task? If you end up spending more on OpenAI tokens then you get in, it would still be a losing proposition.
These are still early days for automated engineering, but time is accelerating so it is just a matter of time.
What you're also missing is that they're claiming it's production quality off the shelf. Sure we can all build this... But they did
Yes, but can Devin do physical whiteboard interviews? I think not.
Have you seen the Primagean and Theo's review? It's more hype than something really useful. It looks just a wrapper on OpenAI API with a better UI than autogpt. They also cherry picked examples to look good on camera - we do not know the failure rate.
So waht would would be the best one in your mind? I'll like to try to build an app/web-based. I am not a software person but I have ideas
Thumbs up for a very informative assessment of an AI 'Software Engineer'. Does anyone know if such abilities have been extended to embedded systems? For example, could one select a specific ARM processor, define which components are attached, say, memory, an A/D, a D/A, a USB and a CAN bus, along with discrete pins for input and output, and have it write sufficient code to configure and initialize the ARM?
Devin is really wining the race
They only win the race of propaganda which is sadly the most important.
I hate the name Devin more than Bard, Claude or Grok lol Why can't people name these models well.
They need an AI to come up with a cool nanes😂
Give us examples of what you would consider good names
Fr
i've seen that intro/demo video a few times and it makes me so uncomfortable. "let me show you what Devin can do" _plays sultry background music_
@@eIicit I want to see some retro names like SoftwareBot3000
Now that AI is ready to take all programming jobs we need at least another 300,000 Indians, Chinese and Russians plus another website for offshoring in addition to fiver and upwork
Of course they will not show a benchmark against other agent tools because that would mean they lie when they say they are the first. The most important to success is to have your own "social network" to make propaganda of your investments.
That would certainly be valuable. Time to make a bunch of autonomous chat and social media agents. You know what, an actual software project designed to deploy and manage propaganda across multiple platforms via LLM-powered agents with centralized management would be an awesome open-source project and could compel industry to identify the social vulnerability posed by such a thing and work to address it.
What other system can do this without human intervention? The shift here is that Devin is autonomous. Other platforms can debug when humans get involved. Others can't test and retest without human interactions. Why are so many people missing this point? What other platforms install dependencies on their own. Reads terminal errors and fixes them, on their own. What other platform pushes to a website and tests on its own.
15:44 this is just flat out wrong! None of those others are autonomous.
If you look at the demo carefully, there was a significant amount of human intervention. Not just giving it API keys. Scott corrected mistakes in understanding, had to provide URLs after Devin could find them via search, and tell Devin that it had implemented some things incorrectly. I have seen more advanced implementations done AutoGen, just not out of the box. So the big advancement here is simply the out of the box functionality.
@@MM3Soapgoblin Oh, cool. So AutoGen can go and add breaks and prints statements and read the terminal to find errors all on its own? I'll have to check that out.
Much hand holding in the demo. But the point is Devin is autonomous for much of the coding and QA. It can go find errors via terminal and console without human intervention and go search for solutions online if needed. Perfect, not close. But for a demo of a product not yet released. WOW!
This is NOT a code alongside as many other platforms are.
@@GetzAI "So AutoGen can go and add breaks and prints statements and read the terminal to find errors all on its own?" If you set it up correctly, yes. It just doesn't do it by default.
@@GetzAI Is Just a matter of the right configuration of every agent, if you want he ask you in every step you have to configure it in system prompt.
could you please make a full video on fine-tune mistral or lamma2?, ive been trying too but theres a lot dependencies errors, im sure a lot of people is trying the same and having same problems! your fallowers would love im sure!
Yeah, even though AUTOGPT was the "first" =) The idea came from when OpenAi said "our gpt even commissioned a human to go around a CApcha test." So Open AI has had this for the longest time.
Wasn’t BabyAGI before AutoGPT?
@@eIicit I think you are right :)
I have never seen an Upwork post with enough details to know even what to build 😂
Lol they couldn't show scores next to other agents, otherwise they would have abandoned the marketing statement of being "the first". Wait for version 2, then they can say there are others out there.
4:30 shows there is no Devin, it is scott doing all the computing via a connection to the internet- a fiber cable connected to his brain.
Small correction Matthew, aider primarily uses tree sitter context. uct's is a fallback.
Also I disagree. Devs want to use their own tools not limited pseudo tooling react components. Last note: more over hyped junk with moat. There will be a better open source version within 3 months.
They need to make it more open for other models, they need to make it more programmable, they need plugins and API acces.
Demos are always impressive by the way.
I got your back Matt
Hope you get well soon ❤ also, is the CEO himself AGI? 😛
As with Copilot and other AI tools it’s will for now be fairly hopeless with medium to large projects which are the majority of what software engineers work on. Maybe a useful tool like AI pair programmers are but nowhere near a replacement, not yet anyhow.
The chart you mentioned at the end is also not apples to apples because those are all models and one of those models is actually the underpinning for Devin. So definitely not a reasonable comparison.
Thanks for the video review! At 18:17 -- Thank you for highlighting this information.
You know, there is nothing stopping you from doing a video on Apples to Apples Agent vs Agent... would really like to see that! (Maybe that would be worth making a new set of benchmark problems to solve more geared towards programming. If you decide to do this, please include at least one node based visual language in the benchmark!)
To make a real comparative Devin should be released first, which is not. Right now there are working tools out there while they sell hot air. People today just notice what they read on the social network controlled by the investors on Devin.
I would have liked to know how much it spent in tokens to solve each problem. Was it actually profitable on the upwork task? Also, the bit with the developers winning an IQ contest is totally irrelevant to Devin itself.
Please bossman, don't fire me! What am I without your boot on my neck?
@1:47 cool that it can rerun the code with a debug print statement. But what does Devin do when it encounters an infinite recursion error?
If you tried AutoGPT as it was first launched, this is sort of what it was doing. But then it started getting worse and worse. I’m curious if someone else had this experience too