When all the dust settles and companies realize they need developers again, just remember their first reaction when they got their new toy was to eliminate developers.
Why isn't AI replacing management and CEOs? Why just developers and employees? Hint: Venture Capital is full of scams targeting people with big wallets and smooth brains.
@@KevinJDildonik Because responsibility on Managers and CEO is greater than on programmer. Manager is responsible for developers, and CEO for everyone. Often legally responsible. It is not ability issue, it is risk mitigation issue
let's gooo, another internet of bugs video! yes finally someone actually tests it, even though most of the benchmarks are probably in the training data of many LLMs, but this was telling, thanks for this video
Thanks for this evaluation. As a non-coder I tried copilot last year to see what it could do and quickly realized it was useless for me, so I never wasted my money. I've been watching the space, but haven't seen any comparisons like this. Really helpful for people like me who don't have coding experience, but want to understand where the technology is at.
Your statement at the end regarding MBA’s sitting in a corporate ivory tower, making decisions on layoffs based on hype is the real concern about this whole AI debacle and it is happening. This has the potential to set many companies back many many years.
Well if it's all true. Once the hype subsides they'll need us like never before to clean up their mess That might be a good time to strike gold with contracts that will allow us to save up enough to retire ..
I swear every product I use, even just as a user and not a developer, is turning to shit since the AI push. Google products in particular, which makes me sad because I used to really love their products.
This video continues the pattern on AI coding I see. All the skeptical/critical examples are extremely detailed, while all the times I see praise its extremely vague.
Yours are the only videos I can watch on this topic. I get literally nauseous with disgust when I see chatbotshill content. Cheers mate, excellent stuff
@2xbyx4 LOL. I almost went back and re-recorded that, but I figured YOLO. I'm not EXACTLY sure what words I was trying to say (about having had fun playing with CodeCrafters), but it sure didn't end up sounding like whatever I meant. 🤣
@@isaacalves6846 Here is what ai said about your criticism. "I understand where the creator is coming from. The Al systems in the video did struggle with some relatively simple tasks. However, I've also seen Al systems do some amazing things. I think it's important to remember that Al is still in its early stages of development. There's a lot of room for improvement, and I'm confident that Al will eventually be able to do many things that are currently impossible.".
Great video! It shows that while AIs are useful, they can’t replace the creativity and problem-solving of human developers. A must-watch for junior devs
It's incredible this content is free on the internet. Take my money, Lord of the Bugs. (tried to post this comment in the long form video, but the feature is not enabled there...)
Carl your "straight-up" objectivity, with comparing these AI Coding Generating Tools, is what keeps me watching (and learning) from your many years as a software engineer/developer. Yes, please do more AI demos! Really enjoy how you lucidly convey your thoughts. And you're right, watching you type would be boring as "F*@^. HA! 😜
The challenge you gave to AI has been solved by lots of people in the exact same way & posted on github & most likely all the code is already there in the training data of the big models. I tried a few codecrafters challenges & my cursor copilot was finishing the code for me following the exact requested spec before me & I just had to tweak the code. So, it'd be interesting to see how it does in a brand new challenge which doesn't exist anywhere yet.
Thank for this enlightening video. It seems more like you were a patient instructor helping the AIs as if they were new dev students rather than the AIs helping you (the person who paid for their help).
These are the best videos on ai and I always show these to my programmer colleagues whenever they make uninformed statements about how the ai would replace our jobs . This just confirms all my suspicions about the limitations and this just scratches the surface. I work on much larger code bases with more difficult challenges and it can't even do these easy tasks u show here properly. I m not worried about my job security at all
I've done the CodeCrafters http challenge and the description is clear enough for an AI to write the code, like the requirements are so constrained for an AI to do its thing. Imagine having to start introducing edge cases or start thinking about maintaining the crap that you didn't write.
@@bnchi The reason it’s capable of even doing that is that the code for building an HTTP server is readily available on the Internet and has been for many years. It was ingested as part of the ML learning process. It’s not creating anything new. It’s not creating anything you couldn’t already find on stack overflow or GitHub or a coding book or anywhere similar.
Yes, all those things are why AI won't replace developers, but it's still way more efficient to work with the AI than do it all on your own. Regarding introducing edge cases -> This is more a function of how well you structure the code. AI allows you to write and refactor code faster, so overall in the same time you can produce code that is *more resilient* to edge cases. Same goes for maintenance. AI allows you to much more quickly refactor code into its own functions for example, which reduces spaghetti and increases maintainability. Your criticisms really only apply if 1) You don't give detailed instructions to the AI about the code you want it to write 2) You don't iterate on the code generated by AI with more instructions and manual interventions
@@seeibe I often use my editor refactoring tools to move a highlighted region or a block into a function, inline code, change function name across the entire codebase etc .. the editor is way better in these tasks than an LLM because they're working with the actual AST of the code so the editor know a lot about the code than a guessing tool like LLM. Having to let the AI do architecture decision or name suggestions in my code always been a failure and I have to describe things in multiple iteration to then get something that I would have written way faster.
@@seeibe What AI does is speed up things. Whether you're an expert who routinely writes well structured and efficient code or a beginner who regularly produces crap, what you produce won't change, you'll just produce it faster.
As a member of the general public I do need to keep a search enging in another window to explain what you are doing. But it is clear you are actualy putting the AI through rigorus tests applicable to your feild of expertise. So much of the media is taking these companie's claims at face value. Thank you for the time and effort you are putting in to this and for spuring me to find out more about codeing.
I'm very interested to see how Cursor performs with medium tasks. Judging by the way the "very easy" tasks look, using AI tools to write the code and then debugging them takes about the same time time aswriting the code yourself.
Not sure but I'd expect it to be a great tool for learning? Even now when I want to learn a new tool, I will let the AI generate the code for me, and then have it explain the parts I don't understand. Way faster and more hands-on than first painfully diving into the documentation and exhausting yourself before you can write your first line of code.
@@seeibe Maybe, but the quality of your learning process degrades, you believe you dont cuz look how fast you are learning, but it actually does degrade it. AI is meant to help you automate boring, mundane tasks that you already know and can fact check easily. It isn't meant to be used as a tool for learning.
@@seeibe sure i get what you say but learning is about reading and then using that knowledge to make something when you use AI you don't do much it gives you chewed food basically.
@giorgikochuashvili3891 A lot of what you talk about there is cross referencing knowledge in our brains. With the current state of human knowledge you'd have to hyperspecialize for that to be effective. I rather be a generalist with a broad knowledge, and let the machine take care of dredging up the detail from its vast pool of knowledge.
If your company is trying to replace developers with AI, what it tells you is they do not have any clue how value is created in a technical organization. If they think they can replace developers because AI is able to "write code", they are not just factually mistaken about whether AI can write good code, they are more deeply mistaken that merely coding is the primary value added by their technical team members.
Thank you for your video, was interesting and unexpected in the end) Waiting for the next experiments with typescript and/or rust (*or any other compiled language)
Great review, thanks. Especially useful right now with all the hype around Cursor, at least on twitter (apparently deserved at least relatively speaking).
Oh man, as someone who's likely going to have some demo or another that would fall under your purview of interest, I'm excited someone doing this the right way!
12:16 syntax errors aren’t affected by try… except block. Python checks syntax at the start during compilation to bytecode, and try…except block catches runtime errors
Subscribed, pretty much what i expected, can you also add replit's agent given all the hype it been getting ? I also think this should be redone every 1-2 months as a series given the current hype cycles.
Haven't tried that one - I'll go take a look. Thanks. Not sure about repeating it - we'll see. It's a lot of work. As long as people keep watching them, I'll keep making them, but I'm concerned people will get bored and it would be wasted time and effort.
Sharing the actual prompts to the AI to resolve the problems would make a difference. If the prompts were merely a copy&paste from the CodeCrafter page there is no wonder that the the AI failed - there is a specific way to "talk" to the AI, keep in mind to break the problem into tiny steps. Secondly, all the coders mentioned do some RAG of your existing interactions with your code => crappy results at first usage. (still not in the chat mode) - so I expect more accurate results by pasting those problems directly to the web interface to GPT4o or Claude 3.5.
I'm not interested in the "how can a skilled programmer best use an AI" question. I'm interested in the "how well can an AI emulate or replace) a skilled programmer?" question.
This statement reflects a weird fact that if you play by AI's rules/prompts, the outcome will generally be better (similar experience with autonomous driving, it takes sometime to adjust to the driving algorithm). It means for AI to perform well, human now has to adapt to AI instead. That perspective is either depressing (if AI becomes dominant) or unrealistic (if AI continues to require it). And then we are also asked to trust a system that NO ONE even its creators understand well??
@@InternetOfBugslol… watching you videos on this topic reminds me of the videos of other creators using LLMs to write a programs that would take an entry level devs 15 or 20 minutes and able to pull it off in just a few hours of prompting.
I think this comes down to the information involved in describing what you want. For example what if I told the prompt. "Make me an app that will be successful". It doesn't have enough information. There are so many paths that it can take because of ambiguity. Most of the AI is just reuse of templates that are already known. To innovate you would need to manually create new ideas at a fundamental level. So then in the future AIs are reduced to the idea creating capabilities of the user.
This was a great way to compare and test different integrations. I would love to see a more "realistic" test would implement some generally understood best practices for prompt engineering (since this is often the use case for devs who try these tools)?
I was actually pretty mad when I heard the challenge since it sounded cherry-picked to be way too easy. I guess even in those cases AI sucks. I don't get why so many people I work with say it's so amazing.
AI can automate some tedious and boring tasks, and that's about it. Which is I guess nice, and that's why people like it, but it's only the easiest part.
Excellent question, maybe they expect a drastic improvevement with the next generation? Maybe they are impressed that AI (deep learning) can do anything at all, given that it is such a new technology?
@@CaridorcTergilti Most certainly part of it, when ever someone actually listens to the criticisms with the current state of LLM's their next response is "well this is that worst its gonna get, it just gets more powerful from here" which just is not guaranteed.
Looking forward to seeing more. I find copilot/codeium handy in the moment to moment work I do, mostly when it fills in a line or block just the way I would have done, or saves me a Google when I can't quite remember the syntax for something. Definitely curious about cursor...
Nice summary, definitely made me want to watch the full session. Around 15:48 you mention that you wouldn't use it for work, outside of testing: did you mean testing the AIs or using them to perform/write tests?
I'm also very curious about how they perform with new vs old languages. Anecdotally I've seen them get c much better than rust etc. It makes sense since they'd have more to train on but I'm curious how much better any one model would do on the same problem using different languages.
There are loads of simple editor tools to run before commits that can automatically fix simple whitespace problems, e.g. Black. Basic syntax errors too. But point taken about Python.
The One that makes honest, unbiased, in-depth tests of billion$ AI projects, may gain a lot of attention. Monetization and value of those projects may be at stake 😉
Hello Internet of Bugs, I have a question, what language do you recommend studying taking into account that today there is a lot of demand for js, java, python programmers and little supply, or offers full of applicants. I am a second year software engineering student and I am a little worried (not so much) about the number of applications there are for each position.
There’s an interesting hidden message here… all those LLMs learned based on the feed that was available to their creators, so they carry that „accent” in whatever they spew out. I bet the PyCharm one was so convoluted and most advanced of the four (albeit fumbling) because JetBrains fed it with higher quality learning material harvested from the code actual humans entered into their IDE. It doesn’t change the fact that they all fumble and stumble upon their own feet like ghouls in Fallout though 🙄😉
I think where current coding AIs are arguably the most useful is through their auto-complete feature. At least for an experienced dev I think it can save you time. I'd really appreatiate you running the same 4 AIs through their paces in this field - because the ranking could be totally different there (as it is often a different model than in the chat feature).
Hey, nice exercise! I did something similar recently to compare the "new kids on the block" and came to a similar conclusion. A couple of questions: 1. Did you use the default model for Codeium? You can also use gpt4 and Claude 3.5 sonnet with it (in a limited way). 2. When using copilot, do you mostly use the inline chat functionality? While it's true that its quality seems to be a bit lacking lately, correcting issues has typically worked well for me, but I tend to use the chat window and keep the right files added as context while using it. This might make it perform better, but perhaps you already do that. Anyway nice video, as usual!
I think that inline chat with copilot has no concept of a history of file versions, does it? I see you talking to it mentioning a "previous version" of the code, which I'm pretty sure will be nonsense to the model as it won't have access to a "previous version". Wonder how much those things affect the results.
Two critiques: 1) you should absolutely be using API console with temperature at zero, in all cases. 2) using exact prompts for each isn't really a great test because different LLMs have significantly different techniques a skilled user would use to consistently get optimal results. Still, a fantastic video. I personally did similar tests on python, JS, and Elixir and Sonnet 3.5 blew everything out of the water, and I also massively appreciated the much larger context window.
The scenario I was trying to emulate was the "unskilled (or at least unknowledgeable about programming) user." I'm more interested in the "can an AI replace a programmer?" question than the "how can a skilled programmer best be more productive using AI?" question. So I used the default AI configs and copy/paste prompts. Whether or not that's the most important (or useful) question is a whole other topic for discussion.
I'm a relative beginner with about 2 years coding experience, and already even at my level of experience I frequently have to solve problems entirely on my own, and they end up looking nothing like the initial solution the AI (claude in my case usually) uses. Can't imagine what it's like for actual experienced veterans.
I ended my Copilot subscription a few months ago. I've just been using my gpt-4o subscription and that's been ok. I'm not really into looking for a code-first AI again (yet). It's just not good enough and doesn't really help me beyond what I can do with gpt-4o at the moment.
Now in fairness I use Cody by sourcegraph which didn't make the list.. and it has increased my output, I'd say x3 at least. However I've also been coding since the 80's, and when I started there wasn't even an internet. So I don't notice these errors you are pointing out and my guess is as a solid coder yourself you knew where it was screwing up but the rules of the game prevented you from letting the model know other than console outputs and errors etc. So for boilerplate (and lets face it after 40 years at a console everything looks like boilerplate) well for boilerplate code its a fantastic time saver. It also allows me to think ahead at my coding strategy as the codebase in an app advances.
The subtitles read "the company that made Devin was a new-ish startup and I certainly *can* recommend them". I'm sure you meant "certainly *can't* recommend them". Assuming the subs were transcribed with something like Whisper and not checked, this is, for this channel, pretty ironic.
I was using copilot since forever, I had no idea it was so bad. Is it still on the custom fork of gpt3 for code, or have they updated it to one of the newer models?
DO MORE OF THESE Also I think Rust or Clojure would be fun. Rust has fucking confusing error messages if you're doing anything with threads. Wondering for Clojure if they can get the parentheses balanced right.
I gave ChatGPT a problem I was not understanding totally, discrete mathematics from Epp's book. It had some conditions and the task was to build subsets from these. It couldn't do it correctly even when I gave it more and more extra data and explained things. If such basic things cannot be done, much less understand and manage a complex system.
I looked at that, but Gemini seemed about twice as expensive as any of the others (once the trial period expires - at least according to cloud.google.com/products/gemini/pricing ) and that's comparing Gemini's full year's commitment price to Cursor/CoPilot's month-to-month price, so I decided not to deal with it. I might add it if i keep doing more of these, though.
@@InternetOfBugs yeah, I think most of my team is just using the much cheaper generic chat interface when they need to use AI tools. I don't know if we'll use it after the trial period.
iob.fyi/codecrafters will let you sign up to try CodeCrafters challenges yourself. If you're interested in seeing if you're smarter than an AI.
When all the dust settles and companies realize they need developers again, just remember their first reaction when they got their new toy was to eliminate developers.
Why isn't AI replacing management and CEOs? Why just developers and employees? Hint: Venture Capital is full of scams targeting people with big wallets and smooth brains.
Oh we're not going to forget this
@@KevinJDildonik Because interaction with people is harder than interaction with machines.
@@KevinJDildonik Because responsibility on Managers and CEO is greater than on programmer. Manager is responsible for developers, and CEO for everyone. Often legally responsible. It is not ability issue, it is risk mitigation issue
Pepperidge farm WILL remember.
let's gooo, another internet of bugs video!
yes finally someone actually tests it, even though most of the benchmarks are probably in the training data of many LLMs, but this was telling, thanks for this video
Great comparison! Thank you for your effort 🙏🏼👍🏼
Thanks for this evaluation. As a non-coder I tried copilot last year to see what it could do and quickly realized it was useless for me, so I never wasted my money.
I've been watching the space, but haven't seen any comparisons like this. Really helpful for people like me who don't have coding experience, but want to understand where the technology is at.
Thanks for saying so.
Your statement at the end regarding MBA’s sitting in a corporate ivory tower, making decisions on layoffs based on hype is the real concern about this whole AI debacle and it is happening. This has the potential to set many companies back many many years.
Was it Oracle who said they e been able to reduce their programming staff by like 75% or something? Wonder how that’s going…
@@pchasco It won't last. Give it 6 months, especially if the economy starts to heat up in Q1 of next year.
Well if it's all true. Once the hype subsides they'll need us like never before to clean up their mess
That might be a good time to strike gold with contracts that will allow us to save up enough to retire ..
@@a_mediocre_meerkatIt's like self-imposed Y2K prepping.
I swear every product I use, even just as a user and not a developer, is turning to shit since the AI push. Google products in particular, which makes me sad because I used to really love their products.
This video continues the pattern on AI coding I see. All the skeptical/critical examples are extremely detailed, while all the times I see praise its extremely vague.
Yes, clear signs where the logical side resides.
Yours are the only videos I can watch on this topic.
I get literally nauseous with disgust when I see chatbotshill content.
Cheers mate, excellent stuff
16:28 Can Confirm, Carl did enjoy playing with himself during the making of this video
@2xbyx4 LOL. I almost went back and re-recorded that, but I figured YOLO. I'm not EXACTLY sure what words I was trying to say (about having had fun playing with CodeCrafters), but it sure didn't end up sounding like whatever I meant. 🤣
@@InternetOfBugs we love watching you playing with yourself no need to thanks us for that.
This was the most honest video about the state of LLMs for programming I've seen
Sorry I do not agree. He isn't understanding the future of AI. It isn't just a probability machine.
Agree 💯, good video
I absolutely do not agree. This was a highly manipulative ad. So many people are under the influence of mind control! It is crazy!
@@isaacalves6846 Here is what ai said about your criticism. "I understand where the creator is coming from.
The Al systems in the video did struggle with
some relatively simple tasks. However, I've also
seen Al systems do some amazing things. I think
it's important to remember that Al is still in its
early stages of development. There's a lot of room
for improvement, and I'm confident that Al will
eventually be able to do many things that are
currently impossible.".
Except he didn't use any of the top LLMs. Claude, GPT-4o, Gemini.
Great video! It shows that while AIs are useful, they can’t replace the creativity and problem-solving of human developers. A must-watch for junior devs
It's incredible this content is free on the internet. Take my money, Lord of the Bugs. (tried to post this comment in the long form video, but the feature is not enabled there...)
You’re doing the lord’s work. Keep it up!
Great work Carl, appreciate your more sane approach to test these LLM based code generator rather than make-believe results.
Best intro you've done so far, and that's a pretty high bar to beat lmao
Carl your "straight-up" objectivity, with comparing these AI Coding Generating Tools, is what keeps me watching (and learning) from your many years as a software engineer/developer. Yes, please do more AI demos! Really enjoy how you lucidly convey your thoughts. And you're right, watching you type would be boring as "F*@^. HA! 😜
Great stuff. Really hope for comparisons of more complicated tasks. I bet they're entirely different and the AIs reach a breakthrough.
😂
The challenge you gave to AI has been solved by lots of people in the exact same way & posted on github & most likely all the code is already there in the training data of the big models.
I tried a few codecrafters challenges & my cursor copilot was finishing the code for me following the exact requested spec before me & I just had to tweak the code.
So, it'd be interesting to see how it does in a brand new challenge which doesn't exist anywhere yet.
I love how thought out your videos are. Thank you for you videos.
Thank for this enlightening video. It seems more like you were a patient instructor helping the AIs as if they were new dev students rather than the AIs helping you (the person who paid for their help).
Thank you for this fantastic video. I especially like the methodology you chose to evaluate these tools. Would love to see more demos in this format.
Спасибо!
This is a great comparison / general test of AI. First time being recommended your content and I'm excited to see more
These are the best videos on ai and I always show these to my programmer colleagues whenever they make uninformed statements about how the ai would replace our jobs . This just confirms all my suspicions about the limitations and this just scratches the surface. I work on much larger code bases with more difficult challenges and it can't even do these easy tasks u show here properly. I m not worried about my job security at all
I've done the CodeCrafters http challenge and the description is clear enough for an AI to write the code, like the requirements are so constrained for an AI to do its thing. Imagine having to start introducing edge cases or start thinking about maintaining the crap that you didn't write.
@@bnchi The reason it’s capable of even doing that is that the code for building an HTTP server is readily available on the Internet and has been for many years. It was ingested as part of the ML learning process. It’s not creating anything new. It’s not creating anything you couldn’t already find on stack overflow or GitHub or a coding book or anywhere similar.
Yes, all those things are why AI won't replace developers, but it's still way more efficient to work with the AI than do it all on your own. Regarding introducing edge cases -> This is more a function of how well you structure the code. AI allows you to write and refactor code faster, so overall in the same time you can produce code that is *more resilient* to edge cases. Same goes for maintenance. AI allows you to much more quickly refactor code into its own functions for example, which reduces spaghetti and increases maintainability. Your criticisms really only apply if 1) You don't give detailed instructions to the AI about the code you want it to write 2) You don't iterate on the code generated by AI with more instructions and manual interventions
@@seeibe I often use my editor refactoring tools to move a highlighted region or a block into a function, inline code, change function name across the entire codebase etc .. the editor is way better in these tasks than an LLM because they're working with the actual AST of the code so the editor know a lot about the code than a guessing tool like LLM. Having to let the AI do architecture decision or name suggestions in my code always been a failure and I have to describe things in multiple iteration to then get something that I would have written way faster.
@@seeibe What AI does is speed up things. Whether you're an expert who routinely writes well structured and efficient code or a beginner who regularly produces crap, what you produce won't change, you'll just produce it faster.
As a member of the general public I do need to keep a search enging in another window to explain what you are doing. But it is clear you are actualy putting the AI through rigorus tests applicable to your feild of expertise. So much of the media is taking these companie's claims at face value. Thank you for the time and effort you are putting in to this and for spuring me to find out more about codeing.
I'm very interested to see how Cursor performs with medium tasks. Judging by the way the "very easy" tasks look, using AI tools to write the code and then debugging them takes about the same time time aswriting the code yourself.
Imagine a person who has no clue how to code sits down and try this
Not sure but I'd expect it to be a great tool for learning? Even now when I want to learn a new tool, I will let the AI generate the code for me, and then have it explain the parts I don't understand. Way faster and more hands-on than first painfully diving into the documentation and exhausting yourself before you can write your first line of code.
@@seeibe Maybe, but the quality of your learning process degrades, you believe you dont cuz look how fast you are learning, but it actually does degrade it. AI is meant to help you automate boring, mundane tasks that you already know and can fact check easily. It isn't meant to be used as a tool for learning.
@@seeibe sure i get what you say but learning is about reading and then using that knowledge to make something when you use AI you don't do much it gives you chewed food basically.
Because of the Dunning-Kruger effect they will have no idea of the many ways in which their end result is broken.
@giorgikochuashvili3891 A lot of what you talk about there is cross referencing knowledge in our brains. With the current state of human knowledge you'd have to hyperspecialize for that to be effective. I rather be a generalist with a broad knowledge, and let the machine take care of dredging up the detail from its vast pool of knowledge.
Yes, keep separating the hype from the value. This is an honest service.
If your company is trying to replace developers with AI, what it tells you is they do not have any clue how value is created in a technical organization. If they think they can replace developers because AI is able to "write code", they are not just factually mistaken about whether AI can write good code, they are more deeply mistaken that merely coding is the primary value added by their technical team members.
Days without attacks on Python: 0
Thank you for your video, was interesting and unexpected in the end) Waiting for the next experiments with typescript and/or rust (*or any other compiled language)
The intro convinced me I want to see comparison videos a lot more
Great review, thanks. Especially useful right now with all the hype around Cursor, at least on twitter (apparently deserved at least relatively speaking).
Great stuff, I've been looking for exactly this type of breakdown
Your content is very good. I am proud to be your subscriber.Wish best luck❤❤❤❤
Oh man, as someone who's likely going to have some demo or another that would fall under your purview of interest, I'm excited someone doing this the right way!
Man that's the analytics we have to get from the media. Big thanks for your hard work
Thanks for this comparison, i'd love to see videos with harder problems.
Keep up the good work. Looking forward to the next video!
12:16 syntax errors aren’t affected by try… except block. Python checks syntax at the start during compilation to bytecode, and try…except block catches runtime errors
Subscribed, pretty much what i expected, can you also add replit's agent given all the hype it been getting ?
I also think this should be redone every 1-2 months as a series given the current hype cycles.
Haven't tried that one - I'll go take a look. Thanks.
Not sure about repeating it - we'll see. It's a lot of work. As long as people keep watching them, I'll keep making them, but I'm concerned people will get bored and it would be wasted time and effort.
Nice video. Keep up the good work!
What about AIDER with Deepseek Coder v2 running in a local machine?
Sharing the actual prompts to the AI to resolve the problems would make a difference. If the prompts were merely a copy&paste from the CodeCrafter page there is no wonder that the the AI failed - there is a specific way to "talk" to the AI, keep in mind to break the problem into tiny steps.
Secondly, all the coders mentioned do some RAG of your existing interactions with your code => crappy results at first usage. (still not in the chat mode) - so I expect more accurate results by pasting those problems directly to the web interface to GPT4o or Claude 3.5.
I'm not interested in the "how can a skilled programmer best use an AI" question. I'm interested in the "how well can an AI emulate or replace) a skilled programmer?" question.
This statement reflects a weird fact that if you play by AI's rules/prompts, the outcome will generally be better (similar experience with autonomous driving, it takes sometime to adjust to the driving algorithm). It means for AI to perform well, human now has to adapt to AI instead. That perspective is either depressing (if AI becomes dominant) or unrealistic (if AI continues to require it). And then we are also asked to trust a system that NO ONE even its creators understand well??
@@InternetOfBugslol… watching you videos on this topic reminds me of the videos of other creators using LLMs to write a programs that would take an entry level devs 15 or 20 minutes and able to pull it off in just a few hours of prompting.
I think this comes down to the information involved in describing what you want. For example what if I told the prompt. "Make me an app that will be successful".
It doesn't have enough information. There are so many paths that it can take because of ambiguity. Most of the AI is just reuse of templates that are already known. To innovate you would need to manually create new ideas at a fundamental level. So then in the future AIs are reduced to the idea creating capabilities of the user.
Never thought about Python's syntactically relevant indentation tripping up LLMs, but it seems obvious now. Cheers!
Yes more videos of this.Perhaps showing what to do yourself and what the AI should automate for you.
This was a great way to compare and test different integrations. I would love to see a more "realistic" test would implement some generally understood best practices for prompt engineering (since this is often the use case for devs who try these tools)?
Great use of screen of death!
What about with the new OpenAI o1-preview??
I was actually pretty mad when I heard the challenge since it sounded cherry-picked to be way too easy. I guess even in those cases AI sucks. I don't get why so many people I work with say it's so amazing.
Most people simply repeat what they are told.
AI can automate some tedious and boring tasks, and that's about it. Which is I guess nice, and that's why people like it, but it's only the easiest part.
Excellent question, maybe they expect a drastic improvevement with the next generation? Maybe they are impressed that AI (deep learning) can do anything at all, given that it is such a new technology?
@@CaridorcTergilti Most certainly part of it, when ever someone actually listens to the criticisms with the current state of LLM's their next response is "well this is that worst its gonna get, it just gets more powerful from here" which just is not guaranteed.
bcause most people arent able to build an http server .
Highly informative video, thanks for the objectivity and clarity
Awesome video, looking forward to more
Looking forward to seeing more. I find copilot/codeium handy in the moment to moment work I do, mostly when it fills in a line or block just the way I would have done, or saves me a Google when I can't quite remember the syntax for something. Definitely curious about cursor...
Great video. Please make more.
Love it when your intro interrupts stuff
Nice summary, definitely made me want to watch the full session. Around 15:48 you mention that you wouldn't use it for work, outside of testing: did you mean testing the AIs or using them to perform/write tests?
I bet he means testing advancements in AI
Testing the AIs - as in making more videos like this.
@@InternetOfBugs phew!
Great video, really helpful and informative
Interesting vid 👍 please do more comparisons 👍
Thanks Carl loved the video.
Very interesting, thank you :) I'm very interested of seeing this in C#
I am so curious how far (your) left your bookshelves go :-)
Content quality that everyone needs. Hypes are created people who makes money out of pumping fomo.
Boy you really do be going the extra mile lol. Thank you
I'm also very curious about how they perform with new vs old languages. Anecdotally I've seen them get c much better than rust etc. It makes sense since they'd have more to train on but I'm curious how much better any one model would do on the same problem using different languages.
great video, Carl! I shared it on my LinkedIn
There are loads of simple editor tools to run before commits that can automatically fix simple whitespace problems, e.g. Black. Basic syntax errors too. But point taken about Python.
The One that makes honest, unbiased, in-depth tests of billion$ AI projects, may gain a lot of attention.
Monetization and value of those projects may be at stake 😉
I hope Sam Altman won’t stalk the author of the video and assault him
Hello Internet of Bugs, I have a question, what language do you recommend studying taking into account that today there is a lot of demand for js, java, python programmers and little supply, or offers full of applicants. I am a second year software engineering student and I am a little worried (not so much) about the number of applications there are for each position.
Love seeing Welch's TCL and Tk book over his shoulder.
Great stuff - I wonder what this tells us about the merits of the *other* Copilots.
There’s an interesting hidden message here… all those LLMs learned based on the feed that was available to their creators, so they carry that „accent” in whatever they spew out. I bet the PyCharm one was so convoluted and most advanced of the four (albeit fumbling) because JetBrains fed it with higher quality learning material harvested from the code actual humans entered into their IDE. It doesn’t change the fact that they all fumble and stumble upon their own feet like ghouls in Fallout though 🙄😉
I think where current coding AIs are arguably the most useful is through their auto-complete feature. At least for an experienced dev I think it can save you time. I'd really appreatiate you running the same 4 AIs through their paces in this field - because the ranking could be totally different there (as it is often a different model than in the chat feature).
Good luck with separating the Hype from the Value!
great video, very informative thanks!. Did not think they would fail this hard lol. Maybe even AI has problem why weakly typed languages?
Hey, nice exercise! I did something similar recently to compare the "new kids on the block" and came to a similar conclusion. A couple of questions:
1. Did you use the default model for Codeium? You can also use gpt4 and Claude 3.5 sonnet with it (in a limited way).
2. When using copilot, do you mostly use the inline chat functionality? While it's true that its quality seems to be a bit lacking lately, correcting issues has typically worked well for me, but I tend to use the chat window and keep the right files added as context while using it. This might make it perform better, but perhaps you already do that.
Anyway nice video, as usual!
Isn’t there a free VScode extension that does the same as cursor?
I don't know, is there?
Yes there is, Cody. I use it. Last year I was using BITO or something like that.
Pretty decent honestly.
I think that inline chat with copilot has no concept of a history of file versions, does it? I see you talking to it mentioning a "previous version" of the code, which I'm pretty sure will be nonsense to the model as it won't have access to a "previous version". Wonder how much those things affect the results.
This man is going to get pushed down a lift shaft, by Satya Nadella😂 But, seriously, great content. Keep it up.
Nyah - I expect I'll get run over by a rogue Tesla CyberTruck in FSD mode.
Great job mister
Give an agentic LLM product a try. Suggest you try Claude Dev. Perhaps it is a preview of Devin
First comment I read that actually had some meat to it. This video was ultimately created as an ad
I came here for this. CLAUDE DEV is great!
Two critiques: 1) you should absolutely be using API console with temperature at zero, in all cases. 2) using exact prompts for each isn't really a great test because different LLMs have significantly different techniques a skilled user would use to consistently get optimal results. Still, a fantastic video.
I personally did similar tests on python, JS, and Elixir and Sonnet 3.5 blew everything out of the water, and I also massively appreciated the much larger context window.
The scenario I was trying to emulate was the "unskilled (or at least unknowledgeable about programming) user." I'm more interested in the "can an AI replace a programmer?" question than the "how can a skilled programmer best be more productive using AI?" question. So I used the default AI configs and copy/paste prompts.
Whether or not that's the most important (or useful) question is a whole other topic for discussion.
I'm a relative beginner with about 2 years coding experience, and already even at my level of experience I frequently have to solve problems entirely on my own, and they end up looking nothing like the initial solution the AI (claude in my case usually) uses. Can't imagine what it's like for actual experienced veterans.
I ended my Copilot subscription a few months ago. I've just been using my gpt-4o subscription and that's been ok. I'm not really into looking for a code-first AI again (yet). It's just not good enough and doesn't really help me beyond what I can do with gpt-4o at the moment.
Now in fairness I use Cody by sourcegraph which didn't make the list.. and it has increased my output, I'd say x3 at least.
However I've also been coding since the 80's, and when I started there wasn't even an internet. So I don't notice these errors you are pointing out and my guess is as a solid coder yourself you knew where it was screwing up but the rules of the game prevented you from letting the model know other than console outputs and errors etc.
So for boilerplate (and lets face it after 40 years at a console everything looks like boilerplate) well for boilerplate code its a fantastic time saver. It also allows me to think ahead at my coding strategy as the codebase in an app advances.
The subtitles read "the company that made Devin was a new-ish startup and I certainly *can* recommend them". I'm sure you meant "certainly *can't* recommend them". Assuming the subs were transcribed with something like Whisper and not checked, this is, for this channel, pretty ironic.
more comparisons!
This guy is a bucket of ice water on CEOs who have drank the Koolaid!
koolAId
you might wanna review chatgpt o1 mini and o1 preview. this might be the version id be willing to pay for.
Hoping to start working on that (o1) video this week,
Very interested in specifically the “AI no-code app builders” hype
I was using copilot since forever, I had no idea it was so bad. Is it still on the custom fork of gpt3 for code, or have they updated it to one of the newer models?
DO MORE OF THESE
Also I think Rust or Clojure would be fun. Rust has fucking confusing error messages if you're doing anything with threads. Wondering for Clojure if they can get the parentheses balanced right.
Awesome video, Internet of Bugs!
I gave ChatGPT a problem I was not understanding totally, discrete mathematics from Epp's book. It had some conditions and the task was to build subsets from these. It couldn't do it correctly even when I gave it more and more extra data and explained things. If such basic things cannot be done, much less understand and manage a complex system.
I wonder how well Google's stacks up. For compliance issues, it's the only one my work allows, but i only use it to stub out unit tests.
I looked at that, but Gemini seemed about twice as expensive as any of the others (once the trial period expires - at least according to cloud.google.com/products/gemini/pricing ) and that's comparing Gemini's full year's commitment price to Cursor/CoPilot's month-to-month price, so I decided not to deal with it. I might add it if i keep doing more of these, though.
@@InternetOfBugs yeah, I think most of my team is just using the much cheaper generic chat interface when they need to use AI tools. I don't know if we'll use it after the trial period.
Please add Claude Dev to you list of options.
More people need to see this video...
It’s so funny, I canceled my copilot subscription after one year and moved to cursor, which uses Sonet, and then this video came along.
Could you compare Cursor with Replit? Thank you for your great videos 🎉
I’m enjoying this! Great video! I hope he uses cursor and Cody!