Correction on Sourcegraph Cody: I wasn't fully using this one correctly. You can download an additional app and link it to your GitHub repository, and it does a _much_ better job of understanding the repo. I'm not sure why the UX doesn't prompt you to do this from the start, but it does fix some of the problems in the video such as over-indexing on node modules. However with all that said, I honestly still wasn't able to get it to perform many tasks that were particularly useful, and the larger the repo I gave it, the less reliable the responses seemed to become. It could definitely be worth a shot in your projects to see if it is useful to you though! 😃
@@ivanolmo4109 There's a little icon in the bottom of the Cody chat that links to a download of a desktop app that asks you to link the repo, and it does a better job of indexing the code if you do.
@@ConnerArdman It's already different now. I did see a prompt about this after I installed Cody but was confused as to why it was telling me about a desktop app. I moved on and now I don't see any reference or links to the desktop app. There own website doesn't have any mention or link to it either. I also asked Cody to assist me with downloading it and it gave me a dead link. FFS UPDATE: ooohhhh... the desktop app isn't available for windows...... SILLY ME
I like that the video starts by jumping right into reviewing the extensions instead of a long intro that ends with a sponsored segment. You get points for that from me
The examples he uses (factorial, mergesort) are trivial with simple, well known solutions implemented in almost all languages. No one writes their own mergesort or factorial function. I'd like to see a comparison of these various tools using something more realistic. For example how well would they implement envelope encryption using various cloud providers? I hope they would produce code that is at least a good starting point for envelope encryption. If I get some time maybe I'll do my own trial using these various tools. What would be some other realistic examples to test these tools with?
These are all terrible at real-world coding. Your best bet is probably GPT-4 since it is much better than the others at having a back and forth conversation. You can kind of reason it into giving you correct code or potential errors sometimes. Still not good though.
imho coding assistants may be more useful working on trivial yet time consuming coding tasks like generating html from a visual description or refactoring existing code. if your prompts end up longer than the assistants code output you're probably not using your assistant well
Personally, I use copilot for work and I never rely on it for new implementations. I only use it as a refactoring/convenience tool. Works great for that. Can speed up my workflow by several orders of magnitude. I might use chat-GPT as a starting point for something I have not done before -- as a smarter google search -- but I wouldn't trust it either. AI is there to augment, not replace, IMO.
@@renynzea - That sounds good, but speeding up your work by two orders of magnitude means you would complete an entire day of coding in 5 minutes. Don't get carried away.
I literally haven’t seen your channel before but I gotta comment about how much I appreciate that your video just starts. There isn’t 3-5 minutes of just bs when the topic is literally explained in the title. THANK YOU!
We need a comparison with more non-standard functions/customized requests. Factorial is too likely to be an example in the training data of the language models. To determine these tools' capabilities and usefulness for actual work we'd need some tests to get code that solves a problem the tool most likely hasn't seen before, or atleast not with the same constraints or not with the same custom properties (like sort this array of strings alphabetically and add an argument to select whether to treat uppercase and lowercase the same or have all lowercase letters come after all uppercase letters like ASCII code based sorting).
I agree. One place I've noticed Co-Pilot really shines is in Makefiles. If your targets are well named, and use wildcards, it's able to deduce the idea reasonably well. Overall, I've noticed these tools suffer the same issues as humans. It will only provide bulk prompts in atomic segments, but struggles to isolate sub-segments of these chunks to interleave a modification to existing code. Almost like it was trained to copy and paste from stack overflow...
@@derschutz4737 Because you are always confronted with non-standard tasks in programming. Most of the time there is already an existing system that you need to work on, which dictates constraints that you need to account for, and that's where the main time sink in implementing a big project is.
Excellent video. What I love about your presentation is the speed. You speak at a great speed that gets to the point, and gives us exactly the amount that we need without any fluff. So many times when I watch other people's videos they go slowly and I just have to speed up the video to 1.5 speed at least to get through it. This is my first video of your I have watched and I am really glad I did. Thanks.
from wikipedia : 170 is the largest integer for which its factorial can be stored in IEEE 754 double-precision floating-point format. This is probably why it is also the largest factorial that Google's built-in calculator will calculate, returning the answer as 170! = 7.25741562 × 10306.
The list is missing Codeium, which I use a lot and is pretty spot-on more often than not in my opinion. Wanted to see how it would rank against Copilot and similarly named CodiumAI
Second this. Absolutely love Codeium. It's free and good compared to Copilot that was behind queue wall, then pay wall, not sure about its current state but idc
5:55 It suggested "factorialize" because it's a verb-ized form of the word "factor," since you're writing a function. If you were writing some property, the noun form would be preferred.
Nice Video man. But you missed these AI Coding tools 🛠️. 1.Cursors IDE - Use whole codebase knowledge. 2.Codeium free- Free version of Codeium like Copilot. 3.Phind AI - Extension from Phind AI in VsCode.
Good Additions! Phind is, in my experience more precise, while Codeium often tends to trial and error and keeps looping its suggestions. Where Phind totally shines of all the plugins is fining configuration and settings for eg temux etc. It can save you hours of searching, and you don't have to read meaningless comments on stack* sites etc.
This comparison would have been more useful (perhaps) if the selection of tasks posed to each agent were non-trivial, novel, non-encapsulated tasks. Ie: something you couldn't just google "how do I code " type questions to find a definitive answer.
@@ConnerArdmanclassic naming dilemmas, the only way I was initially able to tell both of them apart initially is that one only did tests and the other one was trying to be copilot (it's actually good)
As many other comments have pointed out already, you used very standard functions every computer science student knows about (factorial, mergesort) for testing the AI assistants. The problem is, these functions are most likely occurring many times in all of the models' training data - and obviously computers can memorize and parrot smart-sounding answers / code. If you do a round 2, maybe you could consider tasks that are not really that hard at all, but a little off the beaten path, you can't just google the solution which also means it probably is not in the training data
Im only about 8 months into my psuedo coding journey (im in school for InfoSec, and programming takes a bit of a back seat as its not necessary to build a program, only to understand how said code functions so we can break it) but co pilot as been an absolute godsend. Not to code for me, but with lack of any kind of instructor to assist me, its sorta like a mini teacher. I dont let it do it for me, rather help me understand what i did wrong.
I wish I had ChatGPT in school! Sometimes grasping concepts or just needing that extra little bit of context goes a long way in biology. Now I use it for coding but it would've been just as useful back then.
instantly clicked the like button after hearing you jump into things immediately without an intro. I don't even know if the video is good yet but you deserve a like for that. *edit: yup the video's great lol. Subscribed
So the summary is, use ChatGPT to research your specific task case, Co-Pilot to build the framework, and then Codium to handle possible cases/refactoring of the code. That’s actually insane 😅
This was a nice comparison. Although the test cases were too simplistic, TBH. E.g., you could have asked to complete code to parse a snippet from a text file given within a preceeding comment. Stuff which isn't literally found in text books ...
codium ai made a mistake at the end of the video -it throws an error if the number is less or equal to 0 - after that it checks if the number equals 0 to return 1 the first condition must be changed to strictly under 0 Great video overall thanks for your effort
My company is probably going to go with tabnine (starting to test it now), mainly because their models can be deployed on-prem. Also because they're modular, one model per language, and they say they'll update them whenever new versions of languages / big frameworks come out. It's also supposed to have some degree of repo awareness (they told us they stuff the context with the content of files that seem related to the current one), and full repo awareness is on their roadmap (if it does work, it should be a big plus compared to generalist LLMs). That said, its reasoning capabilities are definitely not great right now.
We're also using tabnine and chatgpt. I like tabnine's ability to code ahead of me. It doesn't, usually, get in my way and is a helpful assistant. I like its speed or should I say rhythm.
Cody works really well in projects since its context comes from your entire project and not the single file you're in. It's also doing this locally if I remember correctly by creating it's models in PostgreSQL. I'm not sure if the others do this but 'How can I get XYZ from ABC model" has workers really well, or Create Test for this, will often bring in correct model relationships and factories in Laravel.
At the end, the error should say that the input needs to be a "nonnegative integer". You were correct in saying that 0 isn't positive; it's neither positive nor negative.
I absolutely love Copilot since it came out, and in my opinion the only AI that's worth paying for. With that said, CodiumAI looks absolutely amazing and useful. Testing is tedius and optimizing is tedius.
At 16:33 I would argue that your prompt "JSDoc comment" is why it took so long to get a response and why it wasn't what you were expecting. The prompt itself is too ambiguous. Would be interesting if you tried it again but provided more context in the prompt, like "JSDoc the factorial function" Edit: it even says in the reaponse that there's too little context in the prompt 😅
The day when I will be able to open Firefox repository or LibreOffice repository in my editor and having an AI assistant explaining and helping me learn about the repo, which saves me months of reading and looking. I'll say that AI assistant a thing. Right now, they just copy and paste code. I'm hoping for the big thing in the near future.🙏
For what it's worth: Zero is neither negative or positive for the case of integers, but can be negative or positive if it is floating point. From Bing Copilot: Caution: Representations allowing negative zero can lead to errors if developers overlook the differences between +0 and −0 in certain operations.
in a way this comparison reminds me of benchmarks you feed it some example tasks but what i never see in videos like this is the MEAT Like for example, write me some code that aggregates data from x, y, z web API sources and combine them in an application for accounting, and by accounting i mean the client has some god damn custom requirements, also add the deployment pan, the human readable guide docs, have meetings with the managers, etc. The actual part of working. Yes I understand developers only want to sit at their PC and code but at least 30% of your time is not spent working
tabnine deserved better. i find it works well for code completion. i use a simple vscode-openai with custom ai key to have a chat, and i actually like the disconnect between code and chat.
copilot falls by a cathegory when you actually use it on exhaustive in house projects. Sure it know well how to do a factorialize function but with custom code you cannot do much. It works well if it can find something similar on the internet.
in my experience (and I code primarily in R) chatGPT 3.5 gives code that doesn't really work, straight up makes up packages/functions (phantom memories) to solve the issue, and in general is only able to help me 1 out of 10 times
I have the same experience with .NET/C#. It can copy-paste a solution to a trivial problem which has already been solved on StackOverflow 100 times but once you ask it to write code which cannot be copy-pasted from official docs/coding forums, it just makes stuff up (non-existent interfaces/methods/classes) in an infinite loop of "I''m sorry. You're write. Here's the correct solution to this problem". In a lot of ways, that's actually much more time consuming than simply looking for an answer on SO, and if it's not there, debugging/inspecting runtime state/digging through decompiled sources for a poorly documented library by yourself. This might actually change for the better, once ChatGPT can run/debug/modify programs by itself. But for now it's not particularly useful for anything more complex than a throwaway course project you work on for a few days and then forget about.
Test cases... fine... they work. What happens when you throw a full size project at it and ask it to do a task more complex. Like getting data from UI and creating a model object and send it to the save / upload api? Or find a logic error? My main problem with my own spaghetti code is that it's been harder to follow the logic and all the async events. I think for that we need an AI that could follow an execution thread. Or an AI to optimize memory usage. Every software out there seams to be bloated to the max. I made it my mission never to open a 'node_modules' folder again.
You should also try Gemini (the new bard with better models) and google's specific gemini-based code helper (their rival to copilot) called Code Assist
maybe i didn't see it, but you forgot to give sourcegraph cody a context. in chat if u wanna search in multiple files or add an context use @ sign.. it works reallz great
This video was great! Just starting to look at these programs myself. Was looking especially at Cody since perplexity ai kept saying it can't access a repository. Will keep an eye on them as it doesn't look like they are there yet but is the direction they want to go in.
LLMs learn by scouring the web and then parroting what humans have already said. No one is going to hire a programmer to write a factorial function. Try a novel example that hasn't already been plastered across the web zillions of times.
I hear you on the chatgpt thing. You can tell it to cite its sources in its response and it will do it. Sometimes itll reply saying it did not get it from a source and it cant find a source. If it’s something super important then you may not want to use it if there is no source.
the BIGGEST coding challenge that you didn't do and compare is... how well these ai's know how to implement api's and libraries. chatgpt can generate opengl code of a simple hello world on command, does any other assistant ai do that?
I'm still flabbergasted by the fact that people aren't using OpenAI Assistants to read their whole code base (remember, 128k context size) and then assist them with further development. Like, this is exactly what I'm doing, and it works great. I have a project in react, I give the assistant access to it, ask it to analyze every meaningful file and boom, done. I can then ask it stuff like "refactor feature X", or "Why doesn't Y work?"
Whoa wait. How are you doing this? My experience with the regular chatbot and coding a whole project has been a hot mess and not working at all but I haven't messed around with custom gpts yet. Is that what you mean by openai assistants or do you mean something else like the playground with paid api responses? do you give the whole codebase as a single file or as multiple files?
@@aluminiumsandworm Yep, although I wouldn't be too worried about OpenAI directly leaking your code. In the worst case, they may use it for training, which is bad, yeah, but not the same as fully leaking it
I've been playing with the JetBrains AI (PyCharm) and found it to be quite useful for boilerplate, or those times you're too lazy/busy to manually generate a small utility. Liked the way you can ask successive questions/prompts and get the code enhanced.
@@frankroquemore4946I wouldn't recommend an absolute beginner to use AI tools since they can give you wrong and misleading answers. At least, code for at least 6 months some projects to become comfortable with the language you're using. By the way, by plugins, I think he meant chatgpt plugins not vs code plugins
@@unknownguywholovespizzamaybe they could learn from trying to troubleshoot the bugs. Lol Seriously though, I've done testing where it removed a function to fix a bug. Then another bug happens and it readds the removed function. And it repeats until I tell it where to focus. So yeah, I applaud your advice.
@@josholin31 you see that's the problem. They're seriously very overhyped. While they're useful sometimes, but most of the time they're just idiots. Companies lie to us constantly just so they can make money and laugh at people who think "AI" is truly AI. Maybe it's just me but I always notice these chatbots are getting dumber everyday.
You didn't even touch Phind, Deep Seek, Tabby which is as per with copilot, even surpasses it for me in some cases during some bash scripting which surprises me too. And for cody it doesn't have any idea if you ask it randomly, you have to register the repo either from local disk or from github with the cody app to give it access to your code files then it works like magic. Out of a blue opening a folder and a file in vscode and asking the extension it will fail every time because it doesn't perform the scanning. Only file discovery scanning that it does for any open folder will fallback to node modules like yours.
i can build something like copilot inserts, in fact i did similar. and for my own stuff i have hundreds of ideas on how to make it better. problem: if you want it smarter, it will cost more. because all the mechanisms to make it smarter cost tokens and time. but models rapidly get cheaper, better and faster. so in a year from now, you have a mind blowingly smart assistant
Codepilot not throwing an error is actually superior. Exceptions are expensive so properly handling error scenarios or error response is preferable (OPINION).
JS' Number.MAX_SAFE_INTEGER (before it might start losing precision) is like 9e15 (9 quadrillion). 170! is... a bit bigger, ~7e306. The max_safe is somewhere betwen 18! and 19!
@@ConnerArdman Well, the 306 part isn't quite so random. The 7e306 is close to 2e308, which is approximately the maximal 64-bit float value. If it was 171!, then that would be larger than the maximal value for 64-bit floats.
I was sent here by Matt Wolfe. Thank you for this great overview! Your precise comments per assistant help me come to a decision which assistant to try out in the future :-)
Im just wondering why Tabnine is below Github copilot. A more compact code (given by copilot) is not necessarily better, by the otherside, its worst in most cases, because it difficults the maintaince for developers later. The example of merge sort shows that. Less code > more compact > less explanations > harder maintaince. Im not saying tabnine is better than copilot, just saying that you forget to talk about that subject, and for me at least, an AI that writes up more explanations in code (and with less prompts) is the one that i would use.
The basic functions like pulling in a mergesort, or a factorial function doesn't really save the programmer any time. It's the difference between 10 seconds and 30 seconds to run over to a stackoverflow post or personal repository to grab one. The extra constraints are what will break it and cause real time to develop - it's that time reduction that makes these tools worth it.
I use Phind, I code in python and asked it to transform to js (wich I'm learning) or to help me correct my code, I really like the explanation it gaves me and "talking" with my code on vscode directly is a plus //it has a free plan wich is the one I'm using
phind is good but for me it too often confuses languages. just a few days ago, I asked it something about calculations with dates and random data in ABAP and it quoted sources about python… and if I can believe what it says on the website, it was using gpt4
It would be better if you tried Codeium instead of Codium. Despite the similar name, they are completely different assistants in terms of quality. Codeium is no worse than the paid Copilot, but it's free.
The fastest way to see if your AI code assistant is worth your time is asking it to write an example driver for a specific framework and a specific platform. They will invariably fail all 3.
great video, very informative, and I especially love how you get straight to the point, and kept it engaging throughout the whole video :) however, there's this boxy sound from your microphone that makes me dizzy, and I used chatgpt to help me explain how to fix it lol: 1. Cut 250Hz Range: Start by applying a parametric EQ and reduce the 250Hz frequency range by about 4dB. This should help remove the boxiness. 2. Boost 3.5kHz Range: Next, boost the 3.5kHz frequency range by about 3dB. This should add more clarity. 3. Adjust Resonance: Play with the resonance (Q, bandwidth) settings to achieve a smoother curve between the 250Hz and 3.5kHz points on your EQ visualizer. Aim for a gradual transition, creating an almost straight line through the 1kHz area. 4. Fine-Tune the 1kHz Area: Slowly cut a notch in the 1kHz area to refine the sound and make it sound more natural. This can help reduce any unwanted "wispy-ness" (very technical term). Optional: Boost your volume by about 1-2dB if needed.
Thanks! And yeah, the issue isn’t the mic or the EQ. The audio was peaking when I recorded, and I did my best to save it in post because I _really_ didn’t want to have to rerecord this one 😅
22:36 - What’s interesting is that I think most compilers that perform tail recursion optimization do the same thing (i.e. simply convert them into a loop).
Claude Opus smokes Chat GPT4o right now for anyone that is still looking at this. The web app for Opus has a 200K context window vs 32K for Chat GPT4o, and it just gives better code too. Claude was able to help me enable high resolution timers on a Giga R1 correctly and Chat GPT couldn't even start without putting itself into a loop. Try any real project code and youll see what i mean.
The merge function probably isn't written automatically as it could be expected by you that it should be written above the mergeSort function. So it just waits for you to place the cursor before it suggests
Guys wants the agent to both write code and good test cases. It would be enough if a basic test harness was generated, and the test-cases could then be manually updated. I would be extremely with that result alone.
claude 3.5 is i think a tat better then openAI, Personally i wonder are there any local LLM's good for this (who were trained on basic english only though mostly on code, perhaps even for specific languages).
Factorial of a negative number is not NaN, it's undefined, or throw. And that "factorialize" smashes the stack before throwing unhelpful RangeError on most non-number input (not " +004e+00" though). num > 170 => Infinity. num > 21 => loss of precision. ChadG did better.
Hey Conner - Do you still think that Cody stands on Acceptable level? They kept on working since the release of this video of yours and recently having the ability to use Claude Sonnet I think it goes top of the list, What are your thoughts? :)
A key difference between bard and chatgpt is bard is a better teacher. Chatgpts info is a touch better but it struggles to help u learn using hints and guiding and usually gives the full answer to you no matter have many times u tell it not too. Bard is really good with giving hints and directions and will listen to u if you tell it not to provide snippets.
Test cases should be generated so, that the code will enter into as much condition branches as possible to actually cover the code, not just some random numbers thrown at the method.
5:30 ive been using gpt to practice coding and i realize an llm just gives you the most common string of words that would follow your prompt but dam man sometimes it just needs to tell you "i dont know" or "thats beyond my abilities" something besides just confidently giving me wrong information.
Correction on Sourcegraph Cody:
I wasn't fully using this one correctly. You can download an additional app and link it to your GitHub repository, and it does a _much_ better job of understanding the repo. I'm not sure why the UX doesn't prompt you to do this from the start, but it does fix some of the problems in the video such as over-indexing on node modules. However with all that said, I honestly still wasn't able to get it to perform many tasks that were particularly useful, and the larger the repo I gave it, the less reliable the responses seemed to become. It could definitely be worth a shot in your projects to see if it is useful to you though! 😃
Also for the generating code comments, Cody has a built in commend to generate docs which should do a much better job :)
you didn't try Codeium though. it's different from CodiumAI
What is the "additional app"?
@@ivanolmo4109 There's a little icon in the bottom of the Cody chat that links to a download of a desktop app that asks you to link the repo, and it does a better job of indexing the code if you do.
@@ConnerArdman It's already different now. I did see a prompt about this after I installed Cody but was confused as to why it was telling me about a desktop app. I moved on and now I don't see any reference or links to the desktop app. There own website doesn't have any mention or link to it either. I also asked Cody to assist me with downloading it and it gave me a dead link.
FFS
UPDATE: ooohhhh... the desktop app isn't available for windows...... SILLY ME
I like that the video starts by jumping right into reviewing the extensions instead of a long intro that ends with a sponsored segment. You get points for that from me
I really despise that kind of CC's behavior.
Nothing wrong with that. Content creators need to make a living too. In the end, you're getting that content for free.
@@Homiloko2 That argument goes out the window when you can't block the ads.
@@FaastTex Well, that's a fair argument. My ads are still blocking fine so I even forget they exist
I genuinely thought something was wrong at first because I'm so used to the 1 minute intro for EVERY SINGLE VIDEO regardless of subject matter
The examples he uses (factorial, mergesort) are trivial with simple, well known solutions implemented in almost all languages. No one writes their own mergesort or factorial function.
I'd like to see a comparison of these various tools using something more realistic. For example how well would they implement envelope encryption using various cloud providers? I hope they would produce code that is at least a good starting point for envelope encryption. If I get some time maybe I'll do my own trial using these various tools.
What would be some other realistic examples to test these tools with?
bump
These are all terrible at real-world coding. Your best bet is probably GPT-4 since it is much better than the others at having a back and forth conversation. You can kind of reason it into giving you correct code or potential errors sometimes. Still not good though.
imho coding assistants may be more useful working on trivial yet time consuming coding tasks like generating html from a visual description or refactoring existing code. if your prompts end up longer than the assistants code output you're probably not using your assistant well
Personally, I use copilot for work and I never rely on it for new implementations. I only use it as a refactoring/convenience tool. Works great for that. Can speed up my workflow by several orders of magnitude. I might use chat-GPT as a starting point for something I have not done before -- as a smarter google search -- but I wouldn't trust it either. AI is there to augment, not replace, IMO.
@@renynzea - That sounds good, but speeding up your work by two orders of magnitude means you would complete an entire day of coding in 5 minutes.
Don't get carried away.
I literally haven’t seen your channel before but I gotta comment about how much I appreciate that your video just starts. There isn’t 3-5 minutes of just bs when the topic is literally explained in the title. THANK YOU!
like a bad anime lol
We need a comparison with more non-standard functions/customized requests.
Factorial is too likely to be an example in the training data of the language models.
To determine these tools' capabilities and usefulness for actual work we'd need some tests to get code that solves a problem the tool most likely hasn't seen before, or atleast not with the same constraints or not with the same custom properties (like sort this array of strings alphabetically and add an argument to select whether to treat uppercase and lowercase the same or have all lowercase letters come after all uppercase letters like ASCII code based sorting).
I agree. One place I've noticed Co-Pilot really shines is in Makefiles. If your targets are well named, and use wildcards, it's able to deduce the idea reasonably well. Overall, I've noticed these tools suffer the same issues as humans. It will only provide bulk prompts in atomic segments, but struggles to isolate sub-segments of these chunks to interleave a modification to existing code. Almost like it was trained to copy and paste from stack overflow...
exactly, that problem was in the domain of llama with 7b of parameters.
why are u trying to use these tools for tasks that aren't common in the training data?
The whole point is to get a good aswer that is not in the training data. The zero shot capabilities.@@derschutz4737
@@derschutz4737 Because you are always confronted with non-standard tasks in programming. Most of the time there is already an existing system that you need to work on, which dictates constraints that you need to account for, and that's where the main time sink in implementing a big project is.
Excellent video. What I love about your presentation is the speed. You speak at a great speed that gets to the point, and gives us exactly the amount that we need without any fluff. So many times when I watch other people's videos they go slowly and I just have to speed up the video to 1.5 speed at least to get through it. This is my first video of your I have watched and I am really glad I did. Thanks.
Thanks, glad you liked it!
from wikipedia : 170 is the largest integer for which its factorial can be stored in IEEE 754 double-precision floating-point format. This is probably why it is also the largest factorial that Google's built-in calculator will calculate, returning the answer as 170! = 7.25741562 × 10306.
170! = 7.25741562 * 10^306
The list is missing Codeium, which I use a lot and is pretty spot-on more often than not in my opinion. Wanted to see how it would rank against Copilot and similarly named CodiumAI
Second this. Absolutely love Codeium. It's free and good compared to Copilot that was behind queue wall, then pay wall, not sure about its current state but idc
Same
Codium is great
Isn't Codium in the video? 20:40
@@BrianSamThomas Codium is. Codeium isn't
Also, Cursor IDE and their coding assistant could be mentioned. It works pretty well for me :)
Cursor IDE should be added, I agree.
Totaly agree, using ChatGPT and bard out of the box isnt fair compared to the others, should absolutely included Cursor
5:55 It suggested "factorialize" because it's a verb-ized form of the word "factor," since you're writing a function. If you were writing some property, the noun form would be preferred.
But factorialize is a really good word
Nice Video man.
But you missed these AI Coding tools 🛠️.
1.Cursors IDE - Use whole codebase knowledge.
2.Codeium free- Free version of Codeium like Copilot.
3.Phind AI - Extension from Phind AI in VsCode.
Good Additions!
Phind is, in my experience more precise, while Codeium often tends to trial and error and keeps looping its suggestions.
Where Phind totally shines of all the plugins is fining configuration and settings for eg temux etc. It can save you hours of searching, and you don't have to read meaningless comments on stack* sites etc.
what abt blackbox
Or IBM’s watsonx
This comparison would have been more useful (perhaps) if the selection of tasks posed to each agent were non-trivial, novel, non-encapsulated tasks. Ie: something you couldn't just google "how do I code " type questions to find a definitive answer.
there's also codeium to , really great they were even featured on a changelog podcast
These AI tools need better names lol, Codium and Codeium cannot both exist 😂
@@ConnerArdmanclassic naming dilemmas, the only way I was initially able to tell both of them apart initially is that one only did tests and the other one was trying to be copilot (it's actually good)
Codeium really doesn't get talked about enough... It's recent updates makes it beyond fire🔥🔥🔥
i uninstalled tabnine once codeium came out. Love it until it’s not free anymore 😂
IntelliJ and JetBrains among many others were built on top of Codeium ;)
As many other comments have pointed out already, you used very standard functions every computer science student knows about (factorial, mergesort) for testing the AI assistants. The problem is, these functions are most likely occurring many times in all of the models' training data - and obviously computers can memorize and parrot smart-sounding answers / code. If you do a round 2, maybe you could consider tasks that are not really that hard at all, but a little off the beaten path, you can't just google the solution which also means it probably is not in the training data
Im only about 8 months into my psuedo coding journey (im in school for InfoSec, and programming takes a bit of a back seat as its not necessary to build a program, only to understand how said code functions so we can break it) but co pilot as been an absolute godsend. Not to code for me, but with lack of any kind of instructor to assist me, its sorta like a mini teacher. I dont let it do it for me, rather help me understand what i did wrong.
I wish I had ChatGPT in school! Sometimes grasping concepts or just needing that extra little bit of context goes a long way in biology. Now I use it for coding but it would've been just as useful back then.
instantly clicked the like button after hearing you jump into things immediately without an intro. I don't even know if the video is good yet but you deserve a like for that.
*edit: yup the video's great lol. Subscribed
Now you need to test JetBrains AI coding assistant and Google Duet AI coding assistant (both released since this video).
Jetbrains ai is using chatgpt behind the scenes and they hide that they use chatgpt, so there is no jetbrains ai
@@Sp1tfire100The model they use is less interesting than how well it integrates with the IDE. The model can often be changed.
So the summary is, use ChatGPT to research your specific task case, Co-Pilot to build the framework, and then Codium to handle possible cases/refactoring of the code. That’s actually insane 😅
This was a nice comparison. Although the test cases were too simplistic, TBH. E.g., you could have asked to complete code to parse a snippet from a text file given within a preceeding comment. Stuff which isn't literally found in text books ...
codium ai made a mistake at the end of the video
-it throws an error if the number is less or equal to 0
- after that it checks if the number equals 0 to return 1
the first condition must be changed to strictly under 0
Great video overall thanks for your effort
Yeah nice catch, I missed that while filming 😅 Although the test suite handles this properly, so it would at least catch its own error in this case.
My company is probably going to go with tabnine (starting to test it now), mainly because their models can be deployed on-prem. Also because they're modular, one model per language, and they say they'll update them whenever new versions of languages / big frameworks come out.
It's also supposed to have some degree of repo awareness (they told us they stuff the context with the content of files that seem related to the current one), and full repo awareness is on their roadmap (if it does work, it should be a big plus compared to generalist LLMs).
That said, its reasoning capabilities are definitely not great right now.
How much does it cost? I'm playing with Github Copilot. It is pretty good at fixing bugs and explaining what the code does, etc.
We're also using tabnine and chatgpt. I like tabnine's ability to code ahead of me. It doesn't, usually, get in my way and is a helpful assistant. I like its speed or should I say rhythm.
@@dak2009 nice
so, do your company using tabnine now?
You absolutely need to try Cursor.
Cody works really well in projects since its context comes from your entire project and not the single file you're in. It's also doing this locally if I remember correctly by creating it's models in PostgreSQL.
I'm not sure if the others do this but 'How can I get XYZ from ABC model" has workers really well, or Create Test for this, will often bring in correct model relationships and factories in Laravel.
lol, Just read your pinned comment 😬
The fact that you just start the video with no intro is amazing and makes me feel like you value my time. Thank you, and keep it up!
At the end, the error should say that the input needs to be a "nonnegative integer". You were correct in saying that 0 isn't positive; it's neither positive nor negative.
I absolutely love Copilot since it came out, and in my opinion the only AI that's worth paying for.
With that said, CodiumAI looks absolutely amazing and useful. Testing is tedius and optimizing is tedius.
If you are student, you can have it for free.
Commenting on how the video immediately starts is amazing
At 16:33 I would argue that your prompt "JSDoc comment" is why it took so long to get a response and why it wasn't what you were expecting. The prompt itself is too ambiguous. Would be interesting if you tried it again but provided more context in the prompt, like "JSDoc the factorial function"
Edit: it even says in the reaponse that there's too little context in the prompt 😅
The terminology "non-negative" if you want greater or equal to zero. (Zero is neither negative nor positive. Zero is even and not odd.)
The day when I will be able to open Firefox repository or LibreOffice repository in my editor and having an AI assistant explaining and helping me learn about the repo, which saves me months of reading and looking. I'll say that AI assistant a thing. Right now, they just copy and paste code. I'm hoping for the big thing in the near future.🙏
For what it's worth: Zero is neither negative or positive for the case of integers, but can be negative or positive if it is floating point. From Bing Copilot: Caution: Representations allowing negative zero can lead to errors if developers overlook the differences between +0 and −0 in certain operations.
Bro went straight to the point, I was literally not ready 💀
Super interresting comparison. Thanks Conner for this actual topic 🙏
in a way this comparison reminds me of benchmarks
you feed it some example tasks but what i never see in videos like this is the MEAT
Like for example, write me some code that aggregates data from x, y, z web API sources and combine them in an application for accounting, and by accounting i mean the client has some god damn custom requirements, also add the deployment pan, the human readable guide docs, have meetings with the managers, etc. The actual part of working.
Yes I understand developers only want to sit at their PC and code but at least 30% of your time is not spent working
I love how this is so straightforward, thank you
tabnine deserved better. i find it works well for code completion. i use a simple vscode-openai with custom ai key to have a chat, and i actually like the disconnect between code and chat.
so u chat with a bot?
copilot falls by a cathegory when you actually use it on exhaustive in house projects. Sure it know well how to do a factorialize function but with custom code you cannot do much. It works well if it can find something similar on the internet.
in my experience (and I code primarily in R) chatGPT 3.5 gives code that doesn't really work, straight up makes up packages/functions (phantom memories) to solve the issue, and in general is only able to help me 1 out of 10 times
I have the same experience with .NET/C#. It can copy-paste a solution to a trivial problem which has already been solved on StackOverflow 100 times but once you ask it to write code which cannot be copy-pasted from official docs/coding forums, it just makes stuff up (non-existent interfaces/methods/classes) in an infinite loop of "I''m sorry. You're write. Here's the correct solution to this problem". In a lot of ways, that's actually much more time consuming than simply looking for an answer on SO, and if it's not there, debugging/inspecting runtime state/digging through decompiled sources for a poorly documented library by yourself.
This might actually change for the better, once ChatGPT can run/debug/modify programs by itself. But for now it's not particularly useful for anything more complex than a throwaway course project you work on for a few days and then forget about.
Preach brother
Test cases... fine... they work. What happens when you throw a full size project at it and ask it to do a task more complex. Like getting data from UI and creating a model object and send it to the save / upload api? Or find a logic error?
My main problem with my own spaghetti code is that it's been harder to follow the logic and all the async events. I think for that we need an AI that could follow an execution thread. Or an AI to optimize memory usage. Every software out there seams to be bloated to the max. I made it my mission never to open a 'node_modules' folder again.
Bard has the advantage of being able to access the internet. You can give him an API documentation and he'll make you a wrapper
Hello. Really good coverage on the AI coding tools. Can you try the AI assistant of JetBrains please?
Best coding AI review I have seen. Really wish you had included the one I use, Codeium (different than CodiumAI).
You should also try Gemini (the new bard with better models) and google's specific gemini-based code helper (their rival to copilot) called Code Assist
maybe i didn't see it, but you forgot to give sourcegraph cody a context. in chat if u wanna search in multiple files or add an context use @ sign.. it works reallz great
cody is the best AI coding slave tbh
This video was great! Just starting to look at these programs myself.
Was looking especially at Cody since perplexity ai kept saying it can't access a repository. Will keep an eye on them as it doesn't look like they are there yet but is the direction they want to go in.
LLMs learn by scouring the web and then parroting what humans have already said. No one is going to hire a programmer to write a factorial function. Try a novel example that hasn't already been plastered across the web zillions of times.
I hear you on the chatgpt thing. You can tell it to cite its sources in its response and it will do it. Sometimes itll reply saying it did not get it from a source and it cant find a source. If it’s something super important then you may not want to use it if there is no source.
Specify .gov and .edu it will find files it shouldnt i discovered that one while writing my history paper on ww2
the BIGGEST coding challenge that you didn't do and compare is... how well these ai's know how to implement api's and libraries. chatgpt can generate opengl code of a simple hello world on command, does any other assistant ai do that?
I'm still flabbergasted by the fact that people aren't using OpenAI Assistants to read their whole code base (remember, 128k context size) and then assist them with further development. Like, this is exactly what I'm doing, and it works great. I have a project in react, I give the assistant access to it, ask it to analyze every meaningful file and boom, done. I can then ask it stuff like "refactor feature X", or "Why doesn't Y work?"
Why cant you use Cursor IDE with whole codebase
if your codebase is proprietary that may not be an option
Whoa wait. How are you doing this? My experience with the regular chatbot and coding a whole project has been a hot mess and not working at all but I haven't messed around with custom gpts yet.
Is that what you mean by openai assistants or do you mean something else like the playground with paid api responses?
do you give the whole codebase as a single file or as multiple files?
@@HaseebHeaven Good point, I didn't know about it and am currently testing it
@@aluminiumsandworm Yep, although I wouldn't be too worried about OpenAI directly leaking your code. In the worst case, they may use it for training, which is bad, yeah, but not the same as fully leaking it
I am using Assistant AI that is in preview for JetBrains editors, and it looks to be similar to ChatGPT but it is built-in to the editor.
I've been playing with the JetBrains AI (PyCharm) and found it to be quite useful for boilerplate, or those times you're too lazy/busy to manually generate a small utility. Liked the way you can ask successive questions/prompts and get the code enhanced.
Is it just me or is the JetBrains Assistant kinda sassy when it corrects you?
My tests playing with Sourcegraph Cody, there is a step you need to do that creates embeddings for a given repo before asking things about said repo.
Yeah checked the pinned comment 👍
We're waiting for the updated version, Conner.
GPT4 with plugins is by far the best - good video nonetheless
Which plugins? I’m a super beginner in code and I’m really curious what would be most helpful.
@@frankroquemore4946I wouldn't recommend an absolute beginner to use AI tools since they can give you wrong and misleading answers. At least, code for at least 6 months some projects to become comfortable with the language you're using.
By the way, by plugins, I think he meant chatgpt plugins not vs code plugins
@@unknownguywholovespizzamaybe they could learn from trying to troubleshoot the bugs. Lol
Seriously though, I've done testing where it removed a function to fix a bug. Then another bug happens and it readds the removed function. And it repeats until I tell it where to focus. So yeah, I applaud your advice.
@@josholin31 you see that's the problem. They're seriously very overhyped. While they're useful sometimes, but most of the time they're just idiots. Companies lie to us constantly just so they can make money and laugh at people who think "AI" is truly AI.
Maybe it's just me but I always notice these chatbots are getting dumber everyday.
You didn't even touch Phind, Deep Seek, Tabby which is as per with copilot, even surpasses it for me in some cases during some bash scripting which surprises me too. And for cody it doesn't have any idea if you ask it randomly, you have to register the repo either from local disk or from github with the cody app to give it access to your code files then it works like magic. Out of a blue opening a folder and a file in vscode and asking the extension it will fail every time because it doesn't perform the scanning. Only file discovery scanning that it does for any open folder will fallback to node modules like yours.
Pretty surprised they didn't know about Phind tbh
For more complicated functions you need more detailed prompts. At the end of the day, it just saves you some typing time given enough context.
You have a new AI to test - JetBrains just announced their AI coding assistant for all of their IDEs.
What a heavenly delight to start right out with relevant stuff. I didnt notice this annoyed me before. I hope this will become a thing
+1 for directly diving into comparison, -1 for using stupid mergesort / factorial instead of practical example
Actually O(n^2) factorial is kinda correct, since the intermediate results get bigger and bigger, and require more and more time to calculate.
i can build something like copilot inserts, in fact i did similar. and for my own stuff i have hundreds of ideas on how to make it better. problem: if you want it smarter, it will cost more. because all the mechanisms to make it smarter cost tokens and time. but models rapidly get cheaper, better and faster. so in a year from now, you have a mind blowingly smart assistant
Appreciate the effort, everything works perfectly. Keep up the good work!
In the first factorial example, another thing you and AI missed is that the 'n
Codepilot not throwing an error is actually superior. Exceptions are expensive so properly handling error scenarios or error response is preferable (OPINION).
The "level 7 kyu" comment haha
Literally training on my codewars solutions
notice how AWS Code Whisperer produces the exact same code as Tabnine
JS' Number.MAX_SAFE_INTEGER (before it might start losing precision) is like 9e15 (9 quadrillion). 170! is... a bit bigger, ~7e306. The max_safe is somewhere betwen 18! and 19!
Nice, yeah thinking about this now 170! would be an absurdly large number. That's a classic AI making stuff up moment haha
@@ConnerArdman Well, the 306 part isn't quite so random. The 7e306 is close to 2e308, which is approximately the maximal 64-bit float value. If it was 171!, then that would be larger than the maximal value for 64-bit floats.
I was sent here by Matt Wolfe. Thank you for this great overview! Your precise comments per assistant help me come to a decision which assistant to try out in the future :-)
Awesome, glad you found it helpful!
Im just wondering why Tabnine is below Github copilot. A more compact code (given by copilot) is not necessarily better, by the otherside, its worst in most cases, because it difficults the maintaince for developers later. The example of merge sort shows that. Less code > more compact > less explanations > harder maintaince. Im not saying tabnine is better than copilot, just saying that you forget to talk about that subject, and for me at least, an AI that writes up more explanations in code (and with less prompts) is the one that i would use.
The basic functions like pulling in a mergesort, or a factorial function doesn't really save the programmer any time. It's the difference between 10 seconds and 30 seconds to run over to a stackoverflow post or personal repository to grab one. The extra constraints are what will break it and cause real time to develop - it's that time reduction that makes these tools worth it.
Not to mention, then you have to still proofread the ai's output just in case, so you really don't save time there more than likely.
I use Phind, I code in python and asked it to transform to js (wich I'm learning) or to help me correct my code, I really like the explanation it gaves me and "talking" with my code on vscode directly is a plus //it has a free plan wich is the one I'm using
phind is good but for me it too often confuses languages. just a few days ago, I asked it something about calculations with dates and random data in ABAP and it quoted sources about python… and if I can believe what it says on the website, it was using gpt4
CodiumAI fix suggestion:
num
24:27 - copilot and CodiumAi are the ones you want
great video. Would love to see your evaluation of Replit Ghostwriter
It would be better if you tried Codeium instead of Codium. Despite the similar name, they are completely different assistants in terms of quality. Codeium is no worse than the paid Copilot, but it's free.
The fastest way to see if your AI code assistant is worth your time is asking it to write an example driver for a specific framework and a specific platform. They will invariably fail all 3.
At the pace this field is progressing you could update this video every month 😊
great video, very informative, and I especially love how you get straight to the point, and kept it engaging throughout the whole video :)
however, there's this boxy sound from your microphone that makes me dizzy, and I used chatgpt to help me explain how to fix it lol:
1. Cut 250Hz Range: Start by applying a parametric EQ and reduce the 250Hz frequency range by about 4dB. This should help remove the boxiness.
2. Boost 3.5kHz Range: Next, boost the 3.5kHz frequency range by about 3dB. This should add more clarity.
3. Adjust Resonance: Play with the resonance (Q, bandwidth) settings to achieve a smoother curve between the 250Hz and 3.5kHz points on your EQ visualizer. Aim for a gradual transition, creating an almost straight line through the 1kHz area.
4. Fine-Tune the 1kHz Area: Slowly cut a notch in the 1kHz area to refine the sound and make it sound more natural. This can help reduce any unwanted "wispy-ness" (very technical term).
Optional: Boost your volume by about 1-2dB if needed.
Thanks! And yeah, the issue isn’t the mic or the EQ. The audio was peaking when I recorded, and I did my best to save it in post because I _really_ didn’t want to have to rerecord this one 😅
22:36 - What’s interesting is that I think most compilers that perform tail recursion optimization do the same thing (i.e. simply convert them into a loop).
This really helped me get started on a new project, and make everything just so much easier as a learning developer
right
Great video, right to the point. Only thing I'd have liked to see is a cost comparison. But still earned a sub. 👍
Claude Opus smokes Chat GPT4o right now for anyone that is still looking at this.
The web app for Opus has a 200K context window vs 32K for Chat GPT4o, and it just gives better code too.
Claude was able to help me enable high resolution timers on a Giga R1 correctly and Chat GPT couldn't even start without putting itself into a loop.
Try any real project code and youll see what i mean.
The merge function probably isn't written automatically as it could be expected by you that it should be written above the mergeSort function. So it just waits for you to place the cursor before it suggests
Guys wants the agent to both write code and good test cases. It would be enough if a basic test harness was generated, and the test-cases could then be manually updated. I would be extremely with that result alone.
The true 'Time is of the essence'... I like this one
for cody in the first question it suggest the
Calude also does a good job of writing code
Missing cursor which I think is a game changer, but I'll take a look here at these ones
You missed Codeium (not to be confused with Codium), which is the most widely used code completion engine currently in use.
claude 3.5 is i think a tat better then openAI, Personally i wonder are there any local LLM's good for this (who were trained on basic english only though mostly on code, perhaps even for specific languages).
Factorial of a negative number is not NaN, it's undefined, or throw. And that "factorialize" smashes the stack before throwing unhelpful RangeError on most non-number input (not " +004e+00" though). num > 170 => Infinity. num > 21 => loss of precision. ChadG did better.
Hey Conner - Do you still think that Cody stands on Acceptable level? They kept on working since the release of this video of yours and recently having the ability to use Claude Sonnet I think it goes top of the list, What are your thoughts? :)
Incredibly helpful video! Didn't even know about Codium AI, just started using it and it's performing really well in test generation 💪
Dude, factorial is a traditional thing. Are those good for a totally new stuff like Svelete, Qwik or stuff?
I love how DeepAI has a button for Replit, so you can go straight into say for example python and see it run...
I've also had chatGPT do the entire thing, code, generate files for input and output, results.txt file etc. Beautiful so far.
A key difference between bard and chatgpt is bard is a better teacher. Chatgpts info is a touch better but it struggles to help u learn using hints and guiding and usually gives the full answer to you no matter have many times u tell it not too. Bard is really good with giving hints and directions and will listen to u if you tell it not to provide snippets.
Test cases should be generated so, that the code will enter into as much condition branches as possible to actually cover the code, not just some random numbers thrown at the method.
Every intro to every TH-cam video needs to be like this 😅😅😅 I wasn’t even ready 😂
Amazing video!
As this is a fast-paced technology, curious to see how that comparison would be in 1/3/6 months time.
5:30 ive been using gpt to practice coding and i realize an llm just gives you the most common string of words that would follow your prompt but dam man sometimes it just needs to tell you
"i dont know" or "thats beyond my abilities" something besides just confidently giving me wrong information.