NEW STUDY Does Co-Development With AI Assistants Improve Code?

Continuous Delivery

มุมมอง 15 517

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 14 พ.ย. 2024

ความคิดเห็น • 172

@ContinuousDelivery วันที่ผ่านมา ⁺⁵
SIGN UP FOR AI STUDY: Does Co-Development With AI Assistants Improve Code ➡ form.typeform.com/to/PnVpuZGr?
@charliemopps4926 วันที่ผ่านมา ⁺³³
The problem is, Management doesn't want "Good code" they want solutions fast... they don't care if its good or not. AI is exactly what they've been looking for.
@traveller23e วันที่ผ่านมา ⁺²
fast code is what led the PR I'm reviewing to be stuck in the open stage for months as the dev responsible went from hastily patching one critical bug to the next after a poorly managed piece of development leading into a premature golive.
@CallousCoder วันที่ผ่านมา ⁺³
This is so true. I run into this at my banking customer all the time.
Just this morning 20 lines of OR statements in a SQL script all checking a table not to be null. So I wrote to make this more clever.
The manager was like: “that’ll take time we just copy the line and change it.” And I wonder where the pride of these developers has gone and how management can support even less than mediocrity.
@CallousCoder วันที่ผ่านมา
@@traveller23ejust commit it and run it shadow that’s the only way you’ll see if it is working all fine. When you have end to end tests you’ll be happy as you can already have a preliminary outcome but even then massive bug fixes you want to run in parallel for a healthy amount of time.
@alanmacmillan6957 วันที่ผ่านมา
and this is why we end up with problems like the post office scandal.
@CallousCoder วันที่ผ่านมา ⁺¹
@ that was more a legal fuck up.
Sure the software was bad but nobody in their right mind would suspect franchisees that did operate for 20 years to full satisfaction that they are fraudulent by not first suspecting the brand new software! And Two Tier Kier was the main prosecutor hmmmmm who’s really incapable and to blame here?
@adambickford8720 วันที่ผ่านมา ⁺³⁶
I've found it slows me down and traditional static analysis tools are still better. They are faster and don't hallucinate.
@giorgos-4515 วันที่ผ่านมา
I really wonder if LLMs get the static analysis tools outputs if they could do anything actually useful.
@puntoycoma47 วันที่ผ่านมา
New guy here, what is static analysis?
@traveller23e วันที่ผ่านมา ⁺⁴
@@puntoycoma47 Basically a clever algorithm in the IDE (either integrated or provided by a separate engine via some plugin system) that looks at the code and detects some set of errors and warnings. The limitation is that the algorithm is only as clever as the amount of effort put into creating it and can be limited by language design, so for example a lot of them have poor or nonexistant null-dereference checks, but they do tend to be good at finding obvious issues as well as (usually) figuring out whether or not the program will even compile.
@ErazerPT วันที่ผ่านมา ⁺¹
@@giorgos-4515 Not much a non-llm model couldn't do better. Give it the "bad code", give it the "good code" human corrected post static analysis. Train over LOTS of it. Now the model will correct your code as he learned to do. Bad part? Needs training data. And good training data, or it's GIGO. Might hallucinate here and there, nothing you can do about it, it shares that problem with us.
@purdysanchez วันที่ผ่านมา
I haven't messed with it much, but if you know what you are trying to do and clearly define the problem, it does OK in giving code suggestions in a narrow scope. The problem comes down to the developer. Some are inclined to just use it if runs without analyzing and revising the code.
@esra_erimez วันที่ผ่านมา ⁺⁶³
The short answer is: No. The longer answer is: No, it's not. I have used different AI tools to assist with development and it didn't work for me. Although, it was helpful to me in learning new topics and augmenting search engines.
Edit: Here is a case in point. There are some deprecated library functions I needed to replace. The AI tools made up library functions that simply didn't exist. I tried changing the prompt, I tried giving feedback but in all cases never got a solution and they tended to go back to recommending the deprecated library functions after a while.
@homomorphic วันที่ผ่านมา
Yeah, but I bet those APIs *should* exist.
@oysteinsoreide4323 วันที่ผ่านมา ⁺²
You can't use the code directly. But at least I use it as a tool to get ideas.
@sashbot9707 วันที่ผ่านมา ⁺⁵
I am much more productive with ai. And in my Company I can Show that I am around 2 Times faster than my peers with the same code quality.
@Paul-uy1ru วันที่ผ่านมา
Have you tried gpt 01 preview?
@purdysanchez วันที่ผ่านมา ⁺⁵
The biggest danger is that people use it to write code in technologies they are unfamiliar with instead of reading the documentation.
@seanwoods647 วันที่ผ่านมา ⁺⁷
If you need an experimental control, I've written an HTTP server from first principles in both Java an Tcl. I'm the author of the httpd module in Tcllib.
@danielt63 วันที่ผ่านมา ⁺¹⁴
Personally, I think it does a better job of doing an initial review of code rather than writing the code in the first place.
@seanwoods647 วันที่ผ่านมา ⁺¹³
Ok, so most of the "people" I see posting about how AI is an "improvement" seem to be throwing out a lot of emotional arguments. But no sign that they've actually written a single line of production code in their life. Speaking as a guy who has been writing code since the age of 10 (and I'm 50 now), every attempt I've had with AI code generation with an LLM has been a shitshow. Yes, it will cook up some plausible looking snippet. But it will just invent a fluffy little cloud of a function that does all of the heavy lifting off-screen. And when you try to track down what this mystery function is, it doesn't exist an any API.
When I ask an LLM to generate a specific solution, it will regurgitate a tangentially related example from Stack Overflow. And I know, because the problem I was trying to solve was in corner cases because 10 years ago I had written a library function built around that very example that I had cribbed from stack overflow.
It's a mimic. And the worst part of a mimic is that what it produces sounds perfectly plausible if you have no idea what you are doing.
And before you start calling me an AI hater, I pay the mortgage writing expert systems. I know what machine generated responses should look like. I also know that the proof is in the regression testing. And I also know that letting a machine off the leash is a guarantee you will be bitten on the ass. I'll be happy to show you my scars.
@plaidchuck วันที่ผ่านมา ⁺³
Any idea when companies may realize this and start hiring entry level people again?
@darylphuah วันที่ผ่านมา ⁺⁵
anyone saying AI is improving their work has outed themselves as mediocre devs.
@pawelhyzopski6456 21 ชั่วโมงที่ผ่านมา ⁺²
@@darylphuahim not top. Self taught only. And whenever i get answer from ai is always a generic. I better write generic stuff myself and learn something from it.
@ikusoru 21 ชั่วโมงที่ผ่านมา ⁺²
@@darylphuah This! 100% true
@mandisaw 16 ชั่วโมงที่ผ่านมา
@@plaidchuck As soon as the stock returns stop chasing the hype. Outside of Big Tech and startups though, plenty of companies are still hiring entry-level. Depends where you are and what your Edu & Exp looks like.
@AndrewBlucher วันที่ผ่านมา ⁺¹¹
Great project!
In the 1980s I did a minor thesis on Programmer Productivity. Many, many, tools and systems were marketed with productivity claims, but none of the vendors had test results to back those claims. At this time there were several research projects on code completion and code suggestion, and I had the pleasure of watching as these came to market with the real results we see today.
The big issue that I see with this so called AI code generation is, apart from the reproducibility issue Dave mentions, hallucinations. After all LLMs are chatbots. They are not reasoning about how best to solve the problem.
All the best with the project!
@traveller23e วันที่ผ่านมา ⁺⁶
Also even if they did always produce valid code it would be essentially equivalent to constantly taking the first result on stack overflow. It might _work_, but you're likely to be misusing libraries and leaking memory all over the place.
@davidmartensson273 15 ชั่วโมงที่ผ่านมา ⁺¹
@@traveller23e I have very little experience so far with using AI assisted code but the little I have I have treated the same way I do any code I find on the internet, as a suggestion and a source of ideas.
The code might be bad or plain wrong, but it might at least contain relevant methods, classes or libraries that I was not aware of, or a pattern for solving the problem I did not know.
With this knowledge I can search for more specific info on the topics and either validate the code, fix the code into something usable, or hopefully at least find some piece of help.
Even a solution that proves to be bad might be helpful as it might spur new ideas of my own.
Will AI be better than random searches on stackoverflow or google, I do not know, I expect that over time the results will improve, but I also expect that I will at least for the foreseeable future need to tweak most code, unless its trivial things where the main benefit is not having to write the boilerplate parts.
For the inline help in visual studio my main experience is bad, it very often suggests very wrong things to the point it actually reduce productivity since I as a reflex spend time trying to read and understand something completely out of place, or having to erase something it autocomplete, but thats mainly due to some complete stupidity that space should be the autocomplete key and that seems not be changeable :/
@traveller23e 13 ชั่วโมงที่ผ่านมา
@@davidmartensson273 VS's "TAB to insert xyz" feature is horrendous, you never know what TAB is going to do the next time you hit it.
@MikeOchtman วันที่ผ่านมา ⁺⁵
It generates bad code very quickly. You have to be able to code yourself, and use the ai to provide some of the boilerplate and tedious stuff. It does know some good algorithms, but it is not good at creating solutions to new problems.
@mktatyt 20 ชั่วโมงที่ผ่านมา ⁺⁹
I'm irritated by the fact that even very sophisticated software engineers overlook the reality that large language models are in fact deterministic. The variations arise from explicitly implemented randomness factors such as top-k, top-p, and temperature. If we want reproducible outputs, simply turn off these factors, and you'll receive the exact same text or code for the same prompt.
@xybersurfer 15 ชั่วโมงที่ผ่านมา ⁺¹
exactly what i was thinking. i'm guessing the randomness makes sure it doesn't get stuck on the same response, to hide it's limitations
@mktatyt 12 ชั่วโมงที่ผ่านมา
@@xybersurfer I think it's simply a UX decision.
@marcbotnope1728 วันที่ผ่านมา ⁺¹¹
What i have noticed is that the AI tools "kills" IntelliSense based is a code-completion. Replacing it with random guessed completions that very often is not an part of the interface for the object you are working on.
@matsim0 21 ชั่วโมงที่ผ่านมา ⁺²
Yes, right? Sometimes it's useful and saves you a trip to stack overflow or the docs, but often it gets in the way by suggesting code completions that make no sense while intellisense would have given you the right answer immediately.
@ianosgnatiuc 12 ชั่วโมงที่ผ่านมา
Because it lacks the context. It does a lot better when additional context, such as available types and methods, are included.
@KulaGGin วันที่ผ่านมา ⁺⁷
Signed up. Interesting. In all he years I've been watching YT since 2006 I haven't gotten to an interactive video like this(an actual challenge).
@ContinuousDelivery วันที่ผ่านมา
That’s awesome. I hope you find it something worthwhile and productive.
@matthewjamesbetts วันที่ผ่านมา ⁺³
Michael Feathers and Thoughtworks are doing and writing a lot of interesting things on this topic, including for example, TDD as a way to get code from AI that is tested and fits your design, and AI at many stages of software development, not just for coding.
@chrisnuk วันที่ผ่านมา ⁺⁵
It's radically reduced the barriers to entry. There will be so much code written over the next few years. At the moment, the people who need help with their VBA, Python, or SQL query have disappeared. In a year or two, there will be a mess to clear up. I think our jobs will evolve, but it always have.
@almazingsk8er วันที่ผ่านมา
I have worked with engineers where I ask them the same question in multiple instances, "What node version are you on?" and they ask CoPilot how to find that out each time. I know people building apps right now who don't have "node -v" memorized because they can just ask copilot whenever they need it. It's having a weird effect on how people learn to code. It's weird asking someone a question during pair programming, then watching them type it into a chatbot and then begin to answer the question after they skim the response it gave them.
@eaglethebot1354 วันที่ผ่านมา ⁺²
I use it for all the infrastructure stuff that I don’t feel like digging through docs to debug, such as CICD pipelines, infrastructure as code, networking, etc.
Google says AI can’t reason, but anyone who’s tried to have it write code can tell you that. But When you give a task that only has a few correct answers it does quite well.
@purdysanchez วันที่ผ่านมา
I think it's a good starting point. Say you don't use a technology often. Have AI offer some initial idea, and then it's easy to check the docs to make sure the AI is actually doing what you asked for.
@alrightsquinky7798 วันที่ผ่านมา ⁺³
I write Clojure code, so my code bases are too small to even merit using AI. I have my whole code bases memorized. It’s nice to have a complex web server implementation with only a few hundred lines of code.
@RiaanRoos 2 ชั่วโมงที่ผ่านมา
Over the last 24 months, I have been experimenting with different ways of pair programming with AI and think that although it is still limited in its real-world application, it is improving.
The latest release of OpenAI's canvas is a step in the right direction to help with the problem of "new code" being produced each time you prompt AI for help.
Plugins for 'co-pilots' also are getting better at alleviating this problem.
I have had the best success when starting with well-written unit tests and then letting Gen AI write just enough code to satisfy the tests!!!!
This mirrors my normal workflow the closest and constraints AI well enough that I do not experience the hallucinations commonly seen when just 'letting AI do the work'.
@AmiGanguli วันที่ผ่านมา ⁺¹³
AI is super useful. No, it doesn't deliver useful code of any real scale or complexity. But it can save a ton of time looking up API calls. That's what really takes the most time in programming nowadays. I can whip up a little algorithm to manipulate a tree structure or something like that in no time. It usually even works first time. But finding the right library function for my current requirement and figuring out what it needs in order to work properly sucks up my time. If ChatGPT knows the library, it can whip something up that probably doesn't do what I want, but likely uses the right API calls in mostly the right way. And figuring that out is 80% of my day.
Well, that and meetings. If I could send ChatGPT to meetings, that would be a real time-saver.
@purdysanchez วันที่ผ่านมา ⁺⁴
I guess if you're working with a license that allows you to feed it the entire documentation as context. If not, it regularly makes up fake function calls that don't exist in an API.
@harmless2u2 วันที่ผ่านมา
@@AmiGanguli 100% agree
@AmiGanguli 19 ชั่วโมงที่ผ่านมา
@@purdysanchez Hmm. I haven't had that happen. I have had cases where it uses old versions of the API that are no longer supported, or even mixes different versions. That's a pain, but it still give a good starting point most of the time. You at least know where to look in the docs for what you need.
@mandisaw 16 ชั่วโมงที่ผ่านมา
@@purdysanchez Even if the docs are public. Google AI Search wasted 10min of my precious side-project time chasing an API hallucination. Looking up the real API manually, and writing the code I needed took about 10-15min. Reading docs isn't hard - reasoning about what you need from them relative to your problem-domain is what takes the time, and LLMs can't meaningfully help with that.
@AnnCatsanndra วันที่ผ่านมา ⁺⁴
In my subjective experience, it depends but is generally more effective than the old "copypaste from stackoverflow" approach.
More useful than the in-IDE code assistants though are the LLM chatbots where I can ask specific _questions_ and then figure out what code I need from there and a little bit of verification for good measure. That and if I _really_ need to pump out some easy to do but tedious code, asking an LLM to intelligently fill that is faster than writing my own code generator for it.
But if we're specifically talking about the autocomplete copilot style, yeah, I stopped using most of them because they break up my train of thought more than they help. Usually when I'm *actually about to type in the editor* I've already decided on what I plan to implement, and having huge overbloated contextless code snippets pop up while I'm doing that is more annoying than helpful.
@mrpocock วันที่ผ่านมา ⁺⁷
I have had good experiences with ai code assistance. I probably count as a senior developer - been coding in one language or another since the 80s. You get code for free. It takes work and knowledge to get quality. Sometimes it is enough to let it make working code, and for that it is already there.
@HarleyPebley 21 ชั่วโมงที่ผ่านมา ⁺¹
Reminds me of the old joke: Ask 10 programmers for a solution to a problem and get 20 different answers.
For the new world: As 1 AI for a program as a solution to a problem and get 65535 answers.
:-D
@mattbristo6933 23 ชั่วโมงที่ผ่านมา ⁺²
AI is great when you need snippets of Html, JavaScript or want it to check some code to offer improvements. When I have asked it to write code, it doesn't know the problem domain and cannot reason about it and therefore the code is not always optimal.
@Danielm103 วันที่ผ่านมา ⁺¹
I see a lot of people trying to use AI to generate AutoLISP code for AutoCAD. The forums are littered with people asking for help, to fix AI generated code. I tried to make a AutoLISP generator where the user could type something like, create a cube 100x100,100. I would pass the info to AI and try to execute the results. Kind of neat, but didn’t work too well
@semsomify วันที่ผ่านมา
I personally use it to generate code snippets for certain specific tasks. For example, if you're going to use a new API, you can ask it to generate a code snippet that's close to your specific uses of the API and also ask it to explain the API. It also helps for some data processing tasks. Also debugging. So it saves a lot of time you'd otherwise have spent debugging, reading documentation. When there's code I am very familiar with, using it would actually slow me down. But to generate full projects, I think we're not there yet.
@xananacs 23 ชั่วโมงที่ผ่านมา ⁺²
AI is fairly good as an expensive and energy intensive snippet provider.
If the exercise is something that could be defined a starter template like a node http server, I would guess the study can't yield any interesting result. Of course AI can do that very fast, but so could a snippet collection.
The exercise has to be about making something new.
@mandisaw 16 ชั่วโมงที่ผ่านมา
A good real-world example would include incomplete or domain-aware specs. So much of the online training data is basically beginner web tutorials, or documentation samples, so it's probably useful for people who only handle those sorts of basic tasks. But production software usually involves a lot of decisions, even when the final code isn't all that novel.
@xananacs 10 ชั่วโมงที่ผ่านมา ⁺¹
@@mandisaw that's a good insight yea
@BaldurNorddahl 20 ชั่วโมงที่ผ่านมา
There is a tool called Aider that most developers don't know. This is a different way to use AI for code generation. I have found that for some tasks, I can use Aider to produce software much faster than would otherwise be possible. The trick it is to learn how to use it including when not to use it.
While Copilot is something that tries to help in the editor, Aider is like a team member, that you will ask to do a task. Give it an example of a feature and tell it to implementing a new feature. I might have a feature to create an object in a database and now I need to have a feature to update and a feature to delete - a few moments later I will have just that without coding a single thing myself.
@nickbarton3191 วันที่ผ่านมา ⁺²
Security worries me, I've signed a NDA, it didn't.
@mandisaw 16 ชั่วโมงที่ผ่านมา ⁺¹
Liability and risk-management as well. When a developer (or team) f*cks up, they can be trained, reprimanded, fired, or even sued. Can't reprimand the AI.
@donharrold1375 วันที่ผ่านมา
I’m not a professional programmer, but I do use coding to solve complex problems. I usually have a very clear idea of what I want to achieve before I start writing code, whether that’s a complex statistical analysis, machine learning or chemical engineering analysis. Thinking about the problem and developing an idea of how to solve it is much more important and time consuming than developing the syntax. In that respect, AI has been a game changer for me. Once I’ve set out the problem then code can be generated in a matter of minutes to solve it. That used to take quite a long time. So, AI yes it definitely helps people like me immensely.
@ikusoru 21 ชั่วโมงที่ผ่านมา ⁺¹
Let’s say that you use AI to crunch some numbers and analyze a dataset. How do you verify it is doing exactly what you want if you don’t fully understand the code syntac and therefore the program generated by the AI? (There must me at least 1% of faith you have in the generated code. Definitelly can't be 100% certain)
@donharrold1375 15 ชั่วโมงที่ผ่านมา
@@ikusoru I do work through the syntax and check that it makes sense. I also sense check the results; fundamentally I understand what the output should look like. My point is that I don't have to dogmatically type every line of code. Entering code takes a lot of time, particularly if you don't type quickly.
@purdysanchez วันที่ผ่านมา ⁺¹
I would say it's in the uncanny valley. If you glance at it quickly it looks like code
@matthewhussey1980 ชั่วโมงที่ผ่านมา
Hi, I've signed up but can't find a link to contact people for questions so asking here. If using AI for the study should we be using it as if it is leading or as an assistant?
For example, I have recently been coding using AI as if I'm an idiot to see how much it can do. This isn't how I would do it if trying to be productive, which is more a few questions but mostly using it as a speed up for intellisense.
@wonseoklee80 วันที่ผ่านมา
It’s improving in ways we’ve never seen before. While it’s not a direct replacement for human coders, it serves as a significant performance enhancer. Until AGI emerges (whether in the near or distant future), AI will play a crucial role as a productivity booster for all developers.
@ScottLahteine 3 ชั่วโมงที่ผ่านมา
I maintain a large and active open source project with a lot of C++ with meta-programming, and so far AI coding tools have been only marginally useful in the main codebase, mainly by providing better and more context-sensitive auto-completion while typing code. Where they have been more useful is in writing the support scripts, primarily in Python. These are not usually very complex, but the LLM still needs everything spelled out in advance in plain language to get the best result. It might be possible to get LLMs to produce better code by iterating more on smaller units using agents, allowing them to work from high level overview of the task down to the fine details in a tree-like iteration pattern, breadth-first I presume. This approach would also help deal with the relatively small context windows of current models. Add in a code review agent and even better things could happen….
@kkiimm009 วันที่ผ่านมา
I find AI extremely useful when I work with languages and frameworks I can read and understand but don't understand so good that I can effortlessly write it. It is also very good at converting stuff from say a sql table to a entity class and so on. I also recently fed it a description of an import text file that has fields on given column positions and lengts and so on in a text file, and it made that parser without me having to do that boring work myself. It is also very good when I dont remember how I did something and instead of googling after an example it usually give me the correct answer on the first try. And so much more.
And they are pretty good at not changing everything but only changing the place that needs fixing. If you already have som code and ask it to fix a problem then it dont give you a whole new code, it just fixes the spots that need fixing. Usually.
@russellormes776 23 ชั่วโมงที่ผ่านมา
You can use it to generate the code that gets the test to pass and then refactor to your heart's content.
@alanmacmillan6957 วันที่ผ่านมา
I've been using AI to generate code but it's heavily dependent on the definition you provide in the first instance, and the examples on the internet it uses to construct the "solution". It's useful but.... you have to have the personal knowledge to understand if the solution it's presented really does do what you actually want. Similarly I've seen it generate code that on initial inspection seems to do what it says on the tin but in subsequent testing it either isn't suitable or needs significant rework. What you also find is it brings a new type of technical debt:- i.e. you get a lot done more quickly {initially}.... but when it needs to be modified, enhanced, reworked to meet a new requirement or a bug found and fixed it can be hard unpicking something that you didn't write yourself and the learning curve you avoided first time round you have to overcome sooner or later. it's also diabolical when you need a solution to a more arcane and abstract problem. (e.g. legacy machine code or proprietary hardware)
@diamantberg วันที่ผ่านมา ⁺¹
"Java is to JavaScript as ham is to hamster." - Jeremy Keith
Nevertheless, I'm going to take the survey.
@ikusoru 21 ชั่วโมงที่ผ่านมา ⁺¹
I feel same about AI and the hype around it 😅: "AI is to Artificail Inteligence as ham is to hamster."
@PatrickMetzdorf ชั่วโมงที่ผ่านมา
This understanding of AI-assisted code may be a bit outdated, though.
Using a web app like chatgpt, yea you get from-scratch code every time, indeterminate and vulnerable to quality issues every single time.
But the tooling ecosystem (e.g. Cursor) allow you to use AI assistance in a more controlled and granular fashion, with in-line editing, context enhancements (documentation like frameworks, code conventions etc).
So the more productive way is not to ask chaGPT to "Write an app that counts sheep", but to set a framework for what your code should be like, e.g. via a set of types a code convention markdown file etc, and then you give it more precise instructions, e.g. "implement this interface with the params object I have given you." or "update the Feed component to add error handling for the case when responses are slow", and so on...
We need to use AI for what it's good at (doing the low-level grunt work and explaining things), and give it clear instructions for what it is not good at.
@ErazerPT วันที่ผ่านมา ⁺¹
Is it any good... well, I'd say it depends. Give it a very small specific thing to do on a domain it knows a lot about, and yes, it will do OK, better than a human with zero domain knowledge at least. Give it something complex in a domain where it only has cursory knowledge and... well...
All in all, we're still at the "hard AI" problem that killed it back in the day and won't let it go further anytime soon, it being the intractable "how do we reduce it to some form of formal logic so it can self check for correctness?". And what is correctness after all? Builds? A does what A is supposed to do (even if it messes up B)? Both? And more?
@thepaintedsock 2 ชั่วโมงที่ผ่านมา
I've found it useful for building examples to help me move forward with technologies I am not familiar. This is especially useful for devops, full stack development and training where it fills skills gaps.
The downside is the code provided is verbose and often has mistakes and made up solutions. This can set a person back in time, so it can be a bit touch and go. Its either use that or go on stackoverflow, wait and get your question brutally downvoted and closed, especially if its devops or aws.
Chatgpt is a joke but claude and cody are pretty good.
The code however is boilerplate stuff and doed not compare to the quality, speed and finesse of a coder fluent in the language.
I only use it for my skills gaps.
@juaneshberger9567 วันที่ผ่านมา ⁺¹⁰
Working with medium sized repos (don't know how effective it will be with super large code bases). If you write good tests, documentation, and make sure that your functions are small (only do one thing) and have clear names, I have found really effective results with claude sonnet when writing single functions, also if you have good tests you can feed its output into the llm and get good feedback.
In summary, good code practices, TDD, documenation, clean code, small functions/classes and propers levels of abstraction are even more important for AIs than they are for people.
Giving a project context to claude (with docs and tests) and only asking for one function at a time has given me realtively good results.
@giorgos-4515 วันที่ผ่านมา ⁺¹
Can LLMs handle a codebase worth of information??
@EiziEizz วันที่ผ่านมา
TDD is for gullible idiots that think that creep uncle bob is not a scamming psycho
@almazingsk8er วันที่ผ่านมา ⁺⁴
It does do well with OOP specifically, I have found. I used copilot pretty heavily to write C# code and it has a much easier time following strict OOP standards than when prompting it via chat.
That said, I did run into instances where it suggested outdated libraries, nonexistent functions, or introduced very subtle bugs. The moment I began to "trust" the code it wrote was the moment I found myself debugging something nasty that was the result of a small hallucination. It requires quite a bit of diligence to produce quality code while working with an LLM is ultimately what I concluded, and whether or not the diligence is worth the time/effort spent was relatively situational.
@retagainez วันที่ผ่านมา
@@giorgos-4515 No. I think that's the purpose of RAGs.
@ikusoru 21 ชั่วโมงที่ผ่านมา ⁺¹
@juaneshberger9567 I had a different experience using Claude Sonnet on smaller and medium sized repos. Same as with Copilot and Cursor (well and ChatGPT) the halucinations are still quite a problem and I find it takes me more time to review the code and fix issues than it would if I write it manually.
@Rick104547 วันที่ผ่านมา
I use it to for specific tasks sometimes like generating a good name for a unit test. It also works pretty well to 'search' through documentation, ofcourse you always have to check but it's faster to do it this way.
For the rest it's really a hit and miss for me.
วันที่ผ่านมา
Would be interesting to😢 read that study. I worked for Swedish IT a lot, I know that Lund U-ty is in top there.
Now I am in differret terithory, trying to evaluate if copilot is good or not.
It looks that juniors love it while seniors hate it.
So I think that this tool generates statistical median.
@TheBackyardChemist วันที่ผ่านมา
I think it is obviously not there yet, but that may change, given enough time and effort by the developers of AI systems. But I think if there is one paradigm that is best suited for code getting written by AI, it is TDD. Humans are required to write a bunch of good, new tests, and the task of the AI is to make all new tests pass without breaking any of the old ones. Putting in execution time and peak memory usage tests (fail if the program uses too much time/space), as well as a lot of random fuzzing tests is probably also a good idea if you try to do this.
@jahelation8658 3 ชั่วโมงที่ผ่านมา
Copilot invents function calls on 3rd party APIs that do not exist on the API. This is frustrating as hell. Seen this especially with Crypto code.
@techsuvara วันที่ผ่านมา
AI tools are just search tools, I use them to look up things I could find in the broader pool of experience. Like which API should I use for this, or what is the call to run a quick sort on a list of Typescript objects. That kind of thing.
I’ve used it once to write a program, but then replaced most of the code anyway.
@godzilla47111 วันที่ผ่านมา ⁺²
I have good experience in co-coding with a rubber duck that will very patiently listen to my reasoning why some code would work or not. And I have even better experience with discussing with GPT about code.
So, if anyone is disappointed in generative AI, maybe try a different approach with generative AI.
@mrpocock วันที่ผ่านมา
It is really good as that sounding board.
@julianbrown1331 วันที่ผ่านมา
If you constrain AI code gen with effective tests (thinking TDD) then does it matter if the reproducibility of the code doesn’t exist? If it meets the same constraints as TDD (notably code coverage) and is passing all tests then it is arguable over whether reproducibility matters. In effect your tests are encoding the problem, genAI is inferring the solution. Granted that isn’t how it works now but if it did, would that constitute a viable model to AI coding?
@barbidou วันที่ผ่านมา ⁺²
No it would not. First, there is no such thing as 100% test coverage, unless in very small artificial domains, where you can test all possible input combinations. Second, when something goes wrong, it becomes much more difficult to find out which changes need to be reverted. Maintenance-wise, it is a recipe for a nightmare.
@julianbrown1331 วันที่ผ่านมา
@@barbidou I'm just playing devils advocate here...
The domain in question is the code being generated so I would argue that, actually, 100% code coverage is both achievable and actually desirable to stop your genAI adding in code that a) you don't need, and b) could be adding undesired behaviours. You still need to resolve the same boilerplate problems (depending on the language in question) but those are an understood problem
The code being generated doesn't have to be the entire solution so this isn't radical. You wouldn't try to create a single monolithic unit test suite for your entire solution...
The premise isn't new either - 5th gen languages are supposed to work in a similar fashion and they have been around for 40 years. The problem was the AI available then wasn't able to cope and it all went out of fashion within the space of a few years because it wasn't scalable to more complex problems
As for "fixing" problems - the generated code is disposable, instead you revise the tests to refine behaviour. You don't even need to put your generated code under version control because it is so disposable and, as you've pointed out, a complete nightmare to maintain by hand
@grokitall 22 ชั่วโมงที่ผ่านมา ⁺²
@@julianbrown1331if as you accept, it is a nightmare to maintain by hand, and it is not suitable for version control, it is almost certainly bad code.
most reasons for code to be hard to,maintain are due to breaking good practice, and the code which was generated based on largely untested training data is almost certain to be a testability nightmare are well. not to mention the security issues, copyright problems, etc.
@julianbrown1331 22 ชั่วโมงที่ผ่านมา
@ the lack of need for VC for generated code is down to a lack of repeatability- changes to the tests result in (potentially) radically different code. You would still capture it but comparing deltas is pointless.
The point is to treat the code as a black box. The code in the box only has to meet a few criteria. That it meets all the tests and the constraints on coverage (although that is, itself a test). You can add static analysis as a test too
The real crux is to let go of the generated code and accept that it meets the requirements. You aren’t coding the product, only the tests
Personally I’d rather worry about the code but that isn’t the premise of the question I posed
@philadams9254 วันที่ผ่านมา ⁺⁴
It's as good as your prompt. Most times I ask something of AI, 99% of my prompt is code style and guidelines. You can be incredibly specific and it will deliver. I'm not just talking about "use spaces instead of tabs" and the minor details - you can tell it to do FP/OOP or avoid certain patterns etc
@edgeeffect วันที่ผ่านมา ⁺⁸
But by the time you've found the correct language to specify all of that and then adequately checked the results are correct, do you think that you could have just written the code?
@philadams9254 23 ชั่วโมงที่ผ่านมา
@@edgeeffect I don't understand the first part of your sentence? I have a standard template of instructions I just paste in and I know it's right or wrong straight away as I'm only asking for small quantities of code each time. Yes, I could write the code but sometimes there are some boilerplate things or complex array structures that I struggle with.
@ikusoru 21 ชั่วโมงที่ผ่านมา ⁺³
@edgeeffect Exactly! I find that 9/10 times coding with LLM assistance takes the same or more time to write a function as doing it without it.
@mandisaw 16 ชั่วโมงที่ผ่านมา ⁺¹
@@philadams9254 I think @edge's point was that if you're using a template of specific instructions incl code style, guidelines, domain-specs, etc then you're basically already 90% to writing the actual code. Most of the time spent IMExp isn't writing code, it's working out the solution to a problem that suits your business needs, constraints, & strategy. By the time you know the solution, actually writing it is quick.
@Yulrag วันที่ผ่านมา ⁺¹
I found AI to be good at explaining obscure topics like some Maven configuration aspects (although xml itself was faulty, but it gave me a push in the right direction). But to write code by prompt? It will take me longer to write to a good prompt than to write code itself.
I would say, this incarnation of AI assistant is good at preventing procrastination at difficult points (since instead of googling you just ask a question and get an exhaustive answer), but it won't improve code itself. Then again, I have not worked with something well integrated with IDE yet, so I may be wrong on this.
@tedand 10 ชั่วโมงที่ผ่านมา
@@Yulrag try Cursor IDE, it's a VSCode fork that is very well integrated with LLMs like Claude Sonnet. It makes a fantastic assistant to me.
@ariverosmg 23 ชั่วโมงที่ผ่านมา
I think the experiment is quite preliminary... Because it will give very different answers if the devs using AI are just using autocomplete related tools, or real AI powered development tools like with Cursor IDE, and let's say they're using those good tools (and not simply Github Copilot), then it'll be very different if they're prompting the system to generate high quality and maintanable code all the time, or not. I write code with AI constantly, at this point, I cannot imagine how somebody would argue isn't faster, and better, and have better quality, and my only answer to this, is that most devs have still to figure out, that you should request the right things to the AI, you are the guide to ensure quality.... I build using TDD, I request to create the test from the Acceptance Criteria I have to fulfill, then the code to satisfy that test, then refactor if needed... and this works too good to ignore, it is fast and efficient... But I guess that if you just give your task and use whatever it returns, then yes, that's not useful at all and you end with the idea that it doesn't help.
@corsaro0071 23 ชั่วโมงที่ผ่านมา
The answer is: yes, if you already know what good code looks like
@raybod1775 9 ชั่วโมงที่ผ่านมา
Internal corporate AI systems program and design much better than what’s given to the public. Retrieval Augmented Generation likely uses internal documentation and internal code to keep and maintain code correctly.
@scycer วันที่ผ่านมา
All these comments are quite interesting, more against it than id expect. Ive found AI tools to be much closer to humans than people realise when coding. If you told a dev to make something in code, with only the context of the current file and the words you write (which itself have their own interpretation), then its likely the quality is low, asking AI to do it twice is obviously going to do it differently, so would i if there was enough time between sessions to completely forget my past implementation.
I think fundamentally, once we start to get more and more "context" into the systems we use in software development, abstracted at the right level and provided as context to LLMs with multi stage iterations, we will see the big shift in productivity. Aider has massive potential already, starting this path with /ask and /architect to plan out and research before implementation. If we start adding in BDD, better requirements, tailered small language models and other feedback loops into it, its likely to keep improving dramatically.
@nschul4 วันที่ผ่านมา
I don't see how there's so much room for debate on this. When we first got llms they weren't very helpful to coders, in only two years we've seen them become very helpful as an assistant, in two years more...
@ErazerPT วันที่ผ่านมา
You're falling into a damning "scaling fallacy", and every serious ML practitioner that isn't trying to sell some ML based "AI tool" will tell you that. If not sure about what I'm talking, the issue is called diminishing returns. If in doubt, go check image classifiers and detectors. Sure, improving, but... the times of "big returns" like LeNet>AlexNet>SSD>YOLO are long gone. Now you fight over a 0.05% improvement with an exponential increase in resources spent. OpenAI is already hitting the wall hard, so is llama and every other llm, because the approach is fundamentally flawed, if fun.
@youroldmangaming8150 วันที่ผ่านมา ⁺²
Ive set up an automation with well defined inputs and outputs. I put together an ai BOT that creates code to the desired outcomes with set inputs. I just blackbox it. It just iterates away by itself until it gets what I have asked for. If it diverges then it stops going down that part of path where it was closer to the desired outcome. This way I just keep control of the architecture. If the black boxes dont perform then I look at optimizing those individually. In saying that this is just to generate a proof of concept. Once there is something close to want I need then I start to look at the code generated and apply best practice. Not sure it this is workable in a professional environment just yet, as I stopped doing this for money many years ago. But the one thing I can say about change is that if you dont embrace it, you will be left behind.
@ikusoru 21 ชั่วโมงที่ผ่านมา ⁺¹
"if you dont embrace it, you will be left behind"
That is quite a statement from someone who doesn't write software profesionally for living (aka lacks experience on the topic). It is more of an opinion. 😅
@youroldmangaming8150 20 ชั่วโมงที่ผ่านมา
@@ikusoru Yes it is. That is. quite a statement for someone who no longer writes software any more professionally after 30 years doing it for a job.
@robw3610 วันที่ผ่านมา
So far I have had good experiences, at least with Jetbrains AI assistant in Rider. But am not using it for large swaths of code. I am mostly using it for documentation and writing boiler plate. I tend to find that I get better "help" from Chat GPT 4o, with the main advantage of being able to learn new APIs and frameworks a lot faster than looking for online guides and turtorials that are almost always not geared for what I am trying to do.
While I think the in editor assistants are good for boilerplate and refactoring, I dont trust the output enough for production code, at least not without heavy scrutiny.
@qj0n วันที่ผ่านมา
In technologies which I know enough to write a good code, it's quicker for me to just write it instead of generating via copilot and fixing afterwards. However, when I had to do small task in popular language I code very rarely (JS), copilot produced something probably not-so-good, but not as bad as mine (and much quicker than me)
I believe that at best, copilots will make it easier to build cross-functional teams and exchange work within a team and that's a good thing about them. But I don't believe managers will understand it
@ThomasTomiczek วันที่ผ่านมา
They are bette than you think - except not the ones you tested. Current tools have serious limitations that are being worked on - but there is some stuff in development that COULD work better (should actually) but is not going to be as cheap as people think. We talk of AI agentic frameworks with a 100 million token context window and near perfect recall - enough to start loading a lot of documentation and code base in to analyse it. Until you can have an AI put together its own context by researching relevant code - you are very limited in what an AI can see at a time.
Hallucinations are less an issue if the AI does not write code, but writes code, compiles and tests it - error happen, also with humans, but there is no reason a properly INTEGRATED system cannot fix them. In fact, when I have AI do coding - it is often a lot back and forth with me acting as the hands of the AI. Put code in, report errors (also from unit tests). There is NO reason the AI could not do that itself. Heck, an AI should - if under i.e. git - branch off, work with the source control, then submit the merge request once it has everything done. But it seriously sucks how they are currently integrated.
Btw., reproducibility is "solved" - that is where the chat part comes in. They take a random sample of the next token so that answers are not repeating itself - for code parts, you want this off and always take the "best" token, but in chat interfaces you cannot change the temperature and even with API you cannot do that in parts of the answer. Again, a problem of the integration.
I look forward to the moment AI can do refactoring autonomously. This is not about code quality - but this basically requires the complete stack under AI control, including running and fixing issues in unit tests. THAT will make them useful.
@grokitall 22 ชั่วโมงที่ผ่านมา
the problem with the idea of using statistical ai for refactoring is that the entire method is about producing plausible hallucinations that conform to very superficial correlations.
to automate refactoring, you need to understand why the current code is wrong in this context. this is fundamentally outside the scope of how these systems are designed to work, and no minor tweaking can remove the lack of understanding from the underlying technology.
the only way around this is to use symbolic ai, like expert systems or the cyc project, but that is not where the current money is going.
given the current known problems with llm generated code, lots of projects are banning it completely.
these issues include:
exact copies of the training data right down to the comments, leaving you open to copyright infringement.
producing code with massive security bugs due to the training data not being written to be security aware.
producing hard to test code, due to the training data not being written with testing in mind.
the code being suggested being identical to code under a different license, leaving you open to infringement claims.
when the code is identified as generated, it is not copyrightable, but if you don't flag it up it moves the liability for infringement to the programmer.
the only way to fix generating bad code is to completely retrain from scratch, which does not guarantee fixing the problem and risks introducing more errors.
these are just some of the issues of statistical methods, there are many more.
@ThomasTomiczek 21 ชั่วโมงที่ผ่านมา
@@grokitall Btw, you miss the reason for the refactoring. This requires a good overview over a larger codebase, as well as use if source control AND IDE / developer tools (to run unit tests). We are not there - in particular the large codebase is a PAIN for now. It gets worse if the app i question is visual - and i.e. requires analysing screen shots, but even if one takes that out a large application is a real problem for now just from the context.
@ianosgnatiuc 12 ชั่วโมงที่ผ่านมา
A good programmer will make the code maintainable with or without assistans. A bad programmer will make it unmaintainable no matter what.
@trignals วันที่ผ่านมา
Interesting project.
Would like to see phase 2 and on mix in AI. If it is possible to use AI to deliver a solution comparable to maintaining code, do you even care about maintenance? It extends the viable timescale, even if all the code is regenerated.
The value of maintenance is that it is the fastest way to deliver change. If you can establish confidence in new solutions fast enough why maintain?
I'd guess it's a harder experiment to run but also a lower hurdle for AI code to pass. Looking forward to hearing the results on this one.
@barbidou วันที่ผ่านมา ⁺¹
The cost of verification that a regenerated solution really does what it's supposed to do can be prohibitive. Small localized changes during maintenance are less likely to affect a solution as a whole. Code regeneration, on the other hand, calls for full blown regression testing.
@grokitall 22 ชั่วโมงที่ผ่านมา ⁺¹
the value of maintainence is not in speed of change, but in the fact that when done well, it produces ever improving code which is easier to change. this requires minor updates which make specific changes to make particular types of improvements, which requires understanding why the code is less than optimal, and which change is the better one to make.
this is fundamentally at odds with how statistical ai in general works, and when you regenerate sections of code in big blocks, you have no reason to believe that what it guessed this time is any better than what it guessed last time, or that it is not throwing away better code to replace it with something worse.
it also fundamentally screws up the whole idea of version control, as it is impossible to create good commit messages, and you are repeatedly just bulk replacing large chunks of code rather than evolving it.
@trignals 17 ชั่วโมงที่ผ่านมา
@@barbidou agreed the cost could be prohibitive. Alternatively it might not be. If a study prohibits the attempt it can't make any comment.
@trignals 16 ชั่วโมงที่ผ่านมา
@grokitall I've tried to see how "ever improving code which is easier to change" could have a distinct meaning and I've failed. To me it's exactly the same with different words.
Readability and good design make it easier to make changes, they embed tacit knowledge of the problem domain. Like clustering it into areas where stability and confidence are high distinct from where either or both are low.
As a technique automated testing allows us to quickly assess we have preserved the behaviors we care about after a change. This is coupled to the production code. Not in the sense of the code smell tightly coupled tests, just in that we exercise parts of production code by calling it by name.
However none of this leaves any trace for the user. They interact with the code as a large black box. The user does not know if any code from one release version to the next bears any similarity to its predecessor. Or if the underlying structures have been preserved.
Version control, automated testing etc. are all techniques geared towards a particular paradigm of code generation. They are techniques for making it safe to work at a particular level of abstraction, a particular intermediary step between the user and the machine code.
They are of value because of the assumed workflow. When a study looks at a new technique because it could potentially fundamentally alter the cost benefit analysis of the whole workflow, it is noteworthy that it looks to deviate as little as possible from traditional workflows.
So I fully agree with your larger point that it is at odds with how a generative AI works.
I expect the research team is aware of that and made the choice because they are asking and answering the simplest question first. That will leave them better suited to ask follow up questions. But I didn't take part in those conversations so I'm having fun discussing it here.
Go well buddy
@immunoglobul วันที่ผ่านมา
Thanks
@kamertonaudiophileplayer847 วันที่ผ่านมา
AI should be trained on the best code samples, and then it will generate a code on level from an average to a skilled developer.
@bitshopstever 5 ชั่วโมงที่ผ่านมา
Many of the coding assistants provide a PR like or git patch view to proposed changes which negates a few of your points, or at least tries to.
These things are arrogant junior engineers, who claimed they know everything - That can produce value with the right oversight.
@josemartins-game วันที่ผ่านมา ⁺²
Ocasionaly. But most of times is crap.
@ingoeichhorst1255 วันที่ผ่านมา ⁺¹
[3:45] If you set temperature to zero and make sure to use the same seed if you can configure it, the LLM indeed produces the same output. To double-check I've just created the exact same unit tests for a bubble-sort algo with GWEN2.5 and temp 0. And no diffs where found.
Non-determinism is a lie and a result of a lack in understanding of LLMs (and missing config options, sometimes).
@bestopinion9257 วันที่ผ่านมา
It is, after you explain several time that you want something else.
@RyanElmore_EE วันที่ผ่านมา ⁺¹
The point of if you ask the code generation twice with the same words, it may give different results. If you give the problem to 2 humans, will they give the same exact code to solve the problem? There's more than one way, no? Why is AI put on a different accuracy / repeatability KPIs for 'success'? (I don't know why either)
@romangeneral23 13 ชั่วโมงที่ผ่านมา
No....
@ramielkady938 วันที่ผ่านมา
Generative AI is Gods sent SW engineers. Deal with it.
@ikusoru 21 ชั่วโมงที่ผ่านมา ⁺¹
@ramielkady938 Not sure what if it is only me, but the way you phrased it makes it hard to understand what you meant.