A sentence I stole online and sometimes tell my classmates is "If your code doesn't work, we don't care how fast it doesn't work". I think it's a great piece of advice.
My data structures and algorithms lecturer said something similar, an anecdote about him replacing a broken algorithm. The exchange was along the lines of 'But your algorithm takes 3 times as long as my algorithm!' 'Yes, but _my_ algorithm gets the right answer'
Correctness comes before efficiency. Sometimes also people also say “I can make it run a bit faster if it doesn’t have to handle all these edge cases”. Or, in other words, if it doesn’t have to be correct.
A classmate of mine wrote a tree search accidentally with 2 extra unnecessary recursion calls and the professor determined that the code would eventually work, but it would take much longer than the age of the universe, so....... sometimes a bad algorithm is no better than code that doesn't work.
That's a clever adaptation of the shorter one they use often when training devs: "It's easier to make working code go faster than it is to make fast code work"
Here is a case I got early in my software development career. Our older software package was taking almost three hours to read the configuration files. My boss gave me the task to convert the configuration files to binary rather than human readable text, because it would be three times faster. I pushed back and asked for research time, because while there were a lot of large configuration files, but they were not THAT large! After doing my research, I found the real problem. There were two files that contained data that had to be correlated together. Each entry in the first file had a key to a data entry in the second file, and they needed to be connected in memory via pointers before the program could run. They were read via functions that opened up the file, searched for an entry, closed the file, and returned the entry. It started by manually reading the first file to get a list of keys for every entry in the first file. Then for each key, it would call the function to open the file, search for the key, and close the file. It would then repeat the process for the second file. My solution: Open both files and read all data entries into memory. Once everything is in memory in two lists, iterate through the first list and match it to the correct data entry in the second list. Loading times went from three hours to thirty seconds.
Had similar problem but on my small program to automatically search for counterexamples for math problem. Having to constantly open and close files is SO slow
I work with statistical software and I am pretty sure I can say: 100% of my unnecessary performance issues were based on badly written loops that repeated something unnecessarily. I think this is another such case: the file opening and closing was repeated for no reason, and probably even in an explicitly programmed loop. In my use cases, everything that is called only once can be utter garbage in terms of performance without anyone noticing. In your case, this would mean that a function call that opens the file in 100 ms would be equally as good as one that opens it in 1 ms. Only when you force your computer to repeat something that is unnecessary a million times you are able to reach time frames that can be perceived by a human. So my 80% move is to try and move as much as I can out of loops. In your case it would be the file opening and closing.
One lesson that always stuck with me since college was that software engineering doesn’t just have one time-cost. It has three: runtime cost, development time cost, and repair/maintenance cost. Which happen to be a perfect allegory to the performance, velocity, adaptability triangle
This video makes me feel im over optimizing and making too much effort on adaptability.. on a real estate full stack app im making... wich probably wont have more than 500-1000 objects in the database.. and maybe 10-20 images per object... just made me think i should be more optimizing for the images, rather than the app itself. (wich will not have insane traffic anyways)
This brings back a memory from grad school. A co-worker of mine insisted that the time to load data was too long and so he cooked up examples that showed the worst case, plotted out on a linear graph and everything. None of our problems used data sets big enough to get into the "bad region". In fact, all of them were small enough to make the normal method a bit faster, which of course was hidden down in the corner of his non-logscale graph. It wasn't even worth the time to have the meeting, not only because he proved himself wrong, but also because our programs would spend a couple milliseconds loading the data, then process it for minutes to hours. Optimizing a non-problem with something that would have made it worse. Great job!
@@demonslime I guess he was thinking that we could optimize the neural network training down to a few milliseconds on large data sets. In the mid-90s. A very optimistic expectation! 😆
Besides "premature optimization", there's another thing to avoid, which is "premature pessimization", or absence of performance-critical thinking and writing poorly performing code without thought. If performance is never on your mind during initial development, you're going to have to do a lot of work to get any amount of performance out of your application later. So some degree of performance-critical thinking *is* a good habit to form. Knowing what patterns to avoid from the get-go will speed up development, rather than having to double back to refactor these poor patterns regardless. I feel like "premature optimization is the root of all evil" is often used to argue that performance doens't matter at all, which is not an absolute truth and it's often more complicated than that. It's better to judge on a case-by-case basis. But don't get stuck in conversations about pre-increment or post-increment, because all experienced developers know that it really doesn't matter in the vast majority of the cases, and those are typically not the optimizations that come up in code reviews.
Yeah I'm glad he talked about macro level decisions and micro level performance considerations. The "premature optimization is the root of all evil" is due to people making code unreadable and hard to maintain for seemingly vapid performance gains, sometimes even things that the compiler would have optimized for you already, like function inlining. It's the macro level decisions such as choosing the right data structures and database designs that are really important. For example, incorporating parallelization into a design to increase performance is extremely hard to do after the fact, since it relies heavily on the structure of your data to be easily parallelization and minimize shared data.
Yeah totally agree with this. Kinda extreme example but I once took some code from taking many hours to run to less than a minute. It was saving function outputs to disk and then loading them elsewhere to get the output, this was being done inside of a function being numerically integrated to simulate forwards in time and so it was being evaluated a LOT. In this case just knowing the order of magnitude performance difference between keeping results in memory versus the time to read and write to disk would be useful for the original author to have realised for preventing this kind of extreme performance pessimisation.
This * 1000. Using the right algorithms and data structures will get you most of the way there. Instruction-level optimization is something I have super-rarely done outside of game engines and embedded devices. Hand-written assembly code on an Arduino can yield measurable gains, but on a modern PC (or Mac), the compiler does a better job than the human 99.9% of the time, and when it doesn't, it's often because your high-level code is a mess.
To be fair, people take "premature optimization is evil" too close to the heart and stop tinking about performance altogether. By the time people start noticing, it becomes "a death by a thousand cuts". I worked on a codebase like this once. Higher-ups kept complaining about things taking too long, but we couldn't pinpoint that to any particular part of the code.
Yes, it can be used as an excuse for complacency. Often times we have multiple implementation choices with similar complexity/maintainability/implementation effort - being aware of the best option there is NOT premature optimization.
I have literally refunded games because they were so unoptimized so, especially in gaming, your resulting frame rate is exceptionally important to the end user.
At work, a customer filed a bug because our distributed system didn’t converge it’s state fast enough. They would make a change in the controller and wait like 2 minutes before converging. This was a pain for them because it was a container solution, so a container not coming up in 2 minutes ==> k8s would shut it down. We would force their containers to go into an infinite loop or trying to come up, being too slow, and getting killed. So I spent 6 months developing profiling tools and optimizing, eventually getting convergence down to 20 seconds from 120. But then another (smarter) guy at my work added a “readiness gate” that basically just told k8s “hey this pod isn’t ready yet so just wait”, and that immediately solved the customer problem. It took him a few weeks to do it, sure, it wasn’t instant, but. It was the RIGHT solution. Optimization was not. Even when optimization seems like the right solution, even when customers and PMs were literally asking me for it, it was not the right solution. That’s my premature optimization story :)
The take away of course is - identify the problem and only then fix it. Listen to the customer carefully. The problem was not in the speed department although speed can be the solution to it, it's just not the optimal one.
I always struggle with thinking "what's the best way to implement this" rather than "how can I implement this", ending up making overly complicated stuff instead of just getting something to work and improving it afterwards This video has been really informative!
I like to think of it as how can I write this so that everything can be flipped on it's head in 2 lines gracefully? Maintainability more often than not is the biggest pain point in real world situations. A slow app is better than no app, but spaghetti will turn your head into confetti. Not sure how much sense that made. Ah well.
When working with robotics applications in python and c++, the workflow almost always involves 1. write something quick and dirty in python 2. grow the code to a point python's poor performance becomes a problem 3. rewrite everything in c++ Most of the time this is all it takes to achieve performance requirements. The reason I still find it useful to do something in python first, is because it helps shaping the features that a program needs. If you're fluent in both programming languages, it's fairly easy to translate between them.
Honestly, I've found out that you basically have to rewrite your code from scratch at some point or another. It basically impossible to write code without gathering digital debt, 1 dubious design choice makes 80% of the code this dilly-dally bubblegum workaround due to you not formatting the input enough or something. In rewriting, I typically notice how I can turn my 1000 lines to 100, partly because I won't use most of the functions and partly that I for some reason do practically the same thing in 7 different ways over and over again.
Where I worked we had a mix of both. Python for accessibility and fast development. All interfaces and wrappers were in python. The resource intensive code was then written in c and the library used within python.
I tend to do the same with C#, client ask me for a c# program, performance become a problem because of the growing features/requirements, end up rewriting it in c++
One thing you didn't mention here is that not all performance problems are because your code is taking a long time to do a thing. Sometimes your code is waiting on an external resource (like a database or API) to do a thing, and the optimization is to make fewer calls to it and rely on it for as little as possible
effectively most performance problems are something is waiting. Chache misses for example let the CPU idle for something around 200 cycles (or more, depends on the CPU). Having too many of them certainly makes things slow.
In my opinion, part of the problem is when we think of things like database connections and API's as "external", and that ensuring speed there is someone else's problem. While this _can_ be true for API's provided by other companies, internally in a company, the comunication between components is often the main cause of many kinds of problems, including performance. Developers need to understand the costs of accessing components outside their own code base, whether that is a database, API, local storage or some shared filesystem/cloud storage. While making fewer calls generally is a good idea, and should be a primary goal when setting up an enterprise architecture, trying to less on "external" resources that can make your code both bloated/unmaintainable and slow in some cases. Often, the main source of slowdowns is when developers insist on accessing databases or API's to retrive data a single element at a time, as if they're hash maps. Let's say some customer is logged in to your system, and wants to see their purchase history. Let's say your database has an order table and an order-line table. If you first transfer the order table as a collection, and then loop over each order-line, it could take 1-2 seconds to retrieve the order history for the customer, somtimes more. If, on the other hand, you join those tables and return everything using one query, this can be cut to less than 50ms. Not only does this speed up your service, it also reduces the load on the DB, speeding up everyone else's work, too. The same goes for API's, except this requires even more cross-service design work. An API that returns only atomic information elements can be extremely slow, and can be sped up by orders of magnitude if it exposes endpoints that allow retrieval of larger collections of data at once.
I feel this is one of the best channels about programming that I know of. Instead of asking to give us more content, I want to advocate for the opposite: please keep up the quality even if it means spacing out videos. Nevertheless, thank you for doing what you're doing.
Cannot stress the usefulness of profilers. I have been intimidated by them my entire career until I took the plunge to diagnose a problem with our React unit tests being very slow. Turns out the arranging and acting were not the bottlenecks, but the special assertions we were making. Totally took me off guard and was such a joy to find out what was really going on.
Profilers are useful if you have clear bottlenecks, but very often this isn't the case. Imagine your program is bubble sort, and you decided it will be like that because you "don't want premature optimisation". Well, now it's slow for large data sets, and no profiler will help you, and you'll have to rewrite your program from scratch for no benefit. It pays to be very careful at the beginning, don't listen to people who talk about "premature optimisation" without understanding what it means.
To be fair, profilers tend to have really poor, unergonomic interfaces imo. I was spooked by them for a long time too, and I still haven't found one that I really like. But good grief have they saved me a lot of time, now that I know how to use them.
This is the kind of content young programmers need. The dogmas you're taught in programming courses and college aren't always the best fit for your project, and people need to be pragmatic. Focus less on ideals and more on doing things that work. There's a balance to everything.
Completely agree. From my experience as you approach 2 comp sci experts the number of unsolvable problems reaches infinity. For those who do not have a background in programming, this problem can be expressed as a system of equations which simplify to approximately 1/(1-x) where x is the number of needed comp sci experts per organization. Sorry if the math is off, I'm just a PHP developer.
One example of readability over performance is the For... Each loop. It's very readable, because you can explain with the variable itself what is happening. But it's also slower than a regular for... next, which is then in return, way less readable. I think if the For... Each... needs to be optimized because it is the main thing slowing things down, maybe the language itself used is wrong. Maybe programming directly in assembly is better. You can control each cycle completely.
@@3lH4ck3rC0mf0r7 It was also half way of a ironic comment, but I think there is a logical limitation to performance to many languages, such that if you want them to be faster, you have to need hacks, which is where you might want to think aout switching language instead. I remember back in the old days on the C64, when we wanted to make really fast for loops, you replaced the 0 with a dot for i=. to 255:poke 1024+i,160:next is VERY fast in C64 basic, because it's literally run like machine code, because of some bug. But it's also not that readable, and I could just as easy have made a dedicated MC routine for it and calling it using SYS.
Great video! Wanted to emphasize that "premature optimization" does not equate to "do no optimization", and I think your video hit the big one with focusing on data structures. We can be critical thinkers and have reasonable predictions to know when we should do those big improvements early to save a lot of pain later. It's always a gamble when doing things early so if you're not pretty sure, then setup your design to learn as fast as possible. Knowing when it's a "problem" is best discovered before it becomes a really big one.
Great video for javascript coders who don't want to learn how computers actually work. For anyone working or interested in working on software that requires good performance: No, optimization is not an afterthought that can be "easily patched" at the end of the development cycle. Architecture of data and the way it connects together in a complex project plays a massive role in performance, and this is not something that optimizing compilers are allowed to modify. Most stuff in this video is false or only applies to non-realtime software. Keep looking for actually valuable information, it's out there.
I haven't heard it expressed this way, but I like how you laid this out. There are two stages of optimization: * Best practices * Best implementations We typically start with the best practices and hope that they're the way to go. These optimizations are things like array type, design patterns, etc. For most code, it's fine and the way to go for readability. Best implementation comes into play when your "likely-optimized" code is part of a loop that has to run at lightning speeds. We never know about the best implementation until we have a problem. Which tools do you use for optimization? Keep up the great content!
this is exactly it! much better to spend your time considering why you're looping through something so much in the first place than spend it optimizing the speed of the loop, up until you're completely out of other things to resolve
I'm so glad this channel isn't rushing to produce videos but each video is carefully researched is the best thing. Probably the best channel to understand programing as a concept.
For a class in uni I was writing a basic synthesizer in C for a custom embedded system developed by some engineering students from prior years. This system didn't even have an FPU so everything needed to be in fixed point. At the beginning I was separating everything into functions in their own C files, but when we started to introduce DSP into the mix, I started getting pops in the audio output. I had just read Game Engine Architecture by Jason Gregory and learned about the instruction cache and stack resetting from function calls, so I moved every function to be either inline in a header or as a really long function style macro. And it worked fine, I was able to play multiple notes with DSP without any pops. Similarly, at one point in uni a friend was writing an HRTF audio plugin and found that by making the entire audio processing block in one function with no calls to std vector subscript, it went from 1 voice to 99 voices playing concurrently. In the world of real-time audio operating at 48 kHz, unfortunately code aesthetic is not allowed as function calls will actually slow things down enough to be noticeable to the end user. But for other real time applications the frame rate is almost never high enough for "the overhead" of a function call to be noticeable.
If you write a function as static and call it only once, it is basically guaranteed to be inlined, so separating out code into a static function in the same file has no performance cost. (static free function, not static member function of a class)
I've always used the theory of critical sections. Some parts of your code will be within a "Critical Section" while the rest is in a "Non-Critical Section". Critical sections are areas where performance matters because it is code that will be executed either over vast amounts of data, or within a continuous loop (eg. a game update/render loop). Non-critical sections is code that is executed either infrequently, or due to an explicit event. Critical sections will typically only exist in about 5% of your code. Everything else is non-critical and requires little to no optimization outside of what you get for free from the compiler. The critical section code does require optimization, but it should also be carefully designed to try and maintain readability and maintainability. If either of those are broken, documentation can be provided so that future developers can study and understand why and how the code is doing what it is (such as referencing the mathematical formulas being used to calculate a complex solution). Again though, 95% or more of your time is spent working on non-critical code that requires readability and maintainability over performance and does not need to be optimized at all. For example, I built a simple menu system for a micro controller's LCD screen. This code is not efficient at all, but I don't care because the UI only ever updates when a user pushes a button, so who cares if it takes a whole 250ms to refresh the screen. What I care about is readability because UI can be complex, even at a small scale, and I want to be able to fix bugs as easily as possible. What does matter for performance however, is the sensor monitor which needs to be highly reactive and efficient but only accounts for ~10% of my actual code base. The UI is ~40% and the rest is File/Network IO and business logic that all runs slowly because it doesn't have to be fast and is limited to the speed of IO.
The term Critical Section is already used in programming to refer to code that should only be run in a single thread at a time, but yes. The concept you are describing is generally referred to as a "hot path" or "performance path," as in, it's the codepath that will be run the most often so needs to be the most performant.
Yeah, if you have a section that is super slow due to a file io, for example, it doesn't matter if something else around that takes 10ns or 20ns to run. Sure, you can speed up that other piece by 2X, but it's not going to make a difference overall. But if that function is run a million times in a performance critical section, yeah, every little bit counts.
Tbh even in the cases where I need optimization, I very rarely find myself needing absolutely percect optimization. More often I'm like "oops, that is a bit slow, lemme just cache some of that data and now it runs 10x faster" Unless you're working on something that truly pushes hardware to its limits, basic optimizations are likely all you need 99% of the time.
There is the term “critical path”, from project-management theory. But yes, a typical program’s use of CPU time does tend to follow a pareto distribution, and concentrating effort on speeding up the “hot” bits will help more than elsewhere.
Fun fact about the post-fix vs pre-fix syntax. There are legit some languages where ++i will ALWAYS be faster than i++. It is indeed because they make a copy, but the copy isn't used, and the compiler is too primitive to realise this and remove the temporary store. Outside of such cases, the compiler (or JIT if you swing that way) will pretty much always be smart enough to eliminate any difference between the two.
My issue with this has always been, if you don't know the difference between what pre- and post-increment do, you should! And if you know, why wouldn't you use the correct one? I despair of my fellow coders who think they "know enough now, and don't need to learn anymore"... _never_ stop learning...
That videos hits direct in my heart. I am was nearly targeting fully on performance and adaptability, which killed me often adding new features. I ended refactoring for the new feature, but did the same again. I am currently in the process of realizing it. And stopping to over-optimize. I now often have small functions where I know I did a simple solution, but if it should be a performance issue which it never will be, I could optimize it in a couple of minutes. I already feel how my productivity rises, which also rises my happiness and motivation to do more and nice features. Thank you for the video, it confirms my thoughts and motivates me to go further!
I agree with basically everything *except* the reasoning stated for why post/pre-increment doesn't really matter. My reasoning for having a preference in every situation is that there is a definitive difference. Not that that difference has a huge performance or functional impact, but that it is defined at all. When someone 10 years from now is reading through my code (fairly common for AAA game studios) and sees a post-increment, I want them to be sure they can say "it's there for a reason" and that they *don't* say "well, post-increment is slow so I'll just switch it to pre-increment" and inadvertently create bugs. My issue with people defaulting to post-increment because "the compiler will just optimize it to pre-increment anyway" is because: 1. It won't *always* do that if they're not truly synonymous, which can lead to unexpected behavior; and 2. When we write code, it's not just for the compiler, but for future readers (including our future selves) to understand our thought process when we were writing it. The way I think about it is that if increment operators weren't so short, would we make justifications for it not really mattering which was used? If instead of code being `for (int i{ 0 }; i < 100; i++) { /* ... */ }` it was `for (int i{ 0 }; i < 100; math::increment_integer_and_return_previous_value(i)) { /* ... */ }` I think it'd be obvious people's immediate question is "why do we care about the previous value here?" and that the answer of "we don't, but it doesn't matter because the compiler will optimize it" wouldn't be satisfactory. Aside: there's also a greater issue of user created `operator++` implementations which incur side-effects (which may differ between pre and post increment). There's a whole discussion to be had about if that's even good practice (it's not, please don't make operators with side-effects, I beg you), but since we live in a world where millions of people think OOP is a good idea I can't make any assumptions about it.
I was never a fan of the ++ thing since from what I understand it's a holdover from assembly, where many CPUs had a specific instruction just to add 1 to a register (typically it was called INC for increment.) Nowadays, not every CPU has a dedicated INC operator so I really don't see how "int x = ++i + 7" would compile differently than "int x = (i + 1) + 7"
@@williamdrum9899 "int x = ++i + 7" and "int x = (i + 1) + 7" have different answers. They would always compile differently. the later is 1 larger than the first, because ++i pushes i before adding 1 onto the stack while the (i + 1) pushes its result on the stack. Which is incredibly valuable in programing.
I think an important aspect to making these decisions is being well-versed in computer science in general. E.g., even if you haven't written a compiler, the concept that i++ and ++i have the same SSA form after collapsing unused variables. So, they will obviously compile to the same assembly after SSA optimization.
Idea for video: how hard part of coding up a large project from scratch is often not implementing features, but rather the software *design* ; how all your code is laid out. Getting a good design in place at the beginning is often instrumental in implementing features in your project later down the line without losing your sanity.
@@mv2e19 Most of his videos seem to come from books like "Clean Code", just to make sure I'm not misunderstood, I like his video's, just a little disappointed that he doesn't credit the source and makes it look like he came up with these concepts himself
I think all of those topics you talked about (function call overhead, i++ vs ++i...) are things that a sincere developer will look at at some point in his carreer and is something which brings you ahead and sheds light on corners of computer science that help you understand things better. That is the stuff that matters a lot as it is the grounds where new concepts are built upon and new ideas sprout. Investigating the ++i vs i++ case has thought me a lot about how cpus work. It is a very interesting case as actually i++ can be faster on some processors (CISC family) due to instruction level parallelism, whereas on RISC processors you're probably better off using ++i. I never quite understood the fear of writing optimized software... If you make it a habbit it will come naturally and not hinder the development process. It will require some thinking at first, but if you want to stop thinking you're probably better off chosing a different job lets be honest.
Profiling your code is something that literally only few people do. People give talks in conferences about how to make things BLAZINGLY FAST, but they forget to tell that the approach suggested may not fit for every usescase. code maintainability > performance (Most of the time 😅)
@@patrickspiegel5524 writing SQL queries is tricky, specially when you're handling large data sets. There is a point where using regular 1:1 indexes stops being beneficial and instead slow you down due to IO bottlenecks. There are other types of indexes that can help, and sometimes it's somehow more effective to do plain joins (using disk and/or RAM) when there's a lot of data to cross over. After working with Oracle databases for a few years, I've grown used to verify the database plans for my queries once or twice before settling down with them, sometimes I end up having to use hints in order to instruct the database engine on how to poll for the matching rows at each table and in what order.
I profile my code but that's because it allows me to see if I only "mildly screwed up" or "screwed up really badly". If a piece of code should run basically instantly but takes a few seconds, I know something may not be right. But if it takes a few ms longer than I expected, I generally just slap a TODO there for when I don't have anything better to do.
I hate when people missuse this quote. The full quote reads "We should forget about small efficiencies, say about 97% of the time. Premature optimization is the root of all evil, yet we should not pass up our opportunities in that critical 3%." This is rarely done, what he ment with optimization was things like cycle counting, cpu specific hot path definitions, cacheline fitting of memory structures. He did Not mean oh using layers upon layers of language abstractions to accomplish something 5 minutes sooner is a nobrainer. Don't **pessimise** your code, write reasonably performant code from the start, you'll find it's much easier to read and much easier to work in an environment that only does what it's supposed to rather then a bunch of busywork that you have to bring out the call stacks for just to find where the smalle amount of useful work gets done at all. That triangle is a false uh trichotomy you can easily have performant code that has full velocity and is reasonably adaptable, simply write code that does exactly what it has to accomplish it's task and if you need a more general solution you write the slow path next to it. you can always add new features this way also.
As a very early stage programmer, the caution against over-optimization is very timely for me. I'll start working on a function to accomplish the same task over and over, and make that function adaptable enough to be called from anywhere and fit a variety of circumstances....before even confirming if the task itself will work for my needs... I guess I've had to redo code enough times that I want to do it right the first time. However this video helps me see that going back to optimize is much easier and less frustrating than having to scrap hours of work because it turns out none of it was necessary to begin with...
I've been programming for years (well, only really 2 years, but still), and I write dog-crap garbage code to begin with, for prototyping. I try to make it as understandable as possible, but I don't go out of my way to make it readable and intuitive, and I don't do much to make it extendable or flexible. Once I am done prototyping, I refactor and rework the code to make it good enough, and once it's good enough I stop there.
I hear ya! 😉 Premature optimization can be the devil for sure- we have to remember to step back and look at the bigger picture first. 🤔 ~ with ❤️ from repliesgpt
Gosh I feel that one. But I've been programming for almost a decade and it's still my preferred style. Given, very little of that time has been doing it for corporate, so we'll see if that changes. But the joy of working with a pleasing, adaptable system that resembles an engine within itself is just too satisfying to forego, so I maintain the view of sacrificing dev speed for architecture.
Man you already helped me and especially my colleagues so much when they interact with my code I regularly rewatch all your videos to keep writing better code and I recommend you to every coder The biggest help you gave were the nesting video and the readability video (I don't remember if the latter one was a whole video but it really stuck with me) Keep making such great content 🔥
I really like this triangle analogy. I'd expand it like this: - at the beginning you have nothing, so the velocity is the only important aspect of development - as you go back on some features, you find things you often modify and repeating code and optimize those parts to modify them easier, so adaptability becomes on par with velocity - as you finally approach the final design, you go back on the features you have and might want to tweak them to desired values. If this is a game, then at this stage you probably have a feature-complete vertical slice of the game and want to make adding new, non-coding elements like maps. models etc. easier. So adaptability becomes the most important aspect of the development and new mechanics happen sporadically as need or inspiration comes - finally, at the end, you have the program you wanted to make. It runs horribly and hopefully you have a powerful machine to run it. Now you can find parts that are inefficient with profiling tools and make them faster (probably at the cost of extensibility). Most of your code will probably run once, few times during the runtime, once every few seconds. Optimization of anything other than most executed code is usually a waste of time. The only concern now is performance. At the end you have an extensible program with few, rigid, ultra-efficient nodes that are never to be touched.
If you're solving any kind of problem that isn't already solved, the likelihood that you will run into a an issue that will force a complete redesign of large parts of your code base is high. Therefore, the main argument against premature optimizations is that most of the code they apply to will not make it to the final product, i.e. most of the effort you put in is wasted. It's a different story if you're designing basically the same API for the Nth time and already know how it will go, more or less.
I remember a particular problem in my code where I was trying to remove recursion, as it is generally avoided at all costs in terms of performance. I solved the problem with loops and tracking data top down, only to realize the data structures were actually making it take twice as long. The real problem with the data structures is that they were allocating memory regardless of whether the loop would actually go down far enough for it to matter. Heck, often times the recursion wouldn't happen, and the end case was done in one function call! It truly was a waste of time after a generally quick enough solution which did what it needed to.
When replacing recursion with a loop + list data structure, I usually don't do it for performance but rather to avoid a potential stack overflow if the recursion has the possibility to go more than ~100000 layers deep. Because, depending on the programming language and whether you run on a non-main thread, the call stack might only be able to allocate a few megabytes (unlike Heap which can take the entire available ram).
On the other hand, you should always remove recursion not for performance reasons, but for the sake of keeping sanity of anyone who has to at some other point try to read and understand what the function is actually doing.
A funny and informative video. It's good to find someone who has a good balance of productivity and humour. Most other code channels lean more towards one or the other.
In many situations, algorithm choice is of enormous importance, and particularly in real-time applications. Optimization requirements may need to be be part of the initial design, so "it depends" on whether or not it can be considered "premature".
I will say that even if performance isn't worth optimizing in a lot of cases, there's still some basic things that can be worthwhile. For example, I'm an Excel/VBA guy, so you basically take any other language, and knock three or four orders of magnitude off the speed. For big jobs, it can easily take minutes to run a complex macro. And there's a couple basic things you can do to speed it up with almost zero effort - most commonly, turn off screen updating, and turn off auto-calculation (then turn them back on when you're done). That can easily be an 80% improvement in a lot of real-world situations, for a whopping four lines of code. (It occasionally causes headaches, especially if your code crashes mid-execution, but there's huge gains to be had). So there's certain kinds of optimization that actually do justify premature use. But it's rare even in my neck of the woods, and...well, VBA is an adorable pile of 40 year old jank, so perhaps this isn't the most useful example for people who code in other languages.
Personally I don't like the increment example because postfix and prefix are virtually identical from a readability standpoint. It's a good idea to just pick one and stick with it for consistency, so why not pick the sometimes-faster one? At any rate, premature optimization is a very important topic, especially when performance hurts readability. Or when there are actually no gains at all (I recently saw someone turn a bunch of JS string concatenations into array concatenations for "performance" - they didn't measure so we'll never know how the JIT optimized). Glad you covered it!
While it is not worth worrying too much about optimization working on the project, I still think it is a good thing to explore when available. Knowing how to make optimized code will generally make you a better coder in the long run.
The point is that people focus on it too much and think it's the only thing that makes you a good programmer. If you're "coding" then sure, but a programmer needs to know how to write good architecture more than highly optimized code.
@@NihongoWakannai Software today is much slower than it should be. Hardware keeps getting faster but a lot software doesn't, because there are many programmers who don't care about performance. Or worse, some programmers don't even know how to write performant code.
@@YASxYT of course they don't care about performance, because it doesn't matter. What matters is finishing the project quickly and within deadlines, not spending 3 extra months getting extra performance that the end user doesn't even notice.
@@NihongoWakannai The user always notices. You need a new cell phone every 2 years because code keeps getting slower and it annoys you, not because your cellphone CPU slows down with age.
@@Diamonddrake No, the user does not always notice. No one notices the difference between a 1ms latency and 0.5ms latency. Users notice significant performance issues, not tiny ones.
I work a lot on math heavy code, there are a few steps I always try to implement to optimize it. Vectorization, swapping matrix axis and memoization. vectorization is by far the most effective and make the code more readable, the other two are very quick to implement and check. If the code is still slow profiling is an amazing tool, find the problem function and try to work on that!
At a previous position, we painstakingly went through each line and found that a function was being called for each element of a list and each element of the previous version every time the list was changed. So appending 3 to [1,2] was run 8 times for [1,2,3], [1], [2], [3], [1,2], [1], [2].
I had a real problem for years early in my career wanting to "get it perfect" the first time. So I would craft the whole thing in my head and then write the code. As I eased into an agile, TDD (or BDD) mindset, it was a huge relief from stress and anxiety.
In unity, an interesting optimization i discovered is you shouldnt deeply nest rapid moving objects all in one parent. Because when a child object moves, it forces to recalculate bounds to the parent...imagine this recalculation happening frequently. Solution was to not parent moving objects. It looked messy in the scene hierarchy but hey at least it runs faster than having to parent them for "organizational" reasons.
The problem with this video is that it completely misrepresents the positions of literally everyone: For the more performance oriented side, you're misrepresenting them by saying they care about menial micro things like i++ vs ++i. For the "root of all evil" side, you are misrepresenting them by suggesting that they do literally anything at all. In reality: The performance side of the argument argues simply to not pessimize your code. Don't write poorly performing code just cause that "seems" to be the easiest. Don't blame your 5,000ms web request on "lol, that's just IO". We literally just argue in favour not picking extremely slow tech/algos from the get-go, as well as to actually measure from time to time. That's it. The "root of all evil" side loudly advocates *to never ever even bother taking metrics of your performance, or even to think about performance characteristics at all*. This side argues that ALL PERFORMANCE MEASUREMENTS AND OPTIMIZATION OF ANY KIND ARE PREMATURE OPTIMIZATION. JUST THROW HARDWARE AT IT.
Just a quick chime-in on the ++i/i++ debate: The reason I believe ++i should be "best practice" is that i++ can lead to subtle bugs, as it *post*-increments, meaning the result of "i++" is actually "i". Which can easily be overlooked if it's used as part of a larger expression! Therefore I recommend either: - Use ++i everywhere. - Never use i++ as part of a larger expression. Or use Rust, which has neither (for this exact reason) and just do: i += 1; :D
I was once making a chat bot that used an API, and when you said it’s often 1 silly thing slowing everything down that really resonated, because I’m not sure why, but when I’d request something from the API, the response would be massive, and stripping away only the things I needed to store in memory improved speed and memory usage dramatically. But this was something I only cared for once I analyzed my memory usage later down the line
Adding a memory pool for when you constantly allocate and deallocate the same type of object but never need more than N at the same time was a nice performance increase for me. You can even add a log entry for when you need to fall back to the default allocator in case you unexpectedly needed more than N objects.
in a big C++ project we focused on velocity and adaptability. We had to redo pretty much everything when optimizing for performance, because what we built initially was that bad. NEVER leave performance for later. Try to make something that is 'good enough' performance-wise for starters. Otherwise you'll end up remaking everything in the long run.
I'd advocate that you did that project correctly. I don't think code should be held holy, but just trashed for the new, better version. It's difficult to optimize beforehand, before you know what are the bottlenecks. Also you will easily go into the optimization mindset, which is to delve into the endless rabbit hole of optimization, and done before you know what is important, can be a detrimental over engineering nightmare. Especially if you combine performance and adaptability just to notice later that what it is doing is not exactly what was needed, and even adaptability was wrong. Then you did a lot of work and used a lot of time, and still have to trash and redo. I think it is important to start with velocity, trash it, and iteratively add adaptability and performance when you have more information on the directions these should take, as these take resources.
@@techpiller2558 not really. On the contrary, experience has proven that it was way too difficult to optimize. Once a foundation has been laid and 10-20 projects are based on that foundation, changes are exponentially hard to make. So yes, in principle you should optimize last, I agree, but in most cases a certain effort needs to be made to make something that can easily be optimized later to achieve some level of performance. This is not a given. You have to work for it. And I refuse to accept the mindset that "if it works, even if it is ugly, difficult to work with and slow as heck, it's ok".
While you shouldn’t be completely unaware of performance issues, you need to fully determine what your program is and then actually measure what your performance limitations are, and prioritize them
Also I should note, this is why it is so important to separate interface from backend. It’s certainly worth thinking a lot about what your interface should be, because it’s changing that which causes everything to break
Love the triangle, put into words what I have been thinking. My rule is just once a component/idea is used for the third time, abstract/generalise it instead of hacking the differences. I'm not into generalising everything straight away, as long as you have experience with how things should be structured, it's typically not an issue. Performance should always be on the back of one's mind, otherwise you'll eventually ruin velocity, although full optimisation should be done when weak points are discovered later during profiling (what you call micro vs macro optimisation).
I was working on implementing visual transitions in java on an individual pixel basis. I saw a bunch of videos and did research on branchless programming and that it was recommended for code involving graphics to be branchless. I spent several hours converting all of my functions (while shoehorning in being able to multiply by the result of a comparison with bit operators) and only at the end I decided to test the difference between the new and old code. The old code was several times faster. I thought about it for a while before realizing the problem with branchless functions is that they compute all of the possible values and then throw away the values that we don't need. Some of my functions were making rather expensive calculations but only for a limited range of inputs and all other cases were either copying a value or returning zero. Making them branchless cause the simple cases to have a performance penalty that far exceeded any branch mispredictions that would have occurred. What I learned was when trying to optimize a lot of similar bits of code, do one piece, compare it to the old, and then implement it for the rest if it is faster.
Graphics programming (at least with GPU acceleration) is really quite complicated, particularly in the branching case. Generally branch mispredictions and overcalculation only tend to happen if every shader unit in a block ends up taking different paths, but if they are branching on a variable in a common way that tends to work out fine. Also, GPU programming gets complicated by drivers, especially for nVidia cards, since those will often have optimizations built in for ways that a big game has done something, which means that what might look worse on paper may be faster because the driver has been written to recognize what you are doing and optimize it specifically.
There's a fairly simple way around the issue you described: do the simple quick operations in one loop and when you find data points that need the complex, slow processing put them on a stack (I usually implement the stack as an array with write always, conditional increment) then have a second loop that goes back over the stack doing the complex work on everything it finds. Both loops can be branchless without throwing away work (except the single memory write to the same memory location for every non-complex data point). But of course there are plenty of cases when branchless code is not the answer - branches are only bad when they are not predictable, and any repeating sequence (up to about length 16 I think) can be predicted eg yynynnyn.yynynnyn.yynynnyn..... only when the sequence is truly unstructured does the prediction fail, like in sorts and searches. (at least in scalar CPU code ... AVX and GPU code is a different matter)
I found your channel a few days ago, and thank you very much for all the content. I have been a software dev since the early 90s, and most of my career optimization was NEVER an issue... until I started making VR applications. VR is the exception to the rule about optimization. Basically EVERTTHING about VR needs to be optimized to hit the NECESSARY frame rates
I think the problem with discussions about optimization or premature optimization often is so simplified (like the difference between i++ and ++i , we agree about focusing on readability here). Quite often two nested loops can be replaced by a sort followed by one loop. That brings down cost from O(n2) to O(n log n). O(n2) is often too expensive, and O(n log n) is most often acceptable. I think this is good practice. In most languages sort is easy to use, and the code is often easier to reason about when data is sorted. I think, when you write code for running on a server or writing code for a reusable library it makes sense to make informed optimizations (like the one above) immediately. When writing code that will run in a web browser the CPU cycles are cheaper and if the application is fast enough then it is.
@@jasonpepas4290 If I build the system, and the web application, and host the servers, I pay for the CPU-cycles that run on my servers. But I do not pay for the CPU cycles that are wasted in your browser, when you use my application.
@@serhatarslan4138, it used to be said that one was faster than the other. It was purely about the generated ASM-instructions, and I doubt it matters at all on modern CPUs.
I agree with you. Given the difference between ++i and i++, modern compilers generate the same code using optimization techniques. These are cookies for compilers :D @gunnarthorburn1219
This is a GOATED video. In the end is about using your own judgement instead of applying rules set in stone. Who would have figured! Your channel is amazing. Keep it going.
ORMs are always a good place to look when you start having performance issues. DB queries often come out on top of the performance issues list, whether you are making too many of them or they are simply unoptimized. When used without care, ORMs often procude suboptimal queries and object loading problems.
The passive aggressiveness is epic! I love it. That whole "performance" bit is such a rampant issue in development. Just look at any Stackoverflow question and you're guaranteed to see some speed freak in the comments making suggestions about "don't do that, do this, it's faster" when they have absolutely no idea what the use case(s) is/are. Not only that, the OP didn't even ask "how do I make this faster?" They just want to make it work. Definitely some popcorn worthy reading if you ever need to brighten your day. But then again, I'm not a developer so what do I know? 🤷♂
That was an really good and chill explanation on something I struggle with all the time. I've had projects fail due to me (the solo developer in the company) trying to focus too much on adaptability and performance where it wasn't needed. Feels bad... 10/10 would zen watch again, have a sub :)
I was surprised to discover how young this channel is. The content and quality of your videos are top notch. I appreciate the humor too. I'm an amateur, so I've never had to deal with performance. But I'm often paralyzed by trying to simplify my code. I'm not sure how to describe it in few words. Basically, getting from point a to point b, I'll end up with 20 nested if -else statements and be only half way. Then I spend hours reading through the documentation looking for a built in function, implementing and debugging my ham fisted attempts. I think of them as learning opportunities, but because I'm an amateur these lessons are few, far between and often forgotten. It's super discouraging.
Another hit. This channel is so good. Yes, I think that people come out of formal education with this over-emphasis on performance. They've had weeks to do their homework, so forget velocity. They're moving onto the next chapter next week, so forget extensibility. Big O notation is bouncing around in their head like the windows screensaver and their CS idols got to the moon with less memory than a wrist calculator. They think "anyone can get it done, but only the best can do it right and that's what gets them noticed". Then they graduate and land in their first production cycle and realize "oh, it is not the case that anyone can get it done".
My advice for performance gains: 1. Database design (missing indexes for large tables, too complex queries) 2. Database queries (making even small sets of data large and not utilizing indexes ) 3. Everything file related (caching with files instead ram, too many files in 1 folder, to many file operations) 4. External apis (too many calls)
For those writing console apps, anything related to IO is slow, including writing to console. So make sure to take out those debug lines before running!
Much appreciated video. What I like the most: 1. The distinction between Macro and Micro optimizations. Don't convolute the discussion. Macro aligns with good architecture, Micro opposes it. 2. Go for data structures (or database structures) first. Data locality is where the battle is won. And it forces you to think about your architecture.
I do real-time signal processing algorithms for embedded systems, and sometimes you need to consider performance in the early stages of development. For example, I may be given the choice between doing something in the time domain or the frequency domain, and depending on if I'm doing more convolutions or array multiplications, one of those choices could be orders of magnitude faster. And because the choice of domain cascades through the whole system, changing later might mean scrapping the whole project. That said, this applies only to the "core" algorithmic portion of my code. Even in signal processing, there is a lot of peripheral code that isn't nearly so performance constrained, and prioritizing flexibility/maintainability is the way to go.
Once upon the time, we develop an auditing framework in .NET core 2.1 that used Socket technology to send data between processes/machine. We placed a lot of synchronization object (semaphore, mutex, etc.) to ensure the data integrity. The code was very smelly... it felt wrong to me. Anyway, the tests were passed the output looked good so we did not clean the code. Later when .NET core 2.1 was upgraded to .NET5, the tests started to fail and sporadically deadlock happened. It turned out that the code failed because the .NET5 much more faster than .NET core. So we had to clean up the code, remove a lot of unnecessary synchronization, and use a modern asynchronization programming model. Sometimes clean code and performance go hand in hand. P.S. I loved the little stories (Mark Zuckerberg, GTA5, etc.) during the videos. Great job!
Totally agree. Had performance issues in my machine learning project, done in Python. I thought that list comprehension were the problem, but I decided to profile first. Turns out it was some useless copying from RAM to GPU mem :D
Hey man wtf. Why did you stop making videos? You're too good at this to not make videos! You actually mastered this youtube niche and you outclass everything else I've seen on TH-cam. Listen WE WANT MORE
I look forward to your videos quite a bit. You have a reasonable, rational approach to the thesis of each video, so I hope you will continue for a long time to come. Cheers!
The best reason for Zuckerberg to use PHP was that he already new how to make a website with PHP. There are plenty of times when way faster code doesn't take more time to write and sometimes even faster to write. Depends on the knowledge of the programmer.
6:44 it seems to me you’ve shown that the answer is, “No, pre- and post-increment have the same performance, at least in optimized code, and provided the result of the operation is ignored.” Thank you for investigating this question and making the answer available to everyone. You’ve saved us, collectively, a great deal of work.
Your stuff is fantastic. These are all the things I have been advocating for 15+ years. You present it in a much more accessible way than I ever did (corporal punishment and yelling :) ) I am now having my team watch all of your videos - Thank you and keep up the great work! I also liked how you touched on YAGNI - Young devs can get caught up in that as they start writing better code
I just discoverwd your channel recently and I love it! I feel like there is a gap in content for intermeddiate programmers like this, and this channel fills that gap beautifully with really useful topics covered in a streamlined and neatly explained way. Thank you for providing us with that :)
I recently learned, that thinking about memory layout is much more important than I thought! I was working on some numerical calculations and tried to do some optimization. The first thing I did was flatten some nested vectors to just normal vectors to improve parallelization. This sadly did not result in as much of a speedup as I hoped, in some parts it even slowed the code down, despite it making better use of all available threads. Then I did some more digging into how the memory is layout of and how the vectors are iterated over and noticed that it was very impractical for caching. Thus, I just shifted the memory layout around a little bit without changing anything about the algorithms used. This alone brought me a speedup of over 50%. I never thought that it would make THAT much of a difference, but I am glad I did it :)
@@theodorealenas3171 (can't say for the original commenter, but) when working with continuous data, the most cache-friendly way to iterate over elements is sequentially. As you access an element, the next few will be loaded in cache (cache controllers detect memroy access patterns + data is cached in chunks), and access time for them will be near instant. This can be illustrated with a 2D array (say, a matrix of floats) stored in memory line by line. If you process it line by line, it'll be fast, because the next element in a line will also be the next value in memory (and thus will benefit from caching), but if you iterate over columns, almost every memory access will be a cache miss and will take very long time in comparison, despite the algorithm being essentially the same.
@@john.darksoul I think what they did is they aligned the beginning of the data with a memory block. I think that's what they're referring to. So one load operation brings a good number of the data you want, and I wonder, how do you typically express that in non assembly code
@@theodorealenas3171 It could have to do with a condition being optimized with branch prediction. The cpu will typically execute the first instruction(s) of the loop while it checks the condition. In cases such as (i < vector.size()), the condition is always true until the end, meaning every loop but the last one is sped up a bit. But if it's sometimes true, sometimes false, it will be much slower than it could be, such as in a case where you have different control paths inside a loop.
@@theodorealenas3171 yeah, well, it would make sense if the amount of data is really small, i.e. fits in a single cache line, but for large amounts of data it wouldn't make much difference (if at all) I believe In C11, there's aligned_alloc, which allows you to specify alignment in bytes, and some platforms have their own functions that do similar thing (_mm_alloc, posix_memalign etc)
I just recently discovered this channel and I have to say it's amazing. Thank you for the useful and beautiful content ! As a French engineering student I had classes about algorithmic complexity (which is a bit painful but really useful), and I think that it belongs to the 80% moves along with data structures. A classic case study is sorting arrays/lists of numbers, and the sorting algorithm can make a huge difference when working with large ones.
to me pre increment vs post increment isn't about performance but about code clarity and intention if i just want to increment i'll use pre increment, if i want to iterate on something while actually using the post increment's functionality, i'll use post increment
I work as a runtime performance engineer in the games industry, and I can say without a doubt that everything in this video hits home so true. The virtuous cycle of my job is: - Have a KPI of what we are trying to hit; the performance GOAL - Measure; - Make sure your measurement is correct.... - Form a hypothesis of what's 'slow'; does it actually miss the performance GOAL? - If it does miss the goal, take action - execute loop until you hit the goal (but always remember you can ask 'is the goal valid') Rinse and repeat, and always bias for not spending infinite cycles in this loop.
im developing a game and that concept of adaptability is very worthy for me during the experience of crafting a complex system, may the velocity is gained over time while you focusing full on adaptability
Hey, really enjoy the videos that you put out. Really liked the dog walking part because i have that constantly... Trying to make a system highly adaptable and adding complexity to where it doesn't need it. Making the development 5 times more. I'm now at the end of a 2 month project that should have taken 3 weeks... I don't know how to balance speed and adaptability
For the previous videos I was like "Yeah, that's also what I try to do and tell people all the time...", but this one caught me. Thanks for that perspective.
I’m personally in the ++i camp. Not because it’s faster, but because it conveys a meaning to the reader. If I read i++, I always subconsciously have to consider the side effect of saving the value, doing some operations with the copy, possibly returning it from a function, then incrementing the value. With ++i, I don’t have to consider that possibility. Just increment the number, and use that, maybe. Having to think less about something when reading someone else’s code is worth it in my opinion.
My general strategy: Start with the simplest and most intuitive approach, then if it's too slow or memory intensive, figure out something better. But also, there's a difference between simple and stupid. Don't invoke premature optimization as an excuse to write stupid algorithms that are blatantly unperformant.
Your array vs hashset example reminds me of something I ran into during my limited time coding: As a default I used array and += to add elements to it. This worked fine in test but was slow in production. Switching from array to generic list and using add() made it way faster. Further testing showed that array was faster for low number of elements (which was my test data), but when it met real data (up to 600k elements) it was inadequate.
I have a pathological fear of the ++i operator, and here's why (story time): So early in college, I did an assignment with someone who insisted on writing the most hyper-optimized code, no matter what. When our professor told us that readability was one of the grading criteria, this guy spent the entire practical session arguing with him, while I was desperately trying to just get him to work on the assignment with me. The professor tried to explain pretty much the same things as this video covers, but my classmate just wouldn't hear it. So after that lab session we agreed to both do a part of the assignment. He'd do the first half of it, and then I'd finish it and turn it in. I'm honestly not sure why I didn't just ditch him then and there, because of course he ended up sending me this arcane bit of code that didn't even work on the evening when the assignment was due, asking me to pretty please debug his code and also finish the assignment on time, with only a couple of hours left. I guess I was young and naïve, lol. Anyway, I ended up redoing everything from scratch and pulling an all-nighter. Luckily my teacher wasn't super strict with deadlines, cause if he was I would've just failed the course right there because of this goon. I didn't bother putting my classmate's name on the assignment after that. He apparently went to complain to the professor and actually managed to get more time to finish the assignment by himself. And then he quit the course a few days later because it was "too stressful". So now, whenever I see someone write "++i", I get Vietnam war flashbacks to speedrunning a college assignment at 4am.
This is the first video I see from your channel. I rarely subscribe to channels, and even more rarely subscribe after watching only one video. You've just got a new subscriber. Great content dude
We could also add that any developer should never waste time to outsmart the compiler. I don't write critical software (embedded or real-time) so I'll always favour elegant and easy to maintain code over "optimized" code written by a human.
Your videos are great ! A college shared it today at work and it's funny that both of us found your channel :) Really good content, well edited and with a lot of tips to develop better. I hope that your channel will grow, you earned it !
Nice video! Though a more complete quote of 0:06 is rather : " Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3% " Other than that I find your videos better and better, both in content and presentation. Have a great day!
Your channel is pure quality dude. Btw once upon a time I was bored, so i wrote a hash map in c++. It took me like 15 minutes to make a primitive hash map. Then I benchmarked it agains std::unordered_map. My code was ~100 slower than the bulit in data structure ;D
I love optimization. Like, really love it. And not just for performance; sometimes for aesthetics, sometimes for brevity... messing around and pushing "boundaries" is great fun. But if I don't know that the code will even be around in a week, it's hard to justify spending time on it other than for pure enjoyment. Some of the code I've written has been so simple that a caveman could poke a stick at it and it would be possible for him to adjust it; other code has been so convoluted that you'd need a Rosetta stone and a safari guide to decipher it and avoid the dangerous creatures lurking in the shadows. That iteration counter, by the way, was pretty neat. Buuut... as a standalone number it didn't really do anything useful for me. If possible, some real world examples would be neat, to put that number into context. E.g. "for all the people in the world using a calculator app, if they switched to my app it would take 150 000 years of combined use to justify the three hours I spent on this investigation" or something. Not saying that calculators would be using i++ a whole lot, but I'm just pulling things out of my rear end here. 😄
A sentence I stole online and sometimes tell my classmates is "If your code doesn't work, we don't care how fast it doesn't work". I think it's a great piece of advice.
My data structures and algorithms lecturer said something similar, an anecdote about him replacing a broken algorithm. The exchange was along the lines of 'But your algorithm takes 3 times as long as my algorithm!' 'Yes, but _my_ algorithm gets the right answer'
Make it work, make it clean, make it fast.
Correctness comes before efficiency.
Sometimes also people also say “I can make it run a bit faster if it doesn’t have to handle all these edge cases”. Or, in other words, if it doesn’t have to be correct.
A classmate of mine wrote a tree search accidentally with 2 extra unnecessary recursion calls and the professor determined that the code would eventually work, but it would take much longer than the age of the universe, so....... sometimes a bad algorithm is no better than code that doesn't work.
That's a clever adaptation of the shorter one they use often when training devs: "It's easier to make working code go faster than it is to make fast code work"
Here is a case I got early in my software development career. Our older software package was taking almost three hours to read the configuration files. My boss gave me the task to convert the configuration files to binary rather than human readable text, because it would be three times faster. I pushed back and asked for research time, because while there were a lot of large configuration files, but they were not THAT large! After doing my research, I found the real problem.
There were two files that contained data that had to be correlated together. Each entry in the first file had a key to a data entry in the second file, and they needed to be connected in memory via pointers before the program could run. They were read via functions that opened up the file, searched for an entry, closed the file, and returned the entry. It started by manually reading the first file to get a list of keys for every entry in the first file. Then for each key, it would call the function to open the file, search for the key, and close the file. It would then repeat the process for the second file.
My solution:
Open both files and read all data entries into memory. Once everything is in memory in two lists, iterate through the first list and match it to the correct data entry in the second list.
Loading times went from three hours to thirty seconds.
so basically you move the opening of the file in front of the iteration and that got you from 3h to 30seconds ? that's awesome
@@owlon225 Yep. Because then I only needed to reach each file once instead of n times.
points at ur icon
Nice :)
Had similar problem but on my small program to automatically search for counterexamples for math problem. Having to constantly open and close files is SO slow
I work with statistical software and I am pretty sure I can say: 100% of my unnecessary performance issues were based on badly written loops that repeated something unnecessarily. I think this is another such case: the file opening and closing was repeated for no reason, and probably even in an explicitly programmed loop. In my use cases, everything that is called only once can be utter garbage in terms of performance without anyone noticing. In your case, this would mean that a function call that opens the file in 100 ms would be equally as good as one that opens it in 1 ms. Only when you force your computer to repeat something that is unnecessary a million times you are able to reach time frames that can be perceived by a human. So my 80% move is to try and move as much as I can out of loops. In your case it would be the file opening and closing.
One lesson that always stuck with me since college was that software engineering doesn’t just have one time-cost. It has three: runtime cost, development time cost, and repair/maintenance cost. Which happen to be a perfect allegory to the performance, velocity, adaptability triangle
That's a great point
Thanks for sharing! Great lesson
This video makes me feel im over optimizing and making too much effort on adaptability.. on a real estate full stack app im making...
wich probably wont have more than 500-1000 objects in the database.. and maybe 10-20 images per object...
just made me think i should be more optimizing for the images, rather than the app itself. (wich will not have insane traffic anyways)
@ Actually, it's 3 points... Because... Triangle?
I will get my coat.
If it takes (me) more time to optimize it than any and all users of the app is ever going to get back, then (I) don't optimize.
This brings back a memory from grad school. A co-worker of mine insisted that the time to load data was too long and so he cooked up examples that showed the worst case, plotted out on a linear graph and everything. None of our problems used data sets big enough to get into the "bad region". In fact, all of them were small enough to make the normal method a bit faster, which of course was hidden down in the corner of his non-logscale graph. It wasn't even worth the time to have the meeting, not only because he proved himself wrong, but also because our programs would spend a couple milliseconds loading the data, then process it for minutes to hours. Optimizing a non-problem with something that would have made it worse. Great job!
Was he not aware that the data loaded in a few milliseconds?
@@demonslime I guess he was thinking that we could optimize the neural network training down to a few milliseconds on large data sets. In the mid-90s. A very optimistic expectation! 😆
Besides "premature optimization", there's another thing to avoid, which is "premature pessimization", or absence of performance-critical thinking and writing poorly performing code without thought. If performance is never on your mind during initial development, you're going to have to do a lot of work to get any amount of performance out of your application later. So some degree of performance-critical thinking *is* a good habit to form. Knowing what patterns to avoid from the get-go will speed up development, rather than having to double back to refactor these poor patterns regardless. I feel like "premature optimization is the root of all evil" is often used to argue that performance doens't matter at all, which is not an absolute truth and it's often more complicated than that. It's better to judge on a case-by-case basis. But don't get stuck in conversations about pre-increment or post-increment, because all experienced developers know that it really doesn't matter in the vast majority of the cases, and those are typically not the optimizations that come up in code reviews.
Yeah I'm glad he talked about macro level decisions and micro level performance considerations. The "premature optimization is the root of all evil" is due to people making code unreadable and hard to maintain for seemingly vapid performance gains, sometimes even things that the compiler would have optimized for you already, like function inlining. It's the macro level decisions such as choosing the right data structures and database designs that are really important. For example, incorporating parallelization into a design to increase performance is extremely hard to do after the fact, since it relies heavily on the structure of your data to be easily parallelization and minimize shared data.
In a way that's part of adaptability. You want to design the software so that performance can easily be improved later if needed.
Yeah totally agree with this. Kinda extreme example but I once took some code from taking many hours to run to less than a minute. It was saving function outputs to disk and then loading them elsewhere to get the output, this was being done inside of a function being numerically integrated to simulate forwards in time and so it was being evaluated a LOT. In this case just knowing the order of magnitude performance difference between keeping results in memory versus the time to read and write to disk would be useful for the original author to have realised for preventing this kind of extreme performance pessimisation.
It’s not that performance doesn’t matter, it’s that it’s often unintuitive where the performance bottlenecks will lie.
This * 1000. Using the right algorithms and data structures will get you most of the way there. Instruction-level optimization is something I have super-rarely done outside of game engines and embedded devices. Hand-written assembly code on an Arduino can yield measurable gains, but on a modern PC (or Mac), the compiler does a better job than the human 99.9% of the time, and when it doesn't, it's often because your high-level code is a mess.
To be fair, people take "premature optimization is evil" too close to the heart and stop tinking about performance altogether. By the time people start noticing, it becomes "a death by a thousand cuts". I worked on a codebase like this once. Higher-ups kept complaining about things taking too long, but we couldn't pinpoint that to any particular part of the code.
Yes, it can be used as an excuse for complacency. Often times we have multiple implementation choices with similar complexity/maintainability/implementation effort - being aware of the best option there is NOT premature optimization.
I have literally refunded games because they were so unoptimized so, especially in gaming, your resulting frame rate is exceptionally important to the end user.
The real sentence in its context is more like "'premature micro optimization is evil"
which makes a lot more sense
@@ChaotikmindSrcthe real sentence is basically "premature optimization in ~97% of cases is bad, but in the remaining 3% it's very important"
@@Ehal256 I disagree, premature optimization is good, premature MICRO optimization is evil
At work, a customer filed a bug because our distributed system didn’t converge it’s state fast enough. They would make a change in the controller and wait like 2 minutes before converging. This was a pain for them because it was a container solution, so a container not coming up in 2 minutes ==> k8s would shut it down. We would force their containers to go into an infinite loop or trying to come up, being too slow, and getting killed.
So I spent 6 months developing profiling tools and optimizing, eventually getting convergence down to 20 seconds from 120. But then another (smarter) guy at my work added a “readiness gate” that basically just told k8s “hey this pod isn’t ready yet so just wait”, and that immediately solved the customer problem. It took him a few weeks to do it, sure, it wasn’t instant, but. It was the RIGHT solution. Optimization was not.
Even when optimization seems like the right solution, even when customers and PMs were literally asking me for it, it was not the right solution.
That’s my premature optimization story :)
Amazing story, thanks for sharing! Sounds like you're working on some exciting stuff!
The take away of course is - identify the problem and only then fix it. Listen to the customer carefully. The problem was not in the speed department although speed can be the solution to it, it's just not the optimal one.
@@MTAG if only that was as easily done as said!
@@MTAG there's a reason the customer is just that. A customer. Not a developer.
🤔 That sounds like a great lesson in understanding when optimization is needed and when it's not! 💡
~ with ❤️ from repliesgpt
I always struggle with thinking "what's the best way to implement this" rather than "how can I implement this", ending up making overly complicated stuff instead of just getting something to work and improving it afterwards
This video has been really informative!
im the opposite actually. generally i try getting it to work and if it does at all i dont try making it faster because im too fuckin lazy to do so
@@tristantheoofer2 I'm the opposite. I don't try to make it work and if it works by accident I make sure it stops working
I like to think of it as how can I write this so that everything can be flipped on it's head in 2 lines gracefully? Maintainability more often than not is the biggest pain point in real world situations. A slow app is better than no app, but spaghetti will turn your head into confetti. Not sure how much sense that made. Ah well.
When working with robotics applications in python and c++, the workflow almost always involves
1. write something quick and dirty in python
2. grow the code to a point python's poor performance becomes a problem
3. rewrite everything in c++
Most of the time this is all it takes to achieve performance requirements. The reason I still find it useful to do something in python first, is because it helps shaping the features that a program needs. If you're fluent in both programming languages, it's fairly easy to translate between them.
Honestly, I've found out that you basically have to rewrite your code from scratch at some point or another. It basically impossible to write code without gathering digital debt, 1 dubious design choice makes 80% of the code this dilly-dally bubblegum workaround due to you not formatting the input enough or something. In rewriting, I typically notice how I can turn my 1000 lines to 100, partly because I won't use most of the functions and partly that I for some reason do practically the same thing in 7 different ways over and over again.
from what i hear rust is good for that
Where I worked we had a mix of both. Python for accessibility and fast development. All interfaces and wrappers were in python. The resource intensive code was then written in c and the library used within python.
I tend to do the same with C#, client ask me for a c# program, performance become a problem because of the growing features/requirements, end up rewriting it in c++
If I have a piece of code that runs slow in Python because of large data sets, I convert it to Rust using GPT.
One thing you didn't mention here is that not all performance problems are because your code is taking a long time to do a thing. Sometimes your code is waiting on an external resource (like a database or API) to do a thing, and the optimization is to make fewer calls to it and rely on it for as little as possible
or do things async
Definitely "do less work" is a huge part of performance
effectively most performance problems are something is waiting. Chache misses for example let the CPU idle for something around 200 cycles (or more, depends on the CPU). Having too many of them certainly makes things slow.
I suspect this would be discovered when running the profiler, or?
In my opinion, part of the problem is when we think of things like database connections and API's as "external", and that ensuring speed there is someone else's problem.
While this _can_ be true for API's provided by other companies, internally in a company, the comunication between components is often the main cause of many kinds of problems, including performance.
Developers need to understand the costs of accessing components outside their own code base, whether that is a database, API, local storage or some shared filesystem/cloud storage.
While making fewer calls generally is a good idea, and should be a primary goal when setting up an enterprise architecture, trying to less on "external" resources that can make your code both bloated/unmaintainable and slow in some cases.
Often, the main source of slowdowns is when developers insist on accessing databases or API's to retrive data a single element at a time, as if they're hash maps. Let's say some customer is logged in to your system, and wants to see their purchase history. Let's say your database has an order table and an order-line table. If you first transfer the order table as a collection, and then loop over each order-line, it could take 1-2 seconds to retrieve the order history for the customer, somtimes more. If, on the other hand, you join those tables and return everything using one query, this can be cut to less than 50ms.
Not only does this speed up your service, it also reduces the load on the DB, speeding up everyone else's work, too.
The same goes for API's, except this requires even more cross-service design work. An API that returns only atomic information elements can be extremely slow, and can be sped up by orders of magnitude if it exposes endpoints that allow retrieval of larger collections of data at once.
I feel this is one of the best channels about programming that I know of. Instead of asking to give us more content, I want to advocate for the opposite: please keep up the quality even if it means spacing out videos. Nevertheless, thank you for doing what you're doing.
I agree completely, I like the quality and don't really look for speed in the people I watch.
@@cyrileo tf?
@@kiyu3229 what?
Ah, so you want his videos to have better performance/quality at the cost of velocity, I see.
true that.
Cannot stress the usefulness of profilers. I have been intimidated by them my entire career until I took the plunge to diagnose a problem with our React unit tests being very slow. Turns out the arranging and acting were not the bottlenecks, but the special assertions we were making. Totally took me off guard and was such a joy to find out what was really going on.
Could you elaborate on what profilers entail in this scenario?
What are examples of react profilers ?
Yes please
Profilers are useful if you have clear bottlenecks, but very often this isn't the case. Imagine your program is bubble sort, and you decided it will be like that because you "don't want premature optimisation". Well, now it's slow for large data sets, and no profiler will help you, and you'll have to rewrite your program from scratch for no benefit. It pays to be very careful at the beginning, don't listen to people who talk about "premature optimisation" without understanding what it means.
To be fair, profilers tend to have really poor, unergonomic interfaces imo. I was spooked by them for a long time too, and I still haven't found one that I really like. But good grief have they saved me a lot of time, now that I know how to use them.
This is the kind of content young programmers need. The dogmas you're taught in programming courses and college aren't always the best fit for your project, and people need to be pragmatic. Focus less on ideals and more on doing things that work. There's a balance to everything.
Completely agree. From my experience as you approach 2 comp sci experts the number of unsolvable problems reaches infinity. For those who do not have a background in programming, this problem can be expressed as a system of equations which simplify to approximately 1/(1-x) where x is the number of needed comp sci experts per organization. Sorry if the math is off, I'm just a PHP developer.
One example of readability over performance is the For... Each loop.
It's very readable, because you can explain with the variable itself what is happening.
But it's also slower than a regular for... next, which is then in return, way less readable.
I think if the For... Each... needs to be optimized because it is the main thing slowing things down, maybe the language itself used is wrong. Maybe programming directly in assembly is better. You can control each cycle completely.
@@AntiAtheismIsUnstoppable Until you realize you suck at assembly and the compiler just did a better job than you... lol.
@@3lH4ck3rC0mf0r7 It was also half way of a ironic comment, but I think there is a logical limitation to performance to many languages, such that if you want them to be faster, you have to need hacks, which is where you might want to think aout switching language instead.
I remember back in the old days on the C64, when we wanted to make really fast for loops, you replaced the 0 with a dot
for i=. to 255:poke 1024+i,160:next
is VERY fast in C64 basic, because it's literally run like machine code, because of some bug. But it's also not that readable, and I could just as easy have made a dedicated MC routine for it and calling it using SYS.
@@terraformer316 And without destroying the sourcecode too.
Great video! Wanted to emphasize that "premature optimization" does not equate to "do no optimization", and I think your video hit the big one with focusing on data structures. We can be critical thinkers and have reasonable predictions to know when we should do those big improvements early to save a lot of pain later. It's always a gamble when doing things early so if you're not pretty sure, then setup your design to learn as fast as possible. Knowing when it's a "problem" is best discovered before it becomes a really big one.
i love CodeAesthetics videos, its visually pleasing, the content is well made, i can see him getting 1 milion subs in no time
This!
Let's be honest he only has 6 videos (including this one) and he's got 151K subs so I wouldn't be surprised
Same 👏
and i forgot to sub to him shit
kinda sad he doesn't credit the ones who came up with these concepts but I agree with the visual part
Great video for javascript coders who don't want to learn how computers actually work. For anyone working or interested in working on software that requires good performance: No, optimization is not an afterthought that can be "easily patched" at the end of the development cycle. Architecture of data and the way it connects together in a complex project plays a massive role in performance, and this is not something that optimizing compilers are allowed to modify. Most stuff in this video is false or only applies to non-realtime software. Keep looking for actually valuable information, it's out there.
I haven't heard it expressed this way, but I like how you laid this out.
There are two stages of optimization:
* Best practices
* Best implementations
We typically start with the best practices and hope that they're the way to go. These optimizations are things like array type, design patterns, etc. For most code, it's fine and the way to go for readability. Best implementation comes into play when your "likely-optimized" code is part of a loop that has to run at lightning speeds. We never know about the best implementation until we have a problem.
Which tools do you use for optimization?
Keep up the great content!
this is exactly it! much better to spend your time considering why you're looping through something so much in the first place than spend it optimizing the speed of the loop, up until you're completely out of other things to resolve
I'm so glad this channel isn't rushing to produce videos but each video is carefully researched is the best thing. Probably the best channel to understand programing as a concept.
You are one of the most valuable channels I have subscribed to. Please keep up the good work, appreciate it.
Totally agree
Thanks for the kind words 😊 I hope CodeAesthetic can continue to help people improve their programming skills 💻
~ with ❤️ from repliesgpt
For a class in uni I was writing a basic synthesizer in C for a custom embedded system developed by some engineering students from prior years. This system didn't even have an FPU so everything needed to be in fixed point.
At the beginning I was separating everything into functions in their own C files, but when we started to introduce DSP into the mix, I started getting pops in the audio output.
I had just read Game Engine Architecture by Jason Gregory and learned about the instruction cache and stack resetting from function calls, so I moved every function to be either inline in a header or as a really long function style macro. And it worked fine, I was able to play multiple notes with DSP without any pops.
Similarly, at one point in uni a friend was writing an HRTF audio plugin and found that by making the entire audio processing block in one function with no calls to std vector subscript, it went from 1 voice to 99 voices playing concurrently.
In the world of real-time audio operating at 48 kHz, unfortunately code aesthetic is not allowed as function calls will actually slow things down enough to be noticeable to the end user. But for other real time applications the frame rate is almost never high enough for "the overhead" of a function call to be noticeable.
But the thing is: You can always separate things and create a macro to join them later, if need be.
This way your code is readable AND it's fast
If you write a function as static and call it only once, it is basically guaranteed to be inlined, so separating out code into a static function in the same file has no performance cost. (static free function, not static member function of a class)
I've always used the theory of critical sections. Some parts of your code will be within a "Critical Section" while the rest is in a "Non-Critical Section". Critical sections are areas where performance matters because it is code that will be executed either over vast amounts of data, or within a continuous loop (eg. a game update/render loop). Non-critical sections is code that is executed either infrequently, or due to an explicit event. Critical sections will typically only exist in about 5% of your code. Everything else is non-critical and requires little to no optimization outside of what you get for free from the compiler. The critical section code does require optimization, but it should also be carefully designed to try and maintain readability and maintainability. If either of those are broken, documentation can be provided so that future developers can study and understand why and how the code is doing what it is (such as referencing the mathematical formulas being used to calculate a complex solution). Again though, 95% or more of your time is spent working on non-critical code that requires readability and maintainability over performance and does not need to be optimized at all.
For example, I built a simple menu system for a micro controller's LCD screen. This code is not efficient at all, but I don't care because the UI only ever updates when a user pushes a button, so who cares if it takes a whole 250ms to refresh the screen. What I care about is readability because UI can be complex, even at a small scale, and I want to be able to fix bugs as easily as possible. What does matter for performance however, is the sensor monitor which needs to be highly reactive and efficient but only accounts for ~10% of my actual code base. The UI is ~40% and the rest is File/Network IO and business logic that all runs slowly because it doesn't have to be fast and is limited to the speed of IO.
The term Critical Section is already used in programming to refer to code that should only be run in a single thread at a time, but yes. The concept you are describing is generally referred to as a "hot path" or "performance path," as in, it's the codepath that will be run the most often so needs to be the most performant.
Yeah, if you have a section that is super slow due to a file io, for example, it doesn't matter if something else around that takes 10ns or 20ns to run. Sure, you can speed up that other piece by 2X, but it's not going to make a difference overall.
But if that function is run a million times in a performance critical section, yeah, every little bit counts.
Tbh even in the cases where I need optimization, I very rarely find myself needing absolutely percect optimization.
More often I'm like "oops, that is a bit slow, lemme just cache some of that data and now it runs 10x faster"
Unless you're working on something that truly pushes hardware to its limits, basic optimizations are likely all you need 99% of the time.
There is the term “critical path”, from project-management theory.
But yes, a typical program’s use of CPU time does tend to follow a pareto distribution, and concentrating effort on speeding up the “hot” bits will help more than elsewhere.
Fun fact about the post-fix vs pre-fix syntax.
There are legit some languages where ++i will ALWAYS be faster than i++. It is indeed because they make a copy, but the copy isn't used, and the compiler is too primitive to realise this and remove the temporary store.
Outside of such cases, the compiler (or JIT if you swing that way) will pretty much always be smart enough to eliminate any difference between the two.
The thing is though, you can just always use ++i and not have to rely on the compiler doing the right thing.
computers and compilers were not always so fast, been doing ++i for so many years that i++ looks goofy
My issue with this has always been, if you don't know the difference between what pre- and post-increment do, you should! And if you know, why wouldn't you use the correct one? I despair of my fellow coders who think they "know enough now, and don't need to learn anymore"... _never_ stop learning...
Yes. ++i will ALWAYS give you the good case. i++ is not guaranteed.
It's totally appropriate to call this out in code review.
Great video! I like Casey Muratori's "philosophy of non pessimization" as a framework for macro optimization.
was going to comment that
That videos hits direct in my heart. I am was nearly targeting fully on performance and adaptability, which killed me often adding new features. I ended refactoring for the new feature, but did the same again. I am currently in the process of realizing it. And stopping to over-optimize. I now often have small functions where I know I did a simple solution, but if it should be a performance issue which it never will be, I could optimize it in a couple of minutes. I already feel how my productivity rises, which also rises my happiness and motivation to do more and nice features. Thank you for the video, it confirms my thoughts and motivates me to go further!
I agree with basically everything *except* the reasoning stated for why post/pre-increment doesn't really matter. My reasoning for having a preference in every situation is that there is a definitive difference. Not that that difference has a huge performance or functional impact, but that it is defined at all. When someone 10 years from now is reading through my code (fairly common for AAA game studios) and sees a post-increment, I want them to be sure they can say "it's there for a reason" and that they *don't* say "well, post-increment is slow so I'll just switch it to pre-increment" and inadvertently create bugs.
My issue with people defaulting to post-increment because "the compiler will just optimize it to pre-increment anyway" is because:
1. It won't *always* do that if they're not truly synonymous, which can lead to unexpected behavior; and
2. When we write code, it's not just for the compiler, but for future readers (including our future selves) to understand our thought process when we were writing it.
The way I think about it is that if increment operators weren't so short, would we make justifications for it not really mattering which was used? If instead of code being
`for (int i{ 0 }; i < 100; i++) { /* ... */ }`
it was
`for (int i{ 0 }; i < 100; math::increment_integer_and_return_previous_value(i)) { /* ... */ }`
I think it'd be obvious people's immediate question is "why do we care about the previous value here?" and that the answer of "we don't, but it doesn't matter because the compiler will optimize it" wouldn't be satisfactory.
Aside: there's also a greater issue of user created `operator++` implementations which incur side-effects (which may differ between pre and post increment). There's a whole discussion to be had about if that's even good practice (it's not, please don't make operators with side-effects, I beg you), but since we live in a world where millions of people think OOP is a good idea I can't make any assumptions about it.
That's a really good point. The difference between the two that matters isn't performance usually, it's intention.
I was never a fan of the ++ thing since from what I understand it's a holdover from assembly, where many CPUs had a specific instruction just to add 1 to a register (typically it was called INC for increment.) Nowadays, not every CPU has a dedicated INC operator so I really don't see how
"int x = ++i + 7" would compile differently than "int x = (i + 1) + 7"
Technically every operator has side affects because it advances the instruction pointer 🤪
@@williamdrum9899 "int x = ++i + 7" and "int x = (i + 1) + 7" have different answers. They would always compile differently. the later is 1 larger than the first, because ++i pushes i before adding 1 onto the stack while the (i + 1) pushes its result on the stack. Which is incredibly valuable in programing.
@@Diamonddrake I dont get it
I think an important aspect to making these decisions is being well-versed in computer science in general.
E.g., even if you haven't written a compiler, the concept that i++ and ++i have the same SSA form after collapsing unused variables. So, they will obviously compile to the same assembly after SSA optimization.
Idea for video: how hard part of coding up a large project from scratch is often not implementing features, but rather the software *design* ; how all your code is laid out. Getting a good design in place at the beginning is often instrumental in implementing features in your project later down the line without losing your sanity.
he's copying these from a book, he's not making these things up
@@matthias916what book?
@@mv2e19 Most of his videos seem to come from books like "Clean Code", just to make sure I'm not misunderstood, I like his video's, just a little disappointed that he doesn't credit the source and makes it look like he came up with these concepts himself
@@matthias916 These concepts are well known in the CS field though, it's not like those books were the first to discuss it.
@@Dobby-gy5uz Well, most of these concepts were formed by the authors of these books, people like Martin Fowler and Kent Beck
I think all of those topics you talked about (function call overhead, i++ vs ++i...) are things that a sincere developer will look at at some point in his carreer and is something which brings you ahead and sheds light on corners of computer science that help you understand things better. That is the stuff that matters a lot as it is the grounds where new concepts are built upon and new ideas sprout. Investigating the ++i vs i++ case has thought me a lot about how cpus work. It is a very interesting case as actually i++ can be faster on some processors (CISC family) due to instruction level parallelism, whereas on RISC processors you're probably better off using ++i. I never quite understood the fear of writing optimized software... If you make it a habbit it will come naturally and not hinder the development process. It will require some thinking at first, but if you want to stop thinking you're probably better off chosing a different job lets be honest.
Profiling your code is something that literally only few people do. People give talks in conferences about how to make things BLAZINGLY FAST, but they forget to tell that the approach suggested may not fit for every usescase.
code maintainability > performance (Most of the time 😅)
I profile my code and usually it just shows that I write shitty SQL statements.
@@patrickspiegel5524 no one knows how to write efficient SQL queries, even chatGPT
@@patrickspiegel5524 Well that's an easy problem to solve at least lol. Unless the problem is how much data you need back, then you're kinda f'd.
@@patrickspiegel5524 writing SQL queries is tricky, specially when you're handling large data sets. There is a point where using regular 1:1 indexes stops being beneficial and instead slow you down due to IO bottlenecks. There are other types of indexes that can help, and sometimes it's somehow more effective to do plain joins (using disk and/or RAM) when there's a lot of data to cross over. After working with Oracle databases for a few years, I've grown used to verify the database plans for my queries once or twice before settling down with them, sometimes I end up having to use hints in order to instruct the database engine on how to poll for the matching rows at each table and in what order.
I profile my code but that's because it allows me to see if I only "mildly screwed up" or "screwed up really badly".
If a piece of code should run basically instantly but takes a few seconds, I know something may not be right.
But if it takes a few ms longer than I expected, I generally just slap a TODO there for when I don't have anything better to do.
I hate when people missuse this quote. The full quote reads "We should forget about small efficiencies, say about 97% of the time. Premature optimization is the root of all evil, yet we should not pass up our opportunities in that critical 3%." This is rarely done, what he ment with optimization was things like cycle counting, cpu specific hot path definitions, cacheline fitting of memory structures. He did Not mean oh using layers upon layers of language abstractions to accomplish something 5 minutes sooner is a nobrainer. Don't **pessimise** your code, write reasonably performant code from the start, you'll find it's much easier to read and much easier to work in an environment that only does what it's supposed to rather then a bunch of busywork that you have to bring out the call stacks for just to find where the smalle amount of useful work gets done at all. That triangle is a false uh trichotomy you can easily have performant code that has full velocity and is reasonably adaptable, simply write code that does exactly what it has to accomplish it's task and if you need a more general solution you write the slow path next to it. you can always add new features this way also.
As a very early stage programmer, the caution against over-optimization is very timely for me. I'll start working on a function to accomplish the same task over and over, and make that function adaptable enough to be called from anywhere and fit a variety of circumstances....before even confirming if the task itself will work for my needs...
I guess I've had to redo code enough times that I want to do it right the first time. However this video helps me see that going back to optimize is much easier and less frustrating than having to scrap hours of work because it turns out none of it was necessary to begin with...
I've been programming for years (well, only really 2 years, but still), and I write dog-crap garbage code to begin with, for prototyping. I try to make it as understandable as possible, but I don't go out of my way to make it readable and intuitive, and I don't do much to make it extendable or flexible.
Once I am done prototyping, I refactor and rework the code to make it good enough, and once it's good enough I stop there.
Excellent insight. I think the keyword here is working iteratively.
I hear ya! 😉 Premature optimization can be the devil for sure- we have to remember to step back and look at the bigger picture first. 🤔
~ with ❤️ from repliesgpt
Gosh I feel that one. But I've been programming for almost a decade and it's still my preferred style. Given, very little of that time has been doing it for corporate, so we'll see if that changes.
But the joy of working with a pleasing, adaptable system that resembles an engine within itself is just too satisfying to forego, so I maintain the view of sacrificing dev speed for architecture.
7:14 "Will my startup fail?" damn, Code. You hit the place that hurts the most.
Man you already helped me and especially my colleagues so much when they interact with my code
I regularly rewatch all your videos to keep writing better code and I recommend you to every coder
The biggest help you gave were the nesting video and the readability video (I don't remember if the latter one was a whole video but it really stuck with me)
Keep making such great content 🔥
I really like this triangle analogy. I'd expand it like this:
- at the beginning you have nothing, so the velocity is the only important aspect of development
- as you go back on some features, you find things you often modify and repeating code and optimize those parts to modify them easier, so adaptability becomes on par with velocity
- as you finally approach the final design, you go back on the features you have and might want to tweak them to desired values. If this is a game, then at this stage you probably have a feature-complete vertical slice of the game and want to make adding new, non-coding elements like maps. models etc. easier. So adaptability becomes the most important aspect of the development and new mechanics happen sporadically as need or inspiration comes
- finally, at the end, you have the program you wanted to make. It runs horribly and hopefully you have a powerful machine to run it. Now you can find parts that are inefficient with profiling tools and make them faster (probably at the cost of extensibility). Most of your code will probably run once, few times during the runtime, once every few seconds. Optimization of anything other than most executed code is usually a waste of time. The only concern now is performance.
At the end you have an extensible program with few, rigid, ultra-efficient nodes that are never to be touched.
If you're solving any kind of problem that isn't already solved, the likelihood that you will run into a an issue that will force a complete redesign of large parts of your code base is high. Therefore, the main argument against premature optimizations is that most of the code they apply to will not make it to the final product, i.e. most of the effort you put in is wasted.
It's a different story if you're designing basically the same API for the Nth time and already know how it will go, more or less.
I don't think THIS is what my girlfriend urged me to research on TH-cam
I remember a particular problem in my code where I was trying to remove recursion, as it is generally avoided at all costs in terms of performance. I solved the problem with loops and tracking data top down, only to realize the data structures were actually making it take twice as long.
The real problem with the data structures is that they were allocating memory regardless of whether the loop would actually go down far enough for it to matter. Heck, often times the recursion wouldn't happen, and the end case was done in one function call! It truly was a waste of time after a generally quick enough solution which did what it needed to.
When replacing recursion with a loop + list data structure, I usually don't do it for performance but rather to avoid a potential stack overflow if the recursion has the possibility to go more than ~100000 layers deep. Because, depending on the programming language and whether you run on a non-main thread, the call stack might only be able to allocate a few megabytes (unlike Heap which can take the entire available ram).
On the other hand, you should always remove recursion not for performance reasons, but for the sake of keeping sanity of anyone who has to at some other point try to read and understand what the function is actually doing.
New videos please? These are excellent - I’m a full time engineer watching your videos on my day off. Nice work!
A funny and informative video. It's good to find someone who has a good balance of productivity and humour. Most other code channels lean more towards one or the other.
In many situations, algorithm choice is of enormous importance, and particularly in real-time applications. Optimization requirements may need to be be part of the initial design, so "it depends" on whether or not it can be considered "premature".
I will say that even if performance isn't worth optimizing in a lot of cases, there's still some basic things that can be worthwhile. For example, I'm an Excel/VBA guy, so you basically take any other language, and knock three or four orders of magnitude off the speed. For big jobs, it can easily take minutes to run a complex macro. And there's a couple basic things you can do to speed it up with almost zero effort - most commonly, turn off screen updating, and turn off auto-calculation (then turn them back on when you're done). That can easily be an 80% improvement in a lot of real-world situations, for a whopping four lines of code. (It occasionally causes headaches, especially if your code crashes mid-execution, but there's huge gains to be had).
So there's certain kinds of optimization that actually do justify premature use. But it's rare even in my neck of the woods, and...well, VBA is an adorable pile of 40 year old jank, so perhaps this isn't the most useful example for people who code in other languages.
I hear ya! 👂😎 Esp when it comes to big jobs, any improvement helps! 📈
~ with ❤️ from repliesgpt
Personally I don't like the increment example because postfix and prefix are virtually identical from a readability standpoint. It's a good idea to just pick one and stick with it for consistency, so why not pick the sometimes-faster one?
At any rate, premature optimization is a very important topic, especially when performance hurts readability. Or when there are actually no gains at all (I recently saw someone turn a bunch of JS string concatenations into array concatenations for "performance" - they didn't measure so we'll never know how the JIT optimized). Glad you covered it!
While it is not worth worrying too much about optimization working on the project, I still think it is a good thing to explore when available. Knowing how to make optimized code will generally make you a better coder in the long run.
The point is that people focus on it too much and think it's the only thing that makes you a good programmer.
If you're "coding" then sure, but a programmer needs to know how to write good architecture more than highly optimized code.
@@NihongoWakannai Software today is much slower than it should be. Hardware keeps getting faster but a lot software doesn't, because there are many programmers who don't care about performance. Or worse, some programmers don't even know how to write performant code.
@@YASxYT of course they don't care about performance, because it doesn't matter. What matters is finishing the project quickly and within deadlines, not spending 3 extra months getting extra performance that the end user doesn't even notice.
@@NihongoWakannai The user always notices. You need a new cell phone every 2 years because code keeps getting slower and it annoys you, not because your cellphone CPU slows down with age.
@@Diamonddrake No, the user does not always notice. No one notices the difference between a 1ms latency and 0.5ms latency. Users notice significant performance issues, not tiny ones.
I work a lot on math heavy code, there are a few steps I always try to implement to optimize it. Vectorization, swapping matrix axis and memoization. vectorization is by far the most effective and make the code more readable, the other two are very quick to implement and check. If the code is still slow profiling is an amazing tool, find the problem function and try to work on that!
At a previous position, we painstakingly went through each line and found that a function was being called for each element of a list and each element of the previous version every time the list was changed. So appending 3 to [1,2] was run 8 times for [1,2,3], [1], [2], [3], [1,2], [1], [2].
Guess someone's pulling pranks
The ol o(2^n)
I had a real problem for years early in my career wanting to "get it perfect" the first time. So I would craft the whole thing in my head and then write the code. As I eased into an agile, TDD (or BDD) mindset, it was a huge relief from stress and anxiety.
In unity, an interesting optimization i discovered is you shouldnt deeply nest rapid moving objects all in one parent. Because when a child object moves, it forces to recalculate bounds to the parent...imagine this recalculation happening frequently. Solution was to not parent moving objects. It looked messy in the scene hierarchy but hey at least it runs faster than having to parent them for "organizational" reasons.
The problem with this video is that it completely misrepresents the positions of literally everyone:
For the more performance oriented side, you're misrepresenting them by saying they care about menial micro things like i++ vs ++i.
For the "root of all evil" side, you are misrepresenting them by suggesting that they do literally anything at all.
In reality:
The performance side of the argument argues simply to not pessimize your code. Don't write poorly performing code just cause that "seems" to be the easiest. Don't blame your 5,000ms web request on "lol, that's just IO". We literally just argue in favour not picking extremely slow tech/algos from the get-go, as well as to actually measure from time to time. That's it.
The "root of all evil" side loudly advocates *to never ever even bother taking metrics of your performance, or even to think about performance characteristics at all*. This side argues that ALL PERFORMANCE MEASUREMENTS AND OPTIMIZATION OF ANY KIND ARE PREMATURE OPTIMIZATION. JUST THROW HARDWARE AT IT.
Just a quick chime-in on the ++i/i++ debate:
The reason I believe ++i should be "best practice" is that i++ can lead to subtle bugs, as it *post*-increments, meaning the result of "i++" is actually "i".
Which can easily be overlooked if it's used as part of a larger expression!
Therefore I recommend either:
- Use ++i everywhere.
- Never use i++ as part of a larger expression.
Or use Rust, which has neither (for this exact reason) and just do: i += 1;
:D
But I love doing "return i++" jk jk
i++; // make it a separate statement instead.
I was using i++ for years without realizing what it actually did. When I learned the truth, I was so shocked that I now exclusively use ++i.
I was once making a chat bot that used an API, and when you said it’s often 1 silly thing slowing everything down that really resonated, because I’m not sure why, but when I’d request something from the API, the response would be massive, and stripping away only the things I needed to store in memory improved speed and memory usage dramatically. But this was something I only cared for once I analyzed my memory usage later down the line
Adding a memory pool for when you constantly allocate and deallocate the same type of object but never need more than N at the same time was a nice performance increase for me.
You can even add a log entry for when you need to fall back to the default allocator in case you unexpectedly needed more than N objects.
in a big C++ project we focused on velocity and adaptability. We had to redo pretty much everything when optimizing for performance, because what we built initially was that bad.
NEVER leave performance for later. Try to make something that is 'good enough' performance-wise for starters. Otherwise you'll end up remaking everything in the long run.
I'd advocate that you did that project correctly. I don't think code should be held holy, but just trashed for the new, better version. It's difficult to optimize beforehand, before you know what are the bottlenecks. Also you will easily go into the optimization mindset, which is to delve into the endless rabbit hole of optimization, and done before you know what is important, can be a detrimental over engineering nightmare. Especially if you combine performance and adaptability just to notice later that what it is doing is not exactly what was needed, and even adaptability was wrong. Then you did a lot of work and used a lot of time, and still have to trash and redo. I think it is important to start with velocity, trash it, and iteratively add adaptability and performance when you have more information on the directions these should take, as these take resources.
@@techpiller2558 not really. On the contrary, experience has proven that it was way too difficult to optimize. Once a foundation has been laid and 10-20 projects are based on that foundation, changes are exponentially hard to make.
So yes, in principle you should optimize last, I agree, but in most cases a certain effort needs to be made to make something that can easily be optimized later to achieve some level of performance.
This is not a given. You have to work for it.
And I refuse to accept the mindset that "if it works, even if it is ugly, difficult to work with and slow as heck, it's ok".
While you shouldn’t be completely unaware of performance issues, you need to fully determine what your program is and then actually measure what your performance limitations are, and prioritize them
Also I should note, this is why it is so important to separate interface from backend. It’s certainly worth thinking a lot about what your interface should be, because it’s changing that which causes everything to break
@@grproteus I'd argue then that somewhere down in that line, too little trashing and redoing were done where it was supposed to.
Man you're becoming the best programming language channel on YT
Love the triangle, put into words what I have been thinking. My rule is just once a component/idea is used for the third time, abstract/generalise it instead of hacking the differences. I'm not into generalising everything straight away, as long as you have experience with how things should be structured, it's typically not an issue. Performance should always be on the back of one's mind, otherwise you'll eventually ruin velocity, although full optimisation should be done when weak points are discovered later during profiling (what you call micro vs macro optimisation).
I was working on implementing visual transitions in java on an individual pixel basis. I saw a bunch of videos and did research on branchless programming and that it was recommended for code involving graphics to be branchless.
I spent several hours converting all of my functions (while shoehorning in being able to multiply by the result of a comparison with bit operators) and only at the end I decided to test the difference between the new and old code. The old code was several times faster. I thought about it for a while before realizing the problem with branchless functions is that they compute all of the possible values and then throw away the values that we don't need. Some of my functions were making rather expensive calculations but only for a limited range of inputs and all other cases were either copying a value or returning zero. Making them branchless cause the simple cases to have a performance penalty that far exceeded any branch mispredictions that would have occurred.
What I learned was when trying to optimize a lot of similar bits of code, do one piece, compare it to the old, and then implement it for the rest if it is faster.
Graphics programming (at least with GPU acceleration) is really quite complicated, particularly in the branching case. Generally branch mispredictions and overcalculation only tend to happen if every shader unit in a block ends up taking different paths, but if they are branching on a variable in a common way that tends to work out fine. Also, GPU programming gets complicated by drivers, especially for nVidia cards, since those will often have optimizations built in for ways that a big game has done something, which means that what might look worse on paper may be faster because the driver has been written to recognize what you are doing and optimize it specifically.
There's a fairly simple way around the issue you described: do the simple quick operations in one loop and when you find data points that need the complex, slow processing put them on a stack (I usually implement the stack as an array with write always, conditional increment) then have a second loop that goes back over the stack doing the complex work on everything it finds. Both loops can be branchless without throwing away work (except the single memory write to the same memory location for every non-complex data point).
But of course there are plenty of cases when branchless code is not the answer - branches are only bad when they are not predictable, and any repeating sequence (up to about length 16 I think) can be predicted eg yynynnyn.yynynnyn.yynynnyn..... only when the sequence is truly unstructured does the prediction fail, like in sorts and searches. (at least in scalar CPU code ... AVX and GPU code is a different matter)
I found your channel a few days ago, and thank you very much for all the content. I have been a software dev since the early 90s, and most of my career optimization was NEVER an issue... until I started making VR applications. VR is the exception to the rule about optimization. Basically EVERTTHING about VR needs to be optimized to hit the NECESSARY frame rates
I think the problem with discussions about optimization or premature optimization often is so simplified (like the difference between i++ and ++i , we agree about focusing on readability here).
Quite often two nested loops can be replaced by a sort followed by one loop. That brings down cost from O(n2) to O(n log n). O(n2) is often too expensive, and O(n log n) is most often acceptable. I think this is good practice. In most languages sort is easy to use, and the code is often easier to reason about when data is sorted.
I think, when you write code for running on a server or writing code for a reusable library it makes sense to make informed optimizations (like the one above) immediately. When writing code that will run in a web browser the CPU cycles are cheaper and if the application is fast enough then it is.
Considering the hellscape that is modern day web performance, I'm not sure I understand how CPU cycles are cheaper on the client.
@@jasonpepas4290 If I build the system, and the web application, and host the servers, I pay for the CPU-cycles that run on my servers.
But I do not pay for the CPU cycles that are wasted in your browser, when you use my application.
I wonder what is the difference in optimization regarding i++ and ++i?
@@serhatarslan4138, it used to be said that one was faster than the other. It was purely about the generated ASM-instructions, and I doubt it matters at all on modern CPUs.
I agree with you. Given the difference between ++i and i++, modern compilers generate the same code using optimization techniques. These are cookies for compilers :D @gunnarthorburn1219
This is a GOATED video. In the end is about using your own judgement instead of applying rules set in stone. Who would have figured! Your channel is amazing. Keep it going.
ORMs are always a good place to look when you start having performance issues. DB queries often come out on top of the performance issues list, whether you are making too many of them or they are simply unoptimized. When used without care, ORMs often procude suboptimal queries and object loading problems.
This vid is so true!!! Thinking about performance when you even have a problem is nasty!!
The passive aggressiveness is epic! I love it.
That whole "performance" bit is such a rampant issue in development. Just look at any Stackoverflow question and you're guaranteed to see some speed freak in the comments making suggestions about "don't do that, do this, it's faster" when they have absolutely no idea what the use case(s) is/are. Not only that, the OP didn't even ask "how do I make this faster?" They just want to make it work. Definitely some popcorn worthy reading if you ever need to brighten your day. But then again, I'm not a developer so what do I know? 🤷♂
That was an really good and chill explanation on something I struggle with all the time.
I've had projects fail due to me (the solo developer in the company) trying to focus too much on adaptability and performance where it wasn't needed. Feels bad...
10/10 would zen watch again, have a sub :)
great video
I was surprised to discover how young this channel is. The content and quality of your videos are top notch. I appreciate the humor too.
I'm an amateur, so I've never had to deal with performance. But I'm often paralyzed by trying to simplify my code. I'm not sure how to describe it in few words. Basically, getting from point a to point b, I'll end up with 20 nested if -else statements and be only half way. Then I spend hours reading through the documentation looking for a built in function, implementing and debugging my ham fisted attempts. I think of them as learning opportunities, but because I'm an amateur these lessons are few, far between and often forgotten. It's super discouraging.
Another hit. This channel is so good. Yes, I think that people come out of formal education with this over-emphasis on performance. They've had weeks to do their homework, so forget velocity. They're moving onto the next chapter next week, so forget extensibility. Big O notation is bouncing around in their head like the windows screensaver and their CS idols got to the moon with less memory than a wrist calculator. They think "anyone can get it done, but only the best can do it right and that's what gets them noticed". Then they graduate and land in their first production cycle and realize "oh, it is not the case that anyone can get it done".
Honestly I'm really glad I found this channel. These videos are all so high quality and enjoyable to watch.
My advice for performance gains:
1. Database design (missing indexes for large tables, too complex queries)
2. Database queries (making even small sets of data large and not utilizing indexes )
3. Everything file related (caching with files instead ram, too many files in 1 folder, to many file operations)
4. External apis (too many calls)
For those writing console apps, anything related to IO is slow, including writing to console. So make sure to take out those debug lines before running!
Good advice 👌 That might be a better approach than blindly optimizing code!
~ with ❤️ from repliesgpt
5. Don't use XML for anything.
Much appreciated video. What I like the most:
1. The distinction between Macro and Micro optimizations. Don't convolute the discussion. Macro aligns with good architecture, Micro opposes it.
2. Go for data structures (or database structures) first. Data locality is where the battle is won. And it forces you to think about your architecture.
I do real-time signal processing algorithms for embedded systems, and sometimes you need to consider performance in the early stages of development. For example, I may be given the choice between doing something in the time domain or the frequency domain, and depending on if I'm doing more convolutions or array multiplications, one of those choices could be orders of magnitude faster. And because the choice of domain cascades through the whole system, changing later might mean scrapping the whole project.
That said, this applies only to the "core" algorithmic portion of my code. Even in signal processing, there is a lot of peripheral code that isn't nearly so performance constrained, and prioritizing flexibility/maintainability is the way to go.
I mean "premature" is the operative word, and it sounds like while it was early in the project, it wasn't premature to optimize then.
@@dominiccasts Totally agree.
Once upon the time, we develop an auditing framework in .NET core 2.1 that used Socket technology to send data between processes/machine. We placed a lot of synchronization object (semaphore, mutex, etc.) to ensure the data integrity. The code was very smelly... it felt wrong to me. Anyway, the tests were passed the output looked good so we did not clean the code. Later when .NET core 2.1 was upgraded to .NET5, the tests started to fail and sporadically deadlock happened. It turned out that the code failed because the .NET5 much more faster than .NET core. So we had to clean up the code, remove a lot of unnecessary synchronization, and use a modern asynchronization programming model. Sometimes clean code and performance go hand in hand.
P.S. I loved the little stories (Mark Zuckerberg, GTA5, etc.) during the videos. Great job!
Totally agree. Had performance issues in my machine learning project, done in Python. I thought that list comprehension were the problem, but I decided to profile first. Turns out it was some useless copying from RAM to GPU mem :D
As my boss says: you never truly know what your code does until you've looked at it with a debugger.
Nice catch! 🔎 Always profile before making optimizations 🙌
~ with ❤️ from repliesgpt
Hey man wtf. Why did you stop making videos? You're too good at this to not make videos! You actually mastered this youtube niche and you outclass everything else I've seen on TH-cam.
Listen
WE WANT MORE
Side note: PHP is a MUCH better language today than it was when facebook was first created. It still has problems, but its constantly improving.
I look forward to your videos quite a bit. You have a reasonable, rational approach to the thesis of each video, so I hope you will continue for a long time to come. Cheers!
The best reason for Zuckerberg to use PHP was that he already new how to make a website with PHP.
There are plenty of times when way faster code doesn't take more time to write and sometimes even faster to write. Depends on the knowledge of the programmer.
6:44 it seems to me you’ve shown that the answer is, “No, pre- and post-increment have the same performance, at least in optimized code, and provided the result of the operation is ignored.” Thank you for investigating this question and making the answer available to everyone. You’ve saved us, collectively, a great deal of work.
Your stuff is fantastic. These are all the things I have been advocating for 15+ years. You present it in a much more accessible way than I ever did (corporal punishment and yelling :) )
I am now having my team watch all of your videos - Thank you and keep up the great work!
I also liked how you touched on YAGNI - Young devs can get caught up in that as they start writing better code
I just discoverwd your channel recently and I love it!
I feel like there is a gap in content for intermeddiate programmers like this, and this channel fills that gap beautifully with really useful topics covered in a streamlined and neatly explained way. Thank you for providing us with that :)
I recently learned, that thinking about memory layout is much more important than I thought!
I was working on some numerical calculations and tried to do some optimization.
The first thing I did was flatten some nested vectors to just normal vectors to improve parallelization. This sadly did not result in as much of a speedup as I hoped, in some parts it even slowed the code down, despite it making better use of all available threads.
Then I did some more digging into how the memory is layout of and how the vectors are iterated over and noticed that it was very impractical for caching. Thus, I just shifted the memory layout around a little bit without changing anything about the algorithms used. This alone brought me a speedup of over 50%. I never thought that it would make THAT much of a difference, but I am glad I did it :)
Wait how? You told the program to align a piece of data somewhere? How do you do that?
@@theodorealenas3171 (can't say for the original commenter, but) when working with continuous data, the most cache-friendly way to iterate over elements is sequentially. As you access an element, the next few will be loaded in cache (cache controllers detect memroy access patterns + data is cached in chunks), and access time for them will be near instant. This can be illustrated with a 2D array (say, a matrix of floats) stored in memory line by line. If you process it line by line, it'll be fast, because the next element in a line will also be the next value in memory (and thus will benefit from caching), but if you iterate over columns, almost every memory access will be a cache miss and will take very long time in comparison, despite the algorithm being essentially the same.
@@john.darksoul I think what they did is they aligned the beginning of the data with a memory block. I think that's what they're referring to. So one load operation brings a good number of the data you want, and I wonder, how do you typically express that in non assembly code
@@theodorealenas3171 It could have to do with a condition being optimized with branch prediction. The cpu will typically execute the first instruction(s) of the loop while it checks the condition. In cases such as (i < vector.size()), the condition is always true until the end, meaning every loop but the last one is sped up a bit. But if it's sometimes true, sometimes false, it will be much slower than it could be, such as in a case where you have different control paths inside a loop.
@@theodorealenas3171 yeah, well, it would make sense if the amount of data is really small, i.e. fits in a single cache line, but for large amounts of data it wouldn't make much difference (if at all) I believe
In C11, there's aligned_alloc, which allows you to specify alignment in bytes, and some platforms have their own functions that do similar thing (_mm_alloc, posix_memalign etc)
I just recently discovered this channel and I have to say it's amazing. Thank you for the useful and beautiful content !
As a French engineering student I had classes about algorithmic complexity (which is a bit painful but really useful), and I think that it belongs to the 80% moves along with data structures. A classic case study is sorting arrays/lists of numbers, and the sorting algorithm can make a huge difference when working with large ones.
to me pre increment vs post increment isn't about performance but about code clarity and intention
if i just want to increment i'll use pre increment, if i want to iterate on something while actually using the post increment's functionality, i'll use post increment
Same.
I consider what is the most sensible for end condition or return or output and pick the increment type based on that.
I work as a runtime performance engineer in the games industry, and I can say without a doubt that everything in this video hits home so true.
The virtuous cycle of my job is:
- Have a KPI of what we are trying to hit; the performance GOAL
- Measure;
- Make sure your measurement is correct....
- Form a hypothesis of what's 'slow'; does it actually miss the performance GOAL?
- If it does miss the goal, take action
- execute loop until you hit the goal (but always remember you can ask 'is the goal valid')
Rinse and repeat, and always bias for not spending infinite cycles in this loop.
im developing a game and that concept of adaptability is very worthy for me during the experience of crafting a complex system, may the velocity is gained over time while you focusing full on adaptability
This one was really well made. Not too fast not overloaded, slow talk... easy to follow. Thank you!
Hey, really enjoy the videos that you put out. Really liked the dog walking part because i have that constantly... Trying to make a system highly adaptable and adding complexity to where it doesn't need it. Making the development 5 times more. I'm now at the end of a 2 month project that should have taken 3 weeks... I don't know how to balance speed and adaptability
For the previous videos I was like "Yeah, that's also what I try to do and tell people all the time...", but this one caught me.
Thanks for that perspective.
My wife is constantly going on about how frustrating my premature optimization is... Or, at least, I think that's what she's been saying.
lol
premature edge optimization
I’m personally in the ++i camp. Not because it’s faster, but because it conveys a meaning to the reader. If I read i++, I always subconsciously have to consider the side effect of saving the value, doing some operations with the copy, possibly returning it from a function, then incrementing the value. With ++i, I don’t have to consider that possibility. Just increment the number, and use that, maybe. Having to think less about something when reading someone else’s code is worth it in my opinion.
how is your logo a C
Dode Aesthetics
Buh
My general strategy: Start with the simplest and most intuitive approach, then if it's too slow or memory intensive, figure out something better. But also, there's a difference between simple and stupid. Don't invoke premature optimization as an excuse to write stupid algorithms that are blatantly unperformant.
Your array vs hashset example reminds me of something I ran into during my limited time coding:
As a default I used array and += to add elements to it. This worked fine in test but was slow in production. Switching from array to generic list and using add() made it way faster. Further testing showed that array was faster for low number of elements (which was my test data), but when it met real data (up to 600k elements) it was inadequate.
I love your videos man. The pace of the video as well as the animations are really good and easy to follow
I have a pathological fear of the ++i operator, and here's why (story time):
So early in college, I did an assignment with someone who insisted on writing the most hyper-optimized code, no matter what. When our professor told us that readability was one of the grading criteria, this guy spent the entire practical session arguing with him, while I was desperately trying to just get him to work on the assignment with me. The professor tried to explain pretty much the same things as this video covers, but my classmate just wouldn't hear it.
So after that lab session we agreed to both do a part of the assignment. He'd do the first half of it, and then I'd finish it and turn it in. I'm honestly not sure why I didn't just ditch him then and there, because of course he ended up sending me this arcane bit of code that didn't even work on the evening when the assignment was due, asking me to pretty please debug his code and also finish the assignment on time, with only a couple of hours left. I guess I was young and naïve, lol.
Anyway, I ended up redoing everything from scratch and pulling an all-nighter. Luckily my teacher wasn't super strict with deadlines, cause if he was I would've just failed the course right there because of this goon. I didn't bother putting my classmate's name on the assignment after that. He apparently went to complain to the professor and actually managed to get more time to finish the assignment by himself. And then he quit the course a few days later because it was "too stressful".
So now, whenever I see someone write "++i", I get Vietnam war flashbacks to speedrunning a college assignment at 4am.
This is the first video I see from your channel. I rarely subscribe to channels, and even more rarely subscribe after watching only one video.
You've just got a new subscriber. Great content dude
We could also add that any developer should never waste time to outsmart the compiler. I don't write critical software (embedded or real-time) so I'll always favour elegant and easy to maintain code over "optimized" code written by a human.
Your videos are great ! A college shared it today at work and it's funny that both of us found your channel :)
Really good content, well edited and with a lot of tips to develop better.
I hope that your channel will grow, you earned it !
Always optimize as you write code.
Nice video!
Though a more complete quote of 0:06 is rather :
" Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3% "
Other than that I find your videos better and better, both in content and presentation. Have a great day!
Your channel is pure quality dude. Btw once upon a time I was bored, so i wrote a hash map in c++. It took me like 15 minutes to make a primitive hash map. Then I benchmarked it agains std::unordered_map. My code was ~100 slower than the bulit in data structure ;D
100% slower or 100 times slower?
@@compositeembryo7186 Roughly 100 times slower (mostly because really bad hash function).
I love optimization. Like, really love it. And not just for performance; sometimes for aesthetics, sometimes for brevity... messing around and pushing "boundaries" is great fun. But if I don't know that the code will even be around in a week, it's hard to justify spending time on it other than for pure enjoyment. Some of the code I've written has been so simple that a caveman could poke a stick at it and it would be possible for him to adjust it; other code has been so convoluted that you'd need a Rosetta stone and a safari guide to decipher it and avoid the dangerous creatures lurking in the shadows.
That iteration counter, by the way, was pretty neat. Buuut... as a standalone number it didn't really do anything useful for me. If possible, some real world examples would be neat, to put that number into context. E.g. "for all the people in the world using a calculator app, if they switched to my app it would take 150 000 years of combined use to justify the three hours I spent on this investigation" or something. Not saying that calculators would be using i++ a whole lot, but I'm just pulling things out of my rear end here. 😄