Optimising Code - Computerphile

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ธ.ค. 2024

ความคิดเห็น • 383

  • @bartekkowalski
    @bartekkowalski ปีที่แล้ว +206

    For others not to waste their 10 minutes of their life:
    In 3:58 the text encoded in binary is "OK so this isn't strictly speaking the same text as the stuff on the left. Oh well :) :)"
    In 5:17 the text encoded in binary is "Right, are you really converting all this binary back to asci? Well done you! - Sean"

    • @Petertronic
      @Petertronic ปีที่แล้ว +4

      Well done you!

    • @jongyon7192p
      @jongyon7192p ปีที่แล้ว

      Well done you!

    • @mellowyellow7523
      @mellowyellow7523 ปีที่แล้ว

      i cant help but notice the binary in the background

    • @bartekkowalski
      @bartekkowalski ปีที่แล้ว

      @@mellowyellow7523 I also decoded the brighter part of it, but my computer accidentally plugged off before I could save the decoded content.

    •  11 หลายเดือนก่อน

      I have not even noticed the second text, got to "RIGH" on the right and somehow thought I had made a mistake, because it seemed unpronounceable, so I typed it into an online binary translator.

  • @bentpen2805
    @bentpen2805 ปีที่แล้ว +474

    It’s always cool to compile two versions of the same algorithm to assembly, and see they’re identical

    • @KilgoreTroutAsf
      @KilgoreTroutAsf ปีที่แล้ว +14

      Hardly the case. Unless we are talking about shuffling the code around, inlining constants, or factoring things out of a loop.

    • @jordixboy
      @jordixboy ปีที่แล้ว +69

      Not hardly the case because by default the compiler will apply all the optimizations by default. Unless you make significant chances, it makes sense

    • @christopherg2347
      @christopherg2347 ปีที่แล้ว +56

      @@KilgoreTroutAsf Yes, a modern compiler can and will do *ALL* of the above.
      That is it's job.

    • @TheArrowedKnee
      @TheArrowedKnee ปีที่แล้ว +31

      @@christopherg2347 Modern compilers truly are mindnumbingly impressive

    • @christopherg2347
      @christopherg2347 ปีที่แล้ว +14

      @@TheArrowedKnee They are written by people a few levels smarter then me with a more understanding of the hardware then I could ever learn.
      They are no different than a library as far as tools go.

  • @Adeith
    @Adeith ปีที่แล้ว +190

    As someone who has to teach juniors to optimize games, these are the types of optimization that are the least important and is very rare that you actually bother with.
    The part he scoffed at also kinda glossed over a very important part, the reason you don't just write it "right from the beginning" is because you don't only do trade offs between speed, memory and battery, you also do it against maintainability, which these kinds of optimization ruin and is the thing you should be optimizing for while writing it the first time around.
    Also, by far the most important optimizations are choosing the correct algoritm, data structure and architecture. Those optimizations are almost never premature and can give speed ups of many orders of magnitude compared to the micro optimizations shown that might give 2x at most.

    • @Rodrigo-me6nq
      @Rodrigo-me6nq ปีที่แล้ว +16

      Exactly, he brushes off premature optimization then goes straight to prematurely optimizing memcpy, one of the most optimized routines in the crt. Real world implementations of memcpy do everything he mentioned and much more.

    • @JackMott
      @JackMott ปีที่แล้ว +25

      Yeah it makes me twitch when people do the premature optimization quote. Getting the basic memory layout of your data right from the start is important, as it will be hard to adjust that later.

    • @dmail00
      @dmail00 ปีที่แล้ว +5

      Was looking for a comment like this. To write fast code you need to consider your data from the start not afterwards.

    • @pierreollivier1
      @pierreollivier1 ปีที่แล้ว +8

      Yeah he didn't even mentioned SIMD, or how to break dependency chain, to give the CPU more room for speculative execution, or how to optimise the cache line usage, by using a DOD design, those are where the bottleneck usually occurs, if you look at what the cpu is doing most of the time in poorly optimised software the CPU is waiting to get data from the cache.

    • @JackMott
      @JackMott ปีที่แล้ว +11

      but is a beginner video. kinda need a few hours to delve into to that stuff.

  • @mausmalone
    @mausmalone ปีที่แล้ว +188

    One fun thing in optimization is the work that Kaze Eamanuar is doing on the N64. To make a long story short - he's optimizing for speed BUT one of the biggest bottlenecks for the N64 is RAM access. The CPU doesn't have direct access to the RAM, so when it requests a read from RAM it first goes to a local cache and if that page of memory isn't cached, it has to request the (essentially) GPU to copy the entire page of RAM into the CPU's local cache. This goes for both data and instructions which go to separate caches. What he's found is that one of the best ways to get good performance out of the N64 is to get your code to fit into (and be aligned into) a single page of memory. If you can do that, the CPU will hum away happily at 93MHz (which was a lot for the time!). But if you keep calling functions that are located in different pages, you'll frequently have to stall the CPU to wait for those pages to be moved into cache.

    • @joshbracken5450
      @joshbracken5450 ปีที่แล้ว +9

      Yeah I love his videos. Discovered him last week and I binged them all. I don't know why but it's so satisfying for me to see how far you can go with efficiency and how much performance you can claq back from code work alone. Absolutely amazing.

    • @lucbloom
      @lucbloom ปีที่แล้ว +1

      Kaze is awesome! Love when he deep dives into optimization strategies.

    • @The_Pariah
      @The_Pariah ปีที่แล้ว

      Pretty sure you and I watched the exact same TH-cam video on how N64 graphics work.

    • @Malik_Attiq
      @Malik_Attiq ปีที่แล้ว +1

      yeah...now the same approach is used by data oriented design.

    • @Zadster
      @Zadster ปีที่แล้ว +4

      Something very similar was done in the days of the BBC Micro and the iconic game, Elite (published 1984). The 6502 has something called page zero, the first 256 bytes of RAM, which can be accessed very quickly with short instructions. There are also similar benefits to your code if it can fit inside 1 memory page itself. Elite used some incredibly cunning memory access algorithms involving these 2 optimisations which have only relatively recently been properly documented. It is, of course, far from the only 6502 software to use these speed-ups, but it is notable for getting a 3D space flight sim and universe sim into 22kB.

  • @Zullfix
    @Zullfix ปีที่แล้ว +72

    1:36 The quote "premature optimization is the root of all evil" is severely misued nowadays as an excuse to write very slow or just plain bad bad on the first (and often only) pass.
    The quote was initially said to discourage developers from inlining assembly inside their C applications before they even had an MVP. But nowadays it's an excuse for developers to write poor, unefficient code that becomes the architecture of application, making meaningful changes require a large refactoring.
    If you can write good, fast code from the start without going overboard into inlined assembly, what excuse do you have not to?

    • @nan0s500
      @nan0s500 ปีที่แล้ว +11

      Premature pessimization is the root of all evil

    • @typechecking
      @typechecking 8 หลายเดือนก่อน +5

      Quotes are the root of all evil.

    • @Rowlesisgay
      @Rowlesisgay 8 หลายเดือนก่อน +3

      optimizing the proof of concept you've finished and had well documented itsnt premature in the slightest, and lacking any optimization makes people associate you with adobe. I think adobe either missed this part or wanted to be associated with itself.

    • @akashgarg9776
      @akashgarg9776 7 หลายเดือนก่อน

      Eh, it’s still true though. Remember being lectured by my advisor in my job when I spent hours having an O(n^2) algorithm and I wanted to make it O(n), and my advisor said that quote and was like, look, it needs to work first, then we worry about that difference

    • @mfjones9508
      @mfjones9508 4 หลายเดือนก่อน

      Premature ejaculation is the root of all evil

  • @brunoramey50
    @brunoramey50 ปีที่แล้ว +77

    - Optimise fo CPU speed
    - Optimise for memory usage
    - Optimise for power consumption
    - Optimise for maintenability
    - Optimise for Developer time
    Make your educated choice !

    • @christopherg2347
      @christopherg2347 ปีที่แล้ว +14

      Personally: Wait until you are in testing so you actually see what needs optimisation.

    • @ME0WMERE
      @ME0WMERE ปีที่แล้ว +16

      luckily these aren't all mutually exclusive

    • @infinitecrayons
      @infinitecrayons ปีที่แล้ว +6

      Glad someone high up in the comments mentioned optimising for dev time, too. Sometimes you're not just trying to make something run a bit more efficiently, you're also trying to finish the next piece of software sooner to bring someone that benefit sooner.

    • @andrewharrison8436
      @andrewharrison8436 ปีที่แล้ว +2

      Maintainability first, that actually cuts down on bugs so it rolls into developer time then you can take your pick (or spade or crowbar as preferred).

    • @edmondhung6097
      @edmondhung6097 11 หลายเดือนก่อน

      CPU speed can buy with money
      memory size can buy with money
      power consumption is not a concern on non-battery device
      Now make your choice on:
      - Optimise for maintenability
      - Optimise for Developer time

  • @LordKibblesTheHeroGothamNeeds
    @LordKibblesTheHeroGothamNeeds ปีที่แล้ว +23

    A note on the idea of not optimizing until its working. For context, I write embedded C for microcontrollers in time-critical systems that have safety requirements (aerospace and such). With that in mind, the software architecture needs to be designed with some degree of optimization in mind, but that aside, I agree, we not optimize until a functional block is complete. The biggest benefit we have in doing this is readability and maintainability; it is just as important that a piece of code can be understood as it is that it works. I have worked with a lot of legacy code that was written without oversight and focused on hand-coded optimization from the start. The result is code that is hard to follow and is prone to mistakes when it is picked up by another developer.

    • @AileTheAlien
      @AileTheAlien 9 หลายเดือนก่อน +1

      The most fun part, is when you get into arguments about what "readable" means. 😅

  • @Schnorzel1337
    @Schnorzel1337 ปีที่แล้ว +21

    One huge trick John Carmack taught:
    If a piece of code is written poorly and you know it. But at the moment the input size is so small that it doesnt matter.
    Give it a nice living comment with for example assert.
    Example:
    You have an array of unsorted numbers and you want to find if 3 numbers add up to zero. The 3-Sum problem.
    And your first approach takes O(n^3) instead of the optimal O(n^2) its fine go ahead.
    Then put down: assert arr.length

  • @Slarti
    @Slarti ปีที่แล้ว +122

    Always optimise for debugging.
    Someone is going to have to fix your code at some point - always write the code so it is easier to step through and fix it.

    • @jakeezetci
      @jakeezetci ปีที่แล้ว +18

      this eally depends on your field of work
      nobody is going to look through my code that computes solar magnetic energy - people just import the function and get their number
      if they want to know what happens inside they can check the formulae in my article

    • @rudiklein
      @rudiklein ปีที่แล้ว +16

      ​@@jakeezetciassuming you've implemented your formulas correctly in your code. I might want to inspect the code itself.

    • @rudiklein
      @rudiklein ปีที่แล้ว +8

      True! You want to optimize for readability and maintainability too.

    • @SimGunther
      @SimGunther ปีที่แล้ว

      If only compilers had a super-optimiser pipeline to magically take readable & debugable code and turn it into the most optimized code possible in release build mode...

    • @Wyvernnnn
      @Wyvernnnn ปีที่แล้ว +4

      Code is ran more often than it is read, more often that it is written
      Optimize for user experience, then for readability, then for yourself

  • @solhsa
    @solhsa ปีที่แล้ว +132

    Modern compilers are pretty insane. Writing benchmarks is difficult because compiler may realize what you're doing and optimize your payload away.

    • @atomgutan8064
      @atomgutan8064 ปีที่แล้ว +16

      That is actually really funny lol. They just optimize too well beyond our human understanding.

    • @Me__Myself__and__I
      @Me__Myself__and__I ปีที่แล้ว +22

      @@atomgutan8064 Its still very much understandable by humans, if anyone had that much time to spend. The problem is nearly every CPU is different these days. They have different sets of instructions or extra instructions. They have different timings for the individual instructions. Some have advanced caching and jump prediction. So the problem is it would take a human a vast amount of time to fully understand all the ins and outs of a single CPU - but your software many deploy to many different types of CPUs! The companies who write optimizing compilers hire bunches of people to specialize in such things and embed that knowledge into the compiler these days.
      Also remember that since CPUs are so much faster these days (GHz) and there is lots of available memory a compiler can use vastly more resources to find the optimal code than was feasible years ago.

    • @mytech6779
      @mytech6779 ปีที่แล้ว

      ​@@atomgutan8064 They are actually very obvious and easy optimizations, which is why it can be a trick to stop them.
      Maybe you want to test some algorithm with a fixed input for consistant speed results, but the compiler is like "Hey these inputs are all constant values so the result is never going to change, I'll just substitute the final answer for the entire algorithm and print it to screen."
      The solution isn't that complex though, just fetch the inputs at runtime from a separate file or pipe them in from stdin, this way the compiler and optimizer do not know the values and must assume they are truly variable.

    • @lucbloom
      @lucbloom ปีที่แล้ว +4

      The sheer subtile effect of
      v = *ptr;
      vs
      v += *ptr;
      in benchmark loops is a good reminder of your need to pay attention.

    • @mikep3226
      @mikep3226 ปีที่แล้ว +17

      I am reminded of a story I was told about a Fortran compiler being written in the 1970s for a brand new computer (the DG Eclipse). They spent lots of time working on getting all the optimizations they could in. They had run lots of incremental tests as it was developed, but the first big test was writing a large Fortran program which had all the types of coding it was supposed to optimize and then running that through the full compiler. The problem was the compiler produced a resulting binary that didn't compute anything, until one of the other engineers (the one I heard the story from, _not_ working on the compiler) remarked, "your program as written produces no output, so the optimizer noticed that _none_ of the code was relevant and optimized it all away!"

  • @AnttiBrax
    @AnttiBrax ปีที่แล้ว +51

    Remember kids, you are only allowed to quote Knuth if you are going to profile and optimise your code later. If you don't, it's just an excuse for writing sloppy code.

    • @Me__Myself__and__I
      @Me__Myself__and__I ปีที่แล้ว +5

      YES! Almost no one profiles and optimizes later anymore. And because of teachings like this very few write decent code that isn't horribly slow with terrible algorithms/data structures. This Knuth quote really needs to die, it was for a long past time when people actually cared about machine instructions and loop unrolling.

    • @clickrick
      @clickrick ปีที่แล้ว

      "Make it right before you make it fast." (P.J.Plauger)
      I was taught this in one of the first classes in my CompSci course back in the 70s.

    • @Me__Myself__and__I
      @Me__Myself__and__I ปีที่แล้ว +1

      @@clickrick Because it made sense i the 70s. And because even with that said there was a baseline of reasonable quality (aka "making it right") that was just expected. Newer developers don't have that same quality baseline, they write inefficient, unscalable and unmaintainable garbage and use quotes like this as justification. If professional coders back in the 80s write code that was this poor, they'd be unemployed.

  • @Me__Myself__and__I
    @Me__Myself__and__I ปีที่แล้ว +1

    Quite a number of really good comments here calling out how this video covers all the wrong things and is off base. Glad to see so many people understand that and are willing to speak up.

  • @adam_fakes
    @adam_fakes ปีที่แล้ว +1

    My first Software Engineering lecturer (30 years ago) taught me this motto "Make it work, Make it better"

  • @bigutubefan2738
    @bigutubefan2738 ปีที่แล้ว +2

    Steve Bagley is one of the most underrated people on the Internet.

    • @Me__Myself__and__I
      @Me__Myself__and__I ปีที่แล้ว +2

      Not based on this video. This is completely wrong, focuses on extremely outdated things and would be useless in a real world development situation.

  • @rmsgrey
    @rmsgrey ปีที่แล้ว +9

    Just watching the video, I can see two obvious (but mutually incompatible) optimisations (in terms of number of instructions per loop) before trying loop unrolling:
    - rather than counting up until R2 matches R3, copy R3 into R2 before the loop (if you want to preserve the original byte count for some reason) and use a single decrement-and-compare-to-0 instruction rather than the separate increment and comparison instructions.
    - rather than having a separate loop counter at all, calculate what the final value for one of the pointers should be and compare that pointer with that value.

  • @LunarcomplexMain
    @LunarcomplexMain ปีที่แล้ว +53

    It's also helpful to just make the program first without thinking anything of optimization, because when you have it finished (or w.e part you're working on) changes you make afterwards while trying to optimize, you'll already have the ability to test that immediately, with whatever code you've already written.

    • @jakeezetci
      @jakeezetci ปีที่แล้ว +5

      yeah that’s exactly what’s said in the video

    • @sttate
      @sttate ปีที่แล้ว +1

      You say that like he didn't say it six times in the video.

  • @kenchilton
    @kenchilton ปีที่แล้ว +16

    I am usually concerned with optimizing for reliability, maintainability, and testability. Performance means little when code is fragile, because running code is generally faster than crashed code. Walking that line between writing compact (performance and resource optimized) code and clear code is an art form in itself.

    • @lyndog
      @lyndog ปีที่แล้ว +1

      I agree with you.
      Broken code has either a time complexity of infinity (it doesn't run or never finishes) or memory complexity that is effectively infinity. Not optimised code is always less in both dimensions.

  • @uuu12343
    @uuu12343 ปีที่แล้ว +2

    I think everyone in the comment section has to remember: this video is meant as a general introduction and understanding of the necessary rules and purpose of code optimization, NOT code optimization for C (using GCC, gdb, and ASSEMBLY)
    These notes are meant to apply to code optimization for, say, Rust, golang, python etc etc, so while memcpy might make optimization via manual control easier - Python, Rust does not have memcpy
    This uses manual approaches to fully understand the flow, not the tools

  • @johncochran8497
    @johncochran8497 ปีที่แล้ว

    Nice. Although he missed a few points.
    Copying 4 at a time is good, but it also introduces potential alignment issues. Some processors can only access larger chunks of data at natural alignments. And for many that can handle unaligned accesses, the unaligned access is slower.
    Another thing missed. The basic copy loop was like
    while(i < n) {
    *p++ = *q++;
    i++;
    }
    That increment of i is mostly wasted effort. By that, it's just to keep track of how many bytes have been copied. Additionally, comparing the pointers directly is pretty easy. So how about.
    limit = p + n;
    while(p < limit) {
    *p++ = *q++
    }
    Now we've eliminated the increment of i and instead are simply using the unavoidable increment of the pointers themselves to handle the loop. So with his ARM example, we now have 3 opcodes per byte instead of the 5. Of course, the optimizations illustrated in the video can also be used as well.

  • @CarlosFernandez14
    @CarlosFernandez14 ปีที่แล้ว +2

    it's funny how we go back to paper on Computerphile videos lol; always cool to learn these concepts Dr. Steve's way.

  • @fishsayhelo9872
    @fishsayhelo9872 ปีที่แล้ว +10

    speaking on unrolling, one of my favorite C "techniques" for doing so has got so be Duff's device, which makes clever use of some C language features to implement loop unrolling at runtime, similarly to as shown in the video

    • @styleisaweapon
      @styleisaweapon ปีที่แล้ว +2

      its not loop unrolling that duffs device gives you, its a jump table that duffs device gives you

    • @SimGunther
      @SimGunther ปีที่แล้ว +1

      ​@@styleisaweaponTechnically it's a simple jump table that takes care of N mod 8 iterations on data before N div 8 iterations on the rest of the data.
      Nowadays, it's reversed when you let the compiler optimize the loop using SIMD instructions.

    • @balijosu
      @balijosu ปีที่แล้ว

      🤮

    • @styleisaweapon
      @styleisaweapon ปีที่แล้ว

      while I agree that its "barely readable" saying "Just use memcpy" tells us you only think duffs device is for memory copying which is such a gross ignorance that maybe, just maybe... hush@kierengracie6883

    • @xybersurfer
      @xybersurfer ปีที่แล้ว

      this is my first time hearing about "Duff's device". thanks for pointing that this interesting trick

  • @adamburry
    @adamburry ปีที่แล้ว +6

    There was a missed opportunity here. Based on the example, I was expecting you to circle back to the point about making your code correct before optimising it. This code fails for a certain case of overlap between src and dst; it has the potential to overwrite your data before you've had a chance to move it.

  • @Tawnos_
    @Tawnos_ ปีที่แล้ว +1

    @13:52: The computer scientist and the mathematician square off. The mathematician answers "you've got a problem with odd numbers", recognizing that anything indivisible by 4 is indivisible by 2, twice. The computer scientist replies "you've got a problem with odd numbers or numbers that are not a multiple of 4", missing that their added comment was included in the original criterium. It's a great reminder to me (computer engineer) to try taking time to process what the other person said before replying.

  • @AssasinZorro
    @AssasinZorro ปีที่แล้ว +1

    I feel like a game Human Resource Machine" gives a great introduction to programs and optimization.
    Your video is complementary to the game itself.

  • @mikep3226
    @mikep3226 ปีที่แล้ว +1

    I had a job once that I described as being the final optimization pass of the compiler. In the mid-70s a friend of mine (Rodger Doxsey, bio on wikipedia) was one of the PIs for an X-Ray astronomy satellite (SAS-3). Once every 90 minutes they got a large downlink of data from the observations in that orbit and had to analyze it quickly to decide if they wanted to change the orders for the next orbit. The problem was the analysis was all done by Fortran code written by astrophysicists, and the data was all packed in 4, 5 and 6 bit fields in larger data words. So, as the first real Computer person to look at the code, the first optimization was to just know how bit arithmetic worked and improve the Fortran in the inner loop a bunch (and, IIRC that halved the time it took for the program to run). But then I looked at the generated assembly code and realized that by better use of the machine code bit shift/mask features, it could be improved much more (IIRC, a factor of 5 this time). They were very grateful for the extra time that gave them to think about what the data actually meant.

  • @matankabalo
    @matankabalo ปีที่แล้ว +6

    Very interesting video, Can't wait for the next one!

  • @johnbennett1465
    @johnbennett1465 ปีที่แล้ว +9

    I am disappointed that he didn't even mention the memory alignment problem. At best, off alignment access is a significant performance hit. At worst, it fails. I don't know if any current computers have the limitation, but I have worked on computers that require all access to be aligned to the data size.

    • @amigalemming
      @amigalemming 11 หลายเดือนก่อน

      He'd better told the people to just call memcpy.

  • @avramcs
    @avramcs ปีที่แล้ว

    I think another point at 2:05 from your question asked is that who knows the “way” to build something. It’s hard to build something the right way at the start because you are trying to write your mental abstractions of your program ideas into actual code

  • @trapexit
    @trapexit ปีที่แล้ว +4

    The ARM has the ability to read multiple words into and out of memory/registers which could also be used to improve cycles per byte of a copy routine.

  • @Richardincancale
    @Richardincancale ปีที่แล้ว +10

    Two other optimisation targets you might consider:
    1. Optimise for stability - particularly in programs operating in real-time environments like transaction processing or process control… avoiding technologies that can lead to memory leaks etc.
    2. Optimise for maintainability - for long lived code that may be maintained by a separate team the ability to perform thing’s simply and correctly, without any tricks or nooks and crannies will be more valuable in the long term.

    • @amorphant
      @amorphant ปีที่แล้ว

      Those aren't optimizations.

  • @TheWyrdSmythe
    @TheWyrdSmythe ปีที่แล้ว +2

    1. Make it work. 2. Make it work right. 3. Make it work fast.

  • @MechMK1
    @MechMK1 ปีที่แล้ว +5

    The best thing to optimize for is readability and maintainability. I've seen people write "optimal" code before, which was ~20% faster than baseline, but at cost of maintainability. These days, for most applications, maintainability is key.
    Also, regarding "the right algorithm": Caching is so often overlocked. Trading off memory usage for execution time can be extremely valuable, especially if the computation is expensive or requires IO.

    • @elpapichulo4046
      @elpapichulo4046 ปีที่แล้ว +1

      Hard disagree

    • @MechMK1
      @MechMK1 ปีที่แล้ว

      @@elpapichulo4046 Would you mind stating why?

  • @GRHmedia
    @GRHmedia ปีที่แล้ว +1

    Power is usually solved by also solving for performance. The less cycles something takes the less power it uses. The less time it runs the less power it draws over time. After that it is more efficient to configure the hardware to run at a lower power level than try and make the code use less power. If you need to use one core vs 64 cores during some low power situation then you could use code for that. However, if you need 1 core out of 64 all the time it is best just to adjust the hardware to that need.
    Performant code tends to also solve other issues. Performant code is usually smaller less lines. Less lines means less chances for bugs. Also makes it easier to maintain and use in the future because there is less to understand for someone new. Less lines also usually translated to less machine instructions in most cases. This means it tends to make fitting in memory easier.
    Granted your compiler flags can also make this change but if it is smaller to start with it will also end up smaller in most cases with the compiler flags set to whatever.

  • @UncleKennysPlace
    @UncleKennysPlace ปีที่แล้ว +1

    When we wrote code for USAF in the mid-90s, each of us were assigned, beyond our workstations, the least powerful computer in use at the base.
    Everything needed to run properly on that lowly PC; "It runs on my PC" would get you chastised severely.
    I learned to write tiny code; now I can write verbose code, and the compiler does the hard work.

  • @TheDeanosaurus
    @TheDeanosaurus ปีที่แล้ว +14

    At an enterprise level there's another optimization layer which is the human element; maintenance, reusability, and extensibility. There are times in our projects we often forego true computational optimization for readability or ease of use of a certain API. I think from a business standpoint that usually comes first (fortunately or unfortunately) because it affects the amount of resources required to solve a problem, especially given the higher order languages that already either have some optimizations built into the compiler/translation layer or, on the other end, are so abstracted away that computational optimization isn't possible. Not to say this isn't still thought about, we consider O(n) daily even if we don't directly solve for it, it's just either second nature or not worth the additional engineering time.

    • @mytech6779
      @mytech6779 ปีที่แล้ว +1

      Optimizing those human items is often not in direct conflict with performance optimization. There is no need to make a poorly commented confused intertwined mess when making a better list of instructions; that tends to be a sign of someone that really doesn't know what is going on just throughing things at the wall to see what sticks. At the other end general overly verbose codebase bloat usually hurts all types of optimization, human and machine.

    • @christopherg2347
      @christopherg2347 ปีที่แล้ว +2

      Absolutely
      "When I was working on Visual Studio Tools For Office we did comparatively little making the customization framework code _run_ faster because our tests showed that it typically ran fast enough to satisfy customers. But we did an enormous amount of work making the framework code _load_ faster, because our research showed that Office power users were highly irritated by noticable-by-humans delays when loading customized documents for the first time. "
      "Which is faster?", Eric Lippert, "Part the fifth: What is this “faster” you speak of?"

    • @TheDeanosaurus
      @TheDeanosaurus ปีที่แล้ว

      @@christopherg2347 Which in essence IS optimization for loading. We have made similar decisions in iOS development to actively avoid dynamic linking because it bloats application launch times. There's newer functionality that allows for dynamic linking during debug builds and then merges those libraries into a single (or multiple depending on how configured) binary which makes release builds take longer but launch times much faster.
      But again that's a different kind of optimization, sometimes we'll write code and abstract away several layers of an operation to make it more testable. Does this make it run slower in production? Possibly? At a scale above a few nanoseconds? Probably not, so we don't even think to optimize for runtime performance in that instance, we optimize "for stability" instead by ensuring test coverage is there. That ultimately saves support time, saves our maintenance time, and would hopefully save consumers' time by working the first time. Even 30 seconds having to relaunch an app from a crash is more time (and bits) burned than a slightly less optimal but unstable piece of code.

  • @erikhgt6020
    @erikhgt6020 ปีที่แล้ว

    Thanks for the great critical questions by the cameraman

  • @casperghst42
    @casperghst42 ปีที่แล้ว +10

    1) write readable code
    2) make sure it works
    3) if possible optimize it

    • @JGnLAU8OAWF6
      @JGnLAU8OAWF6 ปีที่แล้ว +4

      Only if needed and only after profiling it.

    • @christopherg2347
      @christopherg2347 ปีที่แล้ว +1

      I would say "3) Optimize only actual issues revealed during testing"

  • @Xilefian
    @Xilefian ปีที่แล้ว +1

    Optimising `memcpy`! Traditionally you don't want to ever do that, but I actually wrote a fast 32-bit ARM `memcpy` for the GBA that uses 12 registers for a whopping 48 bytes copied per iteration when in ideal conditions
    Each iteration is just 4 instructions:
    ```arm
    .Lloop_48:
    subs r2, r2, #48
    ldmiage r1!, {r3-r14}
    stmiage r0!, {r3-r14}
    bgt .Lloop_48
    ```
    I had to set the CPU mode to FIQ to free up the stack and link registers for an extra 8 bytes to copy per iteration

  • @Robstafarian
    @Robstafarian ปีที่แล้ว

    The sight gags are always appreciated.

  • @sidpatel77
    @sidpatel77 ปีที่แล้ว +2

    bro is using a literal notepad as a texteditor, absolute chad

  • @Wyld1one
    @Wyld1one ปีที่แล้ว +1

    Other types : complexity, understandability, portability(cross platform)
    Testing.
    Sometimes the debug libraries _contain_ errors , thus causing problems determining optimizing and validating.
    Bugs errors and slow downs can also be encapsulated(built-in) in the compiled versions of libraries as well.

    • @Wyld1one
      @Wyld1one ปีที่แล้ว

      Do you need to optimize? CPUs our general way to compute things. What if you don't need a computer at all what if you could just look up the results. Generate a table once look it up after that. What if there's a shortcut computation? There's a shortcut computation is probably also a shortcut lookup. If you need to have more precise results that's the time to make more precise lookup tables
      How important is it to be optimizing the first place. Are you doing trillions of operations and you need it in a fraction of a second well yes. If you're doing one operation every year well who cares

    • @johnbennett1465
      @johnbennett1465 ปีที่แล้ว

      ​@@Wyld1onefor some problems tables are a great optimization. For others the table would require a memory device larger than the observable Universe. I have worked with both cases.

  • @АртёмФилимонов-х4н
    @АртёмФилимонов-х4н ปีที่แล้ว +1

    Do not worry, the editor guy, I read the covers👏🏻

  • @francis_the_cat9549
    @francis_the_cat9549 ปีที่แล้ว +6

    The mindset of "Oh its just a 0.1 secs and it only gets run once" is why we have to wait dozens of seconds (yes thats a lot computers are FAST) for programs to start

  • @christopherg2347
    @christopherg2347 ปีที่แล้ว +2

    I can highly recommend the Article "Which is faster?" by Eric Lippert.
    After a giant like Knuth, he is one of the biggest authorities on the matters of programming.

    • @xybersurfer
      @xybersurfer ปีที่แล้ว

      i'm not that impressed by Eric Lippert honestly. it's probably his bad takes on the StackOverflow website, that gave me a bad impression

  • @SykikXO
    @SykikXO ปีที่แล้ว

    17:25 for any one wondering, its str r5, [r0] instead of [r1].

  • @ericon.7015
    @ericon.7015 ปีที่แล้ว

    I've arrived at the same conclusion by experience. While coding multiple migration scripts, that must be ready to use as soon as possible. Soon enough I realized that first, I had to make the code work and do the job, and then later do the optimisation. At least you know where you could optimise. Otherwise you will not being optimal and will loose precious time.

  • @feandil666
    @feandil666 ปีที่แล้ว +4

    This saying, not optimising too early, was said at a time where programs were mostly small algorithms implemented in C. It doesn't apply to a bigger system, especially with asynchronous and latent operations, where you really have to think about optimisation early enough to not create an unoptimisable mess. For instance if you use OO you'll never get the performance you can get with a data driven approach.

    • @Me__Myself__and__I
      @Me__Myself__and__I ปีที่แล้ว +1

      I love seeing people call this out. I've been saying this for a decade and for a long time no one else seemed to be saying such things.
      Regarding OO, everything has its place. OO is really excellent at dealing with complex inter-twined data that needs to adhere to certain rules. Trying to do such things without OO can be a nightmare. But if you're dealing with vast quantities of simple data OO can add a lot of overhead just from allocating, deallocating and fragging your memory. Right tool for the job.

  • @cidercreekranch
    @cidercreekranch ปีที่แล้ว +3

    Your definition of optimal reminds me of the definition for recursion in the Devil's DP Dictionary. Recursion: noun, see Recursion.

  • @TroZ_Games
    @TroZ_Games ปีที่แล้ว

    One other optimization that you could do before loop unrolling, is getting rid of the counter (i in the pseudocode). Have one line before the loop calculate the address of the last byte to copy (src address plus i). Then in the loop you don't have to do i++, just compare the src pointer to the calculated last byte to copy. If the source pointer address is equal to the last byte to copy, exit the loop. For the copying four bytes at a time version, that would make it one byte per instruction.

  • @LarsHHoog
    @LarsHHoog ปีที่แล้ว +1

    Solving the right problem is my professional mantra because a good solution to the wrong problem is a waste of them.
    That said, Z80 assembly tweaking is a hobby when coding for the ZX Spectrum.

  • @Yupppi
    @Yupppi ปีที่แล้ว

    My favourite topic. I love beautiful and aesthetic stuff, and efficient. Sometimes beauty and magic don't go well together, sometimes it does (there are people like people who work with standard libraries that say, generally the simpler the solution, the better it is). Then you see matrix multiplication optimization.
    Sean's question about isn't it better to design it well at first is great in my opinion to draw out the difference of good design vs optimization. Because at times optimization is definitely not "good design" in terms of let's say readability. And the implementation details shouldn't really be part of the design talk I don't think. For the general idea etc yes, but the optimizations come up from seeing what the actual code is like. Like Sean Parent's classic "that's a rotate" speech in GoingNative2013.
    I think the good way to explain compiler and optimizing is that the compiler can only optimize it as much as you allow it to: if you keep a layer of mystery to everything in the code, the compiler can't deduce what is behind the curtains and has to consider the worst case possible. If you write it smarter, the compiler can just look at the code and see "oh you declared that const, that constexpr, you made loops or avoided loops where I can just write them off and skip calculations, just saving the answer" etc. I.e. if you give compiler enough information about your code (don't go declaring types where you should allow the compiler to deduce them), it can do magic tricks that you couldn't even imagine trying to do clever bit manipulation tricks. The compiler WILL outsmart you if you give it an opportunity. But for example deciding which type of pointer you give, which kind of virtual functions you make, how you pass your pointers or pass by value/reference, that will improve things manually. And using vectors and STL algorithms. Remove loops and branches. Remove unnecessary copies and allocations. And don't lock the compiler forced to do something unnecessary by being very explicit. Don't make an int pointer to be smart because the pointer takes more resources. Of course these apply more to C++ than C, they're obviously different beasts.
    Matt Godbolt has a fantastic demonstrations with his Compiler Explorer about how you shouldn't try to outsmart the compiler, but work with it. Jason Turner's Commodore 64 game in C++17 is also an impressive demonstration of 0 overhead abstractions and compiler magic. Bjarne Stroustrup also had that article about linked lists vs vectors and how despite a lot of all kinds of testing, and despite being a very unintuitive result, vectors came on top almost always. And switch magic...

  • @andrewharrison8436
    @andrewharrison8436 ปีที่แล้ว

    The main optimisation is, in my opinion, the readability. Comments are free at run time when the compiler has removed them. Comments are prceless at 4am when things have gone pearshaped.
    Actually comments may avoid that 4am phone call altogether by forcing you to actually think what you are doing when writing the program.

  • @S_t_r_e_s_s
    @S_t_r_e_s_s ปีที่แล้ว +6

    Premature optimizing is definitely what I struggled with the most at first in software development.

    • @KilgoreTroutAsf
      @KilgoreTroutAsf ปีที่แล้ว +1

      You were learning. Nothing wrong with that.

    • @christopherg2347
      @christopherg2347 ปีที่แล้ว +1

      'Premature optimisation. They say it happens to 4 out of 5 programmers 😉"

    • @Takyodor2
      @Takyodor2 ปีที่แล้ว +1

      @@christopherg2347 Yeah it happens to roughly 5 out of 4 programmers. Hold on, it seems I swapped a couple of variables somewhere, dang it this code is completely unreadable!

    • @christopherg2347
      @christopherg2347 ปีที่แล้ว +2

      @@Takyodor2 "The two hardest problems in programming are: Cache invalidations, naming things and off-by-one errors."

    • @Takyodor2
      @Takyodor2 ปีที่แล้ว +1

      @@christopherg2347 🤣 I love that one!

  • @KilgoreTroutAsf
    @KilgoreTroutAsf ปีที่แล้ว +14

    1:35 One of the most selectively misquoted (when not deliberately misinterpreted) statements in the history of computer science, used to justify every bad engineering decision and poorly written code, ever.
    The FULL quote reads:
    "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, [...] We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."
    He never said "forget optimization altogether, computers are fast and compilers are smart".
    What he ACTUALLY said is "most of the time, you should care about optimization only AFTER you know which sections of the code are critical".

    • @dembro27
      @dembro27 ปีที่แล้ว +3

      Did you immediately stop watching in outrage after that supposed misquote?

    • @DemonixTB
      @DemonixTB ปีที่แล้ว +2

      @@dembro27 He does misquote it by omission, and Kilgore is right. That single quote may sadly be the reason software is 1,000-2,000x slower now then it could be if people simply didn't pretend the compiler is magic and put any amount of performance QA into their completion estimates. By any accoutns Knuth himself is one of the masters of handrolled assembly, so certainly the quote isn't being applied in the ways he would use it for himself.

    • @christopherg2347
      @christopherg2347 ปีที่แล้ว

      @@DemonixTB Funny that you said "could be" instead of "needs to be".

  • @FalcoGer
    @FalcoGer ปีที่แล้ว

    @8:25 that program has a flaw when they source and destination overlap partially. It will overwrite the bytes it is going to copy later.

  • @RipVanFish09
    @RipVanFish09 ปีที่แล้ว

    Seeing that old printer paper brought back some memories.

  • @realhet
    @realhet ปีที่แล้ว +2

    And what remains when you take out optimizing from programming?
    The never ending process of decoding business requirements and translating them to code.
    I consider myself lucky if optimizing is part of the business requirements, because I found it an enjoyable puzzle, but sadly that's a rare opportunity.

    • @Me__Myself__and__I
      @Me__Myself__and__I ปีที่แล้ว

      Good. Always make it part of the business requirements, factor it into your estimates. The more you spend thinking about how to write good, optimal code the better you'll get at coding and the less time such things will take you later. Its a skill and if you exercise that skill you'll be a better coder than most others.

  • @Veptis
    @Veptis 16 วันที่ผ่านมา

    I never learned how to benchmark or profile. And that would be extremely helpful

  • @komolunanole8697
    @komolunanole8697 ปีที่แล้ว +6

    Always write readable/maintainable code first. If performance is an issue, profile and optimize.
    Also sad there was no mention of CPU magic like branch prediction/speculative execution, instruction pipelining, cache effects, etc.
    Optimizing for "lower number of instructions" may sound reasonable but is hardly the right metric on modern hardware.

  • @fghsgh
    @fghsgh ปีที่แล้ว +1

    As a programmer who usually codes in assembly... gosh compilers are really stupid sometimes. A lot of the time you have to write the code in C using weird builtins like \_\_builtin_ctz or SIMD stuff, and it'll be as good as in assembly, but those are the kinds of optimisations compilers do miss. And in the case of older or embedded architectures, you can really save a _lot_ because these architectures are hard to optimise for.
    But it makes sense that compilers aren't perfect btw. We're at the point where adding more optimisations to compilers will noticeably slow down compilation times. It's a tradeoff.

  • @CoolJosh3k
    @CoolJosh3k ปีที่แล้ว +1

    I’d have thought that divide by 4, where 4 is a constant, that the compiler would optimise that to a shift operation for us?

  • @LudwigvanBeethoven2
    @LudwigvanBeethoven2 ปีที่แล้ว

    Profiling before optimization saves a lot of time because then you know which parts are slow and need to be faster

  • @sabriath
    @sabriath ปีที่แล้ว

    you should always check the pointer location before copying memory....i know that's not part of the video explanation, but it's necessary to know which direction to copy the data. For example, if you have an array of 1000 bytes and you are copying from position 2-1000 into 0-998, left to right will work.....but say you are expanding the array and copying the data from 2-998 to 4-1000, well then you have a problem because you from position 6-1000 will have a corrupt data of position 2-3 repeated nonstop. In this way, if the write position is higher in memory than the read position, you have to add the count to both positions and subtract with each loop, working backwards.

  • @Amonimus
    @Amonimus ปีที่แล้ว +3

    My best tip is to make it readable. You don't need the code to be efficient, but if you can tell where anything is you can fix it when necessary.

  • @Kniffel101
    @Kniffel101 ปีที่แล้ว +8

    For most modern CPUs it's less likely to be needing ASM optimization, lacking cache utilization is the biggest bottleneck in most cases.
    Until we have like 512MB+ L3 caches on consumer CPUs, which on the other hand isn't too far into the future, at least for desktops! =P

    • @KilgoreTroutAsf
      @KilgoreTroutAsf ปีที่แล้ว +7

      When we finally get 512Mb caches, bad programmers will already have figure out a way to use an extra 2Gb of memory for a simple quicksort

    • @Kniffel101
      @Kniffel101 ปีที่แล้ว

      That might unfortunately be the case, yeah... @@KilgoreTroutAsf

    • @christopherg2347
      @christopherg2347 ปีที่แล้ว +1

      @@KilgoreTroutAsf Which is avoided by using a existing implementation, instead of prematurely optimizing a custom one.

  • @mikoajzabinski3569
    @mikoajzabinski3569 ปีที่แล้ว

    In the second assembly program, shouldn't add instruction have #4 at the end?

  • @reecelawson2403
    @reecelawson2403 ปีที่แล้ว

    Would you be able to make a video explaining what virtual cores are please?

  • @RealCadde
    @RealCadde ปีที่แล้ว +3

    About the bit of using the compiler to optimize your code.
    Most times it's not a matter of badly compiled code but a bad approach to the problem.
    Say a function needs the first million prime numbers... You wouldn't calculate the first million prime numbers on each function call.
    You would store a table of the first known million prime numbers somewhere so they can be looked up as they are needed. I.E, not CalculatePrime(N) but rather FetchPrime(N)
    No compiler i've heard of can figure this optimization out for you. It can only make the CalculatePrime() function run faster. It can't re-design your program to fetch a prime from a table.
    The best example of optimization i can recall is that of the game Factorio. Their code was optimized but it wasn't enough. They had to consider where the resources they accessed lived in the memory, was it in slow RAM or fast CACHE memory? And they re-designed their code such that as much as possible was living as close to the CPU core as possible as they needed it.
    Apparently, no amount of compiler optimization could do this for them even though modern processors are supposedly good at organizing what can live in cache vs RAM. They changed it so when something existed in cache, they did everything they needed to that memory in one batch and only then would discard parts of that close to CPU data that they knew they wouldn't be needing for a while. Especially as it came to path finding and fluid network calculations.

    • @christopherg2347
      @christopherg2347 ปีที่แล้ว

      97% means it does not apply in 3% of the times.
      Plenty of other games had no such issues.

  • @tpobrienjr
    @tpobrienjr ปีที่แล้ว

    The quote from Knuth also applies to databases: over-normalization.

  • @ChrisWalshZX
    @ChrisWalshZX ปีที่แล้ว

    Begin a life long Z80 coder, these techniques are quite familiar. I don't know if using the stack pointer is feasible for fast data transfer on a modern CPU?

  • @TheStevenWhiting
    @TheStevenWhiting ปีที่แล้ว

    2:06 Much what Cliff Harris says on Positech Games when he was optimising his game.

  • @milasudril
    @milasudril ปีที่แล้ว +3

    It may not optimize this code unless you say restrict, due to potential aliasing.

  • @przemekbundy
    @przemekbundy หลายเดือนก่อน

    How do you compile a program on a piece of paper?

  • @jaffarbh
    @jaffarbh ปีที่แล้ว +1

    One optimisation strategy I use for my backend-heavy Django website is to make it fit into a Raspberry Pi 3! This way, I am forced to optimise it for speed as well as memory (the Pi has under 1GB or usable RAM).

    • @xybersurfer
      @xybersurfer ปีที่แล้ว

      that's a pretty cool strategy. i've been interested in doing the same with my applications. i think there is a lot of deep stuff to be learned that way, depending on how far you go

  • @ChungHieuBui-l2d
    @ChungHieuBui-l2d 2 หลายเดือนก่อน

    Hi, I’m curious about the possibility of using AI to train a Multi-Dimensional Code Analysis and Optimization System. What do you think about this approach?
    It’s like building a 3D model of code - analyzing all angles at once instead of viewing it from just one side.
    The 5 key dimensions are:
    1⃣ Semantic Analysis
    2⃣ Structural Analysis (AST)
    3⃣ Control Flow Analysis
    4⃣ Data Flow Analysis
    5⃣ Pattern Recognition (Vector Space)
    Do you think LLMs can support this approach? Would love to hear your thoughts! 🙌

  • @Elesario
    @Elesario ปีที่แล้ว +1

    Sometimes you're optimising for development time rather than code performance

  • @wmrieker
    @wmrieker 10 หลายเดือนก่อน

    these are more optimizations that a compiler can do behind the scenes. most application programmers are concerned with picking the optimal algorithm. no compiler (afaik) is going to optimize which type of sort is best for the data you have.

  • @FrancisFjordCupola
    @FrancisFjordCupola ปีที่แล้ว

    ARM assembly is so nice.

  • @mytech6779
    @mytech6779 ปีที่แล้ว +3

    Look at me, I'm making a comment "correcting" the video because I think it will make me look highly informed and superior, but really just proves my poor comprehension of the content.
    Hurrah!

  • @pierreabbat6157
    @pierreabbat6157 ปีที่แล้ว

    What happens if you move four bytes at a time 1000 bytes from address 59049 to 65536?

  • @amigalemming
    @amigalemming 11 หลายเดือนก่อน

    An experienced assembly programmer would never count upwards, but always downwards. It saves you the comparison instruction, because decrement always includes a test against a zero result. It's not even counterintuitive - just do not call the variable n, but numBytesStillToCopy, or so. However, modern optimizers can detect and eliminate superfluous upward counters.

  • @amorphant
    @amorphant ปีที่แล้ว

    Don't let the comments confuse you. You don't "optimize for maintainability." A lot of people are conflating big-O complexity with code maintainability. Yes, you should always write your code to be maintainable from the start, but optimization you don't need to knock yourself out on from the get-go. Optimization specifically means reducing the big-O complexity of time, memory usage, or power usage, say from exponential to logarithmic. Big-O does not apply to the concept of human readability/maintainability.

  • @oresteszoupanos
    @oresteszoupanos ปีที่แล้ว +1

    Sean, of course we can read the book titles 🙂

  • @snoopyjc
    @snoopyjc ปีที่แล้ว +3

    Does the data still have to be aligned to loads/store 4 bytes at a time? Back in the day, that was also a limitation of doing it 4 at a time

    • @anianii
      @anianii ปีที่แล้ว +2

      No, but it's faster if it is. There are operations for aligned and unaligned data. The aligned one will almost always be more efficient and take less time

  • @AnExPor
    @AnExPor 11 หลายเดือนก่อน

    You can also optimize for readability and testability.

  • @josefjelinek
    @josefjelinek ปีที่แล้ว

    Nowadays, rather than getting on the CPU instruction level, much bigger speed benefits are generally by optimizing memory layout of the data structures used. And that can be done on a higher level than CPU instructions. What good is to save 4 instructions taking a few clock cycles, when you have a cache miss and wait for hundreds of cycles on a simple read from the main memory... Not optimizing for CPU cache usage / hits (several levels) and prefetch is the biggest blind spot of videos like this one IMHO.

  • @sthex4640
    @sthex4640 ปีที่แล้ว +10

    People that know premature optimization is bad tend to say "just write your program so it works and then worry about optimization". In some cases this is even more harmful... Try writing a game engine, a packet processor or some ML primitives your usual way (with some classes and some interfaces and what not) - and then you'll realize you can't cross a hundred bridges in your pursuit for speed because you've started the wrong way.

    • @christopherg2347
      @christopherg2347 ปีที่แล้ว

      Writing _your own_ game engine is already premature optimisation. Same with a packet processor or any other example.

    • @elpapichulo4046
      @elpapichulo4046 ปีที่แล้ว +4

      ​@@christopherg2347 lol

    • @Rodrigo-me6nq
      @Rodrigo-me6nq ปีที่แล้ว

      Classes are structs with an extra param, interfaces are pointers to a callback table. Both are commonly used patterns in C, if using them degrades performance for you you're using them wrong.

    • @christopherg2347
      @christopherg2347 ปีที่แล้ว

      @@elpapichulo4046 Why are you writing your own game engine then?
      What actual issue did you discover?

    • @sthex4640
      @sthex4640 ปีที่แล้ว

      @@Rodrigo-me6nq In the examples I've given, you're often not supposed to be using them *at all* if you care about performance. That was the point.

  • @wholenutsanddonuts5741
    @wholenutsanddonuts5741 ปีที่แล้ว +2

    Can you do an episode on distillation as it applies to neural nets? I’d really appreciate that. TY!

  • @mirvessen
    @mirvessen ปีที่แล้ว

    I like to optimize code for readability

  • @deepakr8261
    @deepakr8261 ปีที่แล้ว +1

    And on ARM(and some other archs like RISCV as well) you can want to write your loop to count down to zero instead of counting up to the loop count. As they would have decrement and skip if zero flag instruction which would be faster than load value from reg, compare two register and make decision about branch. RISCV even has an r0 reg dedicated to storing 0 value. But again does all these fancy optimization make a real difference if you do it in places which are not hot spots in your code? Probably not.

  • @RealCadde
    @RealCadde ปีที่แล้ว

    2:20 He is right, it's better to make the program modifiable. If you just make it work but the code is a bloody mess, you are going to have to re-write a lot of code when you finally get around to trying to optimize it. Meaning you not only develop the same program TWICE, but now you also have to replicate how the program you've written horribly works in the new optimized version minus the slowdowns or resources expenses.
    Premature optimization might the the root of all evil, but badly written code is like trying to find a way to divide by zero.
    EDIT: I personally tend to do "budget programming" and by that (a term i just made up myself) i mean that every feature of my program has a budget. It can take 10 ms to execute, but not more than that or i will most definitely have a frame drop on my target hardware at 60 FPS since it will coincide with everything else that eats another 6.666 ms.
    Or this feature must not use more than 100 mb of RAM, since the rest of the program only uses 100 mb of RAM in total. I don't want runaway memory usage this early on because re-factoring later to use less just means i've wasted my time developing the first example and have to redo everything anyways.
    I've never concerned myself with power usage because i am not developing for phones or laptops or server farms etc.
    It's all speed and memory usage and in some cases, network latency/bandwidth.
    I plan ahead of time what my goals for performance are and if a newly added feature misses those goals, i optimize them right away. As developing on top of those unoptimized features will just mean i have to change/refactor more code later on anyways to meet the goals.
    It's as if i include execution time and resource usage into my unit tests.

  • @himselfe
    @himselfe 9 หลายเดือนก่อน +4

    The quote "premature optimisation is the root of all evil" is the root of all evil. The quote is outdated and lacks important context from the original passage, and is used by people to excuse writing bad code. It is far better to practice doing it correctly the first time than it is to adopt the attitude of "good enough, we'll fix it in post". Looking at the software industry, there's hardly a pandemic of optimisation going on. Efficient code is an exception not the rule, and that holds the world back in many ways.

  • @TheVoidSinger
    @TheVoidSinger ปีที่แล้ว +1

    There's also optimization for language. not all languages have these well developed compilers, and that's something to know up front. Custom/Scripting languages especially tend to be very literal/rigid in what the compilers write, so knowing and using at least a common basic set of optimization techniques while writing can go a long way in producing better functions and programs. In those kinds of cases maintainability and dev time is usually sacrificed, and either run time, or less often, memory reduction is the goal, with things like power consumption often ignored entirely.

  • @federicomoya4918
    @federicomoya4918 ปีที่แล้ว

    Great content, thank you!

  • @MikelNaUsaCom
    @MikelNaUsaCom ปีที่แล้ว +1

    another idea here, is optimization for the user experience... users are fine with a screen presenting a progress bar while loading... updating the progress bar will take more resources, and cause the program to take longer to execute, however, keeping the users updated and involved in the process can make all the difference in getting the users to have a good time while using the program and have better reviews and better user experience, even though it actually takes longer to execute... the user enjoys it more... so there are trade-offs, in human experience, which programmers don't usually optimize. As an example, I put a bunch of code that loaded while the screen was loading... and then when the users pushed the execute button, most of the work was already done, and the result came up faster... but most of the work was done, before they pressed the button. BTW, Disk access is orders of magnitude slow than memory access, so it can be counter intuitive to pre-emptively load lots of data to memory from disk, but once it's in memory... preforming a bunch of recursion on memory only will be several orders of magnitude faster in memory than by accessing disk during the recursion. Just my 2 cents. =D

  • @simonclark8290
    @simonclark8290 ปีที่แล้ว

    Optimising code for speed IS optimising for power. Software that executes a task faster allows the processor to sleep for longer and that reduces power.

  • @HPD1171
    @HPD1171 11 หลายเดือนก่อน

    I like how what you call “pseudo code” is actually just compilable c code.

  • @not_a_human_being
    @not_a_human_being ปีที่แล้ว

    As a pythonista, I simply MUST stress, that "readability matters". Optimising for readability is a thing (not necessarily the same as PEP 8).

  • @casperes0912
    @casperes0912 ปีที่แล้ว

    Sean was wise. Premature optimisation is evil. But so is unoptimisable architecture. Fundamental structure matters a lot

  • @statphantom
    @statphantom ปีที่แล้ว +2

    it's probably also worth mentioning that bit-shifting right doesn't always act as a divide by 2. especially true with negative numbers. most current versions of C / C++ and I think java does follow a toward zero mantra but older ones may not + some other languages may not either

    • @mytech6779
      @mytech6779 ปีที่แล้ว

      The behavior of a bit shift has little to do with the high level language, it is determined by hardware. Normally it is just truncation, but there may be differences in how the new most significant digit is filled for negatives.(keeping note of signed vs unsigned)