Two Decades of Hardware Optimizations Down The Drain

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 พ.ค. 2024
  • Credits:
    Christian Mutti's original blog post: chrs.dev/blog/clean-code-rust/
    Rust code used in the video: gist.github.com/chrsmutti/698...
    Casey Muratori's Video: • "Clean" Code, Horrible...
    Clean code philosophy Gist: gist.github.com/wojteklu/73c6...
    x86_64 Instruction Set Reference: www.felixcloutier.com/x86/
    Rust mascot Ferris the crab: www.rustacean.net/assets/rust...
    Icons: primer.github.io/octicons
    Photo of Robert C. Martin (CC BY SA 4.0): en.wikipedia.org/wiki/File:Ro...
    I Interviewed Uncle Bob - ThePrimeagen: • I Interviewed Uncle Bob
    Backing soundtrack "Keys Of Moon - Somewhere in the Clouds" is under a Creative Commons (CC BY 3.0) license.
    • 🕯️ Free Relaxing Medit...
    Code used to render this video: github.com/lavafroth/videos/t...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 444

  • @lavafroth
    @lavafroth  หลายเดือนก่อน +261

    Errata:
    0:30 - I misspelled "understandable"
    2:08 missed "accum1" variable declaration (thanks @morels)
    6:25 - Technically the addresses are offsets from the image base (thanks @qwendolyn5421)

    • @xClairy
      @xClairy หลายเดือนก่อน +19

      Thought that was intentional

    • @vilian9185
      @vilian9185 หลายเดือนก่อน +2

      ​@@xClairylmao same

    • @TheOriginalDuckley
      @TheOriginalDuckley หลายเดือนก่อน +8

      Genuinely thought that it was just a funny ass mistake, as a programmer myself, I know it’s super easy to make them and it’s so obvious once you see it!

    • @FruchtcocktailUndCo
      @FruchtcocktailUndCo หลายเดือนก่อน +7

      understandable.

    • @vilian9185
      @vilian9185 หลายเดือนก่อน +4

      @@FruchtcocktailUndCo have a nice day

  • @undergrounder
    @undergrounder หลายเดือนก่อน +888

    See? It’s not me, it’s the compiler.

    • @monad_tcp
      @monad_tcp หลายเดือนก่อน +48

      in this case its the compiler, C# is able to make much better optimized code that actually use SIMD for those cases.
      That's the problem with using VTables in statically compiled languages, they basically can't optimized that.
      If you're going to rely heavily on OO patterns, don't use C++ or Rust, use C# or Java, they're much better optimized for that.

    • @TapetBart
      @TapetBart หลายเดือนก่อน +18

      ​@@monad_tcp compilation time polymorphism on the other hand is based.

    • @monad_tcp
      @monad_tcp หลายเดือนก่อน +12

      @@TapetBart ironically Objective-C was good at that.
      C++ can do it sometimes (that is, if you don't use virtual calls).
      The real problem is that in-lining code across polymorphism is a hard problem.
      When you use templates on C++ it can do it via creating lots of copies, your binary gets huge, but its fast.

    • @mihneabuzatu784
      @mihneabuzatu784 หลายเดือนก่อน +3

      ​@@monad_tcphow could C# apply SIMD in the dynamic dispatch example without knowing what the area function does at compile time? Or are you talking about applying it at runtime?

    • @animarain
      @animarain หลายเดือนก่อน

      Best comment ever!! 🤣

  • @12q8
    @12q8 หลายเดือนก่อน +631

    This is a very niche example, but shows the point.
    I was taught everything Clean Code at uni as well. I remember asking one of my professors the same question “isn’t clean code less performant?”
    And his answer was yes, but it is also easy to read, understand, and modify, and computers are so fast nowadays that, in most cases, the performance impact is negligible.
    In real-life situations, what I’ve seen in industry, is the intuition experienced and knowledgeable developers exhibit in knowing when clean code should be applied over performance and when performance is needed over clean code.
    Something my previous manager taught me.

    • @thisguy.-.
      @thisguy.-. หลายเดือนก่อน +37

      not trying to dispute your experience, as I'm not even employed yet. But I think the example does heavily apply to extensible libraries, something which is more significant to the industry than you give it credit for here. They must keep a minimal performance overhead at every step since they're the backbone of every app in existence. In this example the polymorphic "clean code" is - as stated - great for functionality with external crates, but for internals, enums are practical for not only performances sake but even in usability.
      For end user applications, sure computers are fast and you probably should just not care. But if we apply that logic to all areas of the industry then that's how we get the mess that is electron, npm, and embedded apps that lag for several seconds every time you press a button on a remote. Like you said, it's on the developer to know when they need to write good code or not, but writing performant code is much less niche than you made it out to be here.

    • @liquidsnake6879
      @liquidsnake6879 หลายเดือนก่อน +33

      Depends what you're doing, if performance is critical enough to you then knowledge and application of direct assembly language is a requirement even to this day. No compiler can be trusted to be performant in all scenarios. For the overwhelming majority of programs even a 5x slowdown in a particular calculation produces no significant user-noticeable different in the whole final product.
      But obscure code that is hard to modify and maintain produces daily headaches for your team and for your users as your team struggles to keep the product stable. That's why clean code matters and is superior to concerns of performance most of the time, ofc that's not ALL of the time and that's why we still get paid and haven't been taken over by AI and probably won't

    • @12q8
      @12q8 หลายเดือนก่อน +9

      ​@@thisguy.-.
      I meant the creating a billion shapes examples and the performance benchmark based on adding the area.
      Though it delivers the point.
      I don't think extensible libraries applies completely, since there are generally many that have their pros and cons, and new ones get made alot. One thing you'll see once you start working is how diverse the requirements can get, and many times you'll have to read through the source code to figure out why something is not doing what it should (and then discover the devs added some automagical stuff that changes the params some 4 functions deep) and tracing that is made way easier with clean code rather than obfuscated for performance purposes.
      I've also had to write some projects that used the decorator pattern, and the source code being readable and documented helped me understand how everything works, and knowing a library inside out helps avoid so many headaches and looking up/asking AI what is happening.
      Another thing you'll find is how many work arounds exist for various problems from what you've listed, which usually boils down to figuring out what a command does, and asking yourself "do I need it to do all of this?"
      One such case is monorepos and pulling them. Pulling doesn't just download the source code, but the entire git tree, with all its commits and branches, and you really don't need that. With some reading the docs, and some handle medium articles, you can bring that down from 30mins to mere seconds.
      You probably also had your professors forcing you to read the man pages and documentation, and thinking it is useless when you can just look it up, or asking ChatGPT, but what they are trying to teach you is to be comfortable reading through the docs, because you'll use many libraries and to know how to use them, there won't be CS classes or office hours, you'd have to read the docs.
      I didn't learn this at uni, because I really thought it was a waste of time, but mostly through personal experience hopping over to daily driving Linux and using libraries at work for scripts that use docker and kube.

    • @taragnor
      @taragnor หลายเดือนก่อน +16

      Clean code is generally the rule, with the exception being performance critical sections. And sometimes the benefit of the clean code is worth it. Here we see an extremely simplified example. A shape with one method associated with it, taking the area, with 4 more permutations of shapes. Blow that up to 12 different shapes with 20 different potential methods, and your code becomes a nightmare to read if you want to rely exclusively on enums and match statements. The thing is that clean code standards are mostly for large projects, so it will usually seem nonsensical to apply them to simple toy programs. Seems simple when it's just taking the area of a few shape types, but in real projects you'd probably have rotate, draw, scale, move_vertex, detect_intersection, and all manner of other functions that go with those shapes. And what turns out to at first seem "not that bad", can turn into a bloated looking mess in a hurry.

    • @andreasvaldma8428
      @andreasvaldma8428 หลายเดือนก่อน +7

      Yeah. For example when your writing web services most of the time is not used for computation but waiting for io. In that case it's unreasonable to prefer performance. If you writing a game engine it is a different story.

  • @amongussuss341
    @amongussuss341 หลายเดือนก่อน +79

    Becuase of the heavies and the two decades in the title i thought this was about tf2

  • @sdjhgfkshfswdfhskljh3360
    @sdjhgfkshfswdfhskljh3360 หลายเดือนก่อน +324

    Readability is just yet another variable for optimization like RAM and CPU resources consumption.
    Sometimes you optimize for humans, sometimes for computers - it depends on what is more important in specific place of the code.

    • @cerulity32k
      @cerulity32k 26 วันที่ผ่านมา

      Exactly. Low level and heavy math is super optimized for computation, just look at Q_rqsqrt.

    • @tykjpelk
      @tykjpelk 15 วันที่ผ่านมา +7

      Right. Code that does the wrong thing really fast because you didn't catch a bug isn't very successful.

    • @Krmpfpks
      @Krmpfpks วันที่ผ่านมา

      In addition: Compilers get smarter, architecture changes. If you do not need to solve an actual bottleneck right now, it’s usually better to write readable code as hand optimized coded won’t even be faster in a year or two if you don’t adapt it to whatever new CPU generation is out then.
      But I’m guilty too, I have squeezed out the last bit of performance writing unreadable branchless code and other optimization techniques…

  • @scotmcpherson
    @scotmcpherson หลายเดือนก่อน +280

    This is why I practice what I call Clean Enough Code...not just because you lose access to the hardware, but also there are some software optimizations that just aren't "clean" but uncle bob's definition.

    • @Carltoffel
      @Carltoffel หลายเดือนก่อน +46

      It really depends on how critical the code is. Usually, most of the runtime is spent in a tiny fraction of the code base.
      And keep in mind: Fast is better than slow, but slow is better than unmaintainable.

    • @scotmcpherson
      @scotmcpherson 29 วันที่ผ่านมา

      @@Carltoffel I am assuming you read "clean enough" ?

  • @andrewtran9870
    @andrewtran9870 หลายเดือนก่อน +235

    Insanely high quality content for how small the channel is. Complex concepts are explained simply, visuals are clean and work astonishingly well to help demonstrate the topics

  • @exotic-gem
    @exotic-gem หลายเดือนก่อน +278

    I’m surprised the loop “unrolling” into 4 SIMD accumulators isn’t done by the compiler directly, seems like something it should be able to figure out.

    • @Hardcore_Remixer
      @Hardcore_Remixer หลายเดือนก่อน +33

      Hey, I'm surprised it figured out that it can do SIMD by itself. For now I've been doing it by myself by using Intel's intrinsics.

    • @Jason9637
      @Jason9637 หลายเดือนก่อน +188

      It's actually not allowed to, since reordering floating point operations can change the result

    • @agsystems8220
      @agsystems8220 หลายเดือนก่อน +61

      Floating point arithmatic is not strictly associative in computers. Imagine the case where you have a list consisting of alternation of x and negative x, for a precise x, and an odd number of elements. If you add them in a line you stay near zero, and keep all the precision. If you split it off into 4 accumulators each of them will not stay near zero, and you will lose precision.
      This isn't loop unrolling, because loop unrolling maintains the ordering of the loop. You are talking about telling the cpu that it is an associative fold of a map, and that it can do the adds however it sees fit. Corner cases prevent that.

    • @catgirlQueer
      @catgirlQueer หลายเดือนก่อน

      ​@@Jason9637just have some fun safe math optimizations! (-funsafe-math-optimizations) it'll be fine

    • @LucasSantos-ji1zp
      @LucasSantos-ji1zp หลายเดือนก่อน +12

      @@Jason9637 I wonder if the compiler would do this optimization if we enabled relaxed floating point operations with compiler flags.

  • @R.B.
    @R.B. หลายเดือนก่อน +92

    Clean code isn't about writing the fastest code, it is about writing maintainable code first and foremost. You pointed out the disadvantage using enums, as a derived class wouldn't be able to inherit the area method from its parent, and that is precisely why it is less desirable. There is a time and place where optimization is necessary in a way that will break clean code, but for most programming the algorithms are not often the performance bottleneck anymore. If you've written something where that is the bottleneck, you should ensure you have your validation test cases and then work on the optimization.

    • @georgeweller1
      @georgeweller1 28 วันที่ผ่านมา +9

      Yeah but hotshot kids arent interested in being good developers, they want to be 10x con artists who hop from interview to interview and write shitty blog posts about how everyone else is an idiot and they alone have figured out the truth.

    • @BaremetalBaron
      @BaremetalBaron 28 วันที่ผ่านมา +18

      The real problem isn't performance, the problem is that "clean code" LEADS to unmaintainable messes and the advice is simply BAD. I don't avoid clean code because it's "slow", I avoid clean code because that much indirection and abstraction leaves you constantly digging through over-engineered glue code trying to figure out where anything is, often splitting a simple set of steps into multiple methods in multiple classes across multiple files, so that you have to maintain this deep call graph in your working memory, scrolling around, and tabbing through files, for something that could all fit on screen inline in order.
      It also becomes a source of insidious bugs, because, if something is extracted into a function that doesn't need to be, you now have to worry about when it can be called, what can call it, and under what pre-conditions that produces valid results, whereas inline code simply flows from one statement to the next and the state and flow of execution are obvious (barring say, goto shenanigans).

    • @R.B.
      @R.B. 27 วันที่ผ่านมา +6

      @@BaremetalBaron there's over engineering problems for sure, but there's also a balance which can be struck while still allowing for modular design. If you're building a class which isn't sealed, then concrete implementations, as described, will make maintenance more difficult. It sort of depends on what you're building. For anything significant in size, abstraction allows you to deconstruct the problem into manageable sized chunks.

    • @delphicdescant
      @delphicdescant 27 วันที่ผ่านมา +3

      Maintainability doesn't need to be the #1 priority.
      If you're writing some dull enterprise software "solution" for some mind-numbing corporate web junk, then sure, prioritize maintainability if you want.
      But that's not everybody. So it would be nice to stop hearing the so-called "clean code" doctrine applied so universally.

    • @BaremetalBaron
      @BaremetalBaron 27 วันที่ผ่านมา +8

      @@R.B. I'm not arguing against abstraction, I'm arguing against the specific recommendations of Uncle Bob's "Clean Code" on how to factor your codebase. The recommendations lead to over-factoring, which actually makes it *harder* to understand the system as a whole as it gets larger, because it increases the surface area of the code.

  • @KiraSlith
    @KiraSlith 25 วันที่ผ่านมา +8

    IMO the most interesting thing to come out of this video is actually in the comments, namely the diversity in what people actually consider "clean code" while assuming they're talking about the same thing. Some see "clean code" the same way I do, it's all about laying out code and functions in blocks of self-explanatory code with plain text variables, while others see "clean code" as a matter of using the simplest code possible to achieve the desired output, or using the fewest lines, or modulating the code into meta-packages, just using or not using specific functions, or some arbitrary combination.

  • @yuack2398
    @yuack2398 หลายเดือนก่อน +39

    I write and run programs for HPC. In my field, polymorphism is still used for some highly optimized programs, which are designed to run in thousands of computing nodes in very efficient manner. In this case, they do not use those clean code things deep inside the code, as it is not compatible with optimization, but they are still good solution to manage other parts of programs, where performance doesn't matter.

  • @lucasmontec
    @lucasmontec หลายเดือนก่อน +257

    Only insane or juniors really believe any code/architecture idea is to be followed blindly and everywhere, to it's core. Clean code was created under a large-application context, with up to hundreds of developers working on the same system at the same time. The idea is to make people able to actually write code together, not to generically "write better code". The title is missleading and the content doesn't justify your point. You wrote really simple code, code that requires almost no architecture, and for a very resource intense task, with only one developer working on it, on a modern language that is intended for performance. This is like coding an http server in assembly to justify that apache is bad, or writing an FPGA image processor to say that OpenCV is slow. You can indeed write VERY EXPENSIVE, hard to maintain, ugly and super fast code, most people can, but that code is not at all what most people work with everyday, which is very far away from the bottlenecks of any system. Most code can and should be slower, if it can be fixed and adapter faster. That's exactly what clean code and clean architecture was about... making software soft, easy, adaptable. The computer will always understand you, even if your orders are stupid. People wont. People are also much more expensive.

    • @lavafroth
      @lavafroth  หลายเดือนก่อน +50

      Part of the problem is to come up with a proper middle ground. Blind use of clean code can prevent the compiler to catch places where it can optimize your code (including enabling SIMD). Of course if traits were completely useless, langauges wouldn't ship them but keeping track of these layers of abstractions is also mentally taxing.

    • @adamhenriksson6007
      @adamhenriksson6007 หลายเดือนก่อน +11

      Also, "clean code" add boilerplate-complexity and structure to keep the code from not getting spagetty. One thing I noticed is by doing the easiest possible thing (most primitive, CPU friendly that is), not only is everything easy to understand since it is less structure and less code, but it also is easier to create new features, features can be created with less code, and changes are easier and faster to perform since the program is easier, smaller and with less boilerplate.
      Simplicity has a multiplicative snowballing effect that simplifies and improves the efficiency and output of all future work, which also further snowballs.
      Imagine this but the exact opposite... This has been my exact experience with all pattern-heavy OOP development during my 5+ uni years and all OOP production codebases I've seen so far that relies on OO patterns.
      Also, I even dont care about DRY anymore if it means i can avoid using the worst programming concept ever imagined, "abstract" 🤮

    • @0xO2
      @0xO2 หลายเดือนก่อน +2

      From my pov all such clean big apps are rewritten or abandoned anyways.

    • @lucasmontec
      @lucasmontec หลายเดือนก่อน +8

      @@0xO2 your pov is not science. Most of the professional enterprise applications are clean or "cleaner". Most of the backends in the world are like that too.

    • @lucasmontec
      @lucasmontec หลายเดือนก่อน +6

      @@lavafroth I'm not sure how does it work in rust, but at least in java (where clean was created) and C#, it's not at all taxing. A layer there is a directory, a folder. There are 3 layers. Not taxing at all. Lot only that, you can ignore the layering, you can focus on other aspects. You can extract methods to have smaller units for example, and counter the stackframes by asking the compiler to inline. You can extract variables from inside if conditions, so developers can read your code without having to read comments or interpret complex instructions. You can minimize class sizes, making classes do only one major thing... Again, you don't need to do all of this at once, nor force any of this where it doesn't fit. It's just 80/20 or 90/10! You don't need performance in most of your application, and the readability will pay much more than performance. If you need super high speeds somewhere, you can then just write that performance critical part closer to the metal, even calling simd or other parallel instructions directly. I work in game dev and where we need to talk to the GPU is NEVER clean. It's usually ugly, yet organized, but always hard to read. It needs to be really fast. My whole comment is about context. Architecture is always contextual. It's like precision in machining, it's costly, slower and not necessary everywhere. Still, precision is fundamental and allows machines to work longer, faster and with less stress. No one grinds every face of every part to a few thou tho. For web apps, the layering is well defined, and actually, for most applications that have outputs it's usually: model, business logic, presentation. If those layers don't make sense for your app, maybe your app doesn't need them indeed. Maybe your code is not big enough, most systems and apps are not big enough for such considerations. Just take in mind the problems it's trying to solve.

  • @noobdernoobder6707
    @noobdernoobder6707 หลายเดือนก่อน +11

    "unederstandable" is a beauty in itself.

  • @max_ishere
    @max_ishere 28 วันที่ผ่านมา +14

    I can't believe you had tf2 and Rust in the thumbnail

  • @Rudxain
    @Rudxain 27 วันที่ผ่านมา +3

    I agree with the "moral of the story", but we should remember that premature optimization is bad, especially if there's no profiling/benches! @WhatsACreel talked about branchless programming, and why some "clever" optimizations that remove conditional execution might confuse the compiler, causing the emission of suboptimal code.
    I recommend Michael Abrash's "Graphics Programming Black Book: Special Edition", where he uses greatest-common-divisor as an example: If we find the GCD by making use of the verbatim definition, it runs as O(2^n) (where n is bit-length), but if we use the Euclidean algorithm it runs as O(n^2) (same as multiplication). Both algorithms are simple and readable, but Euclid still wins.
    I'm aware those examples have nothing to do with polymorphism, but they do impact performance

  • @Blacksun777
    @Blacksun777 หลายเดือนก่อน +11

    If not using clean code makes you "automatically" programm with knowledge of compiler/hw optimizations, yes. But I don't. I find performance issues arise not of inefficient for loops or the like, but much more because of a complex interplay in the system architecture. The only place I needed optimizations as shown, was when doing benchmark optimizations on toy examples.
    Actually it would be interesting to see how to take advantage of these HW optimizations. More of them and also in different language types(interpreted, Intermediate Code based ones)

  • @zerotwo7319
    @zerotwo7319 หลายเดือนก่อน +123

    If your project value speed, you create that pattern, write documentation and that will be your modular clean code. It is all about your needs and not some random guy.

  • @mikkelens
    @mikkelens หลายเดือนก่อน +52

    Polymorphism does not inherently mean dynamic dispatch, and it disapoints me to see you not mentioning the static dispatch you can get with impl Trait in rust, which in many cases is the perfect solution to a lot of these problems, potentially even being more efficient than grouping types using an enum. The only reason you avoided this was because you can’t use impl Trait as a standin for multiple arbitrary implementations at the same time (they need to be in a vector with the same internal element size), but this is only something you did for your benchmark without arguing why you needed it. Why do the shapes in your implementation need to be in the same vector? It seems incredibly arbitrary to me, and it kind of ruins the narrative you’re going for in the video, at least to me. This video is very well produced, and I think you can make great things (although you need a better microphone), but I do wish you were more critical of your “assumptions” if you will, even if the video concept was based on other literature. I really wish you dove into the type-level SIMD that is experimental in rust right now but is a way better solution to your problem of choice.

    • @lavafroth
      @lavafroth  หลายเดือนก่อน +21

      True, nightly rust SIMD is a better solution. Perhaps I'll make a more in depth video in the future. Thank you for the criticism.

    • @yuitachibana8829
      @yuitachibana8829 หลายเดือนก่อน +1

      One reason I can think of using Vec is to handle entity processing in a simulation/game where ordering matters between different types of entity.

    • @WelteamOfficial
      @WelteamOfficial หลายเดือนก่อน

      ​@@lavafrothwhat do you think about his suggestion to use static dispatch ?

    • @curatedmemes9406
      @curatedmemes9406 29 วันที่ผ่านมา

      @@yuitachibana8829 YUI????

    • @asdfghyter
      @asdfghyter 28 วันที่ผ่านมา +2

      @@yuitachibana8829 This is exactly why Entity Component Systems are more performant in game engines, since they don't place all objects in a big bag of dynamically dispatched trait objects

  • @doom-child
    @doom-child หลายเดือนก่อน +13

    This is really well done. You made some pretty detailed stuff make intuitive sense.
    It sounds like English might not be your first language, so I thought I'd let you know about something that really tripped me up in your pronunciation. The word "register" is pronounced with the stress on the first syllable, as in "REG-ister", not the second. I thought for the first couple of times that you were saying "resistor" (where the stress is on the second syllable, as in "re-SIS-tor"). English is so weird.

  • @Ruzgfpegk
    @Ruzgfpegk 27 วันที่ผ่านมา +3

    As far as I remember, the "Clean Code" book is more of a list of common issues arising in software development, associated with the "rules" Robert C. Martin found to avoid them.
    And all the rules aren't meant to be applied on all projects.
    What's more important is to often check that we aren't "footgunning" ourselves or our colleagues either in the present or the future, by recognizing in advance we're heading towards a tumultuous direction, and to act on time by applying one of the rules if it's applicable.
    I think I remember one of the takes being like "organizing code can have an efficiency cost, but if you organize better the vast majority of your code that is not speed-critical then everybody wins".
    The example here would be a speed-critical part, so it wouldn't need to follow closely every "Clean Code" paradigm.
    What negates two decades of hardware optimization is more often a lack of skills (not having the right indexes on a database, not caching what could be cached, not profiling the code, not using the right tool for the right job, …) and knowledge (not having experienced lower-level programming and not having any idea of how the code runs in the end) than trying to follow too closely a set of ideal rules.

  • @X39
    @X39 หลายเดือนก่อน +23

    I think you have a general misunderstanding here at place. Clean code does not violate the performance as it works on a different rule in software development:
    Rule 2: Software must be maintainable, except where it collides with the first rule
    Rule one being: a program must be correct. If performance is a requirement (eg. A render target of 120fps must be reached, but boxing would introduce too much latency) , abandoning clean code principles is hence a must, where a violation of rule 1 would be introduced otherwise.
    For all other cases tho, performance is part of rule 3: Programms must be efficient, except where it violates rule 1 or to.
    Clean code hence is very relevant, because most applications do not have a relevant target speed beyond (nice to have). If speed for specific sections is required, hindering hence the correctness of a program, violating some or all principles of clean code is a must.

    • @MayoiHachikuji88
      @MayoiHachikuji88 27 วันที่ผ่านมา

      I don't believe in "maintainable" software. For any given problem, there's an efficient solution, if requirements change, 90% of it will have to be thrown away.
      "Maintainable" software leads to legacy nightmares. It's quite literally easier to create something new. This garbage industry is plagued by sunk cost fallacy that could be avoided if idiots who are investing money into this realized that in the long run it will be cheaper to tailor N programs to N problems than to make one program with M amount of parameters that can solve N problems.
      Ironic that you mention games, because games are proof that this works, video game companies literally throw away 90% of old game code and write new code for new games most of the time, renderer is the only part that really cannot be thrown away because it's an already solved problem and there's nothing to rewrite.

    • @delphicdescant
      @delphicdescant 27 วันที่ผ่านมา

      Ok, but what if someone says your rules #2 and #3 should be swapped? Whether or not your rules as written came from some very highly-respected source, they are still up for debate.

    • @X39
      @X39 27 วันที่ผ่านมา +1

      @@delphicdescant They are the basic rules for our very industry. There ain't no debating.
      Software must be correct, maintainable and fast
      You can't have fast,maintainable,correct, as your software then ain't running.
      correct,fast,maintainable simply makes no sense.

    • @delphicdescant
      @delphicdescant 27 วันที่ผ่านมา +4

      @@X39 > correct,fast,maintainable simply makes no sense.
      Why not? You're just stating "this is the way it is, and nobody can argue." So basically just 100% dogma and 0% rational.
      The "industry" shouldn't be in the practice of following magical spells and reciting doctrines.

    • @TheMelnTeam
      @TheMelnTeam 27 วันที่ผ่านมา +2

      Maintainable is pretty important to sustaining both correct and fast if the software will see updates.
      If you're sure it won't and you're right, it's perfectly maintainable.
      Picking fast over maintainable will put correctness and fast on a timer. Sooner or later, probably sooner, you will need new software.
      But perhaps this is an acceptable sacrifice in some contexts.

  • @agsystems8220
    @agsystems8220 หลายเดือนก่อน +29

    Arguably a better option would be to have a cleverer structure than a vector that partitions it's elements by type. When you ask it to map this structure to area it can look at the type v-table once for each block (rather than per shape), run the same code on each shape in the block, and then zip along accumulating the output once the are all done. In the dumb loop version we are resolving the code for each individual shape at run time, despite them all falling into a small number of buckets that could easily be batched and the code on each of the buckets being inlined and optimised. The compiler has to do this because the functions it may be calling might not even exist yet.
    The next level would be to use a macro to implement the traits in question for vectors (as a new type for scoping reasons) of each specific type as a mapping. The code just needs to apply a map to the vector, and the compiler will unroll the loop and parallelise. Then our smart structure would contain a set of blocks that all implement the trait (using the v table), but each individual block will run vectorised code over itself in concrete way. Unfortunately the compiler doesn't seem to create vectorised versions of functions by default, so we need to hack it a bit. It also means we need to prespecify the concrete types that might exist, so we are not quite as extensible as we would like, but adding a new type would be just adding one line implementing the macro on a new shape type.
    This toy example is easy to beat, because the area functions are essentially the same, the code is small enough that maintenance is easy, and the functions are tiny so that the overhead becomes the major factor. It doesn't need to worry about scaling problems that might come later. I would sort of reject your assumption that the code presented is optimal though, because you have fallen into a trap that abstraction steers you away from; When presented with a stream of objects of different types you generally want to split it into streams based on type and handle them concretely. What you have done is collapsed the types (that encode information for optimisation) into one mega type that you are attempting to handle concretely. Here it is possible, and pretty efficient, but it doesn't generalise and is hard to maintain.
    The problem isn't that the types are abstract, it is that you are mixing abstract types with a concrete loop and expecting it to perform well. What you should be doing is abstracting away the loop into something that lets you leverage the type information in a smarter way.

    • @MayoiHachikuji88
      @MayoiHachikuji88 27 วันที่ผ่านมา

      It would be more clever to stop being lazy and stop trying to push compile time decisions to runtime.
      For example in a raytracer, lets say you support spheres and planes. Well okay you also support collections of planes for things like cubes. Your planes can also differ, triangle vs trapezoid and so on...
      Tell me, why do we need this v-table bullshit for what should be few separate handwritten loops?

  • @user-tv6sw3vt9q
    @user-tv6sw3vt9q 16 วันที่ผ่านมา +2

    Clean code is optimization for the human operator and, by extension, human-operated constructs.

  • @chrsmtt
    @chrsmtt หลายเดือนก่อน +18

    Absolutely incredible! You made a complicated topic very easy to grasp, keep it up!
    Thanks for covering it in such detail.

  • @L1n34r
    @L1n34r 27 วันที่ผ่านมา +2

    I think it's important to remember the "why" behind approaches like clean code. Why do it? Because it reduces the time spent looking for bugs, having new team members understand the codebase, or changing / adding onto existing functionality. That's the why. Why not do it? Well, if you want to squeeze every drop of performance out of a specific critical section of code. So if your product is complex accounting software that needs to work with the least amount of bugs possible while requirements change once a year and older sections of code constantly need to be revised to support new accounting legislation, then maybe clean code is a great idea. If you're working on a video codec library, maybe speed is more important. I would say the vast majority of software written does not need to run as fast as it's possible for it to run, but it does need to be bug-free, easily maintainable, and written in a timely manner.

  • @thomashamilton564
    @thomashamilton564 หลายเดือนก่อน +8

    Note that the XMM registers are 128bit, while most modern CPUs have got access to the YMM registers which are 256bits (AVX2), rust doesn't use the YMM registers by default for portability as you mention. Would be interesting to see what speed up you get (if any) from using target-cpu=native when compiling. Also note that generally speaking you don't need to unroll loops to get simd instructions, it seems kinda random and bad that the compiler didn't do it for you in this case.

    • @1e1001
      @1e1001 หลายเดือนก่อน

      & if you have avx512 you also get zmmN registers which are 512 bits

  • @TehGettinq
    @TehGettinq หลายเดือนก่อน +2

    Btw for library design in rust: you can still use enum dispatch (the technique you showed using an enum instead of a dyn trait) and allow users of the library to implement it for their own types. You simply need to have a variant that contains a Box where T impl the trait (here T would be the type the lib's user implements himself). That way the library user can extend it (its still slightly less flexible).
    A problem with enum dispatch is when you have too many types implementing the trait/functionality, it becomes harder to manage.

  • @ImaskarDono
    @ImaskarDono 29 วันที่ผ่านมา +1

    This example works, because the operation is very similar between shapes. If the method implementation would be very different with many instructions, there would be much less difference.

  • @DeathSugar
    @DeathSugar 28 วันที่ผ่านมา +1

    There are two major notes here - specialized algorithm over well known data types will always outperform algorithms over generalized types. You always can write assembly which (sometimes) will be more efficient than compiler. But would it be maintainable in the long run? No. That's why clean code.
    You can write pretty clean code using iterators APIs instead (map, filter, reduce approach), it would provide pretty efficient baseline with auto vectorization (which means it will use SIMD, if possible). But there are a lot kinds of tasks which aren't reducible and will not be optimized, so you should at least make it maintainable and fast enough, coz generally you almost always will have some room to squish some more juice.

  • @evlogiy
    @evlogiy หลายเดือนก่อน +24

    If I encounter these split accumulators in code, I would be very angry because, in some cases, it can make the code slower instead of faster, and readability is significantly compromised. Trying to optimize the compiler implicitly never ends well in my experience. If you need to force the compiler to apply SIMD instructions, do this explicitly by using std::intrinsics::simd.

    • @lavafroth
      @lavafroth  หลายเดือนก่อน

      std::intrinsics::simd is arguably better but not everyone uses nightly.

    • @JerehmiaBoaz
      @JerehmiaBoaz หลายเดือนก่อน +4

      ​@@lavafroth No it isn't, avoid hand coding optimizations (which also negatively impact code readability) if the compiler can achieve the same. You've benchmarked your optimization so it can only be slower on a different platform but that's also true for a hand coded simd optimization, so the argument is moot unless you take it to mean that you shouldn't optimize at all and produce clean code instead.

    • @felixjohnson3874
      @felixjohnson3874 หลายเดือนก่อน

      I did a quick search because it seemed odd that the Rust compiler wasn't doing this automatically and it seems like Crates like "slipstream" might be a better middle ground here

    • @JerehmiaBoaz
      @JerehmiaBoaz 29 วันที่ผ่านมา +1

      ​@@felixjohnson3874 Creating 4 partial sums and adding them together could result in precision loss. If the floats range from very small to very large and they're sorted, the 4 partial sums will all have very different magnitudes (the first will contain a very small number while the last one will contain a very large number) and if you add those together you'll lose precision. IOW the compiler is correct here..

    • @felixjohnson3874
      @felixjohnson3874 29 วันที่ผ่านมา +2

      @@JerehmiaBoaz yes, but we're talking a precision loss that is magnitudes away from being relevant in 99.9% of cases. This optimization should, at most, be a compiler flag, not require alterations of the codebase to nudge it to do what we want. The cases where this precision loss matters are both literal and figurative rounding errors and in every other case it's an optimization about as 'free' as optimizations can get yet yeilds massive returns.
      As a general rule if you need to restructure your code in a functionally meaningless way to nudge the compiler to do what it already can and should be doing that's a pretty significant problem. The vast majority of use cases will not written with performance consciousness, so as many free or near-free optimizations should be applied or made standard practice as possible. If every user application ran 10% faster life would be a hell of a lot better overall, but those applications weren't written intentionally to be fast because they didn't need to be. Getting those 'non-performance critical' applications benefiting as seamlessly as possible from every available optimization is important because those are the applications that have the easiest pickings and still yield meaningful QoL improvements for the end user. I mean thats basically the philosophy behind Rust at its core "Dedicated people can make fast code; dedicated people can make safe code; how do we let non-dedicated people make fast and safe code"

  • @qu765
    @qu765 หลายเดือนก่อน +37

    uncle bob is a strong supporter of reducing cost to developers ir better than reducing costs to servers
    which is less true with google now spending more on computer than developers
    and also does not apply at all to anything front end

    • @b.6603
      @b.6603 หลายเดือนก่อน +15

      Hahahaha it ABSOLUTELY applies to frontend
      If you think it doesn't, it means you have never worked in a big frontend.
      Of course in the frontend resources are much more constrained. But the way to get performance is avoiding stuff like unnecessary repainting and blocking the event loop, not by avoiding an extra function call.

    • @SKULDROPR
      @SKULDROPR หลายเดือนก่อน +3

      I don't do front end so I wouldn't know. From what I can tell, front end sometimes has to run on very low power devices. Like a cheap TV, Chromecast or even a fridge for example. Performance would probably matter in these cases, wouldn't it?

    • @qu765
      @qu765 21 วันที่ผ่านมา

      @@SKULDROPR yea that's what i'm saying

  • @tonik2558
    @tonik2558 27 วันที่ผ่านมา +1

    Adding in the fast_fp crate allows you to keep the clean code, while also being faster than the hand optimized version:
    running 6 tests
    test ma::tests::corner_area ... bench: 7,085 ns/iter (+/- 10)
    test ma::tests::corner_area_ff ... bench: 3,587 ns/iter (+/- 13)
    test ma::tests::corner_area_sum4 ... bench: 3,636 ns/iter (+/- 287)
    test ma::tests::total_area ... bench: 7,085 ns/iter (+/- 6)
    test ma::tests::total_area_ff ... bench: 2,903 ns/iter (+/- 31)
    test ma::tests::total_area_sum4 ... bench: 3,641 ns/iter (+/- 53)

  • @Markov39
    @Markov39 29 วันที่ผ่านมา +1

    I think "clean code" is also more testable, which is very important for any product.

  • @Hector-bj3ls
    @Hector-bj3ls หลายเดือนก่อน +22

    Just wanted to mention that Christian Mutti's blog was not the original. I'm sure many people have talked about this before, but specifically the blog mentions a video by Casey Muratori: th-cam.com/video/tD5NrevFtbU/w-d-xo.html
    I think you should add it to the list of sources in the video description.

  • @felix30471
    @felix30471 หลายเดือนก่อน +3

    Eh, I'd say that "debunking" is a bit too strong of a word here.
    Don't get me wrong, I absolutely believe that the performance costs of dynamic dispatch vs static dispatch are something worth talking and knowing. They should be kept in mind, but aren't the only factor that decide what is right for a particular situation.

  • @ujiltromm7358
    @ujiltromm7358 18 ชั่วโมงที่ผ่านมา

    There is one point that I haven't seen discussed much in the comments, it is the environmental impact of "clean code".
    Higher performance means higher carbon footprint usually, because we associate that to higher energy consumption from newer, more power-hungry hardware.
    Here however, higher performance is achieved through leaner, more efficient code, but it isn't prioritized over maintainability and that leads to higher energy consumption.
    If there is one argument in favor of "unclean" code, it will be its greenness.

  • @rafaels9790
    @rafaels9790 28 วันที่ผ่านมา +1

    Kind of a bait and switch as the title implies a dunk on Clean Code. Other than that, you do fantastic visualizations and easy to understand explanations throughout the video. Very good work!

  • @treelibrarian7618
    @treelibrarian7618 16 ชั่วโมงที่ผ่านมา

    As someone who writes a lot of avx code in asm, it has become clear to me that the foundations of programming, the data structures and code-flow constructs*, that were good bases for abstractions when they were designed back in the 1970's and 1980's due to the functionality of the processing hardware of the time, are not good or fitting to modern processing hardware. I can, however, imagine (though only vaguely at this point) what more suitable abstractions for modern hardware might look like**. The basis of C and thereby C++, and nearly everything that has followed, was what the asm programmers found themselves doing over and over and over again back when C was created. The things I end up doing over and over are not the same things they were doing, because the hardware has changed. Creating highly performant and readable high-level code should be possible, but new base language constructs and the ways of thinking that go with them are needed to fit with the highly-pipelined***, simd-capable hardware we now have.
    What we have are trucks capable of carrying many tons of cargo each in an efficient way down a long straight highway, but we are asking them to navigate tiny streets with may tight turns to deliver just a single 1kg package most of the time.
    * eg. polymorphism through function-pointer tables attached to multi-data-type structures represented by a single pointer to an assorted collection of data items in memory
    ** a variety of bulk data types with associated higher-order functions that take lamdas that the compiler can inline and properly optimize for the data types involved
    *** one of the biggest issues for a modern CPU is knowing far enough in advance what code it will be running. When it gets its predictions wrong, it incurs a heavy penalty equivalent to >100 instructions being executed, and it's only getting worse as the execution cores get wider and the decode-pipeline stages before them get smaller. Code which doesn't have to decide which way it goes when it gets there based on something it will only know when it's got there allows execution to flow at the full speed the core is capable of.

  • @GLeD101
    @GLeD101 27 วันที่ผ่านมา

    The box isn’t just slower from the deref it’s largely because of cache misses caused by loss of locality

  • @xeviusUsagi
    @xeviusUsagi หลายเดือนก่อน +11

    insane quality and explanation!
    if you keep this up, I can bet this channel will grow quite a lot 💪

  • @l4zycod3r
    @l4zycod3r หลายเดือนก่อน +16

    “Premature Optimization Is the Root of All Evil”, dynamic dispatch costs basically negligible when compared to heavy computation code. So this is example of a very specific and unusual case when actual function is too simple. In general it is better to write working code, profile it, find bottlenecks and optimize, if it is really a concern

    • @asdfghyter
      @asdfghyter 28 วันที่ผ่านมา

      There are definitely cases where you can easily accidentally get dynamic dispatch in the tight performance-critical loop of a system in ways that are difficult to fix afterwards. One common example is when using OOP for a game engine. In this case, there is a risk that every single rendered object performs dynamic dispatch several times in every frame, which can have a significant performance impact when you have tons of tiny objects.
      If you have done this, you can't really fix this issue by making small tweaks, since it's built into the core of the design of your entire system. Your only options here are to either just cope with the issue and try to optimize other places or to make an entirely different system.
      This is exactly the issue that Entity Component Systems were made to solve. In addition to reducing dynamic dispatch, they also improve cache locality over OOP based game engines, by placing similar data together and they reduce branch prediction misses by handling one kind of object at a time instead of working with a big pile of mixed objects.
      In the end, you could still use polymorphism with an ECS, but you would move the polymorphism higher up, so it's not used inside a tight loop and so you don't get arrays of polymorphic objects

  • @kyoai
    @kyoai 5 วันที่ผ่านมา

    I think it's important to keep in mind what the focus is. The point of "clean code" is readability, maintainability and extendibility, not performance. On the other hand, this super-optimized code is very fast, but on a large scale very hard to maintain, read and especially to extend, as you'd have to edit dozens or hundreds of switch-case statements all over your codebase once you want to introduce a new type. The best solution, in my opinion, is a clean code API/clean code for publicly accessible types, so they are easy to use and extend, while having highly optimized code in the internal details of your library, aka the code locations where the important work is done.

  • @Andrew90046zero
    @Andrew90046zero 27 วันที่ผ่านมา +1

    To me, “clean code” is a broader idea about putting more care into how code is written so that your not staring at some function for 2 hours trying to figure out wtf it does, because the author tried to be smart and do manual optimizations that were only relevant 20+ years ago.
    Compilers now do a lot of the boring optimizations that are important so we can focus on writing code the way we think about it in our brains.

  • @george-broughton
    @george-broughton 15 วันที่ผ่านมา

    I've known about radare2 for YEARS and this video was the one hurdle to actually getting to grips with it and understanding how to use it lmao

  • @foxiewhisper
    @foxiewhisper หลายเดือนก่อน +2

    Every so often, the YT algo brings up gems like these. Love these early videos from new hungry creators. +1 sub.

  • @Dr-Zed
    @Dr-Zed หลายเดือนก่อน +1

    Incredible video. I loved the visual SIMD explaination!

  • @chasebrower7816
    @chasebrower7816 29 วันที่ผ่านมา +1

    IMO this is a matter of applying the right tool for the job. The vast majority of code you write in application-level software has zero performance implications, so long as it is written with the least bit of competence. The reason is that the few operations that have non-negligible performance cost are generally off-loaded to libraries that can handle that sort of thing. Clean Code principles are best applied to application development, not core framework/library/embedded/low-level development.
    If you were to look at any of the apps I've worked on recently, and you picked out random functions from my code, you could probably lengthen the execution time of that function by 100x and the user wouldn't even realize. In this case, I strongly prefer every bit of readability.

  • @JH-pe3ro
    @JH-pe3ro หลายเดือนก่อน +3

    There's a contrast to be made between "Clean Code" methods and "Thinking Forth". Clean Code is about managing the complexity you are presented with by, in essence, shuffling around the papers so that it looks nice. It exists within the reality of having a huge codebase with a lot of code that is usually executed just once during the program's lifespan to configure something, and therefore it never has a performance problem. Thinking Forth - and most of the ideas of Forth - is about defining down the problem until you don't have a complex problem, so you need to throw less hardware at it and you understand that hardware better.
    So when we are presented with something like x86 SIMD floating-point instructions, it's already "too complex to be good Forth". You would be advised to design a fixed-point solution instead, since you can make equivalent fixed-point numeric code that is more accurate for a given range, and use less silicon.

  • @asifzamanpls
    @asifzamanpls 27 วันที่ผ่านมา

    Very interesting. I noticed a similar performance boost when benchmarking loops for some database code a while ago but I had no idea it was due to SIMD extensions. I guess I should dig into the generated assembly more often.

  • @TheAlexgoodlife
    @TheAlexgoodlife หลายเดือนก่อน +2

    Really clean animations what do you use to make them?

    • @lavafroth
      @lavafroth  หลายเดือนก่อน +2

      I use Manim (by 3blue1brown).

  • @ddystopia8091
    @ddystopia8091 หลายเดือนก่อน +2

    With dod you would have each shape in it's own array and work on each array at a time, with no conditionals whatsoever

  • @rules1874
    @rules1874 28 วันที่ผ่านมา

    why the fuck am I getting recommended Rust, I've never made a program in my life. Algo has blessed you bruh.

  • @PikeBot
    @PikeBot 27 วันที่ผ่านมา +1

    I can’t believe someone made a sequel video to Muratori’s garbage fire, a software talk so wrong - and so confident in its complete wrongness - that listening to it made me the angriest I’ve ever been in living memory.

  • @zorrozalai
    @zorrozalai หลายเดือนก่อน

    If you just want to know the total area of circles, you can sum up the r^2 values, and multiply the sum by PI. It should further speed up the code.

  • @gustawbobowski1333
    @gustawbobowski1333 28 วันที่ผ่านมา

    Beautiful motion design, Great vid.

  • @dawre3124
    @dawre3124 หลายเดือนก่อน +1

    I have never programmed in rust. From coding in C the diffrence between default compiler flags, O3, Ofast and O3/Ofast + march=native would be cool to see gor the duffrent versions. I don't think default C compilers would change float ops without the Ofast flag actually

  • @Gell-lo
    @Gell-lo 27 วันที่ผ่านมา +1

    I'm here because the thumbnail tbh. But looks cool.

  • @shauas4224
    @shauas4224 28 วันที่ผ่านมา

    We are not even gonna talk about Burst compiler. That is blowing my mind every time i use it

  • @MarkTomczak
    @MarkTomczak หลายเดือนก่อน

    This is a very good breakdown of a specific optimization. I would say that in general, The Meta reasoning here is "If you are going to optimize, you have to care about what the compiler actually does." The key idea here is "What code do I have to write to allow the system to take advantage of the CMD features on my CPU?"
    And more generally, that might not even be the goal. Depending on the workload it is possible that you want to fix this problem by bundling up all these transformations into GPU workloads. But that's another optimization where knowing how to take heterogeneous data and homogenize it is useful.

  • @monsieurouxx
    @monsieurouxx หลายเดือนก่อน +8

    Meh. I'm caricaturing, but this talk is more or less "don't use clean code in GPU pipeline and in assembly". I feel like you're missing the point of clean code principles.

  • @codercommand
    @codercommand หลายเดือนก่อน

    Great video, but why did you not explain the third version? What's the major difference between an enum wrapping structs and structs that contain a constant/value/enum. I'm curious to know why one is faster than the other.

    • @lavafroth
      @lavafroth  หลายเดือนก่อน

      You're basically using a lookup table as described. So in an ideal case, the floating point numbers are laid out packed next to each other in the struct. This makes it easier to load them into registers (look ma! no more vtables).

  • @i-am-linja
    @i-am-linja หลายเดือนก่อน +5

    I didn't know that guy's name but I instantly recognised his face. He's the guy who advocated for _one_ programming language for _every application._ I will never take one word he ever says seriously.

  • @ddre54
    @ddre54 29 วันที่ผ่านมา

    Great content and interesting insights. It will be interesting to see the same benchmark analysis with C, C++ or Java languages.

  • @gotoastal
    @gotoastal 27 วันที่ผ่านมา

    You can remove the `flake-utils` dependency from this flake by inlining the loop. This will save you an entire dependency & if you include the lockfile always nets less lines of code.

  • @SKULDROPR
    @SKULDROPR หลายเดือนก่อน

    Pretty trippy how the compiler automatically handles changing things to SIMD so effectively. Always pays to have a look at the asm when performance is concerned. I am also mindful of what gets put on the heap too, as illustrated in this video. I often think, "How can I do this while keeping things close to the CPU", when performance is is the primary concern of course.

  • @VanStabHolme
    @VanStabHolme 26 วันที่ผ่านมา

    I did some tests as well and I found out that:
    - Plain loop is always the slowest, since compiler doesn't have much context to work with
    - SIMD-hinted loop is fastest *if* you're pre-allocating everything correctly
    - Iterators is both idiomatic *and* fast, since it is specifically optimized by the compiler and has a size_hint method to automagically pre-allocate for you
    All of this assuming static enums, dynamic dispatch should be avoided and is poorly optimized by the compiler. Iterators are the slowest (even slower than a plain loop) when it comes to dynamic dispatch.

  • @ferdynandkiepski5026
    @ferdynandkiepski5026 หลายเดือนก่อน

    At this point most if not all cpus have avx2. The proper way is to use something like cfg-if to check register width. But for code ran on a known target cpu, whether it's your own mavhine or a server setting an explicit target-cpu flag in rust would make use of the available instructions. The only case where you should worry about portability is CI/CD release builds, for the rest of the code it's safe to assume target-cpu native as cross compiling is rare. And if someone does it they know what they're doing.

  • @dexus340
    @dexus340 29 วันที่ผ่านมา

    I believe SSE and SSE2 are included in the x86-64 definition; so those extension *should* be present on all 64-bit x86 CPUs.

  • @alexstone691
    @alexstone691 28 วันที่ผ่านมา

    Writing a fast code from the start should fit under "premature optimization"

  • @colejohnson2230
    @colejohnson2230 หลายเดือนก่อน

    I wonder if one of the built in reduction methods like fold would enable the same performance boosts without the added code smell

  • @madks13
    @madks13 28 วันที่ผ่านมา

    A bit late since the video just poped up on my feed, but i do have an argument : 10M+ LoC, a project i am working on currently.
    All i want to point out is Clean Code is a tool, and like all tools, you need to use them when appropriate.

  • @asdfghyter
    @asdfghyter 28 วันที่ผ่านมา

    I was wondering if the specific number 4 made a difference since there are exactly four shapes and they get added separately to their respective accumulators, but looking at the generated assembly I'm guessing that that wasn't relevant

  • @martandrmc
    @martandrmc หลายเดือนก่อน

    The title made me think you were gonna talk about spectre and how speculative execution will have to ultimately be dropped but I was surprised that it was about SIMD instead!

  • @AbhayKumar-gl5hh
    @AbhayKumar-gl5hh หลายเดือนก่อน

    Can i know the font u used to show code??

  • @TTOO136
    @TTOO136 28 วันที่ผ่านมา

    This is really really interesting, wanted to comment to boost engagement :)

  • @TechnologyRules
    @TechnologyRules 28 วันที่ผ่านมา

    Thank you so much for this video.

  • @glitchy_weasel
    @glitchy_weasel 28 วันที่ผ่านมา

    What a fantastic video! So my take away is that clean code is not the best approach for number crunching - like the core of a simulation solver, a renderer, or similar. Rather, it is better to use clean code in places where functionality can be enhanced by it, rather than hindered. Thoughts? Also, funny thumbnail btw.

  • @shroomer3867
    @shroomer3867 27 วันที่ผ่านมา +1

    Meanwhile I'm here, happy that my code even runs to begin with...

  • @Bobo-ox7fj
    @Bobo-ox7fj 28 วันที่ผ่านมา

    Love the "bugger it, computers are quick nowadays, nobody will care if my tiny utility eats six gigs of ram, has an insane memory leak on top of that and needs a quad core processor to run without lagging" approach to programming... err, I mean the clean code approach.

  • @addmoreice
    @addmoreice หลายเดือนก่อน +1

    Reg. Is. Ster.
    not Regist. Er.
    Other than that annoyance this was really well done.

  • @SteinGauslaaStrindhaug
    @SteinGauslaaStrindhaug 20 วันที่ผ่านมา

    The fact that you needed to use 4 accumulators to "trick" the compiler into using optimisations, seems more like a problem with the compiler than anything. If you had written it in a funtional way in a language that either is purely functional, has some kind of syntax to indicate that a particular function is purely functional or the compiler is simply good enough to detect that a function is in fact purely functional; and then written the loop using a standard sum or reduce function the compiler should know that applying a pure function on a collection is always parallelizable.
    When using a more manual loop construct like a for loop or an iterator like in this Rust code, the compiler has to make a lot of assumptions about your intent to know if it is parallelizable or not; but when you map a pure function over a collection it's a clear sign of intent that you don't care about the order of operations only that it's applied to all of the elements.

  • @user-dc9zo7ek5j
    @user-dc9zo7ek5j 26 วันที่ผ่านมา

    I think I have seen the same video before, this one has animation while the previous one did not. As other commenters said, clean code does not have to be slow. Clean code does not have to be bloated or have 1 function per file or be way too abstract with 5 layers of indirection. Most developers blindly apply what someone told them at the bootcamp, but they are way too short to show most nuances of development. Here is the blindly followered result: view model, controller, view model to controller dto, iservice, serviceimpl, controller dto to service dto, irepo, repoimpl, iservicedto to repo dto. This is not only unreadable, but also hard to follow and extend and multiple times slower than it needs to be, because devs add mappers and ORMs which makes those indirections pointless and the code is layed out in such a "maybe" way that the compiler cannot do much about it. Trimming? AOT? Inlining? Contrary to popular belief, abstraction and layers in some scenarios improve performance, because it allows developers to work with the bigger picture. For example, making the most optimal function for reading of one file, modifying it, and writing to another file will perform worse, than one that caches the read part and has buffer abstractions which wait before writing bytes. You might say, well I can optimize that, but when the logic is way too intertwined, then it will be hard to see the bigger picture.

  • @ralfmimoun2826
    @ralfmimoun2826 หลายเดือนก่อน +1

    In "classic" programming languages, the first optimization would be to switch the loops: make "for shape..." the outer loop and "b.iter" the inner loop. I'd be surprised if that would not help here, too. And as long as it gives you the same result, it is a valid optimization.

  • @JodyBruchon
    @JodyBruchon 27 วันที่ผ่านมา +1

    LOL Rust. I'll be here in C world, thanks.

  • @Davy-oq9pn
    @Davy-oq9pn หลายเดือนก่อน

    What's the code font? it looks great

  • @zell4412
    @zell4412 หลายเดือนก่อน

    what font do u use here ?? 👀

  • @CalgarGTX
    @CalgarGTX 29 วันที่ผ่านมา +1

    I would argue that with the way most (on the business side anyway) dev projects are being run these days, with devs coming on and getting off the project left and right it's more important to have a codebase that is as clean/maintenable/understandable by the common denominator dev than a super optimized to hell and back codebase that only the first guy who wrote it can understand and you are stuck with dead code once he's gone.
    That's a sad thing to say but I've seen it happen way too many times. The quality of the average dev these days is imo quite low. And I'd rather have a project where whatever dev ressource is available can actually work on fixing bugs or expending it to support a new use case/feature, than them throwing their hands up because they don't understand what the hell they are looking at and might break more things than they fix when touching anything.
    On a more philosophical level I would have thought it was the compiler's job to turn whatever human written code it's given into a performance optimized runtime ?

  • @t0rg3
    @t0rg3 หลายเดือนก่อน +3

    Is it possible that you completely missed the point of “make it run, make it clean, make it fast”? Yes, some of these abstractions are costly, but for the most part it doesn’t matter. For the one tight inner loop you will still want to use any optimization strategy in the book.

  • @-syn9
    @-syn9 หลายเดือนก่อน

    There's still more performance to be had here, you currently have to unpack the struct within the loop (array of structs). If you had laid out the vector from the beginning as a struct containing 3 vecs, you could save some cycles unpacking (struct of arrays)
    (haven't tested this, interesting video, wish it went a bit deeper)

  • @henry-js
    @henry-js หลายเดือนก่อน

    What font is that? I like it

  • @anyalei
    @anyalei หลายเดือนก่อน

    Super insightful video! I'd argue it's a tradeoff between maintaining/developing software and it's performance. If performance optimisation takes utmost priority, clean code, maintainability, readability etc go out the window. It's just not a concern. Likewise if you're developing software in a large team of not-entirely-illuminated programmers, sticking to abstraction to keep code from devolving into a tightly coupled mess, and you can just throw more compute at the problem, performance optimisation isn't super relevant. Truly outstanding software does both, but we all know that's just not what most companies are aiming at. Software is as shitty as it can get away with, and the _astounding_ hardware performance increases just get eaten up. It's tragic, really. A spreadsheet is just about as snappy these days as it was in 1998, because we added layers of VM that it has to run in.

  • @orbatos
    @orbatos 28 วันที่ผ่านมา +2

    I couldn't help but notice that you ignored Uncle Bob's clarification of clean code during the interview you seen to have watched. tldr; it's not a strict definition that would prevent you from using this type of optimization.

  • @MatichekYoutube
    @MatichekYoutube หลายเดือนก่อน

    wow, what type face is that font? so cool

  • @JG-nm9zk
    @JG-nm9zk 29 วันที่ผ่านมา +1

    resistor? register?

  • @shadamethyst1258
    @shadamethyst1258 หลายเดือนก่อน

    I think that once we'll get a `#[test]`-like macro that lets us gather a bunch of symbols, we'll be able to create a more restricted version of `dyn` on top of `enum_dispatch` instead :)
    Not quite as fast as the last version, but still maintaining the neat semantics of having clearly separated structs and impl blocks.

  • @Boz1211111
    @Boz1211111 18 วันที่ผ่านมา

    Can someone explain? i have no software knowledge but i really wish to know why hardware optimization went down the drain and how is it gonna affect everyone?

  • @liamh1621
    @liamh1621 28 วันที่ผ่านมา

    The only thing Uncle Bob optimises for is book sales and speaking events

  • @afsinbaranbayezit6663
    @afsinbaranbayezit6663 หลายเดือนก่อน +33

    Nice video bro, but I disagree with the conclusion. What I've realized as I get more and more experience is that a smart but inexperienced engineer usually optimizes code whereas an experienced engineer optimizes deadlines, scalability, reusability, and maybe most importantly their sanity lol.
    The moment your project becomes a little bit complicated then a hobby project, not writing "clean code" makes it extremely difficult to develop, maintain, and especially modify your project. Working on a code base that don't focus on writing clean code feels like re-inventing the wheel for every little bugfix or new feature.
    At the end of the day we need to recognize that code is only a tool for creating a product. So instead of focusing how we can use our tools in the most optimal way, we need to focus on what the product we are developing requires. If it requires efficiency, we do that. If it doesn't require efficiency, we can still focus on it for fun if we have the time. But in my experience, prematurely prioritizing efficiency on a serious project always ends badly.
    It is extremely difficult for a developer to break through their habit of premature optimization though. It just feels really bad initially lol.
    Regardless, the video was really well-made, especially for a channel of your size. Good luck with the channel 👍

    • @lavafroth
      @lavafroth  หลายเดือนก่อน +7

      Indeed, I personally think that one should have a very very compelling reason to build an abstraction. I see a lot of folks reach out for abstractions before even getting started. They end up with BuilderObjectFactoryFactory's.

    • @chrishenk4064
      @chrishenk4064 หลายเดือนก่อน +2

      Summed up my thoughts! I was surprised to be recommended from such a small channel but I see why, keep it up and you'll go places with it.
      By coincidence, I had a really compelling example of this at work this week. We're rewriting some of our software in rust, and a lot of stink gets made over small items. I think a big part of it is rust is more up front about static vs dynamic dispatch than other languages, and the younger people are getting excited to go optimize. Most of the time, there's are 1% type differences.
      Today though, I just finished a 7088% throughput improvement to our web service! How? No joke, almost all of it just came from switching from IIS Hostable web core to hyper.
      What I've really come to enjoy about rust is that in doesn't make me choose between abstractions and performance. If you're pedantic enough, everything can be static. In most cases, that's overkill. But for certain things such as library code it makes massive differences to do the boilerplate and tedium. Tower is an example, the types are really obnoxious but it's a big part of why our hyper based implementation has so little overhead.
      What I would encourage people to consider "is the overhead, or the actual work larger?". Avoiding box with chained math, iterators, etc is a big deal percentage wise. Loading forms or database queries? not so much. If you are mostly overhead, and you structure where inlining can't happen, you get these penalties like in the video. That is worth keeping in mind, and it's why you'll see people really cringe at boxed iterators. For those of us using rust, we get the best of both. I still get my nice iterator abstraction, and the compiler gets its chance to optimize.
      Also, I agree people are too quick to whip out "design patterns" and your BuilderObjectFactoryFactory's arise. But I think crates like tower demonstrate the power of zero cost abstractions in functional programming. If you use generics, with traits, and dependency inversion, it's clean and fast

    • @Alice-zj2gm
      @Alice-zj2gm หลายเดือนก่อน +6

      @@lavafroth Definitely agree on your point about the overuse of abstraction sometimes, but I think that's kind of a separate issue where a desire to have clean/modular code ends up becoming the problem it was trying to solve. There's no silver bullet approach to development either way IMO. Every time you build something, your're making a series of tradeoffs (time/money/maintainability/performance/etc.).
      Two big reasons that I would make me consider an abstraction at the start though:
      1. If you're writing code that does things like making network calls or interacting with a database, you might want to use mocks in your unit/integration tests. It's a lot easier to pass in mock db clients, and things of that sort if you're dealing with an abstraction layer like interfaces.
      2. It can make future changes/features/fixes a ton faster as you can often more easily change the implementation for something without needing to go and update the code in all the areas that use it.
      I've inherited a couple codebases from colleagues that tried to optomize the performance of everything all the time with similar tricks where the performance gain didn't end up matering very much, and those codebases are a nightmare to maintain. Our current team rewrote large sections of them to prioritize maintainability, stability, and scalability first. There was a slight drop in performance, but nobody noticed compared to the rest of the overhead we can't control. Regressions/bugs are way down, and adding new features is so much faster. So far, positive feedback from our new developers and onboarding time have also improved dramatically. I think some peoople right now are just getting really stuck on the idea of more performance always being better and that's really all we want to avoid.
      I've also worked with codebases where there was an instance on everything being ultra abstracted/modular and you end up with those "BuilderObjectFactoryManager" cases with terrible performance or build systems because of the insane amount of code that is effectively duplicated or not necessary. My guess would be at at some point there was a trend to overcorrect in the pursuit of "clean code", and now we're seeing the pendulum swing back towards a mindset of performance optomization first (lot of articles on this topic lately, especially with Rust/Zig). I think we will likely also overcorrect and see an increase in codebases that are similarly hard to maintain and work on because of it in the next 3-5 years.
      It's not a one solution for everything kind of problem. Some people totally go overboard with too much abstraction, and some with performance chasing. I also personally feel that a strong reliance on polymorphism (specifically overuse of inheritance and subclasses) tends to quickly become the enemy of both clean/maintainable code and performance. I almost always prefer structuring packages/code to not rely on inheritance and instead use additional member structs or a more functional approach whenever possible, but that's probably a more subjective opinion.
      The purpose of the code and where it runs also have a massive weight when making decisions on what to prioritize in tradeoffs.
      I'm just sharing my braindump on the pros/cons of both that I've personally experienced. Your video was excelent and I think it shines in prompting people to put a little more time into thinking about the performance of what we write and not overusing certain "clean code" patterns.

    • @oscarfriberg7661
      @oscarfriberg7661 หลายเดือนก่อน +1

      ⁠@@Alice-zj2gm Your brain dump aligns with my own experiences. Premature optimization is as bad as premature abstraction.
      Start with a simple solution. Avoid the temptation to over engineer it. Most bad code comes from over engineering.

    • @davidmartensson273
      @davidmartensson273 หลายเดือนก่อน +1

      @@oscarfriberg7661 So true. I almost always prioritize readability over optimized when writing code.
      Once I find areas that really do need to be optimized I will do that but I would never ever start out by trying to optimize something since it makes refactoring it harder and many times once I get to the optimization part I find better solutions to cut computational cost by better understanding exactly what is causing the problem, maybe optimizing the loop is the wrong thing, maybe I could calculate the sum when I build the list as a separate value and skip the whole extra loop altogether.
      That saves even more performance than making the loop faster.
      The idea behind clean code is not to always sacrifice performance but to making sure the code can be read and maintained over time, which for the wast majority of projects is a very very important feature.
      It does not matter how optimized a code is if you need to scrap it first time you need to change it if changes are expected to come regularly.
      Sure there are cases where you really need to go ballistic with optimizations, but if you really need to go that route, make sure you really understand all the implications of that optimization.
      I saw another similar "optimization" video doing much the same thing, problem was, it only ever works for lists that are dividable by 4, or what number of separate steps you add.
      With any other number it either skips some values in the end or crashes.
      And if you add checks, suddenly the unrolled loop is slower, so then you need to expand it to add up to the last group of four and then have extra code add in the remaining items and the code is quite a lot more complex and much more prone to mistakes by the next one trying to maintain it.

  • @pvtcit9711
    @pvtcit9711 28 วันที่ผ่านมา +1

    Get onto a project with many libraries, apps and APIs, built over years by many develops that have long since left and try add new functionality and you'll be wishing the code was clean code. that's the point

  • @IS2511_watcher
    @IS2511_watcher หลายเดือนก่อน

    Why use dyn Trait where impl Trait would be sufficient? dyn Trait is basically throwing away all the compile-time coolness of Rust.
    "But Vec doesnt work", so use the trait system again, do impl Iterator. (The signature is more complicated actually, but that's fine).
    I'm really confused on why a Rust programmer wouldn't try the compile-time approach first, IMO dynamics are basically a last chance instrument if you can't be bothered to properly do generics and traits or you *require* a uniform structure with dynamics, but even then you're better of using an enum... Traits are still useful even in the enum case, just make it implement the trait and make the function still accept impl. Make it as generic as it can be without sacrificing perfomance.