The Folded Polynomial - N64 Optimization

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 พ.ค. 2024
  • It seems so obvious in hindsight, but somehow only one person thought of this...
    Patreon: / kazestuff
    Discord: / discord
    Streams: / @kazeclips
    🐦 / kazeemanuar
    MERCH: kazemerch.myspreadshop.com/all
    0:00 Intro
    0:56 Chapter 1: Refresher
    2:49 Chapter 2: Numerics
    3:20 Chapter 3: The Square Root...
    4:42 Chapter 4: The Folded Polynomial
    (I quickly scrubbed the video several times and there was no chapter 5, I think he just saw the length between chapters 4 and 6 and assumed it was there)
    10:29 Chapter 6: Other Ideas
    12:23 Conclusion
  • เกม

ความคิดเห็น • 1K

  • @KazeN64
    @KazeN64  7 หลายเดือนก่อน +560

    Getting a lot of comments about making the code branchless - so let me explain why that's a bad idea:
    A branch takes a single cycle on N64 and we have no branch prediction.
    Doing bit manipulations on floats requires us to move the float from a float register to a general purpose register first, so that will always be a penalty of 2 cycles.
    This means that just doing the conditions is WAY faster than doing bit manipulation on floats.
    Lets compare a branchless version to a branchfull version:
    if (shifter & 0x8000) {
    cosx = -cosx;
    }
    Compiles to:
    andi t0, a0, $8000
    beq t0, r0, DontInvert
    neg.s f0, f0
    Dontinvert:
    (3 cycles)
    cosx = cosx ^ ((shifter&0x8000)

    • @fastestdino2
      @fastestdino2 7 หลายเดือนก่อน +70

      Dude I only vaguely understand half that math but I can tell you know your stuff. You literally put more effort and thought into fixing a 20 year old game then most triple A devs put into making theirs. Keep up the good work.

    • @XhsTro
      @XhsTro 7 หลายเดือนก่อน +21

      Kaze, my -2 braincells are gonna EXPLODE 💀💀

    • @multiplysixbynine
      @multiplysixbynine 7 หลายเดือนก่อน +14

      Instead of going branchless, try using only one branch with a switch statement jump table to distinguish the 8 cases up front. That should remove all of the bit tests and swaps and conditional branches at the cost of inlining the polynomial calculation 8 times. Code size would increase but not by much.

    • @andremaldonado7410
      @andremaldonado7410 7 หลายเดือนก่อน +1

      What song did you use for the background music in chapter 3? So familiar but I just can't remember the name

    • @shukterhousejive
      @shukterhousejive 7 หลายเดือนก่อน +1

      It's cool to see that branch delay slot get some work, another reminder why you can't always optimize the N64 like a modern processor

  • @ChaunceyGardener
    @ChaunceyGardener 7 หลายเดือนก่อน +2284

    All math books should have the Mario font.

    • @DaVince21
      @DaVince21 7 หลายเดือนก่อน +96

      Super Maths 64

    • @v_r0
      @v_r0 7 หลายเดือนก่อน +5

      tht's wht i'm saying

    • @cerealnuee8189
      @cerealnuee8189 7 หลายเดือนก่อน +33

      Conversely, imagine a version of Mario 64 that uses LaTeX

    • @susobamna
      @susobamna 7 หลายเดือนก่อน +6

      Would make it more bearable

    • @SanaeKochiya
      @SanaeKochiya 7 หลายเดือนก่อน +6

      with subway surfers in the corner

  • @GameDevYal
    @GameDevYal 7 หลายเดือนก่อน +1685

    "We run the computations first and THEN figure out which one we computed"
    You know you're pushing against the limits of what's possible when your code starts implementing quantum mechanics

    • @jaysefgames1155
      @jaysefgames1155 7 หลายเดือนก่อน +139

      Finally... Quantum computing...

    • @ThompYT
      @ThompYT 7 หลายเดือนก่อน +39

      my brains hurts

    • @notNajimi
      @notNajimi 7 หลายเดือนก่อน +58

      If only they marketed the system as the Nintendo Quantum

    • @Gestersmek
      @Gestersmek 7 หลายเดือนก่อน +75

      They don't call it the Reality Coprocessor for nothing.

    • @MrGreatDane2
      @MrGreatDane2 7 หลายเดือนก่อน +11

      How did you escape your designated Gamemaker corner?

  • @undefined06855
    @undefined06855 7 หลายเดือนก่อน +693

    kaze on his way to save literally 0.000096 microseconds on a console thats 27 years old

    • @kannolotl
      @kannolotl 7 หลายเดือนก่อน +80

      Just you wait until you hear about Super Mario Bros. speedrunners

    • @jess648
      @jess648 7 หลายเดือนก่อน +94

      the benefit wasn’t even fps this time, the sine function is what makes the 3D math of the N64 and 3 dimensional games in general tick basically so physics, rendering and animation all benefit from improvements in accuracy

    • @exylic
      @exylic 7 หลายเดือนก่อน +27

      It's .096µs. The “.000096” is in seconds

    • @DanielFerreira-ez8qd
      @DanielFerreira-ez8qd 7 หลายเดือนก่อน +8

      ​@@jash21222n64 but it runs on an intel i12 12th gen

    • @MrGamelover23
      @MrGamelover23 7 หลายเดือนก่อน +3

      ​@@jess648So does that mean that he can do better animation or physics or something like that?

  • @FairyKid64
    @FairyKid64 7 หลายเดือนก่อน +779

    I really appreciate how open minded you are and how you give credit where credit is due and don't try to make people with "worse" ideas look bad. Keep up the good work!

    • @KazeN64
      @KazeN64  7 หลายเดือนก่อน +391

      I try my best to invite any type of discussion! I guess a confident demeanor in people is often associated with unwilling to change ones mind, which is an unfortunate vibe to give off. I wish more people would just come in and try to tell me where I'm wrong just so we can discuss and learn.

    • @ALZlper
      @ALZlper 7 หลายเดือนก่อน +45

      ​@@KazeN64You have every right to be confident based on your results. If someone else proposes a measurably better solution, of course it's time to upgrade, otherwise the right to be confident is gone :) Love that mindset, exactly mine too.

    • @carltheshivan
      @carltheshivan 7 หลายเดือนก่อน +21

      It's a good idea to not completely dismiss the "worse" ideas because sometimes these things are a two steps forward, on step back situation, and maybe that worse idea will become useful later with a little modification or refinement or in a different context.

    • @3333218
      @3333218 7 หลายเดือนก่อน +12

      @@KazeN64 The best way to create something great is to start out by trying something stupid and correcting why it went wrong.
      Which is why it's important to have people willing to suggest , try and discuss anything that might seem worth giving a chance. It seems you understand that. ^ ^

    • @howard_blast
      @howard_blast 7 หลายเดือนก่อน +2

      Lol, extremely passive aggressive comment. Get over yourself with that toxic "all solutions are beautiful" mentality. Take some responsibility when you're in the wrong. No one is owed anything just because they tried (less hard than others at that).

  • @Armameteus
    @Armameteus 7 หลายเดือนก่อน +106

    In short:
    - not actually faster
    - _way_ more accurate
    I'd say that's a decent trade-off.

    • @FloydMaxwell
      @FloydMaxwell 7 หลายเดือนก่อน +7

      Summarizing this video is a crime to this video

    • @NerdTheBox
      @NerdTheBox หลายเดือนก่อน +7

      @@FloydMaxwell but it saves so many cycles

  • @SpringDavid
    @SpringDavid 7 หลายเดือนก่อน +321

    Kaze when he accidentally creates a movement of optimizing old games to the point they cannot lag:

    • @benjaminoechsli1941
      @benjaminoechsli1941 7 หลายเดือนก่อน +21

      Speaking it into existence!

    • @IceYetiWins
      @IceYetiWins 7 หลายเดือนก่อน +12

      Infinite frames per second

    • @awemowe2830
      @awemowe2830 7 หลายเดือนก่อน +14

      He might be the first person to finally remove frames from games entirely.
      We now measure performance in "speed of light", as frames don't exist, and lag is only something the human brain can suffer from now....

    • @novarender_
      @novarender_ 7 หลายเดือนก่อน +10

      The game is just a function of t

    • @dudono1744
      @dudono1744 3 หลายเดือนก่อน +1

      ​@@novarender_That's called a TAS

  • @realvaporcry
    @realvaporcry 7 หลายเดือนก่อน +807

    Imagine being this math genius, code genius, retro gamer, nintendo enthusiast and also being buff. WTF with this dude, he is a demigod.

    • @MrBlakBunny
      @MrBlakBunny 7 หลายเดือนก่อน +262

      rumour is, that every time he find a new optimization, he does one push-up

    • @chickendoodle3241
      @chickendoodle3241 7 หลายเดือนก่อน +70

      @@MrBlakBunny
      Dear god…

    • @kurikuraconkuritas
      @kurikuraconkuritas 7 หลายเดือนก่อน +10

      He is buff?

    • @kamoune_epice
      @kamoune_epice 7 หลายเดือนก่อน +31

      @@kurikuraconkuritasYep.

    • @i64fanatic
      @i64fanatic 7 หลายเดือนก่อน +35

      @@kurikuraconkuritas he's posted a lot of photographic evidence

  • @GamerOverThere
    @GamerOverThere 7 หลายเดือนก่อน +499

    Kaze is slowly recreating the shipoftheseus problem in SM64 😂

    • @pacomatic9833
      @pacomatic9833 7 หลายเดือนก่อน +8

      Now that you say it...

    • @AROAH
      @AROAH 7 หลายเดือนก่อน +37

      At some point he could just swap out Mario and it’s not even the same game anymore

    • @ZeroUm_
      @ZeroUm_ 7 หลายเดือนก่อน +115

      If we change all the parts, but it ends up sailing 15 nanoseconds faster, is it the same ship?

    • @vespertinnee
      @vespertinnee 7 หลายเดือนก่อน +1

      i get what you're saying. but the mechanics and genre of gameplay ain't changing at all.

    • @MitchelGatzke
      @MitchelGatzke 7 หลายเดือนก่อน +18

      @@AROAH he already did that, he replaced that mario with a brand new optimized mario

  • @fders938
    @fders938 7 หลายเดือนก่อน +182

    This stuff reminds me of when I was learning x86 programming and attempted to use my new-found powers to beat libc's sin/cos. After hand-writing an asm implementation of a 7th order taylor polynomial it was...2x slower and less accurate than libc's version. These videos might help in the future when I get into DS programming.

    • @kintustis
      @kintustis 7 หลายเดือนก่อน +6

      Isn't there an ASM instruction for that anyways?

    • @henke37
      @henke37 7 หลายเดือนก่อน +16

      @@kintustisx87 does indeed have sin and cos as instructions. I'm sure they were great back in 1995.

    • @oscarsmith3942
      @oscarsmith3942 5 หลายเดือนก่อน +6

      @@henke37 They are actually surprisingly bad. For unknown reasons, Intel only used a 66 bit approximation of Pi, so near multiples of Pi they are only correct to 1 significant figure instead of the 16 that they are supposed to reach.

    • @jhgvvetyjj6589
      @jhgvvetyjj6589 3 หลายเดือนก่อน +1

      @@oscarsmith3942 In real use cases that won't matter since the error inherent in rounding the near-pi value will be much larger than the error of fsin and fcos instructions

  • @CynicPlacebo
    @CynicPlacebo 7 หลายเดือนก่อน +215

    Oh, I remember! I'm so glad to hear a programmer that still cares about performance. Too often I've had coworkers do something in a disgustingly inefficient way because they just don't even think about it. I'll rewrite a query, or a function, or just flatten some nested loops, and suddenly it's 3 to 100 orders of magnitude faster (usually when someone is 1,000x slower they start reaching out for help, but that's about it)

    • @krystostheoverlord1261
      @krystostheoverlord1261 7 หลายเดือนก่อน +35

      I feel that! I have had coworkers complain to me that I do not need to optimize the code, but I just go ahead anyways since it usually does not take much longer. They end up liking the faster, optimized code much better (usually running real time instead of 1 frame a second LOL)

    • @CynicPlacebo
      @CynicPlacebo 7 หลายเดือนก่อน +32

      @@krystostheoverlord1261 there are dangers on both sides, but it largely boils down to personality types.
      If you are a perfectionist that wants everything to run perfectly, then there is some use in pushing yourself to go faster and be less perfect (especially early on or during a proof of concept).
      ...but I think most people fall into the other category of people that want to write it once with whatever pops into their head first and then never revisit it as long as it technically provides accurate information.
      Those people need to be pushed into taking a little more time to not just do the first thing but at least weigh a couple options. More importantly, they need to go back, actually test the speed, and do a round 2 specifically intended for optimizing, simplifying, commenting, and making the code more elegant (yes, I'm lumping all those sins together, but I realize some people can just have 1 or 2 of those problems)

    • @adamsoft7831
      @adamsoft7831 7 หลายเดือนก่อน +10

      I think you mean 3-100x slower? 100 orders of magnitude would be 1 followed by 100 zeros.

    • @CynicPlacebo
      @CynicPlacebo 7 หลายเดือนก่อน +14

      @@adamsoft7831 I do not mean 100 times slower. I mean 1000x slower to an incalculably slow but probably exaggerated 100 orders of magnitude (since we never knew how long it would take since it was essentially stuck, I gave it a fake artificial top number for emphasis).
      They usually start asking for help around 1,000x slower, which was why I listed that, but there are literally processes that they tried to run, estimated it would take a few days, and then 3 months later the process still hadn't even hit 1% success.
      I'm talking about *really* big data (many many petabytes).
      Whereas if you cleverly divide and conquer, suddenly we can do the whole thing in less than 24 hours (I know, still slow, but we are talking about many Petabytes across about 300k servers)
      My point was that many things were literally 100 times slower (2 orders of magnitude) and no one would care or ask for help. They would just deal with the fact that this tool only got run once or twice a year. There was a data sync tool that was running monthly, because it took a week or 2 to run. After I fixed it, it ran hourly (every now and then it would go over the hour mark, so it'd skip 1 cron sync. It was just a simple flock, but it hardly ever got triggered. Usually only after a batch update that touched a ton of datapoints all at once).

    • @CynicPlacebo
      @CynicPlacebo 7 หลายเดือนก่อน +16

      I'm not claiming I'm a genius either. The biggest problem is that a dev would literally try to run a script off their personal machine that would then loop through every server and try to do something.
      Just by writing the script so it could run on the server itself and rsyncing it everywhere, that gives me a 300,000x boost because each server can do its own thing simultaneously (yes, I mean THAT dumb of mistakes)

  • @Nicoya
    @Nicoya 7 หลายเดือนก่อน +41

    Optimizing trig functions is great, but the fastest trig function is the one you never call. Have you taken the time to step back and see how many places where you can avoid entering degree/polar space, and instead simply stay in linear (vector/matrix/quaternion) space?

    • @KazeN64
      @KazeN64  7 หลายเดือนก่อน +53

      yeah, im planning to translate the whole game to quaternion animations for example. animation sin/cos calls are the bulk of this right now. unfortunately the entire engine and every behavior runs on euler angles so i dont want to refactor the actor rotation into quats if i can prevent it.

    • @kr1v
      @kr1v 3 หลายเดือนก่อน +3

      ​@@KazeN64(joke) you've already rewritten the entire source once, why not twice?

  • @jfa4771
    @jfa4771 7 หลายเดือนก่อน +673

    imagine if Nintendo discovered your rom hack in 1996, how shocked the devs would be

    • @icyz1ne456
      @icyz1ne456 7 หลายเดือนก่อน +146

      romhack takedown origin story

    • @Flying_Titor
      @Flying_Titor 7 หลายเดือนก่อน +145

      Forget dmca, they'd send a hitman his way

    • @mariotheundying
      @mariotheundying 7 หลายเดือนก่อน +69

      Prob would pay him money for the code or try to hire him, and also have him work on new consoles other than games

    • @bootortle
      @bootortle 7 หลายเดือนก่อน +22

      Yeah, it would prove backward time travel to be possible!

    • @keaton718
      @keaton718 7 หลายเดือนก่อน +126

      Maybe the original original Mario 64 ran at like 2 frames per second and ruined Nintendo's reputation and they went bankrupt. Then Kaze decades later and improved Mario 64 into what it is today and Nintendo's time travellers got ahold of it, took it back to 1996, published it and saved Nintendo. Now Kaze is basing his improvements off that version of Mario 64, the version he unknowingly wrote himself in a parallel reality, and any day now Nintendo's time travelling spies will bring it back to 1996 and Mario 64 will blow everyone's minds and Nintendo will bankrupt Sony because no one wants a Sony crapstation after they see Kaze's v2 Mario 64.

  • @Ragesauce
    @Ragesauce 7 หลายเดือนก่อน +122

    You have no idea how excited I am to play the original SM64 when you remake it with all the improvements. I have held off playing it for years all for this moment. I cannot wait!

    • @danielpope6498
      @danielpope6498 7 หลายเดือนก่อน +7

      I thought he said he wasn't releasing these fixes applied to the original game, just using it to make his sequel

    • @aftdawn
      @aftdawn 7 หลายเดือนก่อน +34

      ​@@danielpope6498nah, he said at one point that sometime in the future he is gonna backport the upgrades and patch's into the vanilla game with no custom levels, but that's probs still gonna be like 6 months after "Return to Yoshi's Island" is out, and there's no ETA on the hack

    • @Tabu11211
      @Tabu11211 7 หลายเดือนก่อน +2

      ​@@danielpope6498 not if we pressure him enough.

    • @bretayerstorm
      @bretayerstorm 7 หลายเดือนก่อน +22

      ​@@Tabu11211 Not trying to offend or anything but, most of us have witnessed what could happen if we "pressure" someone (or a company) to release or publish an app or game just cause we are impatient (looking at you Cyberpunk 2077, NoManSky... )
      We all hate a buggy mess. With that said, I rather be the kind of viewer / customer to actually encourage developers and studios to take their time to make the app, game or whatever they are trying to develop so we the consumers get what we paid for.
      Pressure will only fuel the crunch culture in Programming jobs (or any other field where this exists...) So no. I rather wait few more months.. HELL a YEAR, as long as final product is stable and efficient enough for us to enjoy.
      Just my humble two cents.
      Take care

    • @goob8945
      @goob8945 7 หลายเดือนก่อน +1

      @@bretayerstormreal shiz bruh

  • @LucidTyrant
    @LucidTyrant 7 หลายเดือนก่อน +74

    In an age where a modern basketball video game has over 100 gb of data you give me haven knowing people out there are developing formulas for specific hardware in order to save possibly a few milliseconds just for the sake of optimization. You're work is both amazing and humbling.

    • @KopperNeoman
      @KopperNeoman 5 หลายเดือนก่อน

      I wonder how much of that is optimisation for loading times in an era where read speeds often outstrip decompression speeds.

  • @SailorCheryl
    @SailorCheryl 7 หลายเดือนก่อน +25

    Schoolkid: "Nobody needs this math stuff in real life!"
    Kaze: Entertainment is math, enjoy!

  • @rebmcr
    @rebmcr 7 หลายเดือนก่อน +50

    Do you plan to release a version of your optimised engine at some point, which can run the original Super Mario 64 ROM? It would make for a very interesting comparison.

    • @KazeN64
      @KazeN64  7 หลายเดือนก่อน +67

      of course, this mod will be open source after release.

    • @Apostolinen
      @Apostolinen 7 หลายเดือนก่อน +2

      Out of curiosity... when will this mod release? It's absolutely phenomenal.

    • @HawtDawg420
      @HawtDawg420 7 หลายเดือนก่อน +2

      @@Apostolinen it'll release when it's done ;)

  • @supersmily5811
    @supersmily5811 7 หลายเดือนก่อน +27

    I want this rom hack so badly. The levels look so huge, and clean! You have ziplines! And workplace accidents!

  • @lexacutable
    @lexacutable 7 หลายเดือนก่อน +22

    I'm enjoying imagining an alternate universe in which commercial n64 games were this efficient

    • @mizurazu
      @mizurazu 6 หลายเดือนก่อน

      This. I'm trying to imagine Turok 2 could have actually worked now.

  • @McWickyyyy
    @McWickyyyy 7 หลายเดือนก่อน +73

    How do you figure all this stuff out lmao. I am a full stack web dev but when I see stuff like this I’m just like I’m a fraud 😂

    • @NoNameAtAll2
      @NoNameAtAll2 7 หลายเดือนก่อน +46

      learn C, join the system side
      you'll be angry at electron just like us!

    • @KazeN64
      @KazeN64  7 หลายเดือนก่อน +73

      i dont even know what electron is... :D

    • @McWickyyyy
      @McWickyyyy 7 หลายเดือนก่อน +7

      Lmaoo. I did learn some C in college and I actually loved it and was one of the few to do well 😂 and a little bit of assembly. It is def no easy task lmao. I’m tryna make a fullstack browser game actually. Got the prototype phase done of getting everything I need in place. But I just know imma run into optimization issues later. I always love watching videos like this to see how the pros do it 😭

    • @OhluhKayTall
      @OhluhKayTall 7 หลายเดือนก่อน +12

      Web devs sticking together 😤. In the same boat and feel just as fraudulent watching these optimization videos. One of these days I'll learn C or something
      ...

    • @fungo6631
      @fungo6631 7 หลายเดือนก่อน +10

      @@NoNameAtAll2 OP should learn HolyC instead, like a divine intellect individual would do.

  • @MayaPasricha
    @MayaPasricha 7 หลายเดือนก่อน +61

    The folded polynomial is genius!

  • @Ustaleone
    @Ustaleone 7 หลายเดือนก่อน +149

    Even though I didn't understand the technical bits, I respect you immensly for you dedication to such old and limited hardware. Hopefully someone will recognize this hard work and give you the credit you deserve, whatever that may be.

    • @excitedbox5705
      @excitedbox5705 7 หลายเดือนก่อน +7

      It is no different than any other sport. Think about it, you set a limit and then try to better your skill by seeing how hard you can push it. Un target shooting you try to get as close as possible to the center of the target, in F1 racing you try to decrease your lap time, here you try to max your FPS. Using an N64 game is just a fun way to set the rules for the "competition" that he has a nostalgic connection too and forces him to think outside the box.

    • @inthefade
      @inthefade 7 หลายเดือนก่อน

      I will be playing the hell out of this ROMhack. I hope that is the kind of appreciation he is looking for (and a really good job, if he doesn't already have his dream job).

  • @unique_two
    @unique_two 7 หลายเดือนก่อน +14

    I think you could use a double angle identity of cos here: cos(2x) = 2cos(x)^2 - 1. Compute cos in the interval [0, pi/4] via quadratic polynomial, then use the identity to expand to the interval [0, pi/2]. From there you get all values of cos via the usual symmetries. This might get rid of the square root, but I don't understand the details of the implementation.

    • @schlega2
      @schlega2 7 หลายเดือนก่อน +3

      That would be more efficient if you only need the cos. You'd still need the sqrt to get the sin though.

  • @jmssun
    @jmssun 7 หลายเดือนก่อน +57

    I hope you can maintain a parallel release of original M64 patch that includes all your performance mods, this way the community will constantly referring to your channel if they want the most current on their M64 and the best performant build. The side effect is that it encourages more people to your channel and discover your incredible works, as well as knowing your new content

    • @ruie.34
      @ruie.34 7 หลายเดือนก่อน +2

      Yeah but then he’ll get struck by dmca

    • @kvdrr
      @kvdrr 7 หลายเดือนก่อน +1

      ​@@ruie.34nah he wouldnt, but its nice Kaze has a community so eager to defend him i guess 😅

    • @enochliu8316
      @enochliu8316 7 หลายเดือนก่อน

      ​@@kvdrrMany of his mods have been struck down.😢

  • @bmenrigh
    @bmenrigh 7 หลายเดือนก่อน +8

    This folding technique is common in lots of numerical approximation. For example search for the golang error function (erf) implementation where only a short interval of the function is approximated and then things outside of that interval are shifted via erf() identities into the well-approximated region. I still think you can get better performance by avoiding sqrt() by using a different polynomial approximation for the second 8th of the curve. Once you have two 8ths you have a quarter of the curve and from there you can get the rest.

    • @misanthropolis-zone-act3
      @misanthropolis-zone-act3 7 หลายเดือนก่อน +1

      How does one find the best low degree polynomial for the second eighth with miminal error?
      Carefully selecting points within the second eighth of the curve to find an interpolating polynomial that had the minimal approximation error, all while keeping the degree of the polynomial down to say, a quadratic or cubic, wouldn't be as simple as the first eighth...
      The first eighth looks almost like a clean parabola. My reasoning is this is probably because the domain is close to zero (-PI/4 to PI/4) and the power series expansion of cosine within that range converges faster to the correct value than it does in the range (PI/4 to PI/2) or (-PI/4 to -PI/2) (Keeping the fact we use floats in mind) ...because numbers closer to zero get smaller for higher powers, meaning the higher power terms are less important, and the series expansion for cosine at 0 converges immediately at the first term! (Since cos(0) = x^0/0! is 1 😊) but that power series converges to the correct value more slowly the further from zero you go. In general cosine's power series converges faster than sine because it has even symmetry and all terms are positive before alternating, but also converges faster arbitrarily close to zero, as far as floating point numbers are concerned.
      The second eighth has a domain further from zero and needs more monomial terms to approximate the values to keep error low, because any polynomial of degree 2 or less wouldn't approximate it well without significant error.
      I suppose even with Newton's method or some other way to find an interpolating polynomial for the second eighth, it wouldn't be as useful to interpolate a low order for domains greater than PI/4 or less than -PI/4, which would either have notable error compared to other approaches,or would be computationally slower. That's probably why KazeN64 is defaulting to identities and using an expensive sqrtf for the second eighth
      The cleverness of KazeN64's approach is that the domain is kept close to 0 which keeps error small within that range and minimizes error from discarding the higher degree terms of the series expansion. He can cover the entire image of f(x) =cos(x) using the low error calculation of the domain of f(x) from -PI/4 to PI/4.
      With this domain then identities and symmetries are used to cover the rest of the graph. The entire approach has minimal error while keeping the polynomial degree low since the domain is kept within -PI/4 to PI/4. It's clever because within that domain, a second order polynomial is adequate to match the power series of cosine with minimal error, since within that domain, power series terms higher than degree 2 have smaller overall contributions than anywhere else, so they can be ignored.
      But then again I dunno. Maybe experiments could be done to show that the error is acceptable to justify using a second piecewise polynomial for the second eighth. Maybe sqrtf is slower than additional mults adds and loads for a second polynomial for the 2nd 8th
      Long ago, we read a DSP article where someone tried to approximate a sine wave using connected low order piecewise polynomials (splines). He had significant error though which he admitted in his work. I think KazeN64 mentions something like this in this video (Using piecewise linear interpolation of sines). People have tried this method for decades, some computers do it, but the error becomes the issue.
      Thanks for the information, wish I were a better programmer and algorithms guy to have more experience in this field to appreciate this more though, your comment is really insightful. Correct me if anything here sounds fishy or wrong to you!

    • @bmenrigh
      @bmenrigh 7 หลายเดือนก่อน

      @@misanthropolis-zone-act3 When you divide a curve up into N pieces and then find a best-fit polynomial for each Nth you need the endpoints to match up with each other perfectly or you get noticeable discontinuities. As such, the end points aren't option, they must be fixed in the interpolation. Since I found a best-fit quadratic (defined uniquely by 3 points) that only leaves one free point to choose. I chose that point such that the average error (the difference in area between my polynomial and sine) was zero. I just used binary search to find the 3rd point that had this property.

  • @autodidact7127
    @autodidact7127 7 หลายเดือนก่อน +18

    Having followed this for years I am never ever EVER dissappointed when you upload. One of the only ongoing projects that just ROCK!

  • @adiel_loiola
    @adiel_loiola 7 หลายเดือนก่อน +6

    I have LITERALLY no IDEIA what are you talking about, but i love those videos lmao.

  • @IanZamojc
    @IanZamojc 7 หลายเดือนก่อน +11

    I'd love to see you collaborate with James Lambert who's building Portal 64 to see what kind of performance gains he could potentially see with your optimizations.

    • @4.0.4
      @4.0.4 7 หลายเดือนก่อน +1

      Can't imagine they don't already watch each other's content

  • @MOLDYCHEETO13
    @MOLDYCHEETO13 7 หลายเดือนก่อน +10

    60% of this stuff goes over my head but I can't stop watching it super interesting and entertaining

  • @biobak
    @biobak 7 หลายเดือนก่อน +9

    i understood approximately 10% of the words in this video but congratulations to you and silas on the toenail

  • @anonymouscommentator
    @anonymouscommentator 7 หลายเดือนก่อน

    i absolutely love your videos. not only was mario n64 my childhood game, your videos have the perfect amount of nerdiness and math in them to be interesting while your jokes are hilarious. Keep it up!

  • @matthewtalbot6505
    @matthewtalbot6505 7 หลายเดือนก่อน +6

    Alright, you’re definitely going to be dipping deeper into advanced and/or theoretical mathematics going forwards with this project. Folded 4th order polynomials to approximate the sine and cosine graphs. You, and the community members who assisted, are mad geniuses.

  • @humanbass
    @humanbass 7 หลายเดือนก่อน +65

    I would love to see your version being played by SpeedRunners so they cab comment the diffences in the feeling of the game.

    • @xdanic3
      @xdanic3 7 หลายเดือนก่อน +10

      I'm sure they will, his mod also looks amazing

    • @notalostnumber8660
      @notalostnumber8660 7 หลายเดือนก่อน +16

      Kaze played the original game somewhat recently, and noted about how the physics and mechanics changes made him more rusty in the OG, since his romhack has many quality of life improvements and mechanics

  • @PsychorGames
    @PsychorGames 7 หลายเดือนก่อน +59

    You think I would just ignore Mario doing the soyjak face in the thumbnail? You think I would just let that go? You're a fool.

  • @diskoBonez
    @diskoBonez 7 หลายเดือนก่อน +2

    It brings me immense joy seeing you pull more and more optimizations for this console as if out of thin air. Thankyou for documenting and sharing your discoveries with us as well, unbelievably fascinating!

  • @BucketCapacity
    @BucketCapacity 7 หลายเดือนก่อน +5

    My first thought was a piece-wise polynomial, possibly done with spline interpolation, but the folded polynomial is a cool idea.

  • @jimmyv3170
    @jimmyv3170 7 หลายเดือนก่อน +63

    Kaze has dreams of finding out ways to optimize Super Mario 64 further. We need this man to optimize Starfield and it's 20 year old engine lol

    • @GlitchLamb
      @GlitchLamb 7 หลายเดือนก่อน +5

      Bro, imagine someone as talented as Kaze trying to improve Bethesda's graphics engine, maybe he'll encounter as many bugs as possible on the first day xD

    • @jh302
      @jh302 7 หลายเดือนก่อน +14

      no its not 20 year old engine people say this when they dont know what they are talking about. and anyone who has reverse engineered creation will call you an idiot creation doesnt have anything left of gamebryo except for two functions and that is its scenegraph and its node system. literaly not one single other thing from gamebryo exists in creation. gamebryo is not a game engine it is an engine framework and creation was a engine made for bethesdas games
      creation 2 is a near complete rewrite adding completely new systems to the engine thwt are impossible in fallout 4s version of creation hence why its called creation 2
      the engine isnt actually that unoptimized its just extremely cpu bound as all versions of creation are because you cant run most of its background functions on the cpu
      when people have performance issues its because the single core performance of their cpu isnt as good as it could be and theyre trying to push the game to max everything.
      ns then yell about things they do not understand.
      and as much as gamers think they know what the fuck they are talking about they really really do not know what the fuck they are talking about. do gamers somehow know more about how games work then an entire company who has very selective hiring practices and id softworks developers who i might add are fucking wizards wouldnt know how their own engine works?

    • @jimmyv3170
      @jimmyv3170 7 หลายเดือนก่อน +3

      @@GlitchLamb hell I think maybe within the first 5 minutes of looking at their code lol

    • @jimmyv3170
      @jimmyv3170 7 หลายเดือนก่อน +4

      @@jh302 nice book didn't read. Let me know when they can actually have proper vehicles in their games without their garbage game engine breaking after being held together by unoptimized and bad code for years.

    • @evdestroy5304
      @evdestroy5304 7 หลายเดือนก่อน +13

      ​@@jimmyv3170It wasn't even that long

  • @JulianGoddard
    @JulianGoddard 7 หลายเดือนก่อน +6

    I'm consistently blown away at your programming skills. Keep up the great work!

  • @TaranAlvein
    @TaranAlvein 7 หลายเดือนก่อน

    That was super cool. I liked watching how everything was broken down to extrapolate positions based on just a few calculations. It was amazing, and very interesting to watch!

  • @Dracogame
    @Dracogame 7 หลายเดือนก่อน

    I love this kind of passion projects. Keep it up!

  • @mongus2
    @mongus2 7 หลายเดือนก่อน +3

    I hope the community has access to all of these clever optimizations one day

  • @psycowithespn
    @psycowithespn 7 หลายเดือนก่อน +19

    The level of optimization here is just too satisfying to hear about. If only such maximal utilization of resources wasn't such a time intensive practice.

    • @Kenionatus
      @Kenionatus 7 หลายเดือนก่อน +2

      Yeah, you've hit the nail on the head here. Programmer time is expensive and limited. You can't employ double the programmers to double the output due to coordination taking up more and more time as team size increases, so a hyper optimised game either needs more development time or exponentially more money.

    • @KopperNeoman
      @KopperNeoman 5 หลายเดือนก่อน +2

      ​@@KenionatusThat's also why console graphics got better as generations went on: a more experienced programmer could use his time more efficiently.

  • @TheInfiniteAmo
    @TheInfiniteAmo 7 หลายเดือนก่อน

    Kaze, your channel and insane romhacking ability was a big inspiration for me picking up Decomp romhacking myself and making my first Pokemon romhack. Just like your videos I barely understand what's going on and I'm enjoying every second of it. Thanks for being awesome.

  • @samahearn770
    @samahearn770 7 หลายเดือนก่อน +2

    I think my fave way to avoid doing sqrt is the famous Quake 3 fast inverse square root function, which uses the mantissa of the float itself through a dubious cast and some bitwise black magic so you can calculate normals faster.

  • @Taebot64
    @Taebot64 7 หลายเดือนก่อน +36

    bro is a genius, and he’s using it on mario hacks 💀💀

    • @escape209
      @escape209 7 หลายเดือนก่อน +23

      Bro ✅
      Skull emoji at the end ✅

    • @HiimIny
      @HiimIny 7 หลายเดือนก่อน +1

      ​@@escape209lmao

    • @Taebot64
      @Taebot64 7 หลายเดือนก่อน

      @@escape209 bro 💀💀💀 i don’t think i did that bro 💀💀💀 but the hacking skills are crazy 🔥🔥🔥🔥

  • @LtMooch
    @LtMooch 7 หลายเดือนก่อน +3

    Kaze giving a whole new definition to speed running here

  • @RiverReeves23
    @RiverReeves23 7 หลายเดือนก่อน

    Lookin great Kaze. Doing amazing work for the community.

  • @EzBz982
    @EzBz982 7 หลายเดือนก่อน

    I love your videos, man. Keep up the great content!

  • @ThompYT
    @ThompYT 7 หลายเดือนก่อน +6

    crazy that we have something this insane for an ancient game and then modern game optimization existing in the same timeline

    • @SmashyPlays
      @SmashyPlays 7 หลายเดือนก่อน +1

      I wouldn't say modern since pretty much every game released today has awful optimization and rely on dlss and require 32gb ram and an rtx 4090

    • @ThompYT
      @ThompYT 7 หลายเดือนก่อน +2

      @@SmashyPlays that's... what I meant... I'm comparing them...

    • @SmashyPlays
      @SmashyPlays 7 หลายเดือนก่อน

      @@ThompYT oh my bad I can't read lmfao, sorry about that

    • @ThompYT
      @ThompYT 7 หลายเดือนก่อน +2

      @@SmashyPlays what are u sorry about, you're fine 👊

    • @SmashyPlays
      @SmashyPlays 7 หลายเดือนก่อน +1

      @@ThompYT thanks you're a g 👌

  • @angeldude101
    @angeldude101 7 หลายเดือนก่อน +5

    "// imaginary part in the cosine to give the reader mental damage"
    It's a critical hit!
    Not quite the quaternion video I was hoping for (you mentioned adding quaternions in the comments of your prior video), but I i will always accept more math optimization content on this channel. And yes, I was not joking when I i said I i was hoping for _quaternions._ Quaternions are actually pretty simple when not obfuscated or divorced from their connection to the composition of reflections.

  • @timseguine2
    @timseguine2 7 หลายเดือนก่อน +1

    After your last video on this topic, I was convinced that the accuracy could still be improved considerably. So I am glad you found a way to get it without sacrificing performance, even if none of my suggestions were what got you there.

  • @jess648
    @jess648 7 หลายเดือนก่อน +1

    you explain the basic trigonometry concepts that drives sine functions very well

  • @Isaac________
    @Isaac________ 7 หลายเดือนก่อน +14

    Love your conclusion about assumptions.
    Half-jokingly, I'm wondering when you will start optimizing the microcode itself as we're far into GPU-limited territory now. :)

    • @KazeN64
      @KazeN64  7 หลายเดือนก่อน +32

      Sauraen is currently working on a new microcode called "f3dex3". I'm thinking of getting into microcode programming and expanding on his work when he's more or less done. He's already found some optimizations that are implementing in this mod!

    • @timmygilbert4102
      @timmygilbert4102 7 หลายเดือนก่อน +1

      ​@@KazeN64oh yes micro code 😊 I hope you document your journey into it, there isn't much accessible resources, it might help democratizing it 🎉

    • @Isaac________
      @Isaac________ 7 หลายเดือนก่อน

      Just went over to their TH-cam and there's some cool stuff there. Where could I find more information about Sauraen's efforts in this direction?

    • @KazeN64
      @KazeN64  7 หลายเดือนก่อน +15

      he doesn't post too much publically i think. he usually talks in the fast64 discord.

    • @xdanic3
      @xdanic3 7 หลายเดือนก่อน

      @@KazeN64 Waiting for the day you tell us, new f3dex3 microcode just dropped!

  • @myggmastaren3365
    @myggmastaren3365 7 หลายเดือนก่อน +3

    when anyone asks if math is useful, I'll just redirect them to your videos

  • @multicoloredwiz
    @multicoloredwiz 7 หลายเดือนก่อน

    wild how much you guys can come up with. god bless the information superhighway baby

  • @DavidRomigJr
    @DavidRomigJr 7 หลายเดือนก่อน +1

    This was an interesting watch. I love these types of optimizations, pushing the limits.
    My favorite optimization has to be the fast inverted square root since its so simple and so fast, obvious in hindsight but not very if you don’t already know it.
    All the cache talk reminded me the issues we had with PC to PS2 ports, having DMAs constantly stalling on instruction and data cache fills. In the end there wasn’t a lot we could easily do. Fun times.

  • @karlosk5773
    @karlosk5773 7 หลายเดือนก่อน +4

    Random question: Who composed and programmed the music in the Return to Yoshis Island demo and Peachs Fury? The music is amazing in these games! Thank you for your incredible work!

  • @AwesomeGames56
    @AwesomeGames56 7 หลายเดือนก่อน +5

    This is wild, not only is it more accurate but it’s also fast enough that the console doesn’t even know there’s a difference. Pushing the 64 like this makes me wonder what kind of games we could have if AAA devs still put games out on older systems.

  • @etansivad
    @etansivad 7 หลายเดือนก่อน

    This was a great video. It was like the best parts of Michael Abrash's Programming black book, but about Mario 64. Thank you for putting this together.

  • @lFunGuyl
    @lFunGuyl 6 หลายเดือนก่อน +1

    "For you that was 1995. But for me, that is *today*" 😂

  • @Ehal256
    @Ehal256 7 หลายเดือนก่อน +2

    Cordic is specifically designed for machines with no or slow multiplication, so it doesn't actually require them (that python code isn't really a good example), but when you do have fast multiplies, it's not really worth it.

  • @renakunisaki
    @renakunisaki 7 หลายเดือนก่อน +7

    Nintendo, hire this man once he manages to optimize this game so much it runs in reverse and works as a time machine.

  • @justabrowser4744
    @justabrowser4744 7 หลายเดือนก่อน +1

    I wish you could make videos like this every day

  • @zoiosilva
    @zoiosilva 7 หลายเดือนก่อน

    I can't wait to start seeing speedruns of one of your fixed sm64 versions, and listening to the speedrunner's comments on the run.

  • @0xGRIDRUNR
    @0xGRIDRUNR 7 หลายเดือนก่อน +3

    as someone who loves low level code, hearing that a square root is optimal is bizarre and awesome

  • @gameisrigged6942
    @gameisrigged6942 7 หลายเดือนก่อน +7

    Quaternions 😢

    • @KazeN64
      @KazeN64  7 หลายเดือนก่อน +12

      quaternions will be the final animation format used here, no worry! but yeah at the moment it's still euler angles

    • @angeldude101
      @angeldude101 7 หลายเดือนก่อน

      Fingers crossed for a video. I will however ask if your planned use of quaternions still uses the 360° fixed point angle format, since the individual components are no longer just raw angles, but floats are also twice as large.

    • @gameisrigged6942
      @gameisrigged6942 7 หลายเดือนก่อน

      ​@@KazeN64that would be insanely cool!

  • @wenchinatrenchcoat8459
    @wenchinatrenchcoat8459 7 หลายเดือนก่อน

    your outro is brilliant. I wish i could subscribe again :D

  • @timonus
    @timonus 7 หลายเดือนก่อน

    Holy crap, this is brilliant

  • @Gestersmek
    @Gestersmek 7 หลายเดือนก่อน +4

    I guess the only thing to do now is Fast Approximate Square Root.
    For real though, that folded polynomial is crazy. The math nerd in me was more than impressed at the ingenuity.

    • @KazeN64
      @KazeN64  7 หลายเดือนก่อน +13

      any square root approximation will be a lot slower than the hardware one i think. the famous inverse square root algorithm is in the ballpark for 3 - 20x slower (depending on use case)

    • @MDaveUK
      @MDaveUK 7 หลายเดือนก่อน

      ​@KazeN64 is that the quake 3 fast inverse square root trick?

    • @Gestersmek
      @Gestersmek 7 หลายเดือนก่อน

      @@KazeN64 Well, that's unfortunate, but hey, at least you got a few cycles saved with the current implementation.

    • @blarghblargh
      @blarghblargh 7 หลายเดือนก่อน

      @@MDaveUK if you've heard of it, it's the famous one :P

  • @PsychorGames
    @PsychorGames 7 หลายเดือนก่อน +7

    I'll "fold" your "polynomials" for 64 bucks.

  • @NinF37
    @NinF37 7 หลายเดือนก่อน +1

    That is an incredibly genius way to solve the problem. Genuinely blown away!

  • @murilohumbertocmcb
    @murilohumbertocmcb 5 หลายเดือนก่อน

    i wish all games had dedicated optimizers like you and your community!

  • @osc-omb47896
    @osc-omb47896 7 หลายเดือนก่อน +22

    The N64 definitely is the system of all time

  • @DorE3k
    @DorE3k 7 หลายเดือนก่อน +9

    These optimizations are getting ridiculous at this point, great stuff Kaze! The folded polynomial approach is brilliant and elegant, nice job to the guy who came up with it

  • @misanthropolis-zone-act3
    @misanthropolis-zone-act3 7 หลายเดือนก่อน +1

    Nice video, you are good at explaining these topics without getting too wordy, you explained why keeping the order of the approximating polynomial low is less expensive without getting too pedantic and explaining the Taylor expansion of trig functions
    Oof I should learn how to explain things like you tbh :( but I am too wordy... Explaining things this way is a sign of good communication skills

  • @codcouch1
    @codcouch1 7 หลายเดือนก่อน +1

    i;m new to programming and i was shocked that interpolating between 2 values is slower than calculating all that stuff. I would have just assumed that interpolation was faster and never even investigated. Nice job

  • @cartoonhead9222
    @cartoonhead9222 7 หลายเดือนก่อน +3

    Someone needs to show this to Todd Howard so he knows what optimisation is.

    • @BBWahoo
      @BBWahoo 7 หลายเดือนก่อน

      He's too busy putting his energy into convincing people the ridiculous CPU tax is fine

  • @lior_haddad
    @lior_haddad 7 หลายเดือนก่อน +3

    That's an awesome idea for approximating, sad that it's basically the same speed-wise, and n64 specific...

    • @KazeN64
      @KazeN64  7 หลายเดือนก่อน +15

      its a lot more accurate so i think it's still a huge bonus! i bet theres other architectures that benefit from this approach too

    • @lior_haddad
      @lior_haddad 7 หลายเดือนก่อน

      @@KazeN64 yeah, I guess more older hardware would probably benefit from this! I just don't think there's a lot of hardware where you both use sin/cos/tan a lot, yet those operations are not super-optimized in hardware.
      Accuracy is great though! How close is this to being the perfect 1ULP function?

    • @timmygilbert4102
      @timmygilbert4102 7 หลายเดือนก่อน

      I was on a discord where they had discussed porting Mario 64 to the GBA, the same discord where tomb raider GBA was presented, I wonder how much it's compatible with that console, they had bomb OMB battlefield rendered with texture. The fill rate is even more of a bottle neck 😂

    • @micalobia1515
      @micalobia1515 7 หลายเดือนก่อน

      @@lior_haddad Only use case I could think of is GPU stuff, where that style of sin/cos would be integrated into the hardware, I've no idea if it would be better than what they use though

  • @MagusArtStudios
    @MagusArtStudios 7 หลายเดือนก่อน +1

    Your videos have improved my coding skills :)

  • @General12th
    @General12th 7 หลายเดือนก่อน

    Hi Kaze!
    Very cool!

  • @prototypez4343
    @prototypez4343 7 หลายเดือนก่อน +8

    nintendo 64

    • @madlikov747
      @madlikov747 7 หลายเดือนก่อน

      Nintendo ultra 64

  • @macksnotcool
    @macksnotcool 7 หลายเดือนก่อน +3

    Possible optimization: This is going to sound ridiculous but in many programing languages, multiplying by 0.5 can be faster than dividing by 2. I know this is the case in C# but I don't know about C or C++.

    • @Octobeann
      @Octobeann 7 หลายเดือนก่อน +1

      I don’t remember for sure but I think he might’ve said he’s already doing that in the previous video

    • @angeldude101
      @angeldude101 7 หลายเดือนก่อน +2

      On a bit level, they're just adding and subtracting 1 from the exponent, though if the instructions always take the same number of cycles, then dividing by 2 would slow down to accommodate non powers of 2 that get passed in, which are much slower than a single multiply, but can be more accurate. Multiplying or dividing by a power of two on the other hand is always perfectly accurate as long as your floats aren't subnormal.
      That said, this doesn't actually seem relevant as the given code doesn't include a single division, nor multiplication by a half.

    • @KazeN64
      @KazeN64  7 หลายเดือนก่อน +3

      GCC compiles a division by 2 into a multiplication by 0.5 - but i don't divide anywhere in this anyway so i don't know where you got that idea.

    • @macksnotcool
      @macksnotcool 7 หลายเดือนก่อน

      Yeah, your right. Also, I meant it as a general optimization and not one for calculating sin functions.@@KazeN64

  • @DrKratso
    @DrKratso 7 หลายเดือนก่อน

    I read the title as "They folded the polynomial" and I was waiting for the joke to drop 😢
    Great vid ❤

  • @tumpuga2543
    @tumpuga2543 7 หลายเดือนก่อน

    this video combines all of my favorite things: cool math, computer systems and programming, retro games, and SUPER silly cat gifs

  • @tylerschmitz1816
    @tylerschmitz1816 7 หลายเดือนก่อน

    Not only are u a romhacking god but u spread knowledge I love it

  • @smallmoneysalvia
    @smallmoneysalvia 7 หลายเดือนก่อน

    This is fantastic for use in resource constrained flight controllers

  • @isaacbunsen5833
    @isaacbunsen5833 7 หลายเดือนก่อน

    video for each function LETS GOOOOOOOO

  • @user-xf5ty9yk7z
    @user-xf5ty9yk7z 3 หลายเดือนก่อน

    You took the most parabola-like part of the sine that's still fully periodizable and used it to cover the entire thing. That's great.

  • @BSEUNHIR
    @BSEUNHIR 7 หลายเดือนก่อน +1

    You are doing a LOT of heavy lifting in the N64 space. I don't know how many people are still making homebrew for this old a console, but I imagine you could get most projects to run around twice as fast, which is massive.
    Looking forward to the finished project and to Mario64 2.0 running at 60 fps on native hardware :)

  • @Diablokiller999
    @Diablokiller999 7 หลายเดือนก่อน +1

    You really should look again into CORDIC, there are implementations of this algorithm only using addition/subtraction, specifically for FPGAs. I used it a couple of years ago to calculate a sine for a 40MHz ADC input for phase shift detection (dual phase lock in) and only needed ~30 clock cycles for a 64 Bit input signal.

    • @KazeN64
      @KazeN64  7 หลายเดือนก่อน +1

      i'll need to see some code before i can give it a consideration

  • @JuscelinoVibecheck
    @JuscelinoVibecheck 7 หลายเดือนก่อน

    Sometimes when I watch your videos I get the vibe of the samurai that's still fighting in the WW2

  • @BoilingDietCoke
    @BoilingDietCoke 7 หลายเดือนก่อน

    Beautiful.

  • @lorebz
    @lorebz 7 หลายเดือนก่อน

    every time I think about how Kaze's videos make me wanna code,
    Kaze releases a new video about how coding even includes more math, lke sine and curves and more and more math
    then I get overwhelmed, then I get hopeful, then I get overwhelmed again
    then I get hopeful and then I g

  • @drdan6443
    @drdan6443 7 หลายเดือนก่อน +2

    As someone starting Algebra II I'm starting to get this better

  • @half-qilin
    @half-qilin 7 หลายเดือนก่อน +1

    I think variants of this might be a bit faster on any hardware that has a dedicated floating-point square root function but not a dedicated trig function (or, at least, no fast one)

  • @gdclemo
    @gdclemo 7 หลายเดือนก่อน +1

    Another technique I once tried on a different platform uses the identity sin(x+y) = sin x cos y + cos x sin y, cos(x+y) = cos x cos y - sin x sin y. I split the angle into two parts containing the high and low bits of the angle and used lookup tables. But the polynomial expansion turned out to be faster in the end.

  • @edh615
    @edh615 7 หลายเดือนก่อน

    We need more people like you in modern game development.

    • @KopperNeoman
      @KopperNeoman 5 หลายเดือนก่อน

      Optimising for modern architectures is far harder. Even on consoles, you still need to factor in nonstandard storage devices that your game cannot have low-level access to.

  • @newbornkilik
    @newbornkilik 7 หลายเดือนก่อน

    Wow, I have no idea what Kaze just said to me the last 15 minutes, but I am happy for it!

  • @Hyperboid
    @Hyperboid 7 หลายเดือนก่อน +1

    Math way beyond what I'm doing in school AND new RTYI teaser? Excellent as usual from Kaze
    Waluigi's taco stand better heal you but also burn you because spicy

  • @Endercrow32
    @Endercrow32 7 หลายเดือนก่อน

    This is incredible and also I'm so excited for Return to Yoshi's Island