Why Mario 64's Render Speed BLOWS

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 เม.ย. 2024
  • 0:00 Introduction
    0:53 Chapter1: Compiler Optimization
    1:50 Chapter2: Math Functions
    2:58 Chapter3: Shadows
    3:53 Chapter4: The Instruction Cache
    4:50 Chapter5: Animated Bones
    5:31 Chapter6: The RDRAM
    6:19 Conclusion
    7:01 Bonus: Insane Man Rambling On About C Code
    Yes, this would help with 60FPS Console SM64.
    Subscribe for more Retro Mods!
    Patreon: / kazestuff
    🎥 / kazesm64
    🐦 / kazeemanuar
  • เกม

ความคิดเห็น • 843

  • @KazeN64
    @KazeN64  6 หลายเดือนก่อน +19

    Unlisting this video because of how outdated this is and how much higher quality the new stuff is.

    • @Mint25pop
      @Mint25pop 6 หลายเดือนก่อน

      newer stuff is very high quality!

    • @fido8542
      @fido8542 28 วันที่ผ่านมา

      this is unlisted? no idea how i got here

    • @explodinghammeronthe17thof36
      @explodinghammeronthe17thof36 18 วันที่ผ่านมา

      oke!

  • @JcFerggy
    @JcFerggy 2 ปีที่แล้ว +857

    I've said something similar on a past video, but I would be interested in a patch for the vanilla SM64 that applies all these fixes you've done over the years. I'm sure most of the rom hacking community wouldn't have much use for it, but I think it would be a great technical showcase.

    • @DurradonXylles
      @DurradonXylles 2 ปีที่แล้ว +153

      Agreed. I'd love to see a more optimized Super Mario 64 with all of these fixes, optimizations, and enhancements Kaze and others have been able to pull off, and see it compared to the original game.

    • @SmashyPlays
      @SmashyPlays 2 ปีที่แล้ว +76

      Maybe that could make it run at 60fps on console one day but we can only dream

    • @dcvk6250
      @dcvk6250 2 ปีที่แล้ว +69

      @@SmashyPlays It *would* make it run on 60FPS on console, at least in optimal conditions. It's just a matter of patching the base game with the optimizations and loading the ROM onto a flashcart

    • @Thalesperes
      @Thalesperes 2 ปีที่แล้ว +37

      I was thinking the same thing, It would be awesome to play 60fps sm64 on real hardware

    • @TJBrumfield
      @TJBrumfield 2 ปีที่แล้ว +46

      Including your optimized Mario and coin models please.
      I also hope this code can be shared with the people working on the reverse engineered PC port.
      Sure PC hardware is more powerful, but now people are throwing new features in that port, higher resolution textures, higher poly models, etc. And then they're trying to get this port working on various different consoles with varying levels of processing power as well.

  • @KazeN64
    @KazeN64  2 ปีที่แล้ว +954

    Programmers complaining that loops are not faster than unrolled ones: Watch the bonus section of the video. This is an N64 hardware specific thing. (also yes, I do have the compilerflag to not unroll loops on)

    • @OverKart64
      @OverKart64 2 ปีที่แล้ว +172

      The Mario Kart 64 community sends it's thanks for documenting all this information and explaining the reasoning behind it.

    • @Monafide3305
      @Monafide3305 2 ปีที่แล้ว +20

      Any chance I can get a source for that Komm Susser Tod Cover lol

    • @weatherton
      @weatherton 2 ปีที่แล้ว +25

      @@OverKart64 60fps 4-player when?

    • @beerkegaard
      @beerkegaard 2 ปีที่แล้ว +10

      Bro you are legit a genius

    • @monseftheprince3857
      @monseftheprince3857 2 ปีที่แล้ว +6

      Can U Please Do Ocarina of time

  • @n64glennplant
    @n64glennplant 2 ปีที่แล้ว +328

    I’m glad you put the “ramble” as you call it at the end - I’m sure I’m not the only one who loves this stuff 🧐

    • @Nobbie248
      @Nobbie248 2 ปีที่แล้ว +2

      Glenn plant is here! I watch all your reviews

    • @TrueTydin
      @TrueTydin 2 ปีที่แล้ว +2

      Glenn! The man! The legend!

  • @tbtb66
    @tbtb66 2 ปีที่แล้ว +492

    The original team of like 20 did a pretty great job creating the game, console, controller, and even 2 more games at the same time
    But Kaze coming in and helping the game run better puts a smile on my face because it's like giving it new life, sorta like an old car

    • @ddnava96
      @ddnava96 2 ปีที่แล้ว +75

      Yeah, and the difference is that the og team had a deadline. Kaze is doing all of this during his free time and without any pressure to finish it asap

    • @TheKorenji
      @TheKorenji 2 ปีที่แล้ว +28

      @@ddnava96 yeah, but it's still surprising that a single guy could figure out so many things and fix them, Kaze doesn't have deadlines but neither he has a budget or a team, so it's respectable nonetheless.

    • @tokeivo
      @tokeivo 2 ปีที่แล้ว +34

      @@TheKorenji Are you a programmer? In my experience, as a web dev, what a budget and a team gives you, is primarily time. Having a deadline takes away time.
      And no finished project, was ever finished with the idea that it couldn't be optimized any more - if you're lucky, you reach "good enough".
      Another big difference is, that a team will always try to improve the product. Not "improve the rendering engine". They might have tried to implement one more power up or level given enough time. Or improved the controls. Or fixed the camera angles in the haunted house.
      So while super impressive, it's not at all surprising that someone could optimize SM64. Still, very impressive.

    • @TheKorenji
      @TheKorenji 2 ปีที่แล้ว +1

      @@tokeivo you shouldn't say it's not impressive, even if you're right. because I can assure you that most people around(or at least a lot of them) have been surprised by Kaze one way or another, one of those being, his programming skills, since not everyone is around his level.. to be fair, yeah, it is quite predictable that optimizing these games should be possible, but it is the sole dedication of this man what does it for me.

    • @tokeivo
      @tokeivo 2 ปีที่แล้ว +28

      @@TheKorenji i specifically mention twice that it's impressive. Dunno how you got the idea otherwise.

  • @KingPixelOfficial
    @KingPixelOfficial 2 ปีที่แล้ว +389

    There is a certain charm about watching a gameplay recorded from real hardware and one recorded from emulator, I’m not sure how to explain it tho

    • @mlalbaitero
      @mlalbaitero 2 ปีที่แล้ว +9

      The low res?

    • @KingPixelOfficial
      @KingPixelOfficial 2 ปีที่แล้ว +6

      @@mlalbaitero Maybe. That’s probably another one of the reasons

    • @trunkit8749
      @trunkit8749 2 ปีที่แล้ว +2

      The colors are dimmer, and it’s more pixelated.

    • @RizzlinHD
      @RizzlinHD 2 ปีที่แล้ว +16

      it's just fuzzy in a way that only analog video can be

    • @KingPixelOfficial
      @KingPixelOfficial 2 ปีที่แล้ว +2

      @@RizzlinHD Bingo! That’s most likely what I meant!

  • @ModernVintageGamer
    @ModernVintageGamer 2 ปีที่แล้ว +821

    great stuff!

    • @MocroGamers
      @MocroGamers 2 ปีที่แล้ว +4

      Wtf you commented on his video LULW 😆

    • @samuelthecamel
      @samuelthecamel 2 ปีที่แล้ว +34

      When even MVG is impressed, you know you've done something amazing

    • @danmanx2
      @danmanx2 2 ปีที่แล้ว +8

      @@samuelthecamel We need to talk about MVG and his love of Mario 64.

    • @raptorcitos
      @raptorcitos 2 ปีที่แล้ว +4

      He is a good programmer who knows about hardware limitations and optimizations.

    • @bes03c
      @bes03c 2 ปีที่แล้ว +7

      You know it is legit when MVG gives his stamp of approval.

  • @jimmyhirr5773
    @jimmyhirr5773 2 ปีที่แล้ว +639

    Optimizing math functions using linear algebra and an intimate knowledge of how the R4300i CPU works: 3ms
    Removing two unnecessary raycasts: 2.5ms
    Kaze: 😐

    • @jeremyie
      @jeremyie 2 ปีที่แล้ว +32

      its stupid sometimes

    • @vinesthemonkey
      @vinesthemonkey 2 ปีที่แล้ว +85

      The golden rule of optimization: profile first!!

    • @mariocamspam72
      @mariocamspam72 2 ปีที่แล้ว +9

      @@vinesthemonkey profiling sm64 without any debug symbols be like

    • @vinesthemonkey
      @vinesthemonkey 2 ปีที่แล้ว +18

      @@mariocamspam72 SM64 has a full decomp tho?

    • @buzinaocara
      @buzinaocara 2 ปีที่แล้ว

      If raycasts are that slow, maybe their collision geo could do with so w acceleration stucture...

  • @zheil9152
    @zheil9152 2 ปีที่แล้ว +74

    I’d pay good money to take a “Kaze teaches SM64 C programming” class for an intro to modding the game. I’m someone that works on IoT embedded systems, but graphics and games are a whole different animal

    • @snesmocha
      @snesmocha 2 ปีที่แล้ว +2

      just c coding in general, this man is already better than most modern developers in c++ with ue4

  • @SNESdrunk
    @SNESdrunk 2 ปีที่แล้ว +129

    This is really interesting. I wonder why they felt like the game needed three? vertical raycasts. I suppose that might just go with the territory of making stuff up as you go along

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +99

      yep this was clearly an oversight. it was the same raycast 3 times in different parts of the shadow processing. they should have passed the results down to the next function, but did not. i suppose the people implementing these 3 functinos each received specifications that did not include konwing the surface data.

    • @clementpoon120
      @clementpoon120 2 ปีที่แล้ว +9

      @@KazeN64 where the HELL is it in the code? im going crazy trying to optimise super mario 64 for 3ds right now and i cant bloody find it

    • @mariocamspam72
      @mariocamspam72 2 ปีที่แล้ว +2

      @@clementpoon120 they dont share the same codebase or render pipeline

    • @harrisonfackrell
      @harrisonfackrell 2 ปีที่แล้ว +13

      @@mariocamspam72 I think Clement is trying to optimize the *direct* Super Mario 64 port for the 3DS--the one that came out recently, after the source code leak--as opposed to _Super Mario 64 DS._

    • @AConquerorsVendetta
      @AConquerorsVendetta 9 หลายเดือนก่อน

      @@harrisonfackrell how are the ones for the ds and 3ds different?

  • @thekingofmoo4346
    @thekingofmoo4346 2 ปีที่แล้ว +73

    Could you make a rom hack of standard Mario 64 using all of these optimizations?

    • @deyvien
      @deyvien 2 ปีที่แล้ว +7

      i wonder with all these optimizations if a 16:9 mod would run mostly at 30fps

    • @Oocca_Truth
      @Oocca_Truth 2 ปีที่แล้ว

      I don't know how possible that would be without essentially rewriting and recompiling the game code

    • @deyvien
      @deyvien 2 ปีที่แล้ว +1

      @@Oocca_Truthwouldn't these optimizations be public via recompiling rewritten code? Unless you're saying that 16:9 support requires a bunch of rewriting, which I don't think does given Everdrive / GameShark codes existing.

  • @CAEC64
    @CAEC64 2 ปีที่แล้ว +41

    the optimization is REAL!!

    • @Dozaemone
      @Dozaemone 2 ปีที่แล้ว +4

      Ubisoft: opti-what?

    • @WeegeepieYT
      @WeegeepieYT 2 ปีที่แล้ว +1

      True

  • @bengoodwin2141
    @bengoodwin2141 2 ปีที่แล้ว +93

    "the compiler wasn't good enough so I rewrote this function in assembly" what a madlad

  • @tapedex
    @tapedex 2 ปีที่แล้ว +62

    Now we just need a DeLorean so we can go back to 1995 and give a VHS of this upload to all N64 developers. 😅

    • @gblargg
      @gblargg 2 ปีที่แล้ว +5

      They still wouldn't have enabled GCC optimizations unless you show a video that using the compiler of the time didn't make a build with bugs.

    • @WestHaddnin
      @WestHaddnin 2 ปีที่แล้ว

      Lol

  • @mario493
    @mario493 2 ปีที่แล้ว +44

    I think you should add a minecart from the outside of Bowser's Blazing Burrows. That would make sense how Mario got hop in the cart.

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +47

      that is the plan

    • @mario493
      @mario493 2 ปีที่แล้ว +16

      @@KazeN64 good. Really appreciate the efforts you working on this major rom hack!

  • @j_mes
    @j_mes 2 ปีที่แล้ว +107

    10:50 WHEEZE; bypassing compilation to save time is such a Kaze solution, well done! Just curious what sort of changes in performance would your changes bring to vanilla SM64? I'm just imagining a world where I can look dead on in Fire-Sea or Bowser's Sub in DDD without it being a slideshow ha
    You're a titan brother

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +60

      those 2 levels could definitely be lagless with a few more tweaks!

    • @t0lkki
      @t0lkki 2 ปีที่แล้ว +3

      @@KazeN64 not drawing the whole sub each frame could bring the framerate back up from a code perspective, but do you know how much of an impact it would have if the sub was made with a fraction of the triangles instead (without any changes to the code)?

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +18

      @@t0lkki the problem is not that the sub is drawn every frame. its the collision math. you could simply load it as permanent collision and it'd be fine. ive done this in my sm64 multiplayer and the sub in that game has a higher framerate in multiplayer than it'd usually have in singleplayer...

    • @t0lkki
      @t0lkki 2 ปีที่แล้ว +3

      @@KazeN64 oh that's interesting, doesn't that imply the sub was intended to move at some point? now that'd been a slideshow to watch!

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +21

      @@t0lkki if it moved, it'd be the exact same lag. i think the reason they didnt make it permanent collision is that it disappears on a later act and simply didnt have a function to make it permanent collision only on certain acts. it takes 2 minutes to fix though. i think it was just programmer stupidity.

  • @le9038
    @le9038 2 ปีที่แล้ว +10

    The moment Kaze started to explain the code he rewrote, he just goes gigachad. Especially how he said he just decided to rewrite stuff in assembly...

    • @AConquerorsVendetta
      @AConquerorsVendetta 9 หลายเดือนก่อน

      I'm going to have to learn how to code just to understand this, as I've seen many people say the same sentiment

  • @samuelthecamel
    @samuelthecamel 2 ปีที่แล้ว +18

    Me almost to the end of the rambling section: Okay, this is impressive. It can't possibly get any more insane...
    Kaze: So I started coding in assembly

  • @Tisisrealnow
    @Tisisrealnow 2 ปีที่แล้ว +13

    "Yea i like my gameplay optimized*

  • @Bizzozeron
    @Bizzozeron 2 ปีที่แล้ว +12

    Those bonus bones are also present in Melee models, I theorize that they're probably referential bones because the bones have issues tracking their relationships to their original location, they're commonly found in shoulders and thighs, things protruding from the base st ructure

  • @mat_max
    @mat_max 2 ปีที่แล้ว +1

    This footage looks like some fever dream direct-to-video Mario 64 sequel jahsnahakahkahsjahaksj

  • @MisterN1
    @MisterN1 2 ปีที่แล้ว +7

    Based assembly dev.

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +9

      virgin high level programming fan vs chad asm god

    • @MisterN1
      @MisterN1 2 ปีที่แล้ว +6

      @@KazeN64 NOOOOOOOOOO! YOU CAN'T JUST HAVE AN INTRICATE LEVEL OF KNOWLEDGE ABOUT HOW THE CPU WORKS
      Chad Asm god: Duh huh computer go brrrrr faster.

  • @MrCrabs101
    @MrCrabs101 2 ปีที่แล้ว +1

    love the eva music for the code

  • @gumgrapes
    @gumgrapes 2 ปีที่แล้ว +6

    Absolutely amazing Kaze. I really hope to learn from you someday.

  • @veritassdg
    @veritassdg 2 ปีที่แล้ว +13

    "crack head version" my programming in a nutshell

  • @TurquoiseIcy
    @TurquoiseIcy 2 ปีที่แล้ว

    That Komm Susser Tod segment was so emotionally dissonant, I love it.

  • @KazeN64
    @KazeN64  2 ปีที่แล้ว +31

    If you'd like more Kaze content, definitely check out my new 2nd/backup channel! I upload stuff that wouldn't fit the main channel here!!!
    th-cam.com/video/qMQZJjt90xI/w-d-xo.html

  • @MarmaladeMaki
    @MarmaladeMaki ปีที่แล้ว +1

    Dud rambling about C code might be one of my favorite things. Very interesting if you are learning C / C++.
    Need more!

  • @gudenau
    @gudenau 2 ปีที่แล้ว +1

    I'm so glad you did the part at the end. Some of those changes I didn't know how they could be faster than what the compiler should output.

  • @gooby9306
    @gooby9306 2 ปีที่แล้ว

    god bless that vaporwave cover of komm susser tod

  • @nahuelvazquez2241
    @nahuelvazquez2241 ปีที่แล้ว +1

    seeing your optimizations, if you saw my "port" of the pipfall minigame from Fallout4 to the 68K processor, you'd kill me

  • @supernuke
    @supernuke ปีที่แล้ว

    that komm susser todd remix came outta nowhere lmao

  • @kemox
    @kemox 2 ปีที่แล้ว +2

    You are a legend, i like your vec optimizations specially the one done in assembly

  • @Aunarky
    @Aunarky 2 ปีที่แล้ว

    Really well thought out and well constructed video, I hope you make more of this! It was such a treat to watch :)

  • @SirSethery
    @SirSethery 2 ปีที่แล้ว +16

    A 60fps Mario 64 on original hardware would be incredible. Too bad none of this improves the renderer. Super cool nonetheless.

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +25

      this frees up some memory reads, meaning the renderer is also slightly sped up! 60fps sm64 is within reach.

    • @psgamer-il2pt
      @psgamer-il2pt 7 หลายเดือนก่อน +2

      Bo do I have news for you!

  • @YEWCHENGYINMoe
    @YEWCHENGYINMoe ปีที่แล้ว +2

    You praise them at first and then proceed to roast them.

  • @johnatangonzalez9099
    @johnatangonzalez9099 2 ปีที่แล้ว

    dude keep it up, I love these technical videos of yours

  • @StupidGamer360
    @StupidGamer360 2 ปีที่แล้ว +2

    you are one romhacker

  • @joseberger7737
    @joseberger7737 4 หลายเดือนก่อน +1

    i have modifed the code myself and found the 3 raycasts,
    I UNDERSTAND YOUR ANGER
    the first one i found was if mario is over water cast the shadow on it, if he is in it, cast the shadow at the bottom, not only is this not how water works but it runs faster if you just cast it at the bottom realistically
    the next one was for objects with 4 sided shadows which was identical to the last one for round shadows, i expected to do some work to make it cast 4 sided but no, if the identical function was already in cache then it would still have to load another one

    • @KazeN64
      @KazeN64  4 หลายเดือนก่อน +1

      there's more raycasts...
      there's also a few raycasts to get the floor height around the shadow (instead of using the surface normals...) and there is FIVE raycasts during mario's step function. plus every object has a seperate raycast for it's physics and graphics even though both will have identical results.

    • @joseberger7737
      @joseberger7737 4 หลายเดือนก่อน +1

      @@KazeN64 O_O

  • @shadoninja
    @shadoninja 2 ปีที่แล้ว +8

    You should have done a TAS side by side before/after to show the visual difference

  • @wallabra
    @wallabra 2 ปีที่แล้ว +1

    This is great, very well done!
    What about clang?

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +1

      who's clang

  • @easyaspi31415
    @easyaspi31415 2 ปีที่แล้ว +14

    7:17 Have you considered the restrict keyword? Without the restrict keyword, by the laws of the C standard, it must assume that dest and src overlap.
    So, for example, let's say you did
    float x[4] = { 1, 2, 3, 4 };
    copy(&x[1], &x[0]);
    If GCC loaded first then stored, it would end up in 1 1 2 3, but the C standard says it should be 1 1 1 1.
    The restrict keyword says "these are never going to overlap" and therefore it doesn't need to worry about that.

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +13

      yeah, that would have worked. im no expert at C so i had no idea until i saw a few comments like this.

    • @ssl3546
      @ssl3546 2 ปีที่แล้ว +6

      @@KazeN64 "restrict" was introduced in C99, it did not exist when Mario 64 was being made. I assume there was a GNU extension prior to 1999 but I don't know when.

    • @easyaspi31415
      @easyaspi31415 2 ปีที่แล้ว +1

      @@ssl3546 they didn't use GCC so I doubt it

  • @SilphBoss
    @SilphBoss 2 ปีที่แล้ว

    fly me to the moon & phonk in the same vid? fuckin BANGERS kaze

  • @Foopums
    @Foopums 2 ปีที่แล้ว

    Yoshi shaking his ass at the end had me weak LMAO

  • @dreamfright4066
    @dreamfright4066 2 ปีที่แล้ว +1

    the guy is a sm64 modder who knows code better than nasa dudes for real

  • @ddnava96
    @ddnava96 2 ปีที่แล้ว +1

    Topping it off with assembly. What a legend!

  • @prism223
    @prism223 ปีที่แล้ว

    This is picturesque example of the 80/20 rule.

  • @AdrienTD
    @AdrienTD 2 ปีที่แล้ว +46

    9:40 Kind of weird that GCC is not able optimize float moves to immediate int moves by itself. x86 compilers can do such optimizations, so maybe it's disabled on MIPS because it could cause unexpected behavior? Or maybe that would only work in C++? Or maybe I just don't get it 🤔

    • @alfiegordon9013
      @alfiegordon9013 2 ปีที่แล้ว +4

      Or it’s just gcc being a pile of suck

    • @Ehal256
      @Ehal256 2 ปีที่แล้ว +18

      My guess is gcc isn't as good on most other platforms as it is on x86. My megadrive project uses gcc but I have to inspect the compiler's output and use inline asm often in performance critical areas to get around gcc's poor 68k codegen.

    • @Mobius14
      @Mobius14 2 ปีที่แล้ว +8

      @@Ehal256 MIPS support on GCC is actually not super duper hot like it is on x86. IDO on O2 is actually pretty good for being a 1994-1996 compiler and GCC here is only *marginally* better on the default settings, although you can get a lot more out of it by being flag specific like Kaze is doing.

    • @D0Samp
      @D0Samp 2 ปีที่แล้ว +5

      Maybe it doesn't realize it actually can save instructions (and especially loads) since loading a single precision float is a two-step process but so is a 32-bit immediate, since classic RISC instructions only take 16 bits of data at once. But since all the lower bits of the representation of 1.0f are zero, it can be loaded in a single step (lui $2,16256). PowerPC has a similar issue, they only figured that out for ARM. Old x86 cheats by having a FLD1 instruction.

  • @tux1468
    @tux1468 5 หลายเดือนก่อน

    That bit about C programming just reminds me of how much improvement needs to be done to the compilers optimization algorithm

  • @thecozies
    @thecozies 2 ปีที่แล้ว

    thanks for the shout out dude!

  • @qwertyioup195
    @qwertyioup195 2 ปีที่แล้ว +3

    As someone who only had taken an intro to C++ course, the only thought I had was “oh that’s a void function. That’s neat.”

  • @kingofthegrapes
    @kingofthegrapes 2 ปีที่แล้ว +1

    I love these vids you do about optimizations

  • @BottomOfTheDumpsterFire
    @BottomOfTheDumpsterFire 2 ปีที่แล้ว

    2:00 joke's on you, I'm a graphics programmer and I'm invested

  • @ChaunceyGardener
    @ChaunceyGardener 8 หลายเดือนก่อน

    SM64 needs a PSX port. Bubsy 3D proves it's feasible.

  • @rubixtheslime
    @rubixtheslime 2 ปีที่แล้ว +50

    "my compiler will optimize it all, I don't need to understand the cpu"
    So much this. Been working on rewriting the Kociemba algorithm (for cube solving) in assembly for modern hardware, though of course I haven't finished it because ADHD had other plans. For the record, Kociemba definitively understood how CPU's work when he wrote his C implementation, BUT turns out once you get AVX involved and do it in assembly you can reach 1 billion turns per second on a laptop. Though seriously, throughout this process I've learned that even if you don't use any asm, understanding those details under the hood is absolutely critical. Both for performance _and_ readability.

    • @locklear308
      @locklear308 2 ปีที่แล้ว +6

      It's sad, so much CPU power now days is basically wasted.
      I would LOVE to see something assembly based running on a modern CPU. Like you know those old fun little demo things you could run on a commodore 64? Where you could like run colors across the screen and you can see how much faster machine language was than basic?

    • @rubixtheslime
      @rubixtheslime 2 ปีที่แล้ว +7

      @@locklear308 you might want to check out the channel "What's a Creel?" He does a fair bit of assembly and it's where I learned a lot of what I know.

    • @locklear308
      @locklear308 2 ปีที่แล้ว +1

      @@rubixtheslime oh thanks man that sounds dope!

    • @johnsimon8457
      @johnsimon8457 2 ปีที่แล้ว

      I'm trying to find a video of Michael Abrash (of quake/Oculus/Graphics Programming Black Book) and how he optimized the snot out of a naive implementation Conway's game of life 1000x over.
      > BUT turns out once you get AVX involved
      oh, sure once you get hardware support the sky's the limit. It's a world of difference from an early 90s MIPS

  • @patrickgh3
    @patrickgh3 2 ปีที่แล้ว +12

    Great video! I really appreciate how you emphasize the context that the original code was written in, and how it's different from the context you're making these improvements in. Like the disclaimer at the very start of the video, and how you brought in an actual developer of the game to ask about the compiler options!
    Personally I might have given the 3 raycasts thing more slack, or at least not pinning it on 1 theoretical person. As a complete hypothetical, maybe the extra raycasts were to fix bugs that occured in 1 specific level, and the team was on a deadline, so they chose that fix. That said, I could be wrong, and you're the one who's seen the actual code. Anyways, I appreciate all the attention given to the original developer context throughout the video.

  • @NutyRiver
    @NutyRiver 2 ปีที่แล้ว +3

    Unrelated but I adore what you did with the vertex shading. It makes things look so vibrant and lively. Very Spyro-esque, which is a great thing in my opinion!

  • @lordzooq8987
    @lordzooq8987 2 ปีที่แล้ว

    Instantly subbed

  • @eyeiaye
    @eyeiaye 2 ปีที่แล้ว

    Really incredible work!

  • @XychoLight
    @XychoLight 2 ปีที่แล้ว

    Super interesting, thank you!

  • @KabaroOrabak
    @KabaroOrabak 2 ปีที่แล้ว +2

    This guy is actually insane

  • @richanddarksbane1439
    @richanddarksbane1439 2 ปีที่แล้ว

    Incredible stuff you've done here!

  • @RADkate
    @RADkate 2 ปีที่แล้ว +26

    3:20 im pretty new to programming but i guess they used the 3 raycasts to calculate the angle of the ground instead of just getting the normal direction from the ground below?

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +38

      no, they used all 3 to get the floorheight and slopedness. like i said in the video, you can do it in 1 without changing the shadow by a single pixel.

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +31

      (as in, all of the 3 raycasts are straight down from the same position)

    • @thegreatautismo224
      @thegreatautismo224 2 ปีที่แล้ว +13

      oh lol. I was thinking it might've been like one for the height, one ahead of it for the slope one way, and one to the side for the slope the other way, but I guess it is just redundant lol

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +22

      @@thegreatautismo224 yep, unfortunately it is haha. it does do what you've described for mario's shadow specifically, i did keep that one in tact.

    • @renakunisaki
      @renakunisaki 2 ปีที่แล้ว +22

      I remember noticing that it calculates Mario's circular shadow using Pi to something like 20 digits of precision. With how low resolution this game is they probably could have got away with just using 3.0 :D

  • @ZygalStudios
    @ZygalStudios 2 ปีที่แล้ว +8

    The left side of the picture at 0:38 about sums up my frustrations with programming culture nowadays 🤣
    Very great insight! Loved the video!
    Utilizing the hardware provided will always provide a better solution than expecting the compiler to do things for you.
    The compiler can only reason about such a small portion of your code even with optimizations turned on.
    ie.. USE THE CACHES AS INTENDED.
    For this case you almost chopped the time in half by using the cache better and shaved miniscule amounts of time from using the compiler optimizations.

  • @btarg1
    @btarg1 2 ปีที่แล้ว +142

    Making this open-source could help optimise the code even further, but this is really pushing the hardware to its limits! Can't wait for more

    • @Henrix1998
      @Henrix1998 2 ปีที่แล้ว +18

      I doubt their team wouldn't already have everyone deeply enough interested to this topic to be helpful

    • @gralha_
      @gralha_ ปีที่แล้ว +5

      It would be a sure path to a C&D from nintendo

  • @DoobooDomo
    @DoobooDomo 2 ปีที่แล้ว +8

    For the first optimization (load/store x3 v. load x3 + store x3), I would guess that this is an aliasing issue, and could be solved more simply with the restrict keyword. Good stuff!

    • @gblargg
      @gblargg 2 ปีที่แล้ว +1

      Came here to say this. That can help almost all these cases, since the optimizer must otherwise assume the worst case that every store could modify any other value you're loading.

    • @reeeeeeeeemmmmmmmmmm
      @reeeeeeeeemmmmmmmmmm 2 ปีที่แล้ว

      Yep exactly this. For arguments that overlap in memory the optimization will cause different behaviour, so the compiler can't apply it without you pinky-promising that they don't.

    • @angeldude101
      @angeldude101 2 ปีที่แล้ว

      I was thinking that the compiler should be able to optimize a simple memcpy like that, but I forgot that C allowed mutable aliasing.

  • @iamdarkyoshi
    @iamdarkyoshi 2 ปีที่แล้ว

    Impressive work dude!!

  • @Yoshideking
    @Yoshideking 2 ปีที่แล้ว +2

    I love the fact we are cousins. Congrats on your constant work modding the 64.

  • @Tekape
    @Tekape 2 ปีที่แล้ว

    nice taste in music ;D

  • @wolfcl0ck
    @wolfcl0ck 2 ปีที่แล้ว

    absolutely insane, I love this

  • @bioman1hazard607
    @bioman1hazard607 2 ปีที่แล้ว

    You are absolutely amazing dude.

  • @AssailantLF
    @AssailantLF 2 ปีที่แล้ว +3

    God damn I love in-depth technical videos relating to video game software. Thanks Kaze for being so inspirational and awesome.

  • @CRITICALHITRU
    @CRITICALHITRU 7 หลายเดือนก่อน

    10:54
    Compiler: am I a joke to you?
    Some deranged Yoshi nerd: yes.

  • @CatinaJacket
    @CatinaJacket 2 ปีที่แล้ว

    "The compiler is not good enough so we do it ourselves" You're so fucking cool tbh

  • @OrdinaryAVX
    @OrdinaryAVX 2 ปีที่แล้ว

    Wow, this is some amazing work!

  • @peeyur
    @peeyur ปีที่แล้ว +2

    The song used in the Bonus is:
    PARADIGMA (Remix) - by MC ORSEN

  • @grizzoo
    @grizzoo 2 ปีที่แล้ว

    Thanks, I appreciate these videos just as much as your others.

  • @exotictoast9931
    @exotictoast9931 2 ปีที่แล้ว

    Nintendo should hire this man

  • @CosmicFox2007
    @CosmicFox2007 2 ปีที่แล้ว

    i like how you made the letters the same color as the numbers

  • @ProTayToeGamer
    @ProTayToeGamer 2 ปีที่แล้ว +15

    Didn't understand a thing you said but I enjoyed watching this while thing while eating lunch!

  • @baklojan5933
    @baklojan5933 2 ปีที่แล้ว

    Insane

  • @xeridea
    @xeridea 2 ปีที่แล้ว +4

    The pains of working on older hardware. The load/store issue is much less of an issue with out of order CPUs. They rearrange instructions on the fly to try to prevent issues such as memory stalls. This was also the early days of realtime 3D rendering, and not as many shortcuts were known.

    • @Thelango99
      @Thelango99 2 ปีที่แล้ว

      Even the XBOX 360 CPU used in-order execution.

  • @Seifer_42
    @Seifer_42 2 ปีที่แล้ว

    Kaze this is incredible

  • @WhoNoMe
    @WhoNoMe 2 ปีที่แล้ว

    Interesting vid.

  • @PEACEWALKER1992
    @PEACEWALKER1992 2 ปีที่แล้ว

    Amazing Work! :)

  • @rabidduck1089
    @rabidduck1089 ปีที่แล้ว +1

    I'm waiting for someone to do something like this to Golden Eye or Starfox 64.

  • @madghostek3026
    @madghostek3026 2 ปีที่แล้ว

    Very pog, thanks for the nerdy part

  • @drewynucci9037
    @drewynucci9037 ปีที่แล้ว

    What I’m really waiting for is a ridiculously tight and compiled mario rom I can run on my stock n64 at 60fps (or whatever it would be with this insanely optimized code)

  • @WACKA_WACKA
    @WACKA_WACKA 2 ปีที่แล้ว

    2:29 komm susser todd by astrophysics i see, great music taste.

  • @mattmar96
    @mattmar96 2 ปีที่แล้ว

    I love this thank you for sharing

  • @MBloke
    @MBloke 2 ปีที่แล้ว +1

    Man I can't wait until you get your hands on OoT's code. Gonna be so awesome.

  • @Tobii64
    @Tobii64 2 ปีที่แล้ว

    Dude you are insane in programming holy moly

  • @AdrianDX
    @AdrianDX 2 ปีที่แล้ว

    Great work!👏🏻👏🏻👏🏻👏🏻

  • @BIG_CLARKY
    @BIG_CLARKY 2 ปีที่แล้ว

    that image at the beginning speaks volumes. be more like the 1996 guy.

  • @miserablepile
    @miserablepile 2 ปีที่แล้ว

    This is awesome!

  • @fders938
    @fders938 2 ปีที่แล้ว

    I can't stop watching the stuff about the math function optimizations, that stuff is really interesting to me.

  • @richardg8376
    @richardg8376 2 ปีที่แล้ว +2

    Every now and again its good to be reminded, as I sit there looking smug because of some optimisation I've done in the game I'm making, that there is a whole other level of big-brain optimisations I don't even know exists.

  • @crimsama2451
    @crimsama2451 2 ปีที่แล้ว +1

    Awesome to see! The footage shown looks like a 3ds game! Insane improvements imo.

  • @lior_haddad
    @lior_haddad 2 ปีที่แล้ว +10

    Ooo! Coding video! My favourite kind!
    That level of optimization is insane! really cool!

  • @dytallixx1268
    @dytallixx1268 2 ปีที่แล้ว

    Your a genius kaze