Why Mario 64's Render Speed BLOWS

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 เม.ย. 2024
  • 0:00 Introduction
    0:53 Chapter1: Compiler Optimization
    1:50 Chapter2: Math Functions
    2:58 Chapter3: Shadows
    3:53 Chapter4: The Instruction Cache
    4:50 Chapter5: Animated Bones
    5:31 Chapter6: The RDRAM
    6:19 Conclusion
    7:01 Bonus: Insane Man Rambling On About C Code
    Yes, this would help with 60FPS Console SM64.
    Subscribe for more Retro Mods!
    Patreon: / kazestuff
    🎥 / kazesm64
    🐦 / kazeemanuar
  • เกม

ความคิดเห็น • 844

  • @KazeN64
    @KazeN64  6 หลายเดือนก่อน +19

    Unlisting this video because of how outdated this is and how much higher quality the new stuff is.

    • @Mint25pop
      @Mint25pop 6 หลายเดือนก่อน

      newer stuff is very high quality!

    • @fido8542
      @fido8542 หลายเดือนก่อน

      this is unlisted? no idea how i got here

    • @explodinghammeronthe17thof36
      @explodinghammeronthe17thof36 21 วันที่ผ่านมา

      oke!

    • @Vextrove
      @Vextrove 10 ชั่วโมงที่ผ่านมา

      Nooo

  • @JcFerggy
    @JcFerggy 2 ปีที่แล้ว +857

    I've said something similar on a past video, but I would be interested in a patch for the vanilla SM64 that applies all these fixes you've done over the years. I'm sure most of the rom hacking community wouldn't have much use for it, but I think it would be a great technical showcase.

    • @DurradonXylles
      @DurradonXylles 2 ปีที่แล้ว +153

      Agreed. I'd love to see a more optimized Super Mario 64 with all of these fixes, optimizations, and enhancements Kaze and others have been able to pull off, and see it compared to the original game.

    • @SmashyPlays
      @SmashyPlays 2 ปีที่แล้ว +76

      Maybe that could make it run at 60fps on console one day but we can only dream

    • @dcvk6250
      @dcvk6250 2 ปีที่แล้ว +69

      @@SmashyPlays It *would* make it run on 60FPS on console, at least in optimal conditions. It's just a matter of patching the base game with the optimizations and loading the ROM onto a flashcart

    • @Thalesperes
      @Thalesperes 2 ปีที่แล้ว +37

      I was thinking the same thing, It would be awesome to play 60fps sm64 on real hardware

    • @TJBrumfield
      @TJBrumfield 2 ปีที่แล้ว +46

      Including your optimized Mario and coin models please.
      I also hope this code can be shared with the people working on the reverse engineered PC port.
      Sure PC hardware is more powerful, but now people are throwing new features in that port, higher resolution textures, higher poly models, etc. And then they're trying to get this port working on various different consoles with varying levels of processing power as well.

  • @KingPixelOfficial
    @KingPixelOfficial 2 ปีที่แล้ว +389

    There is a certain charm about watching a gameplay recorded from real hardware and one recorded from emulator, I’m not sure how to explain it tho

    • @mlalbaitero
      @mlalbaitero 2 ปีที่แล้ว +9

      The low res?

    • @KingPixelOfficial
      @KingPixelOfficial 2 ปีที่แล้ว +6

      @@mlalbaitero Maybe. That’s probably another one of the reasons

    • @trunkit8749
      @trunkit8749 2 ปีที่แล้ว +2

      The colors are dimmer, and it’s more pixelated.

    • @RizzlinHD
      @RizzlinHD 2 ปีที่แล้ว +16

      it's just fuzzy in a way that only analog video can be

    • @KingPixelOfficial
      @KingPixelOfficial 2 ปีที่แล้ว +2

      @@RizzlinHD Bingo! That’s most likely what I meant!

  • @tbtb66
    @tbtb66 2 ปีที่แล้ว +492

    The original team of like 20 did a pretty great job creating the game, console, controller, and even 2 more games at the same time
    But Kaze coming in and helping the game run better puts a smile on my face because it's like giving it new life, sorta like an old car

    • @ddnava96
      @ddnava96 2 ปีที่แล้ว +75

      Yeah, and the difference is that the og team had a deadline. Kaze is doing all of this during his free time and without any pressure to finish it asap

    • @TheKorenji
      @TheKorenji 2 ปีที่แล้ว +28

      @@ddnava96 yeah, but it's still surprising that a single guy could figure out so many things and fix them, Kaze doesn't have deadlines but neither he has a budget or a team, so it's respectable nonetheless.

    • @tokeivo
      @tokeivo 2 ปีที่แล้ว +34

      @@TheKorenji Are you a programmer? In my experience, as a web dev, what a budget and a team gives you, is primarily time. Having a deadline takes away time.
      And no finished project, was ever finished with the idea that it couldn't be optimized any more - if you're lucky, you reach "good enough".
      Another big difference is, that a team will always try to improve the product. Not "improve the rendering engine". They might have tried to implement one more power up or level given enough time. Or improved the controls. Or fixed the camera angles in the haunted house.
      So while super impressive, it's not at all surprising that someone could optimize SM64. Still, very impressive.

    • @TheKorenji
      @TheKorenji 2 ปีที่แล้ว +1

      @@tokeivo you shouldn't say it's not impressive, even if you're right. because I can assure you that most people around(or at least a lot of them) have been surprised by Kaze one way or another, one of those being, his programming skills, since not everyone is around his level.. to be fair, yeah, it is quite predictable that optimizing these games should be possible, but it is the sole dedication of this man what does it for me.

    • @tokeivo
      @tokeivo 2 ปีที่แล้ว +28

      @@TheKorenji i specifically mention twice that it's impressive. Dunno how you got the idea otherwise.

  • @n64glennplant
    @n64glennplant 2 ปีที่แล้ว +328

    I’m glad you put the “ramble” as you call it at the end - I’m sure I’m not the only one who loves this stuff 🧐

    • @Nobbie248
      @Nobbie248 2 ปีที่แล้ว +2

      Glenn plant is here! I watch all your reviews

    • @TrueTydin
      @TrueTydin 2 ปีที่แล้ว +2

      Glenn! The man! The legend!

  • @KazeN64
    @KazeN64  2 ปีที่แล้ว +954

    Programmers complaining that loops are not faster than unrolled ones: Watch the bonus section of the video. This is an N64 hardware specific thing. (also yes, I do have the compilerflag to not unroll loops on)

    • @OverKart64
      @OverKart64 2 ปีที่แล้ว +172

      The Mario Kart 64 community sends it's thanks for documenting all this information and explaining the reasoning behind it.

    • @Monafide3305
      @Monafide3305 2 ปีที่แล้ว +20

      Any chance I can get a source for that Komm Susser Tod Cover lol

    • @weatherton
      @weatherton 2 ปีที่แล้ว +25

      @@OverKart64 60fps 4-player when?

    • @beerkegaard
      @beerkegaard 2 ปีที่แล้ว +10

      Bro you are legit a genius

    • @monseftheprince3857
      @monseftheprince3857 2 ปีที่แล้ว +6

      Can U Please Do Ocarina of time

  • @zheil9152
    @zheil9152 2 ปีที่แล้ว +74

    I’d pay good money to take a “Kaze teaches SM64 C programming” class for an intro to modding the game. I’m someone that works on IoT embedded systems, but graphics and games are a whole different animal

    • @snesmocha
      @snesmocha 2 ปีที่แล้ว +2

      just c coding in general, this man is already better than most modern developers in c++ with ue4

  • @ModernVintageGamer
    @ModernVintageGamer 2 ปีที่แล้ว +821

    great stuff!

    • @MocroGamers
      @MocroGamers 2 ปีที่แล้ว +4

      Wtf you commented on his video LULW 😆

    • @samuelthecamel
      @samuelthecamel 2 ปีที่แล้ว +34

      When even MVG is impressed, you know you've done something amazing

    • @danmanx2
      @danmanx2 2 ปีที่แล้ว +8

      @@samuelthecamel We need to talk about MVG and his love of Mario 64.

    • @raptorcitos
      @raptorcitos 2 ปีที่แล้ว +4

      He is a good programmer who knows about hardware limitations and optimizations.

    • @bes03c
      @bes03c 2 ปีที่แล้ว +7

      You know it is legit when MVG gives his stamp of approval.

  • @jimmyhirr5773
    @jimmyhirr5773 2 ปีที่แล้ว +639

    Optimizing math functions using linear algebra and an intimate knowledge of how the R4300i CPU works: 3ms
    Removing two unnecessary raycasts: 2.5ms
    Kaze: 😐

    • @jeremyie
      @jeremyie 2 ปีที่แล้ว +32

      its stupid sometimes

    • @vinesthemonkey
      @vinesthemonkey 2 ปีที่แล้ว +85

      The golden rule of optimization: profile first!!

    • @mariocamspam72
      @mariocamspam72 2 ปีที่แล้ว +9

      @@vinesthemonkey profiling sm64 without any debug symbols be like

    • @vinesthemonkey
      @vinesthemonkey 2 ปีที่แล้ว +18

      @@mariocamspam72 SM64 has a full decomp tho?

    • @buzinaocara
      @buzinaocara 2 ปีที่แล้ว

      If raycasts are that slow, maybe their collision geo could do with so w acceleration stucture...

  • @SNESdrunk
    @SNESdrunk 2 ปีที่แล้ว +129

    This is really interesting. I wonder why they felt like the game needed three? vertical raycasts. I suppose that might just go with the territory of making stuff up as you go along

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +99

      yep this was clearly an oversight. it was the same raycast 3 times in different parts of the shadow processing. they should have passed the results down to the next function, but did not. i suppose the people implementing these 3 functinos each received specifications that did not include konwing the surface data.

    • @clementpoon120
      @clementpoon120 2 ปีที่แล้ว +9

      @@KazeN64 where the HELL is it in the code? im going crazy trying to optimise super mario 64 for 3ds right now and i cant bloody find it

    • @mariocamspam72
      @mariocamspam72 2 ปีที่แล้ว +2

      @@clementpoon120 they dont share the same codebase or render pipeline

    • @harrisonfackrell
      @harrisonfackrell 2 ปีที่แล้ว +13

      @@mariocamspam72 I think Clement is trying to optimize the *direct* Super Mario 64 port for the 3DS--the one that came out recently, after the source code leak--as opposed to _Super Mario 64 DS._

    • @AConquerorsVendetta
      @AConquerorsVendetta 9 หลายเดือนก่อน

      @@harrisonfackrell how are the ones for the ds and 3ds different?

  • @thekingofmoo4346
    @thekingofmoo4346 2 ปีที่แล้ว +73

    Could you make a rom hack of standard Mario 64 using all of these optimizations?

    • @deyvien
      @deyvien 2 ปีที่แล้ว +7

      i wonder with all these optimizations if a 16:9 mod would run mostly at 30fps

    • @Oocca_Truth
      @Oocca_Truth 2 ปีที่แล้ว

      I don't know how possible that would be without essentially rewriting and recompiling the game code

    • @deyvien
      @deyvien 2 ปีที่แล้ว +1

      @@Oocca_Truthwouldn't these optimizations be public via recompiling rewritten code? Unless you're saying that 16:9 support requires a bunch of rewriting, which I don't think does given Everdrive / GameShark codes existing.

  • @mario493
    @mario493 2 ปีที่แล้ว +44

    I think you should add a minecart from the outside of Bowser's Blazing Burrows. That would make sense how Mario got hop in the cart.

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +47

      that is the plan

    • @mario493
      @mario493 2 ปีที่แล้ว +16

      @@KazeN64 good. Really appreciate the efforts you working on this major rom hack!

  • @tapedex
    @tapedex 2 ปีที่แล้ว +62

    Now we just need a DeLorean so we can go back to 1995 and give a VHS of this upload to all N64 developers. 😅

    • @gblargg
      @gblargg 2 ปีที่แล้ว +5

      They still wouldn't have enabled GCC optimizations unless you show a video that using the compiler of the time didn't make a build with bugs.

    • @WestHaddnin
      @WestHaddnin 2 ปีที่แล้ว

      Lol

  • @j_mes
    @j_mes 2 ปีที่แล้ว +107

    10:50 WHEEZE; bypassing compilation to save time is such a Kaze solution, well done! Just curious what sort of changes in performance would your changes bring to vanilla SM64? I'm just imagining a world where I can look dead on in Fire-Sea or Bowser's Sub in DDD without it being a slideshow ha
    You're a titan brother

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +60

      those 2 levels could definitely be lagless with a few more tweaks!

    • @t0lkki
      @t0lkki 2 ปีที่แล้ว +3

      @@KazeN64 not drawing the whole sub each frame could bring the framerate back up from a code perspective, but do you know how much of an impact it would have if the sub was made with a fraction of the triangles instead (without any changes to the code)?

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +18

      @@t0lkki the problem is not that the sub is drawn every frame. its the collision math. you could simply load it as permanent collision and it'd be fine. ive done this in my sm64 multiplayer and the sub in that game has a higher framerate in multiplayer than it'd usually have in singleplayer...

    • @t0lkki
      @t0lkki 2 ปีที่แล้ว +3

      @@KazeN64 oh that's interesting, doesn't that imply the sub was intended to move at some point? now that'd been a slideshow to watch!

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +21

      @@t0lkki if it moved, it'd be the exact same lag. i think the reason they didnt make it permanent collision is that it disappears on a later act and simply didnt have a function to make it permanent collision only on certain acts. it takes 2 minutes to fix though. i think it was just programmer stupidity.

  • @CAEC64
    @CAEC64 2 ปีที่แล้ว +41

    the optimization is REAL!!

    • @Dozaemone
      @Dozaemone 2 ปีที่แล้ว +4

      Ubisoft: opti-what?

    • @WeegeepieYT
      @WeegeepieYT 2 ปีที่แล้ว +1

      True

  • @mat_max
    @mat_max 2 ปีที่แล้ว +1

    This footage looks like some fever dream direct-to-video Mario 64 sequel jahsnahakahkahsjahaksj

  • @peeyur
    @peeyur ปีที่แล้ว +2

    The song used in the Bonus is:
    PARADIGMA (Remix) - by MC ORSEN

  • @TurquoiseIcy
    @TurquoiseIcy 2 ปีที่แล้ว

    That Komm Susser Tod segment was so emotionally dissonant, I love it.

  • @MrCrabs101
    @MrCrabs101 2 ปีที่แล้ว +1

    love the eva music for the code

  • @nahuelvazquez2241
    @nahuelvazquez2241 ปีที่แล้ว +1

    seeing your optimizations, if you saw my "port" of the pipfall minigame from Fallout4 to the 68K processor, you'd kill me

  • @bengoodwin2141
    @bengoodwin2141 2 ปีที่แล้ว +93

    "the compiler wasn't good enough so I rewrote this function in assembly" what a madlad

  • @SilphBoss
    @SilphBoss 2 ปีที่แล้ว

    fly me to the moon & phonk in the same vid? fuckin BANGERS kaze

  • @gooby9306
    @gooby9306 2 ปีที่แล้ว

    god bless that vaporwave cover of komm susser tod

  • @BottomOfTheDumpsterFire
    @BottomOfTheDumpsterFire 2 ปีที่แล้ว

    2:00 joke's on you, I'm a graphics programmer and I'm invested

  • @Foopums
    @Foopums 2 ปีที่แล้ว

    Yoshi shaking his ass at the end had me weak LMAO

  • @ChaunceyGardener
    @ChaunceyGardener 8 หลายเดือนก่อน

    SM64 needs a PSX port. Bubsy 3D proves it's feasible.

  • @KazeN64
    @KazeN64  2 ปีที่แล้ว +31

    If you'd like more Kaze content, definitely check out my new 2nd/backup channel! I upload stuff that wouldn't fit the main channel here!!!
    th-cam.com/video/qMQZJjt90xI/w-d-xo.html

  • @supernuke
    @supernuke ปีที่แล้ว

    that komm susser todd remix came outta nowhere lmao

  • @StupidGamer360
    @StupidGamer360 2 ปีที่แล้ว +2

    you are one romhacker

  • @le9038
    @le9038 2 ปีที่แล้ว +10

    The moment Kaze started to explain the code he rewrote, he just goes gigachad. Especially how he said he just decided to rewrite stuff in assembly...

    • @AConquerorsVendetta
      @AConquerorsVendetta 9 หลายเดือนก่อน

      I'm going to have to learn how to code just to understand this, as I've seen many people say the same sentiment

  • @samuelthecamel
    @samuelthecamel 2 ปีที่แล้ว +18

    Me almost to the end of the rambling section: Okay, this is impressive. It can't possibly get any more insane...
    Kaze: So I started coding in assembly

  • @gumgrapes
    @gumgrapes 2 ปีที่แล้ว +6

    Absolutely amazing Kaze. I really hope to learn from you someday.

  • @AdrienTD
    @AdrienTD 2 ปีที่แล้ว +46

    9:40 Kind of weird that GCC is not able optimize float moves to immediate int moves by itself. x86 compilers can do such optimizations, so maybe it's disabled on MIPS because it could cause unexpected behavior? Or maybe that would only work in C++? Or maybe I just don't get it 🤔

    • @alfiegordon9013
      @alfiegordon9013 2 ปีที่แล้ว +4

      Or it’s just gcc being a pile of suck

    • @Ehal256
      @Ehal256 2 ปีที่แล้ว +18

      My guess is gcc isn't as good on most other platforms as it is on x86. My megadrive project uses gcc but I have to inspect the compiler's output and use inline asm often in performance critical areas to get around gcc's poor 68k codegen.

    • @Mobius14
      @Mobius14 2 ปีที่แล้ว +8

      @@Ehal256 MIPS support on GCC is actually not super duper hot like it is on x86. IDO on O2 is actually pretty good for being a 1994-1996 compiler and GCC here is only *marginally* better on the default settings, although you can get a lot more out of it by being flag specific like Kaze is doing.

    • @D0Samp
      @D0Samp 2 ปีที่แล้ว +5

      Maybe it doesn't realize it actually can save instructions (and especially loads) since loading a single precision float is a two-step process but so is a 32-bit immediate, since classic RISC instructions only take 16 bits of data at once. But since all the lower bits of the representation of 1.0f are zero, it can be loaded in a single step (lui $2,16256). PowerPC has a similar issue, they only figured that out for ARM. Old x86 cheats by having a FLD1 instruction.

  • @Bizzozeron
    @Bizzozeron 2 ปีที่แล้ว +12

    Those bonus bones are also present in Melee models, I theorize that they're probably referential bones because the bones have issues tracking their relationships to their original location, they're commonly found in shoulders and thighs, things protruding from the base st ructure

  • @MarmaladeMaki
    @MarmaladeMaki ปีที่แล้ว +1

    Dud rambling about C code might be one of my favorite things. Very interesting if you are learning C / C++.
    Need more!

  • @CRITICALHITRU
    @CRITICALHITRU 7 หลายเดือนก่อน

    10:54
    Compiler: am I a joke to you?
    Some deranged Yoshi nerd: yes.

  • @dreamfright4066
    @dreamfright4066 2 ปีที่แล้ว +1

    the guy is a sm64 modder who knows code better than nasa dudes for real

  • @tux1468
    @tux1468 5 หลายเดือนก่อน

    That bit about C programming just reminds me of how much improvement needs to be done to the compilers optimization algorithm

  • @MisterN1
    @MisterN1 2 ปีที่แล้ว +7

    Based assembly dev.

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +9

      virgin high level programming fan vs chad asm god

    • @MisterN1
      @MisterN1 2 ปีที่แล้ว +6

      @@KazeN64 NOOOOOOOOOO! YOU CAN'T JUST HAVE AN INTRICATE LEVEL OF KNOWLEDGE ABOUT HOW THE CPU WORKS
      Chad Asm god: Duh huh computer go brrrrr faster.

  • @gudenau
    @gudenau 2 ปีที่แล้ว +1

    I'm so glad you did the part at the end. Some of those changes I didn't know how they could be faster than what the compiler should output.

  • @WACKA_WACKA
    @WACKA_WACKA 2 ปีที่แล้ว

    2:29 komm susser todd by astrophysics i see, great music taste.

  • @veritassdg
    @veritassdg 2 ปีที่แล้ว +13

    "crack head version" my programming in a nutshell

  • @joseberger7737
    @joseberger7737 4 หลายเดือนก่อน +1

    i have modifed the code myself and found the 3 raycasts,
    I UNDERSTAND YOUR ANGER
    the first one i found was if mario is over water cast the shadow on it, if he is in it, cast the shadow at the bottom, not only is this not how water works but it runs faster if you just cast it at the bottom realistically
    the next one was for objects with 4 sided shadows which was identical to the last one for round shadows, i expected to do some work to make it cast 4 sided but no, if the identical function was already in cache then it would still have to load another one

    • @KazeN64
      @KazeN64  4 หลายเดือนก่อน +1

      there's more raycasts...
      there's also a few raycasts to get the floor height around the shadow (instead of using the surface normals...) and there is FIVE raycasts during mario's step function. plus every object has a seperate raycast for it's physics and graphics even though both will have identical results.

    • @joseberger7737
      @joseberger7737 4 หลายเดือนก่อน +1

      @@KazeN64 O_O

  • @prism223
    @prism223 ปีที่แล้ว

    This is picturesque example of the 80/20 rule.

  • @thecozies
    @thecozies 2 ปีที่แล้ว

    thanks for the shout out dude!

  • @RADkate
    @RADkate 2 ปีที่แล้ว +26

    3:20 im pretty new to programming but i guess they used the 3 raycasts to calculate the angle of the ground instead of just getting the normal direction from the ground below?

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +38

      no, they used all 3 to get the floorheight and slopedness. like i said in the video, you can do it in 1 without changing the shadow by a single pixel.

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +31

      (as in, all of the 3 raycasts are straight down from the same position)

    • @thegreatautismo224
      @thegreatautismo224 2 ปีที่แล้ว +13

      oh lol. I was thinking it might've been like one for the height, one ahead of it for the slope one way, and one to the side for the slope the other way, but I guess it is just redundant lol

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +22

      @@thegreatautismo224 yep, unfortunately it is haha. it does do what you've described for mario's shadow specifically, i did keep that one in tact.

    • @renakunisaki
      @renakunisaki 2 ปีที่แล้ว +22

      I remember noticing that it calculates Mario's circular shadow using Pi to something like 20 digits of precision. With how low resolution this game is they probably could have got away with just using 3.0 :D

  • @rabidduck1089
    @rabidduck1089 ปีที่แล้ว +1

    I'm waiting for someone to do something like this to Golden Eye or Starfox 64.

  • @philipphanslovsky5101
    @philipphanslovsky5101 ปีที่แล้ว

    I love programming and Mario 64. Perfect video for me.
    With every single program getting a rust implementation, when will we see Super Mario Rust?

  • @kemox
    @kemox 2 ปีที่แล้ว +2

    You are a legend, i like your vec optimizations specially the one done in assembly

  • @CatinaJacket
    @CatinaJacket 2 ปีที่แล้ว

    "The compiler is not good enough so we do it ourselves" You're so fucking cool tbh

  • @ninjapanda1018
    @ninjapanda1018 2 ปีที่แล้ว +2

    7:01 what is that song’s name?

  • @exotictoast9931
    @exotictoast9931 2 ปีที่แล้ว

    Nintendo should hire this man

  • @wallabra
    @wallabra 2 ปีที่แล้ว +1

    This is great, very well done!
    What about clang?

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +1

      who's clang

  • @drewynucci9037
    @drewynucci9037 ปีที่แล้ว

    What I’m really waiting for is a ridiculously tight and compiled mario rom I can run on my stock n64 at 60fps (or whatever it would be with this insanely optimized code)

  • @kingofthegrapes
    @kingofthegrapes 2 ปีที่แล้ว +1

    I love these vids you do about optimizations

  • @patrickgh3
    @patrickgh3 2 ปีที่แล้ว +12

    Great video! I really appreciate how you emphasize the context that the original code was written in, and how it's different from the context you're making these improvements in. Like the disclaimer at the very start of the video, and how you brought in an actual developer of the game to ask about the compiler options!
    Personally I might have given the 3 raycasts thing more slack, or at least not pinning it on 1 theoretical person. As a complete hypothetical, maybe the extra raycasts were to fix bugs that occured in 1 specific level, and the team was on a deadline, so they chose that fix. That said, I could be wrong, and you're the one who's seen the actual code. Anyways, I appreciate all the attention given to the original developer context throughout the video.

  • @easyaspi31415
    @easyaspi31415 2 ปีที่แล้ว +14

    7:17 Have you considered the restrict keyword? Without the restrict keyword, by the laws of the C standard, it must assume that dest and src overlap.
    So, for example, let's say you did
    float x[4] = { 1, 2, 3, 4 };
    copy(&x[1], &x[0]);
    If GCC loaded first then stored, it would end up in 1 1 2 3, but the C standard says it should be 1 1 1 1.
    The restrict keyword says "these are never going to overlap" and therefore it doesn't need to worry about that.

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +13

      yeah, that would have worked. im no expert at C so i had no idea until i saw a few comments like this.

    • @ssl3546
      @ssl3546 2 ปีที่แล้ว +6

      @@KazeN64 "restrict" was introduced in C99, it did not exist when Mario 64 was being made. I assume there was a GNU extension prior to 1999 but I don't know when.

    • @easyaspi31415
      @easyaspi31415 2 ปีที่แล้ว +1

      @@ssl3546 they didn't use GCC so I doubt it

  • @YEWCHENGYINMoe
    @YEWCHENGYINMoe ปีที่แล้ว +2

    You praise them at first and then proceed to roast them.

  • @AssailantLF
    @AssailantLF 2 ปีที่แล้ว +3

    God damn I love in-depth technical videos relating to video game software. Thanks Kaze for being so inspirational and awesome.

  • @Aunarky
    @Aunarky 2 ปีที่แล้ว

    Really well thought out and well constructed video, I hope you make more of this! It was such a treat to watch :)

  • @ddnava96
    @ddnava96 2 ปีที่แล้ว +1

    Topping it off with assembly. What a legend!

  • @shadoninja
    @shadoninja 2 ปีที่แล้ว +8

    You should have done a TAS side by side before/after to show the visual difference

  • @Tisisrealnow
    @Tisisrealnow 2 ปีที่แล้ว +13

    "Yea i like my gameplay optimized*

  • @SirSethery
    @SirSethery 2 ปีที่แล้ว +16

    A 60fps Mario 64 on original hardware would be incredible. Too bad none of this improves the renderer. Super cool nonetheless.

    • @KazeN64
      @KazeN64  2 ปีที่แล้ว +25

      this frees up some memory reads, meaning the renderer is also slightly sped up! 60fps sm64 is within reach.

    • @psgamer-il2pt
      @psgamer-il2pt 7 หลายเดือนก่อน +2

      Bo do I have news for you!

  • @NutyRiver
    @NutyRiver 2 ปีที่แล้ว +3

    Unrelated but I adore what you did with the vertex shading. It makes things look so vibrant and lively. Very Spyro-esque, which is a great thing in my opinion!

  • @johnatangonzalez9099
    @johnatangonzalez9099 2 ปีที่แล้ว

    dude keep it up, I love these technical videos of yours

  • @humanistwriting5477
    @humanistwriting5477 2 ปีที่แล้ว

    Nice. Although I only really cared about the crazy math at the end after that teaser montage.

  • @richanddarksbane1439
    @richanddarksbane1439 2 ปีที่แล้ว

    Incredible stuff you've done here!

  • @Tekape
    @Tekape 2 ปีที่แล้ว

    nice taste in music ;D

  • @ping5092
    @ping5092 7 หลายเดือนก่อน

    Based evangelion music.

  • @qwertyioup195
    @qwertyioup195 2 ปีที่แล้ว +3

    As someone who only had taken an intro to C++ course, the only thought I had was “oh that’s a void function. That’s neat.”

  • @eyeiaye
    @eyeiaye 2 ปีที่แล้ว

    Really incredible work!

  • @BIueharvest
    @BIueharvest 2 ปีที่แล้ว

    Is that Anne Reburns cover of Komm Susser Tod

  • @XychoLight
    @XychoLight 2 ปีที่แล้ว

    Super interesting, thank you!

  • @WhoNoMe
    @WhoNoMe 2 ปีที่แล้ว

    Interesting vid.

  • @RealRedRabbit
    @RealRedRabbit ปีที่แล้ว

    That math montage song sad as hell.

  • @shadowind30
    @shadowind30 2 ปีที่แล้ว +1

    Damn, you never stop surprising us.

  • @seebaastian
    @seebaastian 2 ปีที่แล้ว

    Those nintendo guys should hire you!

  • @lordzooq8987
    @lordzooq8987 2 ปีที่แล้ว

    Instantly subbed

  • @_DigitalCam
    @_DigitalCam ปีที่แล้ว +1

    What the heck Mario game is this?! I don’t remember this level in Mario 64?!

  • @stefanpedersen2988
    @stefanpedersen2988 2 ปีที่แล้ว

    Who performed the cover of komm susser todd at the bit with all the code?
    Awesome video dude. 😊

  • @DoobooDomo
    @DoobooDomo 2 ปีที่แล้ว +8

    For the first optimization (load/store x3 v. load x3 + store x3), I would guess that this is an aliasing issue, and could be solved more simply with the restrict keyword. Good stuff!

    • @gblargg
      @gblargg 2 ปีที่แล้ว +1

      Came here to say this. That can help almost all these cases, since the optimizer must otherwise assume the worst case that every store could modify any other value you're loading.

    • @reeeeeeeeemmmmmmmmmm
      @reeeeeeeeemmmmmmmmmm 2 ปีที่แล้ว

      Yep exactly this. For arguments that overlap in memory the optimization will cause different behaviour, so the compiler can't apply it without you pinky-promising that they don't.

    • @angeldude101
      @angeldude101 2 ปีที่แล้ว

      I was thinking that the compiler should be able to optimize a simple memcpy like that, but I forgot that C allowed mutable aliasing.

  • @OrdinaryAVX
    @OrdinaryAVX 2 ปีที่แล้ว

    Wow, this is some amazing work!

  • @iamdarkyoshi
    @iamdarkyoshi 2 ปีที่แล้ว

    Impressive work dude!!

  • @MBloke
    @MBloke 2 ปีที่แล้ว +1

    Man I can't wait until you get your hands on OoT's code. Gonna be so awesome.

  • @Seifer_42
    @Seifer_42 2 ปีที่แล้ว

    Kaze this is incredible

  • @wolfcl0ck
    @wolfcl0ck 2 ปีที่แล้ว

    absolutely insane, I love this

  • @ZygalStudios
    @ZygalStudios 2 ปีที่แล้ว +8

    The left side of the picture at 0:38 about sums up my frustrations with programming culture nowadays 🤣
    Very great insight! Loved the video!
    Utilizing the hardware provided will always provide a better solution than expecting the compiler to do things for you.
    The compiler can only reason about such a small portion of your code even with optimizations turned on.
    ie.. USE THE CACHES AS INTENDED.
    For this case you almost chopped the time in half by using the cache better and shaved miniscule amounts of time from using the compiler optimizations.

  • @Thirteen13551355
    @Thirteen13551355 2 ปีที่แล้ว

    That's some crazy stuff.

  • @Xaelyn
    @Xaelyn 2 ปีที่แล้ว

    Didn't expect Astrophysics' cover of komm, süsser tod to show up here of all places, but I'm all for it!

  • @KabaroOrabak
    @KabaroOrabak 2 ปีที่แล้ว +2

    This guy is actually insane

  • @grizzoo
    @grizzoo 2 ปีที่แล้ว

    Thanks, I appreciate these videos just as much as your others.

  • @moodysshuffle
    @moodysshuffle 2 ปีที่แล้ว

    yo what is that song during the montage, it sounds super familiar and also pops the hell off

  • @crimsama2451
    @crimsama2451 2 ปีที่แล้ว +1

    Awesome to see! The footage shown looks like a 3ds game! Insane improvements imo.

  • @madghostek3026
    @madghostek3026 2 ปีที่แล้ว

    Very pog, thanks for the nerdy part

  • @dissonanceparadiddle
    @dissonanceparadiddle 2 ปีที่แล้ว

    The section with bomb Mario looks unbelievable!! I can't believe how good the lightning and polygons look. It's very polished and clean. It's so surreal

  • @bioman1hazard607
    @bioman1hazard607 2 ปีที่แล้ว

    You are absolutely amazing dude.

  • @btarg1
    @btarg1 2 ปีที่แล้ว +142

    Making this open-source could help optimise the code even further, but this is really pushing the hardware to its limits! Can't wait for more

    • @Henrix1998
      @Henrix1998 2 ปีที่แล้ว +18

      I doubt their team wouldn't already have everyone deeply enough interested to this topic to be helpful

    • @gralha_
      @gralha_ ปีที่แล้ว +5

      It would be a sure path to a C&D from nintendo

  • @ButterMuttSquash
    @ButterMuttSquash 2 ปีที่แล้ว

    Thanks for the phonk blast

  • @ohnoitschris
    @ohnoitschris 2 ปีที่แล้ว

    man I love this deep diving into vidya code shit, excellent work