How Optimizations made Mario 64 SLOWER

Kaze Emanuar

มุมมอง 700 558

29 000

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 5 ก.พ. 2025
Patreon: / kazestuff
Streams: / @kazeclips
🐦 / kazeemanuar
MERCH: kazemerch.mysp...
Git for this project: github.com/Kaz...

ความคิดเห็น • 1.6K

@VideoGameBoxReviews 4 หลายเดือนก่อน ⁺⁸⁷⁷⁹
My nightmare is to have someone 30 years in the future totally roast my code
@alt0248 4 หลายเดือนก่อน ⁺⁶⁰⁵
At least you can just roast em back with "At least I set the standard for the time"
@gyroc1_ 4 หลายเดือนก่อน ⁺²⁸⁹
And you would presumably have improved after 30 years. You might cringe at your 30 year old past self's code, or just have compassion that you did the best you could.
@crusaderanimation6967 4 หลายเดือนก่อน ⁺⁹⁶
I mean llet's be honest thers fat chance somone will, and ther's fat chance that person will be you.
@DrsJacksonn 4 หลายเดือนก่อน ⁺⁶³
"That was 30 years ago, I'm wiser now" 👍
@ElGreco15 4 หลายเดือนก่อน ⁺⁵⁴
@@DrsJacksonnproceeds to make the same mistake 30yrs later
@AjaxGb 4 หลายเดือนก่อน ⁺³⁹⁵⁵
Wild that Nintendo implemented culling for individual rocks but not for the giant cave at the end of a long tunnel, haha. I'd always assumed that culling/loading was the whole reason the tunnel was there.
@dycedargselderbrother5353 4 หลายเดือนก่อน ⁺²⁵¹
Maybe it was but one hand didn't speak to the other.
@LavaCreeperPeople 4 หลายเดือนก่อน ⁺¹⁴
Cool pfp
@ddnava96 4 หลายเดือนก่อน ⁺⁶⁸
For the cave they could've just split the level in two, just like they did with Dire Dire Docks and Wet Dry World
@ouroya 4 หลายเดือนก่อน ⁺⁵¹
the tunnel is there to teleport mario around as the whole level including the cave is too large for a sm64 level, so in the tunnel there is an invisible teleporter that brings you back & forth between the two sections
@Akyuu2608 4 หลายเดือนก่อน ⁺⁴
I guess we really just watch the same videos at this point @@LavaCreeperPeople
@rednbloo5796 4 หลายเดือนก่อน ⁺³¹²²
The code needed to optimize being larger than the issue its trying to optimize feels like some kind of punishment from greek myth
@LavaCreeperPeople 4 หลายเดือนก่อน ⁺³⁴
Yes
@Aflay1 4 หลายเดือนก่อน ⁺¹¹³
Nintendo mythology.
@herrabanani 4 หลายเดือนก่อน ⁺²⁰²
apparently this is a somewhat common trap in programming
@cypresshill9268 4 หลายเดือนก่อน ⁺¹⁹
@@herrabanani yeah it is, funny to see it described this way
@TheLucidDreamer12 4 หลายเดือนก่อน ⁺⁸⁹
To be fair, N64 is also built on the MIPS architecture while every Nintendo console until then was built on CISC processors. Pipelining and its associated effects on branch penalty and memory access wasn't well-understood at the time except for the few PhDs who designed these
@gameworkerty 4 หลายเดือนก่อน ⁺⁸⁶²
The most 80's looking japanese programmer imaginable is staring dead eyed at this video while chainsmoking
@taraskhan475 2 หลายเดือนก่อน ⁺³
Imao
@nicknorthcutt7680 2 หลายเดือนก่อน ⁺¹
Hell yeah 😂😂
@RafaWithAnL 2 หลายเดือนก่อน ⁺²
90 baby here nor am I Japanese, but you pretty much explained my exact positioning.
@sk8_bort 2 หลายเดือนก่อน ⁺⁸
All the people who were involved in the development of this game are probably driving tractors in rural Japan lol
@TBL-AMELIA 2 หลายเดือนก่อน ⁺⁷
@@sk8_bort the majority of Nintedno employees have been with the company for decades
@Noisy_Cricket 4 หลายเดือนก่อน ⁺⁴⁶⁴
I think the main reason Nintendo used memory instead of the CPU for maths was because Nintendo was used to programming the NES and SNES, which both had fairly weak built in CPUs. The reliance on memory for lookup tables was probably instinctual.
@ArneChristianRosenfeldt 4 หลายเดือนก่อน ⁺³⁰
And me, a 386SX user hated lookup tables for a long time.
Edit to clarify: it is kinda hard to find, but 386 needs two cycles for a lot of stuff, for example reading from memory. So reading the 32bit address of the table already takes 4 cycles. Then the look up takes another 4 thanks to the page miss. Yeah, 8 does not sound too bad. It is just that in 386 Intel really cleaned up MUL and DIV. Every output bit needs only a single cycle, while the CPU fills its instruction queue. Don’t branch directly after math! Mix math with lookups (interleave two threads).
@Formalec 2 หลายเดือนก่อน ⁺⁷
Normal programmer instict is to use precomputation but that only works if memory access is either relatively fast or cached.
@Adiee5Priv หลายเดือนก่อน
Well, they weren't really that weak for the time they were made, but obviously, they were WAY weaker than the n64's CPU
@jean-michelgilbert8136 หลายเดือนก่อน
A 386SX didn't have a FPU though so sincos LUTs would have made total sense.
@djmips หลายเดือนก่อน
@@Formalec Precomputation still has it's place for sure - just look at all the preoptimized level data in a modern game but to be sure, it's often way faster just to compute something because memory bandwidth is STILL the primary bottleneck.
@HamStar_ 4 หลายเดือนก่อน ⁺⁶⁹⁰
Moral of the story here goes beyond N64 development: Optimizations don't exist in a vacuum. You need to know where your bottleneck actually is before you can attempt to work around it.
@Pepesmall 4 หลายเดือนก่อน ⁺⁴¹
I always wondered if this ever actually happened, where optimization backfired by being done poorly lol
@phantom-ri2tg 4 หลายเดือนก่อน ⁺²¹
Some other comments said a major issue is that they didn't know the hardware it would end up being ran on. So it would be pretty hard to identify bottlenecks.
@Spartan322 4 หลายเดือนก่อน ⁺¹⁴
I remember watching a talk for cppcon about how virtual function calls (assembly guaranteed dynamic dispatch) on modern machines don't have a notable performance difference from regular function calls (in fact the performance of them is non-deterministic, you're basically flipping a coin on what will be faster, and it depends on so many things its impossible to predict, yes dynamic dispatch can be just as performant) and one of the primary things the speaker was saying was that benchmarking is meaningless, and its things like this that remind me of that.
@Spartan322 4 หลายเดือนก่อน
@@phantom-ri2tg Benchmarks are generally designed to ignore that, compiler behavior is of little concern because what the compiler produces is deterministic in its behavior, and that's the thing we focus on. The problem is when the machine code doesn't perform deterministically, also a compiler of the same version with the same build options on the same platform and CPU will deterministically compile the same input to the same output. In most cases all x86-64 (64-bit Amd/Intel) CPUs will receive the same produced output so unless your CPU uses a different instruction set the CPU doesn't really matter. You can observe this with godbolt directly. Also this aside the CPU is not designed to inherently run the machine code its given, but at minimum it must be guaranteed to produce functionally the same observable results, how it achieves that doesn't actually matter so long as it has no observable difference as product. (like the simplest optimization is rearranging instructions in the CPU to be slightly more performant)
@Spartan322 4 หลายเดือนก่อน
@@phantom-ri2tg Benchmarks are generally designed to ignore that, compiler behavior is of little concern because what the compiler produces is deterministic in its behavior, and that's the thing we focus on. The problem is when the machine code doesn't perform deterministically, also a compiler of the same version with the same build options on the same platform and CPU will deterministically compile the same input to the same output. In most cases all x86-64 (64-bit Amd/Intel) CPUs will receive the same produced output so unless your CPU uses a different instruction set the CPU doesn't really matter. You can observe this with godbolt directly. Also this aside the CPU is not designed to inherently run the machine code its given, but at minimum it must be guaranteed to produce functionally the same observable results, how it achieves that doesn't actually matter so long as it has no observable difference as product. (like the simplest optimization is rearranging instructions in the CPU to be slightly more performant)
@IndignASMR 4 หลายเดือนก่อน ⁺²³⁶¹
This video was flagged as made for kids, so I didn't get a notification not did it show up in my subscriptions tab. and the option to enable notifications is greyed out. Good ol TH-cam
@MrFrexxia 4 หลายเดือนก่อน ⁺¹¹¹²
Kids these days are just interested in N64 code optimizations
smh
@raafmaat 4 หลายเดือนก่อน ⁺²²³
well, when uploading you have to choose between NOT made for kids, and made for kids.... its confusing! i also figured that NOT for kids would mean like adult or rude content..... i did not realize that it is actually the normal option
@mrrfyW 4 หลายเดือนก่อน ⁺⁸⁷
Why in the world would this be marked for kids anyway? It deals with a lot of complicated computer topics that I don’t think kids would understand. Also, Discord screenshots are in the video, too.
@varietychan 4 หลายเดือนก่อน ⁺¹⁸⁹
@@raafmaat TH-cam likes to "help" creators by forcing the option on sometimes
@KazeN64 4 หลายเดือนก่อน ⁺⁷³³
i didn't mark it for kids. but it looks like it fixed itself now?
@AntonioBarba_TheKaneB 4 หลายเดือนก่อน ⁺¹¹³⁸
they probably wrote most of the code on a SGI workstation way before they made the n64 hardware, and that workstation probably had much faster memory, so it made sense to optimize the code that way. When they ported the code to the retail hardware it was pretty bad but there was no time left for it and they decided to ship it. Well that’s only a theory, but it makes sense to me
@tinoesroho 4 หลายเดือนก่อน ⁺³⁵
steve jobs, peering over a NeXt workstation display: am i a joke to you
@j_c_93 4 หลายเดือนก่อน ⁺¹¹⁶
@AntonioBarba_TheKaneB this happened to the Goldeneye team. They were under the assumption that the hardware would be more powerful and the carts would have more storage than it ended up having so they had to cut a lot of stuff and simplify the level geometry
@tracque_5677 4 หลายเดือนก่อน ⁺¹⁵⁷
It's pretty likely you're right though. Launch titles for new consoles are never particularly well-optimised exactly for this reason. The devs just don't know what they will be working with in advance...
@smallbluemachine 4 หลายเดือนก่อน ⁺¹¹⁵
The evidence up to now is that optimization was limited due to the novelty of the compiler, tools and hardware itself. The schedule was nuts and some of the Mario 64 programmers quit the games industry altogether after burning out.
@JamEngulfer 4 หลายเดือนก่อน ⁺⁴¹
I can’t imagine how disappointing it would feel to run the game on the new hardware only for it to have massive optimisation issues you know you don’t have enough time to fix.
@thecozies 4 หลายเดือนก่อน ⁺¹⁹⁴³
fun fact: exposure to performance lottery can result in a shift to long term agricultural work 👀
@rhysbaker2595 4 หลายเดือนก่อน ⁺¹⁶
Ha, I get that joke!
@jasertio 4 หลายเดือนก่อน ⁺⁴³
I don't get it
@Beines 4 หลายเดือนก่อน ⁺⁴
😂😂
@thinkublu 4 หลายเดือนก่อน ⁺²¹
Honestly I' m so curious about the joke here
@rhysbaker2595 4 หลายเดือนก่อน ⁺⁴⁹⁹
@@thinkublu if I'm interpreting it correctly, it's a reference to the fact that many programmers/people who work with computers technically, tend to eventually end up doing farm work/manual labour later on in life as a way to escape the technology that has caused so much stress for them.
Given that Performance Lottery would cause a LOT of stress/confusion, exposure to it would lead to the programmer being more likely to leave society to work on a farm
@MDPToaster 4 หลายเดือนก่อน ⁺⁴⁰⁴
Never in a million years would I have thought removing Mario’s LOD model would actually have performance benefits.
@musaran2 4 หลายเดือนก่อน ⁺⁵⁶
Well additional data is never free, but geometry not being the bottleneck did not occur to me either.
Maybe some wicked design could get low LOD from full LOD on-the-fly?
@chickendoodle32 4 หลายเดือนก่อน ⁺²²
@@musaran2 may i introduce you to: unrelated engine 5?
@chickendoodle32 4 หลายเดือนก่อน ⁺¹⁵
@@musaran2WAIT NO UNREAL UNREAL!
@viannizvnv7222 4 หลายเดือนก่อน ⁺⁷
@@chickendoodle32 lel
@sealsharp 4 หลายเดือนก่อน ⁺¹²
Regarding unreal 5 and LODs: nanite is far from the silver bullet it is sold as.
@LuizDahoraavida 4 หลายเดือนก่อน ⁺⁶²
moral of the story:
Premature optimize, don't benchmark anything at all and sleep well at night knowing you did the best you could
@cccg838 3 หลายเดือนก่อน ⁺⁷
Actually maybe don't program at all because in 30 years people could make a video roasting your code
@0scur0_ 2 หลายเดือนก่อน ⁺¹
Yes.
@johnclark926 4 หลายเดือนก่อน ⁺⁴⁷⁷
Evil Kaze in a parallel universe:
How I Optimized Mario 64 to Run at
@Goddot 4 หลายเดือนก่อน ⁺⁸⁶
Bethesda Kaze
@KARLOSPCgame 4 หลายเดือนก่อน ⁺⁴⁵
Running Odyssey on the N64 at the cost of being playable
@DavidJBurbridge 4 หลายเดือนก่อน ⁺⁶
To answer that, we need to talk about parallel universes.
@ag2023en 4 หลายเดือนก่อน ⁺⁷
@@KARLOSPCgame The gameplay can certainly be ported. The question is how much of the aesthetics can it preserve.
@ag2023en 4 หลายเดือนก่อน ⁺⁷
@@KARLOSPCgame Kaze has already partially re-created Odissey's stages and mechanics inside Mario 64.
It's not the same thing, but it does prove that Oddissey can be recreated on the N64, just with simpler graphics.
@promaster424 4 หลายเดือนก่อน ⁺¹⁹⁵
Removing the lod model is like the bell curve of optimization. It's good because Mario doesn't look worse from far away and because the N64 does less work by executing less code.
@QuasarEE 4 หลายเดือนก่อน ⁺⁵⁵
They did that in OoT also and it's very distracting when you notice it. I was playing a rando the other day, got bored, and was standing at the distance cut off for the low-detail Link model, moving back and forth a bit and saying, "ugly Link, normal Link, ugly Link, normal LInk".
@tonyhakston536 4 หลายเดือนก่อน ⁺⁸
... How is that at all a bell curve?
@promaster424 4 หลายเดือนก่อน ⁺⁴⁸
@@tonyhakston536 If you don't know what game optimization is you might want to remove the low poly Mario because it's ugly.
If you think you know what game optimization is you might want to keep it because it renders less polygons when you are far away, there for you can't see the difference very well.
If you are Kaze Emanuar you might want to remove the low poly Mario script and model because it saves more memory.
@IncognitoActivado 4 หลายเดือนก่อน ⁺²
Agreed.
@Nicholas_Steel 4 หลายเดือนก่อน ⁺³
@@QuasarEE I'm still waiting for a mod that has your character and held items always use the high quality model. You can only see the high quality models when the camera is smooshed against a wall to bring it closer to you :/
@BadComoc 4 หลายเดือนก่อน ⁺²³¹
it's interesting how relevant some of this is to games today, memory bandwidth is obviously a lot better now but it still is an issue so it's largely still relevant to optimise for data locality and maximise efficient usage of the cpu cache.
@vilian9185 4 หลายเดือนก่อน ⁺⁶
of couse not, no game today optimize of data locality, how you going to optimize for 300 times of cpu different
@kaylee42900 4 หลายเดือนก่อน ⁺⁵⁴
That reminds me of how wild it was when the AMD X3D chips came out with massive CPU cache. Some games made massive fps gains from having more cpu cache available.
@SaHaRaSquad 4 หลายเดือนก่อน ⁺⁵⁴
@@vilian9185 Um, pretty much every modern game engine is designed with efficient data access in mind, data locality is literally why approaches like ECS are being applied. Less memory use is on average beneficial on every current CPU, and the faster the CPU is, the more it is bottlenecked by memory.
@dycedargselderbrother5353 4 หลายเดือนก่อน ⁺¹¹
@@vilian9185 Ever wonder why consoles get a lot closer to higher spec PCs than it seems they should?
@vilian9185 4 หลายเดือนก่อน ⁺¹⁰
@@dycedargselderbrother5353they don't, they sell consoles at a loss, that's why they seems more powerful than a computer a the same price, the perfect example is Steamdeck, that have gains up to 25% just because it don't runs Microsoft shit OS, to be fair older consoles until ps3/xbox 360,yes, they had various advantages, they had custom hardware focused for games and devs had them as priority so they used these hardware features
@arciks11 4 หลายเดือนก่อน ⁺⁸⁵⁰
I have heard stories of western developers being given japanese manuals for hardware and being unable to make sense of them. I wonder if inverse happened here while nintendo was both creating N64 and games for it.
@vilian9185 4 หลายเดือนก่อน ⁺²⁴³
i don't think so, probably was lack of good benchmark tools, if they had what kaze has today where it measure exactly what's is slowing down the rendering(GPU,CPU,MEMORY) they would have saw that memory was always the bottleneck, and as Kaze said the console had to much discrepancy in how it was put together, too fast CPU at the expence of memory throughput
@fungo6631 4 หลายเดือนก่อน ⁺²⁹
I think that happened with the Sega Saturn, not the N64.
@ssl3546 4 หลายเดือนก่อน ⁺¹¹¹
the developers were very new to all of this and the fact they were able to transition from SNES programming to what remains the greatest videogame of all time is a testament to their intellect and dedication.
@JayMaverick 4 หลายเดือนก่อน ⁺¹⁴⁶
OP, you're forgetting that manuals didn't exist. This was not just brand new hardware, but a brand new coding paradigm. These people WROTE the manuals.
@Palendrome 4 หลายเดือนก่อน ⁺⁵⁰
@@ssl3546 yep this and very limited time. Another year or two would have made a big difference but they didnt have time they were trying to beat everyone
@farben_ 4 หลายเดือนก่อน ⁺¹³³
People think the PS1 was more powerful because it did transparencies, colored lights and additive blending like it was nothing, so when they play the system on an emulator with perspective correction and in high-res the games appear much better than an emulated N64 game.
@ABaumstumpf 4 หลายเดือนก่อน ⁺¹⁹
So for games that used those effects - it was significantly more powerful. for games that didn't - it wasnt.
@Jetsetlemming 4 หลายเดือนก่อน ⁺⁸⁸
It also probably helped at the time that the CD format allows PS1 games to have high quality sound and prerendered videos. I'd imagine back in the mid-90s most people had no concept of the difference between a pre-rendered video and "in-engine" as long as it was running on their TV as they played. Stuff like Parasite Eve 2 and Final Fantasy 8, using pre-rendered video behind the player as you actually moved through an environment? On a CRT to hide the compression?? It looks absolutely fucking unbelievably good, like nothing else that generation could remotely achieve. And I think it's actually worth arguing that these benefits still "count" even if for many people prerendered feels "unfair" compared to in-engine. It's the end user experience that matters, not how they pull it off, right? If Parasite Eve 2 can "trick" you into thinking you're walking through a Los Angeles street full of death and destruction using graphics ten years ahead of their time via clever tricks, I don't think that's a lesser accomplishment compared to rendering the street on the fly as best you can like Silent Hill 1.
@simpson6700 4 หลายเดือนก่อน ⁺¹⁸
I just can't stand the vertex wobble on the PS1
@jc_dogen 4 หลายเดือนก่อน ⁺⁵⁷
The PS1 wasn't more powerful, but it was a better balanced system. The N64 is so bandwidth starved that even first party games waste huge amounts of CPU and GPU time doing nothing. Ocarina, which was years later, still spent over half of each frame with the GPU completely stalled waiting for data, which probably leaves it's effective clock rate (doing real work) not far off of the PS1s GPU.
Like kaze said in the video, the power was in more advanced graphical features, not raw numbers.
@farben_ 4 หลายเดือนก่อน ⁺¹⁸
@@jc_dogen The framebuffer as well, it's amazing that if they just made a few better choices hw wise the N64 would have been a much better system, some of the cuts Nintendo did to save a few cents ended up making games for the system much more difficult than it should.
@ales141 4 หลายเดือนก่อน ⁺⁵²⁴
I love how you start the video with over compressed footage of SM64 in an incorrect aspect ratio with ugly filtering in an emulator despite the fact that you clearly know better. Really takes me back to 2010 😭
@dycedargselderbrother5353 4 หลายเดือนก่อน ⁺³²
I read this comment before starting the video and still recoiled.
@SeanCMonahan 4 หลายเดือนก่อน ⁺³⁸
Needs a BANDICAM watermark.
@standoidontwantalastname6500 4 หลายเดือนก่อน ⁺¹⁹
@@SeanCMonahan nah not old enough
Fraps watermark
@SlendytubbiesII 4 หลายเดือนก่อน
@@standoidontwantalastname6500no, Hypercam 2
@nuulcoolpro 4 หลายเดือนก่อน
@@standoidontwantalastname6500Unregistered Hypercam 2
@skylo706 หลายเดือนก่อน ⁺⁵
Merry Christmas Kaze, thx for all the in depth nerdy videos, you're my source to fill this itch
@johnfrian 4 หลายเดือนก่อน ⁺⁷⁸
It's always better to remove a problem than to add a solution, when possible.
For me it usually makes the codebase slimmer, easier to read and easier to develop further.
@personalitybot 3 หลายเดือนก่อน ⁺⁴
I also like to remove solutions. We really ought to wrap this all up, get back to playing solitaire in the computer closet
@everythingpony 23 วันที่ผ่านมา ⁺¹
So if the problem is that the door is causing issues just to remove the door. Don't have any doors to any place then just walk through
@johnfrian 23 วันที่ผ่านมา
@@everythingpony More like, if the door squeaks, you can just remove the squeak (dab of oil). You don't have to crack open a new hole in the wall to bypass the squeaky door problem.
@dabillya6845 12 วันที่ผ่านมา
you guys went to school and want to change an artists passion? bad code is still code that can be good, mario 64 is magical with all its flaws and all.
@supersat 4 หลายเดือนก่อน ⁺⁴⁵
I suspect a lot of this was due to early development on SGI workstations with different performance characteristics than the final hardware, and possibly immature compilers that didn't implement N64 optimizations well.
Remember that the "source code" is derived from the decompiled binary, so unrolling loops might have been done by the compiler, possibly assuming different instruction cache characteristics.
@machinefannatic99 2 หลายเดือนก่อน
its easy to roast old school devs in 2024 when you have the internet and all human knowledge available.
@VoicesInDark หลายเดือนก่อน ⁺²
It's not easy to roast old school devs. Those devs were goats.
Kaze spent literally years to understand sm64 and even there, it was only few fps.
Compare it to current AAA developers who could easily double fps if they at all tried.
@WhoWatchesVideos 4 หลายเดือนก่อน ⁺⁴⁷
To think that memory bandwidth prevented this console from benefiting from conventional practice and flying at incredible speed...
@IncognitoActivado 4 หลายเดือนก่อน ⁺¹
Bad coding is still bad.
@megawarpig3401 4 หลายเดือนก่อน ⁺¹
@@IncognitoActivadono shit
@greebo4446 4 หลายเดือนก่อน ⁺⁹⁷
seeing those practically unusable profiler bars really puts into perspective how single digit frame optimizations could have been overlooked lol
@seeibe 4 หลายเดือนก่อน ⁺⁸⁹
I love your dedication to get the absolute most out of hardware by actually rethinking your conceptualization of the software to match the hardware's capabilities. Most people are so inflexible in their approach to programming, which is why for the most part we still write software for an architecture from the 1980s.
@personalitybot 3 หลายเดือนก่อน ⁺²
To be fair they had just invented it
@Littlefighter1911 4 หลายเดือนก่อน ⁺²⁰
"Performance Lottery" is a real bitch.
I've tried optimizing some code at work,
added a custom SLAB allocator, to ensure, all objects are within roughly the same memory region.
And now the time hasn't improved at all, because suddenly ANOTHER function caused cache misses.
That one was caused by running the destructor on a large number of objects, despite the objects not being used, afterwards.
(It was only one int being set to 0).
Originally my boss wrote this other code, thinking that reserving 20 elements, will do less allocs.
So he created an array with 20 "empty" elements on stack, instead of using std::vector which will likely use malloc.
Which sounds so far so good. However the constructor and destructor now runs for 20 elements (plus the amount of actually used elements) instead of 0-3 most of the times.
But that the constructor and destructor of mere 20 integers would cause a problem on even modern Clang 14 + ARM64
is something that even I would not have expected.
The best benchmarked solution was to use a union to suppress the automated constructor destructor.
And even that, gave only like 150ms on 1.6seconds. Which really doesn't seem worth the uglified code, in my opinion.
There are a bunch of these micro-optimizations I could make, but they all make the code uglier.
And there are a lot more macro-optimizations that would require the code to be completely refactored and have tests written for all of them.
Seeing as we need to come to a pre-release version pretty soon, there is not much time for either of these.
The initial version of the product will be shipped with the insane startup time of 8 seconds, on our device.
And then I will try to figure out how to improve time, once the other bugs are fixed.
@Keanine 4 หลายเดือนก่อน ⁺³²
Huge respect for including a git repo!
@nicolasn12 4 หลายเดือนก่อน ⁺⁹
The optimisations are so good to the point where fps gets an extremely dramatic boost and it even overflows to just about somewhere under 30 fps.
@Tony78432 4 หลายเดือนก่อน ⁺⁷⁵
I wonder how much Kaze could improve some of the other games with his level of expertise. Imagine a highly optimized Turok on native hardware or any other games. The N64 is one of my favorite consoles.
@arciks11 4 หลายเดือนก่อน ⁺⁸
@@Tony78432 Perfect Dark and Goldeneye honestly need it more than first Turok.
@amayra2010 4 หลายเดือนก่อน ⁺²
i went to see him work in zelda games
@joebidenVEVO 4 หลายเดือนก่อน ⁺²
M64 is a launch game, so the later ones probably had better optimizations already
@IncognitoActivado 4 หลายเดือนก่อน ⁺¹
That sounds awesome.
@IncognitoActivado 4 หลายเดือนก่อน ⁺¹
@@joebidenVEVO No, not really.
@pjsparks4845 4 หลายเดือนก่อน ⁺⁴
I'm currently taking a microprocessors class in college, and your series optimizing SM64 has helped a ton with my understanding of how the microprocessor interacts with the memory and how machine code works. Thank you!
@Xaymar 4 หลายเดือนก่อน ⁺⁶
IIRC regarding the "PS1 is faster than N64" claim, it's difficult to even say if it could even be true. Even with the most in-depth knowledge you could have about the Playstation 1, you'd barely be able to match 2/3rds of the performance of the N64 CPU, and still had to sacrifice a lot to get there. Even with it's hexa-processor design (CPU, GTE-cop, MDEC-cop, MEM-cop, GPU, SPU), it was still functionally inferior. A few points:
- The triplet of rendering processors (CPU, GTE-cop, GPU) only worked with Fixed Point, in either 16bit or 32bit. Many games had to opt for 16bit, and even the games that used 32bit had to limit their levels to relatively tiny areas compared to the N64. Those transition screens or fades between rooms are not by choice, but by necessity to hide the artifacting (vertex snapping/wobbling).
- You had to perform shading, world to camera transform, z-ordering, and camera to view transform on the GTE as the GPU had very limited 3D support. No Z-Buffer, and hardly any actual support for 3D, meant that you got even more wobbly textures as a result. "But you got mip-mapping and dithering for free!" - as if anyone actually wanted that, it was needed to hide the artifacts of the PS1 hardware.
- Instead of having to worry about Rambus, you have to worry about DMA abuse instead. It is very easily possible to write code that causes 0 FPS on the PS1. DMA is hard.
- Cache Trashing is much harder, as you go CPU Cache -> RAM -> CD-ROM Cache -> CD-ROM. That's three misses that have to happen, but if they happen they're so much worse than N64 cache misses. You could easily spend more than a second stuck due to a scratched disc or a bad CD-ROM.
- The built-in hardware decoder for videos with direct DMA to the GPU meant that you could use videos directly, and still render on top if needed. AFAIK the N64 does not have video decoding hardware, and the space on the cartridges wasn't exactly good for it either.
It's been a while since I made homebrew for it, since it's just not a good console to try and develop for. Might not be entirely accurate anymore, as I wrote this from what I remember. There's a lot more, but these are like the primary ones I ran into when making homebrew. 700MB of CD space means nothing if you can't actually use it well...
@ArneChristianRosenfeldt 4 หลายเดือนก่อน ⁺¹
Only the mpeg decoder can output true color. 3d acceleration always used a 16 bpp frame buffer.
So you say that the PS1 had virtual memory and games used it? I know that N64 has virtual memory and you could write an OS which loads pages from ROM.
@TheOofaloofa 4 หลายเดือนก่อน ⁺³
Putting in that Minecraft Glide Minigame music from the consoles gave me crazy nostalgia for no reason whilst learning about how getting lucky will basically make the game go _vroom vroom._
@eklipsed9254 4 หลายเดือนก่อน ⁺¹⁴⁰
I think this is the best example of how premature optimization can be very bad, but optimization after the fact can help immensely as well.
@espertalhao041 4 หลายเดือนก่อน ⁺³⁶
I don't think it is fair to say it is "premature optimization": the tooling was just 3 bars that move about, on the screen.
@Predated2 4 หลายเดือนก่อน ⁺²²
you do a bit of both, optimization after the fact can be horrendous too. You optimize for each chunk, then optimize the whole. Then disable the first optimizations to see if there is any difference, then release.
@klontjespap 4 หลายเดือนก่อน
You can also just design things properly first time around and have them optimized
Novel idea, i kmow
The biggest sin these days is the complete lack of optimizations
@aaendi6661 4 หลายเดือนก่อน ⁺⁴
What's ironic is that I've been accused of making "premature optimizations" for making the same type of optimizations that Kaze is doing.
@espertalhao041 4 หลายเดือนก่อน ⁺⁴
@@aaendi6661 He's actually undoing optimizations, in this video.
@KARLOSPCgame 4 หลายเดือนก่อน ⁺¹¹
This channel and the dude making Mario 64 demake on the GBA are prime content
@Ragesauce 4 หลายเดือนก่อน ⁺⁴⁰
I'm looking forward to playing your completely optimized original SM64 on real hardware, I hope it comes out soon!!
@HashCracker 4 หลายเดือนก่อน ⁺²
Im pleasantly surprised at how freaking fun that sounds. Ima need to dust of the ol 64
@Ragesauce 3 หลายเดือนก่อน ⁺¹
@@HashCracker it's been a few years since he said he'd do it...i'm hoping it comes out soon!
@punishedkid 4 หลายเดือนก่อน ⁺³³
Couldn't you partially avoid "performance lottery" in your code by padding out the binary? If you make all of your code cache poorly but run decently, then certainly it can only run better after you undo the padding.
@KazeN64 4 หลายเดือนก่อน ⁺³²
i've avoidied perf lottery entirely in my game yeah, but that requires some more optimization first.
@KazeN64 4 หลายเดือนก่อน ⁺²¹
almost entirely*
@musaran2 4 หลายเดือนก่อน ⁺³
IMO it's less about generalized padding, more about avoiding moving things through recompiles.
@crushermach3263 4 หลายเดือนก่อน ⁺¹¹
Again I'm amazed what decades of hindsight and a known scope can do to a codebase.
@Fred_Derp 4 หลายเดือนก่อน ⁺³
Mario scared me, when i clicked on the video he was facing the graph but when the ad played mario was looking straight at me
@MattRoszak 4 หลายเดือนก่อน ⁺⁷⁷
Considering how bad most modern software is, watching this video about super optimized low-level code is really satisfying.
Most features on Windows, for example, run hundreds or even thousands of times slower than they need to. It's a shame that efficient code just isn't made any more.
@ante646 4 หลายเดือนก่อน ⁺³³
you could always sacrifice your sanity and become a firmware engineer , the low level never went away lol
@MattRoszak 4 หลายเดือนก่อน ⁺⁶
@@ante646 Fair point!
@bnbnism 4 หลายเดือนก่อน ⁺⁷
Yea its a shame if this level of optimisation was did to Windows 10/11 itd run on many generations older hardware, half as much memory and storage all while being quicker
@fuzzynator2964 4 หลายเดือนก่อน ⁺³
@@ante646 or run linux
@Slayer666th 4 หลายเดือนก่อน ⁺⁴
This is the real use i can see for AI.
AI is already a powerhouse for coding, fine tune it with code optimization and you can probably boost the performance of regular AAA games by 30-50% without much money spent on expensive optimization programmers.
i hope this will become the reality in 3-5 years
@M1XART 4 หลายเดือนก่อน ⁺⁷⁸
I feel that Sega understood some of these things very early on. Games like Daytona USA (original Arcade -version) actually withdraw instanty everything out from enviroment that car passed by at the same speed that car goes on. Basically drawing only things directly on front of you, making draw -distance look awesome.
@jpa3974 4 หลายเดือนก่อน ⁺⁶⁴
Daytona USA was released in 1994, by the way. And it was already a relatively mature product in the 3D-games world that was being exploited by various Japanese and other world softhouses. SEGA had a lot of experience with 3D.
But in the world of Nintendo-themed TH-camrs, Mario 64 released in 1996 suddenly became one of the first 3D games ever made. 🤣
@arciks11 4 หลายเดือนก่อน
@@jpa3974 Cutting teeth on Virtua Racer must've helped.
@Oh-Ben 4 หลายเดือนก่อน ⁺²
AM2 was just built different. The rest of Sega? Not so much.
@DenkyManner 4 หลายเดือนก่อน ⁺⁹
I wouldn't describe the draw distance of Model 2 games as awesome, theres a ton of completely unmasked pop-in and zero use of LODs for backgrounds. The pop-in was also done in fairly large, predetermined chunks, not gradually. I'm a big fan of AM2 and the model 2 hardware but the (visual) success of those games was a combination of incredible art design and obscenely advanced hardware, rather than genius efficient coding.
@M1XART 4 หลายเดือนก่อน ⁺¹
@@DenkyManner The hardware was good on Model & Model 2, but not pre-eminent.
Daytona USA had 32 Bit CPU 25Mhz with 32Bit Co -Processor, and only 8Mbit (1MB) RAM, while resolution was reasonable 496 x 384.
-I would say SEGA learned 3D quicker than others, or at least they moved into making polygonal graphics earlier on. Nintendo would not even knew 3D at the time without Argonaut.
However, Sega was not that strong on CD -based console 3D.
While Nintendo really nailed 3D gameplay/playcontrol as soon as they tried.
@mademedothis424 4 หลายเดือนก่อน ⁺⁶⁶
Wait, how is Mario 64 "one of the first true 3D games ever"? "True" and "one of" are carrying so much weight in that sentence they may generate a tiny black hole.
Even if you're going to discount every racing game since Hard Drivin' in 89, every pseudo 3D fps alll the way up to System Shock and Duke Nukem 3D, every 3D fighting game since Virtua Fighter, bundle Tomb Raider and Quake into that "one of", dismiss anything with locked camera since Alone in the Dark as "not true 3D"... Mechwarrior 1 and 2 were out. Hell, by 1996 there were as many full 3D space shooters based on Star Wars as mainline Mario platformers.
@KazeN64 4 หลายเดือนก่อน ⁺⁴⁵
Fair tbh
@Rowlesisgay 4 หลายเดือนก่อน ⁺²⁹
Yeah I always cringe when people don't make really easy clarifications about formative games that aren't actually the first. The one I'd say for Mario 64 i'd say is "The first good 3-d platformer" or "one of the first 3-d platformers, and a launch title". No salt to Kaze, ur cool you've done it how I like previous times, you just forgor or rewrote it funny this time.
@ScarfKat 4 หลายเดือนก่อน ⁺⁹
Yeah that bit bugged me too lol. I've made similar mistakes though, so I get it. Being accurate is hard
@Kralex 4 หลายเดือนก่อน ⁺¹³
I think he meant one of the first 3D platformers as people often say but its gotten purple monkey dishwashere'd into being one of the first 3D games EVER which is absurd
@jackman5840 4 หลายเดือนก่อน ⁺³
@@KazeN64 This shows you have some credibility at least. But also "one of" does mean not the actual first(and could for sure cover 3 games before it), and leaving out on home console isn't so bad. I feel the intended point stands that 3D was new and very rare when mario 64 came out, and especially when making the game. So while the clarification is good to make I don't feel it's worth being upset about. Also Doom like games are not 3D for sure. So in short Mario 64 is one of the first true 3D games, not really any correction needed.
@Austin-kt7ky 4 หลายเดือนก่อน ⁺²⁰
I really don't understand anything about software programming, but hearing people like you and Pannenkoek talk about it really helps me appreciate the work, passion, and struggles that go into developing a game. I remember being a kid and basically thinking that games just spring out of holes in the ground at Nintendo.
@some-online-dude 2 หลายเดือนก่อน ⁺¹
@@Austin-kt7ky Nowadays the devs' struggle is the bad salary, other than that they fart around in pre-made engines and churn out awful code that runs badly on even the most powerful hardware
@StrangelyIronic 4 หลายเดือนก่อน ⁺⁴
Reminds me of making demos for my Apple IIGS back in the late 90s/early 00s when I was a kid. Look into every trick the hardware allows pulling out as much as the beefed up little iigs with maxed out (at the time) expansion/accelerators will allow. These days I almost exclusively work with the PC88VA3 for demos after the dual architecture (Z80/8086) grew on me along with the rest of the specs/modes.
I've never thought about doing the same process with console hardware, kind of makes me want to try it out now.
@stevenclark2188 4 หลายเดือนก่อน ⁺¹⁵
Inlining, or tricks to prevent branch misses make me wonder if they developed this code on something like an Intel chip with much longer pipelines that respond to some issue much worse than a RISC with shorter pipelines. And LUTs for circular functions may just be a holdover from CPUs with no multiplier. You can approximate a sine really quickly with raw CPU power with a polynomial if you have a multiplier to do it.
@aldproductions2301 4 หลายเดือนก่อน ⁺⁹
Honestly, I wouldn't be terribly surprised.
I'm not a video game developer, but I have had to develop code for a product that was not yet developed. The challenge in that scenario is often that you have to work with *something* in order to get your code working at all, and to start building and testing it.
If the N64 wasn't available when they started development, it could absolutely have been a "just grab something, we'll adjust later" kind of situation.
@ArneChristianRosenfeldt 4 หลายเดือนก่อน
@@aldproductions2301RISC CPUs were available.
@Transgenic86 4 หลายเดือนก่อน ⁺⁷
This is an amazing video. My friend and I both work at Microsoft and he's doing performance optimization on a C++ codebase. I absolutely love the way you explain and analyze these problems! You have such dedicated passion into understanding and fixing this game. Whenever you have your Return to Yoshi's Island game released, I will be playing it on my 32" Sony Trinitron TV and absolutely enjoying the experience. Looking forward to it!
@aurafox1 4 หลายเดือนก่อน ⁺²
Kaze's optimizing Mario 64 so much that we'll soon reach the point where unchecking Limit FPS on your emulator takes you back in time.
@archirug 4 หลายเดือนก่อน ⁺⁵⁸
I gotta callout Kaze for using CSS code in the background of the video’s thumbnail rather than pure, beautiful N64 assembly code.
@Hyperboid 4 หลายเดือนก่อน ⁺⁶
I feel like I've seen this happen before
@victorhugofranciscon7899 4 หลายเดือนก่อน ⁺¹
I noticed now because of you
@simpson6700 4 หลายเดือนก่อน ⁺⁷
That's because the game has been decompiled and you don't need to know assembly anymore to modify SM64.
@KingChewyy 4 หลายเดือนก่อน ⁺²
Because he isnt writing in assembly, most devs of that era were writing in c/c++ for the consoles.
@archirug 4 หลายเดือนก่อน
@@KingChewyy True, but they certainly weren’t writing CSS code, that’s for sure. So the point still stands
@ebridgewater 4 หลายเดือนก่อน ⁺⁷¹
I have a small project idea. How complicated could a SM64 map be whilst still achieving a locked 60fps on real hardware?
@vilian9185 4 หลายเดือนก่อน ⁺⁵⁵
from kaze maps, very very complex
@floppyD 4 หลายเดือนก่อน ⁺²⁷
That'll be basically Return to Yoshi's Island
@Sauraen 4 หลายเดือนก่อน ⁺²⁴
"""""small project"""""
@FlamingZelda3 4 หลายเดือนก่อน ⁺⁷
@@Sauraen would only take maybe say 3 to 5,000 hours?
@arciks11 4 หลายเดือนก่อน ⁺⁵
@@floppyD RtYI is targeting 30 on console.
@TheFiredragon222 4 หลายเดือนก่อน ⁺⁵
Nice working on 13/15. I am pumped to play Return to Yoshi's Island when it releases. Keep up the good work Kaze and thanks for sharing your deep understanding of the N64 and Super Mario 64.
@vidjenko8349 4 หลายเดือนก่อน ⁺²⁴
My favorite game on the N64 is F-Zero X. That game is really underrated as it's not only an amazing action racer but also a technical showcase for the system. I'm glad that dedicated developers were able to make such a game back in the day and it's still very fun.
@personalitybot 3 หลายเดือนก่อน ⁺¹
Pod racer came close in terms of pure speed, but yeah that one was fun
@csbluechip 4 หลายเดือนก่อน ⁺⁶
I have learned more about programming from this series than any other place ... Ever
Thanks
@Knight_Kin หลายเดือนก่อน
It's interesting but you won't learn programming from this video. I'd recommend online courses if you actually want to learn. This is for entertainment.
@csbluechip หลายเดือนก่อน
@Knight_Kin lol. You've clearly never had to write optimised embedded code.
@Wolfram47 4 หลายเดือนก่อน
Cool vid, I learned a lot… gonna need to learn more terms to thoroughly understand things a bit more tho haha 😂
@deplinenoise 4 หลายเดือนก่อน ⁺¹⁸
Nice, but please stop comparing FPS numbers. Use milliseconds so optimization gains can be compared. “Improved by two frames per second” means different things based on where you started.
@TheLaserpewpew 4 หลายเดือนก่อน ⁺¹
Rather use max frame duration in the last X time. This indicates stutter
@brenscott5416 4 หลายเดือนก่อน ⁺⁵
It's so random people who stumble onto this can understand. The average idiot understands a different in fps, not milliseconds
@UltimaRedFireEskimo 4 หลายเดือนก่อน ⁺¹
This video reminds me again why Optimizations are so important. I’m a dev at a AAA company rn, and I quickly learned to design my assets with optimization in mind instead of trying to implement some crazy inefficient shit and try to fix it later lol. The reality is, time is always a factor so giving leeway for other optimizations toward the end of a project is so so crucial…instead of trying to tidy up things that should’ve been lean in the first place. Great video as always!
@smokeydops 4 หลายเดือนก่อน ⁺¹⁹
0:30 i've uhh ""researched"" it thoroughly, and this quote belies the truth only slightly, buut the saturn's CPUs do have a division unit included specifically to accelerate 3D math (as well as the SCU-DSP which was originally intended as the matrix math unit)
@ArneChristianRosenfeldt 4 หลายเดือนก่อน ⁺¹
Yeah, SEGA persuaded Hitachi that division is important also for other customers. Doom resurrection uses it on Sega32x . Jaguar also has division running in the background. Not sure about 3do . I think that Arm has an implicit output register, which blocks until the result is there .
@ponyprideworldwide 4 หลายเดือนก่อน ⁺³
awesome pfp
@Neogeddon 4 หลายเดือนก่อน ⁺²
The way the Saturn handles its 3D effects still wrinkles my brain. Really underappreciated console with some awesome games (Panzer Dragoon Zwei is my fave game of all time).
@RiverReeves23 4 หลายเดือนก่อน ⁺²
That was awesome Kaze. Dude, you never cease to amaze me that you're continuing to find more optimisations. You make coding for the n64 really fun to learn. Cheers mate.
@Sad_Cirno 4 หลายเดือนก่อน ⁺¹¹
lets goo! another Kaze optimization video
@MyBurnerAlt หลายเดือนก่อน ⁺¹
It’s almost 1 am and I’m watching someone optimize Mario
@BerserkingKantus 4 หลายเดือนก่อน ⁺⁷⁴
I would send all this guy’s videos back in time to the developers.
@onebigsnowball 4 หลายเดือนก่อน ⁺¹³
People forget that they were trying to release a game and not some tech demo
@KingKrouch 4 หลายเดือนก่อน ⁺²³
@@onebigsnowball I mean, the Ridge Racer developers did a turbo mode on the PS1 that made the game run at 60FPS instead of 30, but it came at the cost of some cars being removed and shading being reworked.
@ares395 4 หลายเดือนก่อน ⁺⁶
They probably would not be able to watch them lol
@IncognitoActivado 4 หลายเดือนก่อน
@@onebigsnowball Are you salty, nintendrone?
@IncognitoActivado 4 หลายเดือนก่อน
@@ares395 They kinda sucks anyway.
@commonsensei8719 4 หลายเดือนก่อน ⁺¹
the title and thumbnail are brilliant. just here to say that
@varietychan 4 หลายเดือนก่อน ⁺²²
16:52 this check just confuses me. Actors shouldn't even be out of bounds unless they're placed there
@capdyn735 2 หลายเดือนก่อน ⁺¹
It seems like one of those things people put in it to squash a bug without actually addressing the root cause of the problem. Either that or it's an artifact of a different idea for level sizes.
@tahutoa 4 หลายเดือนก่อน
my YT is lagging during the intro. i was like "yep. i see you have footage of the optimized M64 on screen"
@maurokoller3910 4 หลายเดือนก่อน ⁺¹⁰
Here I am watching with a what Kaze is achieving with N64 hardware wondering how the history of gaming would be if he got send back in time to work at Nintendo...
@xdanic3 4 หลายเดือนก่อน ⁺¹²
To be fair, the 3D programs were also in it's infancy back then as well, Max and Softimage were the top contenders back then, they didn't have current blender with the F3D exporter. Painting vertex colors wasn't probably that visual back then nor did they have such a nice texture library and authoring tools.
@maurokoller3910 4 หลายเดือนก่อน
@@xdanic3 Fair enough :D
@cdj17e 4 หลายเดือนก่อน ⁺⁹
Kaze is an extremely skilled programmer for sure, but the N64 and Mario 64 were pretty novel and the constraints of game development made it so that you have a minimum acceptable framerate and then with any extra development time, you'd focus on ensuring minimal bugs or adding more content rather than optimizing the existing content (doesn't necessarily mean all bugs will be fixed though!). While Kaze and other N64/Mario 64 devs managing to do this without the resources of a huge corporation is insanely impressive, it's not like it was realistic to expect the devs at the time to have similar breakthroughs (although there certainly are "cheap" optimizations the Mario 64 devs could have done at the time but I am unsure of whether they would have that drastic).
@maurokoller3910 4 หลายเดือนก่อน ⁺⁴
@@cdj17e yeah, thats why my mind thought it to be funny to imagine him with his knowledge standing on the shoulders of giant prior using that knowledge to help those giants. :)
@Aviertje 4 หลายเดือนก่อน ⁺⁷
@@cdj17e I think it cannot be overstated how much programmers like Kaze are standing on the shoulders of the giants who came before them. I imagine that if you threw Kaze back in time, he'd still be a very talented individual, but depending on the state in which you'd send him back, the results would vary immensely. Modern tooling will have inspired a lot of visualizations that let him realize just how unusable the bars used for performance matrixes by native devs were. Just having these impressions and knowledge of the places where the 'pain' is can avoid so much wasted time and rabbit holes. But at the same time, would he be as effective if he was limited to the tools of the time? Nowadays we have so many means for rapid prototyping that allow a quick 3D scene to be whipped up in Blender and inspected with high framerates, but back then the controls for comparable programs would have been clunky, screen updates slow, and overall process not very flexible in how easily it can be prototyped against the existing product. Also don't underestimate the importance of a quick build-test cycle which very likely involved cross-compiling and maybe even taking things out to plug them into a dev kit device. And finally, assuming Kaze got to work on the product back then, he'd no doubt have to deal with superiors who impose a certain vision or have opinions of their own on how development has to happen, as well as deadlines to meet while regularly spending nights at the office (It's Japan, after all.) It would be a huge difference in every aspect in regards to how he is able to approach these development projects now as a hobby of sorts. (I have no clue if and how he monetizes his activities, but it seems quite niche so I'm assuming it's primarily hobby oriented.)
@spacliche9308 4 หลายเดือนก่อน ⁺¹
I absolutely love the longer explanation videos. I think you could make awesome videos in the style of Retro Game Mechanics Explained
@bunnybreaker 4 หลายเดือนก่อน ⁺⁹
"But can it run Crysis?"
Kaze: "Hold my beer."
@tandoori8597 2 หลายเดือนก่อน
kaze putting in insane amounts of effort in analysing the codebase and optimising fps: "And now the game runs faster without barely any effort"
@lethauntic 4 หลายเดือนก่อน ⁺²⁰
1:14 Kaze saying "wide public", but being interpreted by TH-cam as "white public" in the subtitles made me laugh. It almost sounded that way haha
@totallynothyper964 4 หลายเดือนก่อน ⁺⁴
also at 8:15 he said "fog" and YT thought it was f*ck
@ViktorPripyat 2 หลายเดือนก่อน
Dawg yous gotta compile every optimisation you've made (full sourcecode rewrite, new mario model, deoptimising) into one mod to make the ULTIMATE SM64
@AdyaGD 4 หลายเดือนก่อน ⁺⁶
its crazy how unbalanced the hardware inside the N64 is, who tf designed it
@jackthatmonkey8994 4 หลายเดือนก่อน ⁺⁵
I think they had some issues with sourcing chips coupled with a lack of knowledge for 3D games. 64 bit was only a thing because a 32-bit CPU at the required spec was more expensive / less available
@AdyaGD 4 หลายเดือนก่อน ⁺⁴
@@jackthatmonkey8994 are you sure because I'm more keen to believe that they only got a 64bit CPU for the marketing
@ArneChristianRosenfeldt 4 หลายเดือนก่อน ⁺¹
NES was the last console without a memory bottleneck.
@ThePheenixKing. 4 หลายเดือนก่อน
The dynamic collision on the submarine makes me think they originally intended it to move, scrapped the idea but forgot to change the collision.
@supersayainasriel6745 4 หลายเดือนก่อน ⁺¹²
Fun fact: Crash Bandicoot, one of the games people probably use when they said PlayStation is more powerful is technically using resources on the console that aren't meant for running games. Yeah you know the joke of "Naughty dog breaks the limits of a console at the end of a generation"? Hun naughty dog started on PlayStation breaking limits and hyper optimizing their games. There is a very interesting documentary on the development of crash, down to artstyle, animation rigging, their study of how PlayStation 1 works, it's really interesting and I advise you to look it up and give it a watch if this fact peeked your curiosity.
@HowManySmall 4 หลายเดือนก่อน ⁺¹
What specifically are they doing
@supersayainasriel6745 4 หลายเดือนก่อน ⁺⁷
@@HowManySmall The PlayStation 1 had segments of ram specifically allocated for running the PlayStation itself, and naughty dog found that not all of that ram was being utilized, so found a way to tap into it to make crash bandicoot 1 run better. So basically they were using dev intended ram and then snipping a bit more ram from the console from a place not intended for devs to use. So that on top of art decisions made such as making crash only colored vertex planes, and using boxes for interactive set dressing allowed them to focus on more complex environments without the use of pre rendering like other early PlayStation titles.
@some-online-dude 2 หลายเดือนก่อน ⁺²
And now it's getting at least a demo port on the Saturn that even runs better
@capdyn735 2 หลายเดือนก่อน
Am I remembering correctly that they also used a non-deterministic packing or compression algorithm to even fit the game on the disk?
@BestGirlsBiggestFan หลายเดือนก่อน ⁺¹
@@some-online-dude Not surprising, the Saturn has more power than the PlayStation overall, it was just harder to code for. The Nintendo 64 is still the strongest, but the false rep on Saturn being a 2D only machine lies purely in SEGA of America, and to a lesser extent SEGA of Japan's slip ups with developer relations. and 2D games still being popular in Japan (not so much in the US where the 2D powerhouse only rumours started, where 3D was the new big thing.)
@manamaster6 4 หลายเดือนก่อน
I can't even begin to comprehend the complexity behind the technical stuff, it is impressive to see what you are doing.
@tediustimmy 4 หลายเดือนก่อน ⁺⁴¹
People thought the PS1 was more powerful because of FMVs. That was it: movies.
@solarflare9078 4 หลายเดือนก่อน ⁺¹⁴
It was actually because the hardware is generally much more efficient than the N64 and Saturn, and many 3D games tended to look less limited. Nowadays, it’s easy to point out the wobbly textures and weak 2D capabilities of the console.
@thesenamesaretaken 4 หลายเดือนก่อน ⁺¹⁷
@@solarflare9078it was easy to point out the wobbly textures and jittery pixels back in the day. I did it when I went to my friend's house, playing on the ps1 really jarring (that said a kid who didn't usually play on an n64 would probably have found the blurry textures and widespread use of fog to cover up low draw distance jarring too)
@klontjespap 4 หลายเดือนก่อน
@@solarflare9078lol anyone could point out the insane texture warping on the ps1 from day 1
Same with the n64 textures being a blurry mess
Having already played quake on PC, i was not impressed with either, 3d graphics on pc were lightyears ahead of them and made the early 3d consoles look retarded
We even knew the n64 controller was a fucken shite too and ruined the enjoyment
@ali32bit42 4 หลายเดือนก่อน ⁺⁴
it was actually the CD drive. the extra storage space compared to cartridges meant games were bigger .
@arciks11 4 หลายเดือนก่อน ⁺¹⁰
@@tediustimmy PS1 also didn't suffer from memory stalling as badly as N64 did. Which at the time must've helped with keeping the perceptible gap between them smaller.
@YoutubePizzer 4 หลายเดือนก่อน ⁺¹
I love these optimisation videos Kaze, please keep doing them, seeing how mario 64 approached these kinds of things is really useful for game devs even today
@PlasticCogLiquid 4 หลายเดือนก่อน ⁺³
I want hacks for these games that enable you to play with all the low-poly LOD objects the entire time :D
@thardump859 3 หลายเดือนก่อน ⁺²
Man.... The N64's processor sure is a strange and bizarre thing. Literally, no other processor works as backwards, counterintuitive, and unpredictable as the N64's. Even today the way it works is still so weird and convoluted that even the best emulation can't 100% replicate the bizarre convolution and borderline unpredictability that this processor has in it's inner workings. I mean yeah it was the very first 64 bit piece of technology to be released to the public, but with that comes a lot of problems.
@khallous 4 หลายเดือนก่อน ⁺⁴²
0:16 what level mod is this?
@remcovanhartevelt588 4 หลายเดือนก่อน
I have to know the secret level from world 13
@ObsydianX 4 หลายเดือนก่อน ⁺³
Since no one else is answering you, Kaze is making a mod for SM64. If you haven’t checked out any of his other videos, he goes into depth on the programming of Mario and how he is improving the code for performance games for his mod.
@khallous 4 หลายเดือนก่อน ⁺¹
@@ObsydianX thank you!
@TheBoxyBear 4 หลายเดือนก่อน
When the memory gap is so big even Mario can't longjump across it
@benjaminoechsli1941 4 หลายเดือนก่อน ⁺³
13:13 A racecar with a beercan for a gas tank.
@MozTS 4 หลายเดือนก่อน ⁺¹
Kaze you should look at the perfect dark decompile code and maybe do a video about how n64 programming evolved from mario to its end of life era
@SmogginMog 4 หลายเดือนก่อน ⁺¹¹
"How throwing gasoline into a fire made it COLDER"
@iamdarkyoshi 4 หลายเดือนก่อน ⁺¹
As long as the rambus goes vroom vroom, I'm all for it.
@private3946 4 หลายเดือนก่อน ⁺⁴
Protect this guy at all cost. He is the savior we needed!
Amazing video Kaze! Again and again!
@AndrewAhlfield 4 หลายเดือนก่อน
So exciting to hear you're close to done Kaze! We're all cheering you on :D
@Chrisezo 4 หลายเดือนก่อน ⁺²⁵
Thanks for making these videos, I'm no programmer but I do work in IT so I know a little bit about how code is supposed to work, and its very interesting to see how Mario 64 was coded. I'd like to believe that the poor optimizations in the game most likely happened due to time crunching and stuff, and as people back in the 90s were more limited in the tools that they had access to, it would have taken them a long time to troubleshoot and properly test for things, which is why games used to be much glitchier in the past, but as a result since the devs liked some of these glitches and stuff, we managed to experience a lot of them through cheat codes and stuff, which is what actually made that era of gaming a very interesting experience to grow alongside in.
@iratepirate3896 4 หลายเดือนก่อน ⁺²¹
Games didn't used to be glitcher in the past. They just don't have patches.
@kiyoskedante 4 หลายเดือนก่อน ⁺¹
Another important part of the story is PS2 dev kits bragged about real-time debuggers on live code. As far as my research goes, 1st gen N64 dev kits did not have live debug on active software.
Also, it's likely either the intended platform specs were lower or the sga dev kits had lower specs. These cached renders mentioned halfway through the video May have been fail-safes against the code crashing in the dev environment.
@jc_dogen 4 หลายเดือนก่อน ⁺¹
@@kiyoskedante yeah, no console dev kit was known to have really good tools overall until the xbox. the ps2 did have some fancy kit with the performance analyzer, but i think only the main CPU had a debugger for years. Just write all your vector unit code bug free lmao
@RiversChungus 4 หลายเดือนก่อน
I really appreciate all you do to truly get the most out of Mario 64. Hopefully one day a version compiling all your fixes can be made that runs basically without breaking a sweat. Keep up the great work!
@ragnarokstravius2074 4 หลายเดือนก่อน ⁺⁸
13:25, so who's gonna code Super Mario for the PS1, and who's gonna be the first speedrunner for it?
@jsr734 4 หลายเดือนก่อน
I wonder how hard will be to port it, considering both machines use RISC CPUs but had dramatically different GPUs.
@Trismegustis 4 หลายเดือนก่อน ⁺¹
We'll call it Cramari.
@Martyste 4 หลายเดือนก่อน ⁺⁵
I wonder how these 3 laggy spots look like on PAL. Given the framerate caps at 25, maybe the lags were less noticeable? ( unfortunately the game speed isn't compensated anyway, so it's still slow, just more evenly slow probably. )
@arciks11 4 หลายเดือนก่อน ⁺²
@@Martyste If I'm not mistaken, Pal version had o2 compiler optimization enabled that original release didn't have.
Said compiler setting slightly boosted performance on PAL and Shindou rerelease. Said boost was for PAL's equivalent of 30, meaning peak fps was 25.
@strayorion2031 4 หลายเดือนก่อน
I remember from where I liked to search fun facts about random things in the old days of youtube, that both the N64 and the Gamecube were equal or more powerful than their competitors but that their game storage systems (cartdriges and small disks) were limiting factors
@Aviertje 4 หลายเดือนก่อน ⁺³
I'm not really familiar with the N64, but have you accounted for the generally less-developed state of the compilers 30 years ago? I don't doubt that the performance lottery was a thing even then, but I suspect that more naïve algorithms that covered code generation may have made loops less effective, which would be another reason why unrolled loops were deemed more effective in their tests at the time. The ability to use all registers as effectively as possible makes a considerable impact.
While the 90s aren't the archaic 70s, C was 'merely' 20 years old at the time, but the first MIPS processor was from 1985, a mere 8 years prior to the start of the n64s development. There's a good chance they weren't using the newest compiler toolchain at the time yet (internet was still very niche!), so I think it would be interesting to see how well an 'unoptimized' version would do when compiled with the tools they had at the time, if that is even feasible.
@KazeN64 4 หลายเดือนก่อน ⁺⁷
Yes! I used the exact same compiler and flags they used back in 1996 here. We know this is the same because the unmodified code compiles byte for byte the same
@JohnSmith-of2gu 2 หลายเดือนก่อน
Minimizing your code size at expense of "optimizations" so you can cram your entire rendering code in the cache and run it blazing fast, BRILLIANT! I reminder of how considering memory hardware is important.
@huyphan7825 4 หลายเดือนก่อน ⁺⁹
Basically: Sometimes not doing anything is easier than building a machine that makes things easier
@BestGirlsBiggestFan หลายเดือนก่อน
Build a machine that does nothing.
@MonochromeWench 4 หลายเดือนก่อน ⁺³
Loop unrolling and inlining are not a great idea when your cpu is much faster than your memory. They probably thought the rambus memory was more performant than it turned out to be in reality. Older 16 bit and especially 8 bit systems had fairly balanced ram to cpu performance characteristics because some cpu instructions could be really expensive and memory latency was low compared to cpu speed. RISC cpus like the N64 used had good overall IPC much better than the common 16 bit cpus of the time. No doubt Nintendo's programmers were just not familiar enough with programming a RISC platform.
@ArneChristianRosenfeldt 4 หลายเดือนก่อน
Let me tell you about Atari Lynx and SegaCD.
@gamerz1172 4 หลายเดือนก่อน
I think the biggest lesson in optimizations I got was when I was making a video game for coding practice, and when I was working on a main menu background (Particles behind the screen flying from the bottom to the top) my attempt to prevent the game from loading too many particles by Deleting multiple particles as it created multiple particles caused ALOT of lag
My solution to this and to make the particle background work was to actually make it that it created and deleted one particle at a time and instead to make sure there isnt "One particle only at each Y axis" a particle's spawn point would get randomized to the point where particles created BEFORE one particle could arrive after the particle that was created after them
@inceptional 4 หลายเดือนก่อน ⁺¹⁸
Dude, if you think it's crazy that some people think the PS1 is more powerful than the N64, I wonder how you feel when so many people now claim the Saturn was more powerful and capable than the N64.
Also, we seriously need more people like you in the SNES development scene really optimizing that system and pushing it properly to its limits imo, because there's a lot of games on that system that have room for improvement like this to be honest. And, in the right hands, I genuinely think most of the SNES games that suffer any slowdown could be running at a pretty solid 60fps. Not only that, but it would cool to see some of those game pushing the system even further too, and really showing off what it's capable of.
I can only imagine what you might be able to bring to optimize games like Star Fox or Doom on SNES, never mind just the more typical 2D games there.
@KingKrouch 4 หลายเดือนก่อน ⁺⁶
Honestly, optimizing Doom for 60FPS on the Sega Saturn (like originally intended) would be a really neat thing to see someone attempt.
@inceptional 4 หลายเดือนก่อน ⁺³
@@KingKrouch I'm absolutely sure the Saturn could run Doom at 60fps if it doesn't already. I mean, haven't people got it running at 60fps on some of the older consoles already like the 32X or whatever? I swear I read that somewhere.
Now I'm curious, does Doom 64 run at 60fps?
@alopexau 4 หลายเดือนก่อน ⁺¹
A good example that proves what you said is a recent romhack for Ranma 1/2 Chougi Ranbu Hen, its one of the most poorly optimized fighting games on the SNES, it's an otherways great game but it runs like complete dogwater. Recently a user named upsilandre did a partial rewrite of the game, heavily optimizing the code and got it running at a faultless 60FPS.
It further frustrates me that the majority of notoriously sluggish SNES games that earned the console a reputation were not really the fault of the console, but rather developers being cheapasses and using SlowROM chips. Kandowontu's been hacking SNES games for a while now, converting them over to FastROM and this alone has yielded significant performance improvements, removing most, if not all the rampant slowdowns in a ton of games.
Manfred Trenz in one game with no expansion chips pretty much shamed every SNES dev with Rendering Ranger R2, so the whole console really deserves a redemption arc.
@ABaumstumpf 4 หลายเดือนก่อน ⁺²
And there are even crazy people claiming the N64 was more powerful.
@solarflare9078 4 หลายเดือนก่อน
Retro Core would def think the Saturn is better than the N64, but mainly because he’s not fond of the N64 at ALL.
@matthewmcrobie6703 29 วันที่ผ่านมา
Fascinating video. The devs from those days performed miracles to make some of the most celebrated games in history, but sometimes they didnt make it easy for themselves.
@heavygaming6596 4 หลายเดือนก่อน ⁺⁴
thank you super mario 64 guy
@felixklinge5571 4 หลายเดือนก่อน
Very cool video! Good work on showcasing the various "optimization-attenpts".
It's always important to check your optimization ideas against the actual hardware where the software is run. Especially when only targeting one system (as Nintendo did with Mario 64).
Loop unrolling for example makes sense on newer CPUs because of their out-of-order nature but on other hardware where cache locality is much more important it's hurting you (as shown in the video).
@Hikarikun92 4 หลายเดือนก่อน ⁺³
2:57 - what is the name of that song? It comes up to my mind every once in a while, but I can't remember where it was from...
@ocaiosim 4 หลายเดือนก่อน ⁺²
The Legend of Zelda: Realm Overworld
@Hikarikun92 4 หลายเดือนก่อน
@@ocaiosim Thanks!
@Dalek-br6nu 4 หลายเดือนก่อน ⁺¹
Specifically from Spirit Tracks!
@Hikarikun92 4 หลายเดือนก่อน
@@Dalek-br6nu thanks :)
@tenj00 หลายเดือนก่อน
So the main reason Nintendo does nit release their source code is probably fear of someone publicly roasting their skills in the future.
@mekafinchi 4 หลายเดือนก่อน ⁺⁹
19:20
I think this is the most important thing to know when questioning programming decisions. Previous nintendo
consoles didn't have caches at all; before then memory speed was on par with CPU speed and memory accesses would take the same amount of time no matter when they happened. A huge amount of the modern optimizations are centered around cache performance, but I'd be surprised if cache performance had gotten *any* significant attention at the time.
@ArneChristianRosenfeldt 4 หลายเดือนก่อน
Consoles with cache: 3do, Jaguar, Sega 32x, PS1, Saturn
@mekafinchi 4 หลายเดือนก่อน ⁺¹¹
@@ArneChristianRosenfeldt one may notice that none of those are from Nintendo
@ArneChristianRosenfeldt 4 หลายเดือนก่อน
@@mekafinchi yeah, Nintendo was late to the party and ignorant to the outside world. Probably did not allow experienced ARM coders from Archimedes to come in. Did not pay to get mentor with experience with Sun, Fuji, or SGI servers. Don’t go to trade shows. Don’t learn about profilers and instrumentation.
@mekafinchi 4 หลายเดือนก่อน ⁺¹
@@ArneChristianRosenfeldt ok
@ArneChristianRosenfeldt 4 หลายเดือนก่อน ⁺²
@@guyg.8529 the funny thing about Jaguar is that it does not have a real cache like the older 3do, but scratchpad memory like the TMEM in the N64. So Doom in 1995 load slices of “microcode” into this scratchpad memory. Granted, data access to memory was not as slow as on N64. No problem with pointers in a mesh. PS1 has code cache and data cache and data scratchpad memory. The latter was extensively used by developers, almost like zero page on NES or genesis first 64k. SNES with its page register and x86 with segments were easier to optimize.
@SZvenM 4 หลายเดือนก่อน ⁺¹
Very nice editing in this one, and super interesting topic

ต่อไป

เล่นอัตโนมัติ

FIXING the ENTIRE SM64 Source Code (INSANE N64 performance)