The Genius of the N64's CACHE Instruction

แชร์
ฝัง
  • เผยแพร่เมื่อ 3 ก.พ. 2025

ความคิดเห็น • 945

  • @KazeN64
    @KazeN64  หลายเดือนก่อน +735

    Slight correction to the video: It looks like NEC (Nippon Electric Company) were actually the ones that designed most of the CPU core here, so give them the credit instead of SGI.
    Correction correction: Apparently it was SGI after all? I have no idea. someone else can argue this out. I doesn't really matter to me or the video.

    • @xyzabc123-o1l
      @xyzabc123-o1l หลายเดือนก่อน +37

      companies like NEC need to start designing and manufacturing chips on old nodes again. we don't need faster chips, we need more cheap chips in 2025 :)

    • @syncmonism
      @syncmonism หลายเดือนก่อน

      @@xyzabc123-o1l Intel currently has some 14nm fabs sitting idle I believe. They need to find some customers for those fabs!

    • @incognitoman3656
      @incognitoman3656 หลายเดือนก่อน

      👌

    • @Dwedit
      @Dwedit หลายเดือนก่อน +16

      @@xyzabc123-o1l According to Sophie Wilson (co-designer of ARM processor), price per gate is lowest at the 28nm process node.

    • @xyzabc123-o1l
      @xyzabc123-o1l หลายเดือนก่อน

      @@Dweditty for the info, do you know of any easily purchasable chips on that process node?

  • @chane2k1
    @chane2k1 หลายเดือนก่อน +1877

    Sounds like N64 emulators are just going to have to use your game as an accuracy benchmark.

    • @zeggyiv
      @zeggyiv หลายเดือนก่อน +123

      Expect to need a new PC to play 30 year old games.

    • @iwiffitthitotonacc4673
      @iwiffitthitotonacc4673 หลายเดือนก่อน +23

      ​@@zeggyiv ?

    • @TSquitz
      @TSquitz หลายเดือนก่อน +147

      @@iwiffitthitotonacc4673 emulation is really hard to do, and can require more computing power than the original console

    • @zeggyiv
      @zeggyiv หลายเดือนก่อน +140

      @@iwiffitthitotonacc4673 Emulation gets more computationally expensive the greater accuracy is required. Ultra HLE used to run on what would be considered a toaster today.

    • @EnigmaticGentleman
      @EnigmaticGentleman หลายเดือนก่อน +44

      @@TSquitz I can emulate the most intensive PS2 games at 2x on my 200$ tablet, like unless we're talking 7th gen+ or REALLY high accuracy stuff its just not hard anymore (unless your machine is really underpowered).

  • @rastas_4221
    @rastas_4221 หลายเดือนก่อน +1196

    N64 developers: "Well excuse me we didn't have 20 years to study the architecture and had to ship something by the end of the month!"

    • @thewhitefalcon8539
      @thewhitefalcon8539 หลายเดือนก่อน +127

      This is extremely true

    • @Orzorn
      @Orzorn หลายเดือนก่อน +173

      Yeah, it really seems like the N64 was actually so much more powerful than any of the games every took advantage of. It also seems a lot of this is down to lack of documentation spelling out the best way to use these capabilities, plus overworked devs who had to get a game out ASAP.

    • @PinkMawile
      @PinkMawile หลายเดือนก่อน +89

      We have to send Kaze back in time to revolutionize the N64

    • @ENCHANTMEN_
      @ENCHANTMEN_ หลายเดือนก่อน +60

      I bet we could do spectacular effects with modern hardware, it's just that the complexity to optimize it to this degree isn't humanly possible

    • @zdelrod829
      @zdelrod829 หลายเดือนก่อน +3

      Dk64 moment

  • @TheBackyardChemist
    @TheBackyardChemist หลายเดือนก่อน +844

    18:00 this trick is called Cache-As-RAM (CAR) and as far as I know it is used by BIOS code in most (all?) PCs. In the earliest part of the boot process you simply do not have any RAM yet, since DDR RAM initialization is so complicated. So when modern x86 CPUs come out of reset, they need to start executing code to initialize their memory controller, so for this CAR is used.

    • @KazeN64
      @KazeN64  หลายเดือนก่อน +393

      oh that's really cool! I didn't know this was commonly used already. Interesting that they do it out of necessity instead of for performance.

    • @ThatOSDeveloper
      @ThatOSDeveloper หลายเดือนก่อน +21

      Huh thats really interesting, do other things like the 6502 or something use that?

    • @gabemorales7814
      @gabemorales7814 หลายเดือนก่อน +61

      @@ThatOSDeveloper 6502 and similar super early microprocessors have no cache. First processor I saw with an instruction cache was the 68020 on the Amiga 1200, but IIRC they work differently because the amiga itself has a funky bootstrap sequence. The SH-4 has such a mode, however, it's called "OCRAM mode." The Dreamcast has an integrated MMU so, without checking, I'm fairly sure it'd boot the same way.

    • @TheBackyardChemist
      @TheBackyardChemist หลายเดือนก่อน +23

      @@ThatOSDeveloper Nope, anything where the RAM is straight SRAM or something comparable will have RAM after reset. This is only the case for processors that have to initialize their own memory controller with a complicated algorithm.

    • @queazocotal
      @queazocotal หลายเดือนก่อน +16

      @@ThatOSDeveloper Technically, @thebackyardchemist is wrong, and early PCs (along with the whole 8 bit space) don't use this, as they don't have CPU cache generally. The 486 was the first processor where it could rely on having internal to the CPU cache. 6502/z80/4004/8008/8080/8086 did not have any cache.

  • @johnclark926
    @johnclark926 หลายเดือนก่อน +888

    Kaze seems to have blown past the RTX 5090 phase of development and discovered that the N64 has a pseudo-quantum computer inside

    • @daspeedsta9455
      @daspeedsta9455 หลายเดือนก่อน +95

      Saw this comment before watching and thought this was a joke 😭

    • @Bones_
      @Bones_ หลายเดือนก่อน +65

      This is extra funny considering the people who try to claim that quantum computers use parallel universes and that Mario 64 has parallel universes

    • @bobcake8904
      @bobcake8904 หลายเดือนก่อน +2

      Especially now since Mario in the multiverse released… XD

    • @IXPStaticI
      @IXPStaticI หลายเดือนก่อน +6

      Scrolling past this halfway into the video I thought this was a joke but it turns out HES ACTUALLY STRAIGHT UP DOING THAT WTF

    • @xthomas7621
      @xthomas7621 15 วันที่ผ่านมา

      You all must be exaggerating and keeping the joke going. Lemme see where you got the idea..

  • @viridisspielt
    @viridisspielt หลายเดือนก่อน +548

    Create_Dirty_Exclusive sounds like the general idea behind Conker's Bad Fur Day

    • @skyguysZ
      @skyguysZ หลายเดือนก่อน +14

      this made me laugh way too hard

    • @CottonModem
      @CottonModem หลายเดือนก่อน +30

      Damn, just made an almost identical comment before stumbling across this one. We must both be very handsome, intelligent, and charismatic.

    • @skyguysZ
      @skyguysZ หลายเดือนก่อน +6

      @@CottonModem 20 intelligent people including us and the OP could have thought of this OP’s comment

    • @thesenamesaretaken
      @thesenamesaretaken หลายเดือนก่อน +9

      And there I was thinking it was the nams of Kaze's onlyfans page

    • @galen__
      @galen__ หลายเดือนก่อน

      Dirty Cash(e) 😂

  • @Manabender
    @Manabender หลายเดือนก่อน +304

    So basically, you're taking a couple of cachelines and telling them "you don't cache any more, you are now extra CPU registers."
    Brilliant.

    • @hyperon_ion9423
      @hyperon_ion9423 หลายเดือนก่อน +16

      Can the N64 do operations directly on the cache buckets? I would assume it would still have to load the data to a register connected to the ALU, so they’re more like extra _SUPER_-volatile ram addresses that you then have to “flush” (i.e. bring the original data back from RAM over the Ram-Bus so that you don’t overwrite it)
      I imagine that the trick would be getting as much use out of the cache buckets as you can before needing to reset them back to their original data, or perhaps even invalidate that section of RAM altogether and pretend that the cache _is_ the RAM until the data in it has to be accessed by something other than the CPU.

    • @shinyhappyrem8728
      @shinyhappyrem8728 หลายเดือนก่อน +7

      The SNES CPU also had some memory-mapped bytes on the CPU die ($43x0..$43xB for x=0..7, so 12*8=96 bytes) but sadly they were used mostly just as a place to store DMA/HDMA parameters. Afaik only 1 game used that area as a fast cache for instructions: Another World (SNES port by Rebecca Heineman).

  • @Armi1P
    @Armi1P หลายเดือนก่อน +327

    2035: Kaze manages to run Crysis on N64 by using instructions that theoretically doesn't even exist

    • @LokiScarletWasHere
      @LokiScarletWasHere หลายเดือนก่อน +20

      Don't give him ideas. You know he'll do it.

    • @VlaDexa_MAX
      @VlaDexa_MAX หลายเดือนก่อน +9

      Don't know about N64, but I'm pretty sure that modern CPUs have undocumented instructions, sooo

    • @kellan5431
      @kellan5431 หลายเดือนก่อน +13

      There are a few NES games that use undocumented instructions. On the CHIP-8 (technically a fantasy console) a few undocumented instructions got used so much that they became official

    • @Akriashi
      @Akriashi 29 วันที่ผ่านมา +1

      @@VlaDexa_MAX all procs have undocumented instructions...though modern ones can have them "disabled" via the Instruction decoder being set to convert their opcodes to NOPs in the end-user versions.

  • @6Frxggy
    @6Frxggy หลายเดือนก่อน +519

    bro explained the N64 like a country

    • @LavaCreeperPeople
      @LavaCreeperPeople หลายเดือนก่อน +8

      Pro

    • @gabycute5128
      @gabycute5128 หลายเดือนก่อน +7

      @@LavaCreeperPeople what?

    • @Hollow_Struggler
      @Hollow_Struggler หลายเดือนก่อน +5

      And boy did it work

    • @notme8232
      @notme8232 หลายเดือนก่อน +16

      And he somehow made it MORE confusing

    • @superking208
      @superking208 หลายเดือนก่อน +11

      bro looks up at the sky and says "bro is blue"

  • @bottols
    @bottols หลายเดือนก่อน +123

    I am not using quantum physics in my Mario 64 mod YET. Famous last words.

  • @NanNaN-jw6hl
    @NanNaN-jw6hl หลายเดือนก่อน +141

    Essentially you're using dynamic ranges of cache as a sort of register-window; bravo!
    I've not seen this sort of cache-line optimization talk outside of Linux kernel specific talks before. Excellent!

    • @ArneChristianRosenfeldt
      @ArneChristianRosenfeldt หลายเดือนก่อน +15

      Yeah, this streaming out of sub-16 byte data packages looked very register windows. The Jaguar has a (buggy) helper registers to let the GPU assemble 32:32 bits to write out in one go as 64 bit.

    • @gabemorales7814
      @gabemorales7814 หลายเดือนก่อน +20

      lots of talk about this kind of optimization going on right now in Dreamcast-land with the community port of GTA3. Currently none of it is implemented as everyone hashes out the detail with profiling to see exactly the best way to attack the problem, with the added complexity that both vertex transformation and vertex submission can *_potentially_* thrash cache depending on how it's done.

  • @Luna5829
    @Luna5829 หลายเดือนก่อน +177

    first time a bus has been mentioned in an n64 video without it being "Imagine a bus"

    • @thewhitefalcon8539
      @thewhitefalcon8539 หลายเดือนก่อน +25

      If I had a nickel for every Mario related bus meme, I'd have two nickels...

    • @Pie-jacker875
      @Pie-jacker875 หลายเดือนก่อน +8

      @@thewhitefalcon8539 I'd have 3. Desert bus 64.

    • @patientallison
      @patientallison หลายเดือนก่อน +4

      Imagine a rambus

    • @turbinegraphics16
      @turbinegraphics16 หลายเดือนก่อน +3

      SMB frame rule and rambus being related 😂

  • @ChineseCookie
    @ChineseCookie หลายเดือนก่อน +397

    I am not using this information, I am not making a N64 game. I'm just watching this because I can.

    • @thewhitefalcon8539
      @thewhitefalcon8539 หลายเดือนก่อน +13

      This way of thinking is good on any platform.

    • @Goose____
      @Goose____ หลายเดือนก่อน +10

      a great use of freewill

    • @Swenglish
      @Swenglish หลายเดือนก่อน +27

      Same. I don't even understand half of it, but hearing someone go in depth on their niche interest without being boring is magical when it's clear they have taken their nerdiness to expert level.

    • @HKlink
      @HKlink หลายเดือนก่อน +3

      Listening to someone talk about a thing they're passionate about is always fun. Even if you don't understand half of it.

    • @davidthecommenter
      @davidthecommenter หลายเดือนก่อน +3

      am i ever gonna use this information? not likely
      do i like hearing this guy talk about transforming the N64 into a bloody supercomputer? absolutely

  • @johanngambolputty5351
    @johanngambolputty5351 หลายเดือนก่อน +170

    2:09 As someone who did maths for their undergrad, I can confirm, I have absolutely no memory (its kinda why generality and derivations from first principles appeal in the first place).

    • @mathphysicsnerd
      @mathphysicsnerd หลายเดือนก่อน +2

      That's because you're not a chad universalist who memorizes their proofs like Poincare :^)

    • @JorgetePanete
      @JorgetePanete หลายเดือนก่อน

      it's*

    • @thezipcreator
      @thezipcreator หลายเดือนก่อน +8

      @@JorgetePanete yrou'e*

    • @wyattknutson
      @wyattknutson หลายเดือนก่อน +2

      See now I'm awful at math but my memory is fantastic, wanna connect our brains with a rambus?

    • @multiapples6215
      @multiapples6215 หลายเดือนก่อน

      If you've never forgotten the quadratic formula on an exam and re-derived it on the spot, then are you even a real mathematician :)

  • @dudono1744
    @dudono1744 หลายเดือนก่อน +240

    Rambus was finally going vroom vroom, but now it's retired :(

    • @MarioKartSuperCircuit
      @MarioKartSuperCircuit หลายเดือนก่อน +46

      Bro downloaded more ram to the point he didn't need the base ram anymore

    • @SteveNathn
      @SteveNathn หลายเดือนก่อน +23

      He has a good career and now he can enjoy some time off

    • @crestdazoltral7705
      @crestdazoltral7705 หลายเดือนก่อน +6

      Having (all) your ALUs munching away on useful work with some memory bandwidth to spare is the goal for a well optimized system.

    • @mylittleparody2277
      @mylittleparody2277 28 วันที่ผ่านมา +4

      Don't worry, the RAM BUS is going full time with the RCP = )
      It just don't deserve the CPU that often, that's all.

    • @athos5359
      @athos5359 27 วันที่ผ่านมา

      @@mylittleparody2277 i think it s the rdp that needs most of the ram bandwitch.

  • @lucaspec7284
    @lucaspec7284 หลายเดือนก่อน +113

    Kaze : "Alright, full disclosure : i am not using quantum physics in my mario 64 mod-"
    Also Kaze : "-YET"
    At this rate we'll have ray tracing in RtYI by the time it releases.

    • @ssg-eggunner
      @ssg-eggunner หลายเดือนก่อน +3

      The satirical kaze video by sm64rise is gonna become real

    • @LokiScarletWasHere
      @LokiScarletWasHere หลายเดือนก่อน +4

      You thought the Rt stood for Return To
      You were sorely mistaken

    • @lucaspec7284
      @lucaspec7284 หลายเดือนก่อน +2

      @@LokiScarletWasHere Raytraced Yoshi's Island 64, coming to a nintendo 64 near you in 2025.
      Actually, reminds me of that one guy who made a Ray-Tracing chip for the super nintendo.

  • @Teckman8
    @Teckman8 หลายเดือนก่อน +58

    Wait, why is this legitimately a good way to explain how a CPU works?

  • @3lH4ck3rC0mf0r7
    @3lH4ck3rC0mf0r7 หลายเดือนก่อน +553

    Nintendo: *releases N64 specs &
    development docs*
    SGI: look how they massacred my boy
    Edit: Tbf, this is basically software engineering in a nutshell. Hardware folks come up with some rocket science bullshit to squeeze extra perf out of the silicon, and the software people waste all of that work by having compilers ignore modern special-purpose instructions for the sake of backwards compatibility, and putting the entire program behind all the polymorphism, virtual functions, dependency injections, virtual machines & interpreters, and God knows how many other abstractions and obfuscations. Despite the different nature of software optimization then vs. now, it boils down to a similar amount of fundamentally misunderstanding how the hardware actually functions that led to most of the N64 library having lackluster performance.
    Modern apps are written like a labyrinth, and the CPU is given the unreasonable task of translating the map from a foreign language and solving the labyrinth as quickly as possible. This is often why modern software is ~1000x slower than it could be.

    • @LavaCreeperPeople
      @LavaCreeperPeople หลายเดือนก่อน +9

      Rip

    • @uponeric36
      @uponeric36 หลายเดือนก่อน +163

      Biggest revelation of this channel (besides all the amazing tech) is that the worst, most performance limiting part of the N64 was the documentation.

    • @brandonlittle6444
      @brandonlittle6444 หลายเดือนก่อน +5

      @@uponeric36 same with Bosch mototronic ECUs and their stolen/hidden FR manuals

    • @mylittleparody2277
      @mylittleparody2277 28 วันที่ผ่านมา +4

      Well, yes and no.
      For doing both hardware and software (even if at a way simpler level than CPU) I agree with you that some very powerful hardware possibilities are not used.
      On the other hand, take in account that, the doc (for the N64, but on a lot of projects I worked on) is not as simple or readable as you may expect, and also, the software side don't have a lot of time to learn the hardware and code.
      That's why most of the time, retro compatibility is a thing, because you can reuse old bricks to try to gain a bit of time.
      And i agree with you modern frameworks are just an unstable pile of horrendous things (that you can't even modify easily).
      But just try to say that the game will run faster and on old hardware if you rewrite the engine from scratch with Raylib instead of using Unity...

    •  28 วันที่ผ่านมา +2

      > Modern apps are written like a labyrinth, and the CPU is given the unreasonable task of translating the map from a foreign language and solving the labyrinth as quickly as possible.
      Compilers can do a lot of heavy lifting there.

  • @coltonroyle2341
    @coltonroyle2341 หลายเดือนก่อน +94

    Being a pioneer for a 30 year old console, what a time to be alive.

    • @Doom2pro
      @Doom2pro หลายเดือนก่อน +5

      21 minute papers...

    • @canaconn2388
      @canaconn2388 หลายเดือนก่อน +1

      ​@Doom2proexcept with actual information

    • @Doom2pro
      @Doom2pro หลายเดือนก่อน

      @@canaconn2388 and not spoken like "Today, we, are going, to, discuss, a groundbreaking, piece, of techonogical, development.. so we, will get, to see, and amazing, hard to believe, sight... So hold on to your papers"

    • @arciks11
      @arciks11 หลายเดือนก่อน

      "Man Revolutionizes N64!"
      "He's 25 years late and gonna get sued so IDK why he did."

    • @keaton718
      @keaton718 24 วันที่ผ่านมา

      Have to be wonder if all his research into N64 hardware will indirectly help improve fpga N64 projects to improve, to act truly like the real hardware or a bit more like it at least. If you look at the firmware update history of Analogue products you can see they are frequently updating the cores to address inaccuracies in certain games, even very popular games. So fpga may never truly be 100% accurate for 100% of games. So Kaze's intense research into how N64 hardware actually works and how it is actually used, and how it could be used, is probably important for achieving that goal, or at the very least putting the spotlight on N64 hardware when people inevitably try to run all these things on their Analogue 3D's and such.

  • @vilian9185
    @vilian9185 หลายเดือนก่อน +72

    mario 64 has parallel universes, nintendo64 has quantum cache everything is coming together for mario64 port for a quantum computer

  • @JohnBromin
    @JohnBromin หลายเดือนก่อน +32

    Man I love the visuals in this one. It's been great learning something new every time. Few of the concepts here I don't think I would have understood without the little graphics.

  • @gabemorales7814
    @gabemorales7814 หลายเดือนก่อน +120

    Ah! A direct mapped cache! The Sega Dreamcast has a similar cache setup. I've got a good scheme created to maximize direct mapped cache by using absolute addressing in gcc with an ld script to create stripped zones. Separate the direct mapped cache into 4 zones, each separated by the width of the cache spacing, to ensure writes to buffers don't overwrite the previous line. On the Dreamcast, you can also enable OCRAM mode, which halves the cache into a scratch pad for fast math. This is actually optimal, because the physical layout of the dreamcast's memory is (for sake of brevity ignoring the 64-bit dual ram setup) 2 ram "chips" with 2 banks inside, each bank made up of 2048 rows of memory "cells," each cell being a cacheline in size. Each bank has a mechanism inside to read a bank called a sense amplifier. To read a cell, a sense amplifier must be attached to the row, so if you read a row outside of the boundary, it incurs a performance penalty as the sense amplifier must detach, move to the appropriate row, and reattach. If you operate in OCRAM mode, the sizing of the remaining Cache is *juuuust* right to fit 4 rows at once if you stripe your memory without sense amplifier penalty. It sounds like the DC and N64 actually share quite a bit in common memory wise.
    A really cool feature of the Dreamcast memory map is the entire memory is mirrored to an alternate address which skips cache when read, as well. So you can actually store things in memory and call them using an alternate address without thrashing your data cache. The dreamcast also naturally has prefetch and invalidate instructions, which when combined with absolute addressing and OCRAM mode, gives you quite a bit of granularity in how you control your cache.
    EDIT - Question: Does the N64 offer any sort of degree of instruction parallelization? The Dreamcast uses a 5-stage harvard architecture for instruction fetch, which allows parallelization when basically using any instruction from alternate groups providing they aren't a move opcode. Anything like that exist on the N64? EDIT AGAIN: Welp, looked a little further and it turns out this is actually a part of the MIPS name, lol. "Microprocessor without interlocking pipeline staging." Very, very, verrry cool. The architecture of the DC and N64 are very similar!

    • @ArneChristianRosenfeldt
      @ArneChristianRosenfeldt หลายเดือนก่อน +8

      This video is so complicated that I am almost relieved that the Atari Jaguar only has scratchpad RAM for code and a Matrix and a ton of registers for the data.

    • @gabemorales7814
      @gabemorales7814 หลายเดือนก่อน +18

      @@ArneChristianRosenfeldt Oh man I've done Jaguar programming with my Skunkboard. I consider Dreamcast development way, way easier lol. The Dreamcast is so elegant, nice FPU with fat registers for 2 full matricies, a bunch of really cool SH4 fast math functions. Plus, the absolute coolest feature: Order-independent transparencies, owed to deferred rasterization. You bin all your polygons upfront before sending them to a tile accelerator to rasterize, which gives the tile accelerator, which generates pixel fragments, the opportunity to depth-test against every other polygon in the bucket. This gives the dreamcast per-pixel transparency without needing to order polygons.
      I absolutely love 68000 programming, though. When I do Jag development, I make atari age weep because I play mainly with the 68000 lol.

    • @ArneChristianRosenfeldt
      @ArneChristianRosenfeldt หลายเดือนก่อน +4

      @ I just try to redeem Ataris hardware decisions. Running code out of external memory probably was an accident due to the unified data and code cache and external data access. LOL.
      I cannot code 68k , only 6502

    • @gabemorales7814
      @gabemorales7814 หลายเดือนก่อน +11

      @@ArneChristianRosenfeldt Coming from 6502, I think you'd find the 68000 a dream to work with. They feel very similar, except the 68000 is just more of everything, especially registers. That's the absolute best thing about the 68000 -- FAAAAAAT registers. The 68000 is 32-bit internal, that's seven 32-bit address registers, and eight 32-bit data registers. With bitmasking and bitshifting, that's essentially the same as sixteen 16-bit data registers, or thirty-two 8-bit registers! And unlike the 6502, data registers are general purpose, use however you want. You can also use the address registers in clever ways. Hands down my favorite CPU of all time, simple enough to know the ins and outs of, but feature packed enough to do some incredible stuff. Definitely give it a try!

    • @CantrellDouglas
      @CantrellDouglas หลายเดือนก่อน +10

      Off topic, but kinda funny: That's me in your profile picture. Or, rather, I posed for the reference picture when I was a kid. Wasn't expecting to see myself in the comment section. 😂

  • @Reaperman4711
    @Reaperman4711 หลายเดือนก่อน +85

    0:50 BITD, I had a girlfriend with a create_dirty_exclusive mode. It wound up not being so exclusive, and then I got dumped.

    • @brianb2308
      @brianb2308 หลายเดือนก่อน

      F

    • @Mizu2023
      @Mizu2023 หลายเดือนก่อน +8

      were you ram

    • @GumSkyloard
      @GumSkyloard หลายเดือนก่อน +19

      @@Mizu2023 no, but she was

    • @reas0
      @reas0 หลายเดือนก่อน +15

      invalidated

    • @Mizu2023
      @Mizu2023 หลายเดือนก่อน +4

      @@GumSkyloard Oh right. Saw "dump" and mind went "ramdump"

  • @xdanic3
    @xdanic3 หลายเดือนก่อน +34

    10:08 Fun fact: Many of the equipment the Apolo mission used was analog, so not all data required to run on a CPU

    • @pafnutiytheartist
      @pafnutiytheartist หลายเดือนก่อน +19

      Also, execution speed was the last priority. It was (and still is for all space missions) all about reliability for obvious reasons.

    • @cube2fox
      @cube2fox หลายเดือนก่อน +3

      The Apollo guidance computer was probably quite memory limited in terms of size

    • @T3sl4
      @T3sl4 หลายเดือนก่อน +6

      And what wasn't, was digitized in the simplest of ways: pulse or frequency counting was used as an ADC (getting, I think, 10 to 18 bit operands usually?). They didn't have integrated peripherals for this, not even dedicated ICs, like we do today. (For fast conversion applications, there were digital conversion CRTs: an electron beam sweeps across a punch-coded plate, producing a serial bit sequence corresponding to beam deflection in the other axis. Not sure who was using these; Bell telephone maybe? Military?) Calculations didn't need to run too often -- a few times a second to update spacial navigation and maneuvering, basically solving differential equations by incremental difference; and managing what digital systems (i.e. on/off switches, relays, lights, display and keypad (DSKY), etc.) were set to automatic (including the autopilot controlling thrusters). It was slow (clock rate low 100s kHz?), but had reasonable bus width (18b?) and a couple of otherwise quite powerful numerical instructions (mul/div/etc.?). Things you might not expect given the low capability generally, but customized perfectly for the workload.
      Computer design back then was very different: instead of starting with a standard system, there was simply no such thing, as having a CPU at all was already such a massive hurdle; you have a strong incentive to strip out everything unnecessary, and customize the architecture (not just bus sizes, but parallel/serial, instruction timings, pipelining even, etc.) to suit your purpose. There were no standard instruction sets to pick from (for general applications; arguably IBM's System/360 was the first, perhaps only, standardized instruction set -- but only for mainframe data applications, and this might give you some idea of the scale required to obtain value from standardization, and what the scale of computing generally was like back then!). What we think of today as a CPU, reading instructions and processing data, was a more nebulous concept back then. So, between these things being built from gates, or individual transistors, the tremendous design and hand-assembly effort to put those together, let alone writing ROM (e.g. "rope") and assembling RAM (hand-threaded core!), and the rarefied applications that demanded such lavish expense -- they were very bespoke and specialized systems indeed!
      Pipelining is interesting to mention here... System/360 was the first to have it, ca. 1967, according to one article? More important going into the 70s, and again only for the biggest machines that would benefit from it. It seems like a new thing, but it's relatively new _in the consumer space_ to have needed pipelining, or caching or what have you. What used to be supercomputer tech in the 70s, filtered down to single chip consumer hardware in the 90s, and so on. This pattern hasn't changed much: what passed for a supercomputer in the 2000s (multi-CPU, SMP or asym.; vector instructions; etc.) has filtered down, in a sense, to your smartphone today. We've since settled on the best of both worlds: SMP CPU with moderate vectorization, augmented with large-vector parallel processing ([GP]GPU). We carry in our pockets, for the measly cost of a couple watts power dissipation, the power of myriad Cray Supercomputers.
      Interestingly, grid or flow computing has long been known, but not gained any traction aside from limited use cases where the flow of data is optimal for the calculation (differential field solvers?). Anyway, modern CPUs and GPUs are so extraordinarily powerful that such applications can still run on them with very reasonable execution time, even if not well suited to the flow and dependency of data (i.e. RAM/cache limited). I wonder if that's changing with the availability of tensor cores today (neural net stuff; ugh, "AI").
      (Standard disclaimer: any keywords and inaccuracies are largely from memory, and should be taken as incentive to go and research these things yourself. There are many excellent and accessible articles, going into any level of detail, on the above subjects; highly encouraged!)

    • @bluedistortions
      @bluedistortions หลายเดือนก่อน

      The Apollo computer used for calculating trajectories was an old gear driven cash register

    • @daveloomis
      @daveloomis 29 วันที่ผ่านมา

      ​@@T3sl4Bro, I would read your substack.

  • @thejakfan313
    @thejakfan313 หลายเดือนก่อน +22

    "This will actually somewhat work on some emulators, too"
    *Shows a smoking laptop, which is presumably overheating*
    I love it

  • @ErikBod
    @ErikBod หลายเดือนก่อน +55

    Sick 3d animations go vroom vroom.

  • @TheJesterElectronic
    @TheJesterElectronic หลายเดือนก่อน +39

    Computers have a few functionalities programmers typically would not consciously use, but for the sake of optimization, they sometimes should.

    • @ArneChristianRosenfeldt
      @ArneChristianRosenfeldt หลายเดือนก่อน +11

      Kids, don’t do this at home. You are not Kaze. Premature optimization is the root of all evil !

    • @skylerross8054
      @skylerross8054 หลายเดือนก่อน +10

      *premature* optimization is.
      However, sometimes, you've traced your performance bottleneck to a specific area, using somewhat realistic very stressful workloads. Now you need to optimize something everyone says is impossible to optimize further, because you have no hope of learning how fast is fast enough (it'll always be too slow for something), and performance is a feature. That's when you reach for the esoteric stuff.
      I did that a couple months ago for something at work, it gave like... well, it's hard to quantify. It was noticeable on the test case, at least a 10% throughput improvement of this function (which originally was 33% of runtime), how much time it saves depends on a bunch of parameters, we have an O(n) algorithm with a large constant factor that I can't do anything about, and this function has a O(M^4) section (yes, that's a slow complexity, I haven't figured out how to make it M^3 or smaller)

    • @SimonBuchanNz
      @SimonBuchanNz 8 วันที่ผ่านมา

      ​@@skylerross8054I hardly ever see the full quote, which really undercuts the "you shouldn't optimize this" crowd:
      "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."
      Note that small Knuth is only talking about "small efficiencies", and that even there 3% (a remarkably specific number) are still useful.
      I prefer the much simpler saying "until you measure you're wrong about where your code is slow", or the much more general "you're wrong". Keep that in mind and you'll be much happier a programmer!

    • @skylerross8054
      @skylerross8054 8 วันที่ผ่านมา

      @@SimonBuchanNz I'm not sure why I'm getting atted with this lol.
      I mean, I do agree with this, your conclusion is a nicer way of saying what I already agree with.
      I'm far from an "anti optimization" person, I'm more a "spend effort where it's the most effective for your goals" person. Indeed, performance is a goal/feature. There are an unfortunate number of people who don't think of it like that, and that's how we get apps that take multiple seconds to open, and have to use animations all over the place to mask the fact that things are taking longer than is comfortable.
      Alas, performance is a single goal, and we have others that are at least as important (the most performant x is one that does nothing), and working on this goal before checking that the work is useful at all, much less useful to the goal, is... wasteful. Measure twice, cut once. I may not engineer physical... anything really, but the motto still applies, a lot of advice from other engineering fields does. I prefer the term software engineer over any other term to describe the profession for that reason.

  • @drgabi18
    @drgabi18 หลายเดือนก่อน +20

    5:50 The framerate of the game here makes me think it's more like a CPU simulator, it's gonna be 100% accurate but simulations are still heavy

  • @vinnyandlin8510
    @vinnyandlin8510 หลายเดือนก่อน +34

    So when are you transferring your consciousness to a cluster of n64s?

    • @stealth7225
      @stealth7225 หลายเดือนก่อน +18

      No need for a cluster, one N64 is plenty powerful enough, he just gotta unlock the hidden consciousness port with the right optimizations.

  • @pafnutiytheartist
    @pafnutiytheartist หลายเดือนก่อน +78

    You are very much in the territory where speed is no longer a priority. When writing code commercially, you have to balance readability, expandability and execution speed. Even if devs back then knew your arcane arts, I doubt they would use such tricks. If your game loop runs 10 microseconds faster but everything breaks whenever you update the code, it's not a good change.
    I am genuinely infinitely impressed with your dedication to this madness though.

    • @PlaguevonKarma
      @PlaguevonKarma หลายเดือนก่อน +20

      Kaze doesn't care about readability, he cares about optimisation lol

    • @Gofer925
      @Gofer925 หลายเดือนก่อน +9

      i recommend you watch his
      'optimizing with "bad code" ' video
      it has even more Very Fun optimization stuff

    • @Bthrecon
      @Bthrecon หลายเดือนก่อน +9

      When I did dev many times when more speed was needed the understandable code got commented out, a paragraph added about what they optimisations were, and if you were lucky, another about WHY and what not to do lol.

    • @blarghblargh
      @blarghblargh 29 วันที่ผ่านมา

      Games get shipped. They don't always break even on revenue. Maintenance is a champagne problem. Good enough performance on low end systems is not, because it increases your revenue.

  • @goeiecool9999
    @goeiecool9999 หลายเดือนก่อน +13

    20:26 Poor Henry Kümpel suffering from mojibake. Unless they inserted ü intentionally as a joke...

  • @BenWillock
    @BenWillock หลายเดือนก่อน +14

    Finally, after years of us stupid people asking, Kaze has dumbed it down to our level.
    Bus go vroom hehe

    • @IceYetiWins
      @IceYetiWins หลายเดือนก่อน +2

      Bus go retire

  • @jacekm833
    @jacekm833 หลายเดือนก่อน +17

    AArch64 (a.k.a. ARM 64-bit) has a "dc zva" instruction that AFAIK does the exact same thing as Create_Dirty_Exclusive but sets the entire cache line to zeroes instead of unpredictable values. It is used in reference implementations of memset released by Arm. So this is definitely a known issue and many modern CPUs can work around it.

  • @TariqMKDS
    @TariqMKDS หลายเดือนก่อน +286

    bro knows better n64 than nintendo themselves🙏🙏😭

    • @Genzaijh
      @Genzaijh หลายเดือนก่อน +26

      Of course, Nintendo moved onto other technology. Amazing how deep you can dove into a hobby.

    • @pleasedontwatchthese9593
      @pleasedontwatchthese9593 หลายเดือนก่อน +56

      Its funny reading the N64 official development docs. They explain what a polygon is to developers because 3D was so new. Would you imagine working at an AAA studio and needed to explain what a 3D model is. But it makes sense, there was a start to everything.

    • @crunchdatbacon
      @crunchdatbacon หลายเดือนก่อน +2

      Ps1 is next 😬🥶

    • @ericlizama8552
      @ericlizama8552 หลายเดือนก่อน +5

      Iirc devs had to get permission from Nintendo to use Microsoftcode, so finding optimizations like this was probably stalled by a bunch of red tape.

    • @TariqMKDS
      @TariqMKDS หลายเดือนก่อน +1

      @@Genzaijh so real bro

  • @rmod8
    @rmod8 หลายเดือนก่อน +10

    Well done, you've made Schrödinger's Memory

    • @mathphysicsnerd
      @mathphysicsnerd 29 วันที่ผ่านมา

      I don't remember that part

    • @Unithrex
      @Unithrex 16 วันที่ผ่านมา +1

      @@mathphysicsnerd It's the quantum bit

  • @EnigmaticGentleman
    @EnigmaticGentleman หลายเดือนก่อน +21

    This is actually just a great lesson on computer hardware, like if Kaze's schedule wasn't full I'd say he should definitely do some teaching on the side.

    • @mathphysicsnerd
      @mathphysicsnerd 29 วันที่ผ่านมา +2

      This *_is_* his teaching on the side. Surprise!

  • @steven0719
    @steven0719 หลายเดือนก่อน +7

    when i think you have ran out of n64 hardware vids you keep on dropping em. i don’t regret my sub one bit.
    great video

  • @BenKDesigns
    @BenKDesigns หลายเดือนก่อน +35

    While I'm a programmer, I'm not really a low-level programmer, and these videos are still fascinating as hell to watch. Love your content, can't wait to play your game!

  • @thenimbo2
    @thenimbo2 หลายเดือนก่อน +11

    CACHE RULES EVERYTHING AROUND ME C.R.E.A.M. GET THE MEMORY

  • @BSEUNHIR
    @BSEUNHIR หลายเดือนก่อน +12

    We got 3D Rambus (retired) before GTA 6

  • @ErPiova
    @ErPiova หลายเดือนก่อน +15

    TL:DR: friendship ended with rambus, now cache is all kaze needs (this is exxagerated, but you get the point)

  • @metalj
    @metalj หลายเดือนก่อน +16

    Yep it's a memory throughput issue in the sense that at the moment of this video going up all the best gaming CPU's achieve their top spot on their respective benchmarks exclusively by having an unholy amount of 3D V-Cache. In that sense it's kind of funny that the N64 was almost prophetic in it's first party developers 'not understanding the hardware'. Except nowadays it's not limited to videogames and can close down airports and cost several billions of $ in a single day.

    • @FluffyFoxUwU
      @FluffyFoxUwU หลายเดือนก่อน

      ooooo crowdstrike incident reference

  • @njmccarthy
    @njmccarthy หลายเดือนก่อน +5

    Aside from loving all your videos and being extremely impressed at the level of detail you go into developing on the N64, in this video, I really loved the Ridge Racer Type 4 track (Naked Glow) at 10:14! Well done!

  • @DeltaNovum
    @DeltaNovum หลายเดือนก่อน +6

    This is the best ELI5 and visual representation of how this all works. Great education. Bravo, chapeau and thank you!

  • @Tinkerer_Red
    @Tinkerer_Red หลายเดือนก่อน +9

    Would really like to see a playlist of all of your optimizations over time in release/watch order. Would love an easy way to see the progress over the years as you've optimized so much.

  • @MrAddemaster
    @MrAddemaster หลายเดือนก่อน +8

    Bro knows the N64 better than his own room

  • @hypersonic12
    @hypersonic12 หลายเดือนก่อน +14

    I await your Diddy Kong Racing video.

  • @johnnywernd2593
    @johnnywernd2593 หลายเดือนก่อน +30

    I've never seen anyone as enthusiastic about the N64 hardware as you and it's amazing to see what you've accomplished so far. However, I keep wondering, if you know so much about the hardware, why haven't you considered writing an N64 emulator yourself? I ask because I'm pretty sure yours could be one of the most accurate since you've accumulated so much knowledge about it over the years. Keep up the good work, btw!

    • @KazeN64
      @KazeN64  หลายเดือนก่อน +41

      There are people with more knowledge than me contributing to emulators. (Also, even if I was the one with the most knowledge, I would not enjoy spending my time writing an emulator, I'd rather make my games)
      I think the bottleneck for emulators is often not that perfect accuracy is hard to achieve but rather that it is difficult to be perfectly accurate and performant enough to run games.

    • @mbrofoc
      @mbrofoc หลายเดือนก่อน +2

      ​@@KazeN64thank you for your work. It inspires programmers to further optimize their games

    • @johnnywernd2593
      @johnnywernd2593 หลายเดือนก่อน +5

      ​@@KazeN64 I totally understand that, and you're right. My point was more about the fact that you are so enthusiastic about the hardware and an emulator from you would be like an added bonus. I understand what you mean about not enjoying programming an emulator, since I'm a programmer too, but I don't enjoy working on emulation.

  • @dbarrie
    @dbarrie หลายเดือนก่อน +7

    Cache manipulation is still very much necessary in the modern (console) development space. Most vendor APIs handle much of it automatically, but if you’re trying to squeeze out absolutely every drop of performance you still need to worry about it. Generally just been the CPU and GPU at this point, but back in the PS3 era dealing with the SPUs was a very fun time. Other low-level/embedded development also frequently hits you right in the cache, and it’s almost guaranteed that when things go wrong, the cache is to blame!

  • @hyakin7818
    @hyakin7818 หลายเดือนก่อน +10

    man i needed that 2 months ago for my memory management and scheduling class project

  • @InkLore-p3h
    @InkLore-p3h หลายเดือนก่อน +3

    When this man speaks the entire modern gaming industry weeps-he saves microseconds where others can’t save seconds.

  • @Fyshtako
    @Fyshtako หลายเดือนก่อน +3

    Aww the animation you did to explain things was adorable. Great job, hjgh effort videos!

  • @spudd86
    @spudd86 หลายเดือนก่อน +6

    The equivalent to create dirty exclusive is to write to a write combining mapping.
    On x86 you can also use streaming writes from sse2 to do the same thing. It waits for a full cacheline of writes then flushes. It also does the right thing if you don't fill the chache line. You can also do prefetch with the right hints to say that you're going to be writing to it. Other Architectures likely have similar streaming writes.
    There's a lot of related optimisations. Write combining is mostly for memory mapped devices and things like CPU access to GPU memory. Here write combining or unchached would be set by using the Memory Type Range Registers, or in the page table.

  • @brunosuperman
    @brunosuperman หลายเดือนก่อน +19

    It's amazing how good the graphics quality you've achieved on a Nintendo 64 is! It's so beautiful! Imagine this game running in 1996

  • @Deez-Master
    @Deez-Master 29 วันที่ผ่านมา +3

    Thanks for consistently great videos

  • @FireBroRM
    @FireBroRM 23 วันที่ผ่านมา

    You are a genius brother, thank you for all the effort!

  • @guyg.8529
    @guyg.8529 หลายเดือนก่อน +8

    On modern hardware, you just have the temporal instructions, which bypass the cache, but nothing on the instruction set like Dirty-exclusive. But in the microarchitectural level, the OOO circuits may be eliminate the useless loads if you write immediatly on the loaded data. It depend on the load/store queue implementation, and most of the time, the OOO memory system tend to do loads before writes because the ALU are hungry for data and writes ccan be postponed or fused with a write buffer. Interaction of this optimisation with prefetching must be taken in account, also.
    Using the cache as RAM remind me what's used in modern GPU, notably NVIDIA ones. The L1 cache can be configurated to act as an adressable scratchpad memory (yes, the shared memory in CUDA is just the L1 cache reconfigurated). It's not surprising, since direct-mapped and associative caches contain one or multiple RAMs memories.

  • @Hollow_Struggler
    @Hollow_Struggler หลายเดือนก่อน +3

    Props to you for explaining such a topic in such an understandable manner, its a true display of intelligence

  • @ThePurpleCheeseMan
    @ThePurpleCheeseMan หลายเดือนก่อน +3

    I'm so hyped to try this game of yours. It's really impressive just how far you've managed to take N64's capabilities.

  • @Xaymar
    @Xaymar หลายเดือนก่อน +4

    Modern software video encoders still perform cache optimizations, and some video game engines also do this. It's gotten less frequently done due to the hardware just no longer really requiring it, but it still has performance gains even today. It's why the AMD X3D CPUs are so much faster than the ones without, they're no longer slamming into the RAM latency as often.

  • @lfestevao
    @lfestevao หลายเดือนก่อน +4

    Now the bus fits so many more framerules!
    (or something like that)

  • @cauhxmilloy7670
    @cauhxmilloy7670 หลายเดือนก่อน +1

    Many of these low level explicit cache management instructions are pretty useful for today's modern HPC applications. Specifically, these are great in lockless multithreaded contexts (alongside volatile reads/writes and memory barriers). Really cool video showcasing some sick usecases!

  • @brianb2308
    @brianb2308 หลายเดือนก่อน +18

    You can make 2 builds; one with hardware and all other optimizations, the other with only the optimizations that work on emulator. Not ideal having different systems work differently, but as emulators get better maybe your super optimized build would eventually work. Unfortunately I doubt emulators will get much better because they work with the whole N64 library already :/

    • @ArneChristianRosenfeldt
      @ArneChristianRosenfeldt หลายเดือนก่อน +3

      Or try out the capabilities on game load and patch in shims or NOPs if something fails

    • @eduardoanonimo3031
      @eduardoanonimo3031 หลายเดือนก่อน +9

      I alredy replied this before:
      They are alredy done at the same time.
      The same code executed in one way on real hardware, but if it identifies that is running in emulator due to accuracy limitations, it can change the code to an emulator friendly one.

    • @M1XART
      @M1XART หลายเดือนก่อน

      Yeah, Bear Waker had two builds as well, where other was console optimized.

  • @preludelight
    @preludelight 18 วันที่ผ่านมา

    The effort you went through to illustrate the cpu/ram/etc was top notch. Truly outdid yourself. 10/10, no notes.

  • @kyuthefox
    @kyuthefox หลายเดือนก่อน +4

    with the quantum physics cache where we can have the cache change and decide later if we want to commit to ram. we could do speculativ execution or banch prediction in software. we can run code without knowing if we should waiting for the gpu and reduce the idle time. maybe. i have no idea but this sounds like mad programming and i'm here for it.

  • @thetruegoldenknight
    @thetruegoldenknight หลายเดือนก่อน +1

    The N64 styled visuals for the analogy are just ADORABLE! :D

  • @LS95774
    @LS95774 หลายเดือนก่อน +7

    5:03 whoops, cache momentarily corrupted

  • @toxicNautilus
    @toxicNautilus หลายเดือนก่อน +2

    Me nodding along as if I know what Kaze is talking about when he describes technology more complicated than a rocket for a moon landing.

  • @RicoElectrico
    @RicoElectrico หลายเดือนก่อน +5

    R4300i cache is direct mapped, which you explained in a roundabout way. This means accessing instructions n*16KB apart (up to cache line length) or data n*8 KB apart will evict one already in cache cause they collide. I wonder if it's possible to instrument such events. This could enable some madman optimizations in tight loops.

  • @EskoLuontola
    @EskoLuontola หลายเดือนก่อน +2

    16:16-16:33 I remember reading that Azul Vega had an instruction for zeroing memory without reading the previous value from memory. It was added to make memory allocation faster, because Java initializes all fields with zero when allocating new objects. It improved performance greatly - there was always plenty of memory bandwidth available. They had asked for Intel to add a similar instruction, but at least back then x86 didn't have anything similar. I don't know how the situation is in recent years.

    • @TheLoneWolfling
      @TheLoneWolfling 28 วันที่ผ่านมา +1

      ARM has DC ZVA, which is essentially 'zero a cache line' (slight oversimplification). There are cases where DC ZVA before writing the cacheline does improve performance.
      That being said, many ARM processors also automatically pause linefills for full-cacheline writes if they detect said linefills are unnecessary.

  • @pkillboredom
    @pkillboredom หลายเดือนก่อน +3

    The "computer science lore" joke at the beginning was peak.

  • @jasonmaskell2894
    @jasonmaskell2894 28 วันที่ผ่านมา +1

    😂 I absolutely loved the RAM bus analogy story, and the acronyms. Very entertaining and educational

  • @ukyoize
    @ukyoize หลายเดือนก่อน +5

    I can't believe that x86 despite having stcpy as an instruction doesn't have any cache instructions.

    • @guyg.8529
      @guyg.8529 หลายเดือนก่อน +4

      It do have some cache instruction, for prefetch and invalidation of a cache line (maybe some more) and also temporal load/writes. Some of them are part of the SSE instruction set extension.

    • @gdclemo
      @gdclemo หลายเดือนก่อน

      @guyg.8529 given things like Rowhammer and Spectre that come from cache manipulation, maybe not giving even more cache control to userspace is a good thing.

  • @hene193
    @hene193 26 วันที่ผ่านมา

    Dude you explained really complex thing in really easy to understand way. Great job

  • @Thewinner312
    @Thewinner312 หลายเดือนก่อน +4

    Honestly, it gets a bit confusing when you try to make a city metaphor out of everything.

  • @Vinicius_Berger
    @Vinicius_Berger 20 วันที่ผ่านมา +1

    Next video: "the N64 was actually capable of finding the cure for cancer, but no game ever used that feature"

  • @Elesario
    @Elesario หลายเดือนก่อน +3

    That last one where you can decide whether the cache you wrote should be written back to RAM or not made me think of transactions in a database. I'm sure there's going to be code situations where you generate some data and then keep it or discard it based upon whether it passes some test, although seems niche.

  • @BolverBlitz
    @BolverBlitz หลายเดือนก่อน

    Donation to the RAM bus driver now that he is unemployed

  • @TylerMcVicker1
    @TylerMcVicker1 หลายเดือนก่อน +5

    hell yes let's go

    • @kirabey8946
      @kirabey8946 หลายเดือนก่อน

      Yooo Tyler hi!

  • @4thechivostreamsarchive586
    @4thechivostreamsarchive586 หลายเดือนก่อน

    Bro, I absolutely LOVE the way you animated this in the N64 engines in style!!!!

  • @MeriaDuck
    @MeriaDuck หลายเดือนก่อน +5

    "I'm micromanaging more than Jeff Bezos his employee's p breaks" 😂

  • @Samwow
    @Samwow หลายเดือนก่อน +1

    Kudos on the visual presentation, was very fun yo watch!

  • @MrRaiPlays
    @MrRaiPlays หลายเดือนก่อน +3

    I saw your comment on the Mario in the Multiverse hack (which I love after I got it to stop crashing on my PC) and am wondering if you will make a video discussing on how you think it's unpolished. Your attention to detail is superb and I think your input (as well as your work here) benefits the Mario 64 mod community tremendously...

    • @KazeN64
      @KazeN64  หลายเดือนก่อน +8

      i would not want to drop a rovert roast video

    • @MrRaiPlays
      @MrRaiPlays หลายเดือนก่อน

      @@KazeN64 Is there beef between you and Rovert?

    • @KazeN64
      @KazeN64  หลายเดือนก่อน +4

      @@MrRaiPlays no

    • @MrRaiPlays
      @MrRaiPlays หลายเดือนก่อน

      @@KazeN64 oh good, thought I was missing something and struck a nerve... my apologies :)

  • @TheRealKeymaster
    @TheRealKeymaster 29 วันที่ผ่านมา +1

    Wow such an overall great explanation for a CPU and how the internal CPU cache works. This could teach kids in school a lot, it's great!

  • @hyakin7818
    @hyakin7818 หลายเดือนก่อน +11

    bro is porting gta 3 to n64 soon i swear

  • @Hublium
    @Hublium 29 วันที่ผ่านมา +2

    IMAGINE A BUS
    truly one of the memes of all time

    • @KazeN64
      @KazeN64  29 วันที่ผ่านมา +2

      dont imagine it
      see it

  • @tulip1634
    @tulip1634 หลายเดือนก่อน +8

    Melody Nosurname song

    • @vitasomething
      @vitasomething หลายเดือนก่อน +2

      omgor true :3

  • @salvatronprime9882
    @salvatronprime9882 หลายเดือนก่อน

    This is the most educational "practical programming" channel on youtube.

  • @timmygilbert4102
    @timmygilbert4102 หลายเดือนก่อน +4

    From cache to cash:
    so we have control to cache memory without cost with some constraints ? That is, complex operations can stay in cache for as long as we need before rambus meddling? Can we then use compression as a way to sink the extra cpu idling and virtually increase bandwidth and cache memory?

    • @ArneChristianRosenfeldt
      @ArneChristianRosenfeldt หลายเดือนก่อน

      I always did wonder if the barrel shifter in the Arm CPU on the 3do was meant for efficient data bit packing for LZW and Huffman . That CPU also has cache. MIPS ISA is different.

  • @Bemental77
    @Bemental77 29 วันที่ผ่านมา

    Incredible work. I've never commented on your channel, but your ability to translate here is phenomenal. You should teach.

  • @GaudyGabriev02
    @GaudyGabriev02 หลายเดือนก่อน +33

    Kaze, at this point maybe you could fix Donkey Kong 64 works without Expansion Pak

    • @michawhite7613
      @michawhite7613 หลายเดือนก่อน +13

      Ironically, I'm pretty sure this mod requires the expansion pack

    • @ssg-eggunner
      @ssg-eggunner หลายเดือนก่อน

      ​@@michawhite7613 rtyi64 doesnt normally require expansion pack, but using it does help with making performance extra stable

    • @3dmarth
      @3dmarth หลายเดือนก่อน +5

      I wouldn't be at all surprised, if he wanted to spend the time.
      If it's true that the Pak's main function in DK64 is to store cached lighting data, then Kaze could probably just optimize to the point where the N64 can render the lighting in real-time and avoid caching anything.

    • @ericlizama8552
      @ericlizama8552 หลายเดือนก่อน

      Iirc, the lighting data only needed to be calculated once, then stored for reference later.

    • @micahneitz
      @micahneitz 27 วันที่ผ่านมา

      @@ericlizama8552 storing it is slower than calculating i'd imagine

  • @theultimatetrashman887
    @theultimatetrashman887 หลายเดือนก่อน +2

    There is alot of Cache in old Source games. Once you load it in and store it, your game always becomes way faster than it was, and its only a one time thing! (atleast in there)

  • @Illeea
    @Illeea หลายเดือนก่อน +6

    17:12 what do you mean "Yet".

    • @arran4285
      @arran4285 หลายเดือนก่อน

      Every copy of this mod will be personalized

  • @mathiastoala7777
    @mathiastoala7777 หลายเดือนก่อน +2

    Peak Emanuar once again giving me the exact size video I needed to enjoy my meal 🗣️🗣️🔥

  • @Vectorspace000
    @Vectorspace000 หลายเดือนก่อน +4

    Have any of the N64 hardware devs commented on your videos?

  • @bigchungus7870
    @bigchungus7870 หลายเดือนก่อน

    I just came here to say how much I love the mario renders you do for the thumbnails

  • @Neckhawker
    @Neckhawker หลายเดือนก่อน +4

    I had hard time to understand, but it seems it's simply using CPU cache as if it was a CPU register ?

    • @ArneChristianRosenfeldt
      @ArneChristianRosenfeldt หลายเดือนก่อน +1

      Yeah, some of the more weird mode make you manually manage everything as if you use registers, but you can still use index and addressing mode for arrays.

  • @varietychan
    @varietychan หลายเดือนก่อน +2

    1:52, imagine a City... with a framerule bus, and a rambus.... vroom vroom

  • @rue04
    @rue04 หลายเดือนก่อน +4

    the url of this video ending on "SLow" tells me youtube doesnt *quite* understand your channel lul

  • @avanillagarden
    @avanillagarden หลายเดือนก่อน

    I feel like your scripts are getting better and more entertaining, I enjoy