Using ChatGPT to Optimize Mario 64

แชร์
ฝัง
  • เผยแพร่เมื่อ 10 ก.พ. 2023
  • To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/KazeEmanuar/.
    The first 200 of you will get 20% off Brilliant’s annual premium subscription.
    Patreon: / kazestuff
    🎥 / kazesm64
    🐦 / kazeemanuar
    MERCH: kazemerch.myspreadshop.com/all
    This video was sponsored by Brilliant!
  • เกม

ความคิดเห็น • 637

  • @KazeN64
    @KazeN64  ปีที่แล้ว +454

    Because this keeps being brought up:
    Dividing a signed integer by 2 is not the same as leftshifting it by 1.
    The C spec asks that any division rounds towards zero, so when you have a negative integer and you rightshift it, you'd be rounding away from zero. That's why the compiler will generate extra instructions on a /2 compared to a >>1.

    • @sedme0
      @sedme0 ปีที่แล้ว +8

      Are you avoiding ASM entirely?

    • @KazeN64
      @KazeN64  ปีที่แล้ว +82

      no, there are a few inlined assembly functions in this. the only problem with writing a lot of custom assembly is that GCC has a bug with inline assembly that causes it to omit some necessary NOP instructions.

    • @CielMC
      @CielMC ปีที่แล้ว +7

      You mean rightshift by 1? Or is the msb not on the left here?

    • @zero_318
      @zero_318 ปีที่แล้ว +1

      Unless the NOPs need to be generated/positioned dynamically, would it be reasonable to use byte directives as a substitute?

    • @TheGag96
      @TheGag96 ปีที่แล้ว +3

      So in this case, you figured rounding away from zero for negative numbers was inconsequential?

  • @usernamesareweird4880
    @usernamesareweird4880 ปีที่แล้ว +368

    At this point gaining performance is like gaining muscle mass for Kaze

  • @davidmalkowski7850
    @davidmalkowski7850 ปีที่แล้ว +1127

    This guy is the literal gigachad working on this game. What incredible gains!

    • @TorutheRedFox
      @TorutheRedFox ปีที่แล้ว +136

      mate's got both the brains and the gains

    • @thephilosophersstoned3796
      @thephilosophersstoned3796 ปีที่แล้ว

      @@TorutheRedFox They're actually deeply inter-related, the better your Body runs the better you can Think, Mental Health is a silly dichotomy that makes you think that your brain is separate from your body, the Truth couldn't be any further if it tried. Your Neurology literally dictates your personality down to the physical elements of how you choose to express yourself.
      TL;DR His Gains improve his Brains and his Brains improve his Gainz

    • @KazeN64
      @KazeN64  ปีที่แล้ว +646

      yeah i literally started working out because i knew it'd improve reaction times in video games. later i noticed that it made me think clearer and feel better too.

    • @dikkie2913
      @dikkie2913 ปีที่แล้ว +39

      @@KazeN64 you're the man, keep it up

    • @TorutheRedFox
      @TorutheRedFox ปีที่แล้ว +7

      @@thephilosophersstoned3796 it was a joke on the dumb gymbro stereotype

  • @AgsmaJustAgsma
    @AgsmaJustAgsma ปีที่แล้ว +172

    4:59 Kaze getting angry at ChatGPT is low-key hilarious.

    • @SirCaco
      @SirCaco ปีที่แล้ว +34

      "YOUNG MAN YELLS AT CLOUD"

    • @imselfaware419
      @imselfaware419 ปีที่แล้ว +4

      How is that him getting angry

    • @alfonshedstrom9859
      @alfonshedstrom9859 ปีที่แล้ว +11

      ChatGPT sometimes does act like am overly confident programming junior

  • @ttaute
    @ttaute ปีที่แล้ว +566

    OptimizeGPT: The perfect bot for saving 2 microseconds out of dividing matrices by 2

    • @NoNameAtAll2
      @NoNameAtAll2 ปีที่แล้ว +60

      you laugh, but you can divide 8 8-bit variables by 2 by bit-shifting 64-bit and masking out leaked bits

    • @Henrix1998
      @Henrix1998 ปีที่แล้ว +8

      @@NoNameAtAll2 until it contains signed numbers

    • @DanielVCOliveira
      @DanielVCOliveira ปีที่แล้ว +70

      @@NoNameAtAll2 i like your funny words magic man

    • @NoNameAtAll2
      @NoNameAtAll2 ปีที่แล้ว +8

      @@Henrix1998 other than -1 not becoming 0, you only need to copy out sign bits beforehand (bitflip of same mask) and restore afterwards (bit-or)
      4 operations instead of 2 for unsigned

    • @NoNameAtAll2
      @NoNameAtAll2 ปีที่แล้ว

      @@DanielVCOliveira let's say two 4-bit numbers
      [aaaa] and [bbbb]
      you combine them into 8-bit [aaaa_bbbb] (by reading memory location as 8-bit)
      then bitshift (x>>1): [0aaa_abbb]
      then mask out that leaked a (x &= 0111_0111): [0aaa_0bbb]
      [0aaa] is [aaaa] divided by 2
      same with [0bbb]
      works for any amount of numbers in a row, but only with dividing by powers of 2
      ---
      signed version inputs [Aaaa_Bbbb]
      stores [A000_B000]
      does the unsigned operation [0Aaa_0Bbb]
      then restores: [AAaa_BBbb] which is correct in 2-complement that all computers use
      again, works for any amount of numbers
      but dividing by higher powers of 2 needs several restorations

  • @Ghi102
    @Ghi102 ปีที่แล้ว +331

    It would be really interesting if you gave it the SM64 unoptimized code from the original game and asked it to optimize it and see if it returns reasonable ideas or see if it returns ideas you've used to optimize the code in your mod. I'm really impressed at how well it understands the code.

    • @MyScorpion42
      @MyScorpion42 ปีที่แล้ว +84

      if you have the unoptimized code and the optimized code then it opens up the possibility of training a NN specifically for code optimization

    • @98LuckyLuk
      @98LuckyLuk ปีที่แล้ว +67

      I think the problem with that is that it has no real knowledge of the underlying hardware.

    • @queazocotal
      @queazocotal ปีที่แล้ว +23

      @@98LuckyLuk This is the point in a fiction I recall where someone decided uploading a book on CPU architecture to the bot would be helpful, and things got a whole lot more asymptotic.

    • @BradenBest
      @BradenBest ปีที่แล้ว +23

      @@MyScorpion42 Code optimization is mostly just pattern matching though. GCC does a better job optimizing code than a human does because it knows all the little tricks for speeding things up on an x86 machine. All those tiny optimizations add up, which frees the programmer to focus on bigger picture things like using efficient algorithms and data structures instead of wasting time on optimizations that may damage the portability of the code. If you train an AI to optimize, the first hurdle it has to get past is being better than a human at optimizing, but I see basically no way that an AI can become better than a compiler. It's like software rendering (AI, generic) vs dedicated graphics (compiler, purpose-built).

    • @Ehal256
      @Ehal256 ปีที่แล้ว +6

      @@BradenBest gcc doesn't really do better than a trained human when it comes to individual portions of a program, the benefit is that it does a better job keeping track of a large program.
      It's not very hard to beat even the best compilers if you focus on a small part of a program at once.

  • @sheep6937
    @sheep6937 ปีที่แล้ว +73

    Tldr I'm witnessing two super computers conversing with eachother.

  • @grantsomething
    @grantsomething ปีที่แล้ว +165

    Imagine the power Kaze will have when he gets to the Gamecube

    • @hoo2042
      @hoo2042 ปีที่แล้ว +1

      😂

    • @TheKevinGDX
      @TheKevinGDX ปีที่แล้ว +1

      XD

    • @RefriedBeing
      @RefriedBeing ปีที่แล้ว +31

      When Kaze despaghettifies SSBM

    • @broodingstone958
      @broodingstone958 ปีที่แล้ว +40

      SM64 recreated on the GameCube by Kaze would look like a modern Switch title. Lol

    • @tl1882
      @tl1882 ปีที่แล้ว

      @@broodingstone958 and sm64 on wii u would look like modern pc game with rtx

  • @PlGGS
    @PlGGS ปีที่แล้ว +71

    You should tell ChatGPT to abide by the memory specificstions of the N64 and see if it changes it's answers based on that

    • @tobiwonkanogy2975
      @tobiwonkanogy2975 ปีที่แล้ว +6

      just based on this the limitations should be catalogued, recorded and fed into the GPT thread.

  • @MrAdam802
    @MrAdam802 ปีที่แล้ว +247

    Looking forward to vanilla Mario 64 at 60fps on original N64 hardware!

    • @RottenMuLoT
      @RottenMuLoT ปีที่แล้ว +24

      THIS ^

    • @maxrichards5925
      @maxrichards5925 ปีที่แล้ว +23

      He’ll make history lol

    • @ElsweyrDiego
      @ElsweyrDiego ปีที่แล้ว +13

      not only vanilla, but all mario 64 hacks

    • @The86Ripper
      @The86Ripper ปีที่แล้ว +9

      @@ElsweyrDiego Last impact was imo the pinnacle of rom hacks. Sometimes i wonder....where do we go from here? Is anything ever going to exceed the quality and fun of this hack?

    • @ElsweyrDiego
      @ElsweyrDiego ปีที่แล้ว +8

      @@The86Ripper i think the hacking tools will expand to every possible game launched. if you want do mod Quest 64 as a whole new game with new mechanics you will be able. want to remake Superman 64 with the quality of a triple A game from the actual days? you can. we must wait for this to happen. one day.

  • @xXFoiXx
    @xXFoiXx ปีที่แล้ว +212

    I want someone to use AI to get us from Assembly to readable C code. Imagine the modding renaissance we would have.

    • @avasam06
      @avasam06 ปีที่แล้ว +29

      So an AI assistant for Ghidra?

    • @xXFoiXx
      @xXFoiXx ปีที่แล้ว +28

      @@avasam06 Ghidra but more sophisticated I guess yeah

    • @ScarfKat
      @ScarfKat ปีที่แล้ว +8

      I think it can already do that. Not sure how complex of a function you can give it, but I tested it a bit ago and it actually worked pretty well. (Granted, it was with a very simple function lol)

    • @xXFoiXx
      @xXFoiXx ปีที่แล้ว +4

      @@ScarfKat It gives you a baseline most of the time. Which is fine but it is still a lot of work to get to the proper code.

    • @yami_the_witch
      @yami_the_witch ปีที่แล้ว +10

      decompiling will never be complete. there is just information loss when you compile, multiple different source codes can generate the same machine code and you will never be able to recover comments. struct's and enum's also get horribly mangled because in machine code all of the overlying structure get's removed and it just turns into one long allocated amount of data. ChatGPT pretty much only can do math and hard logic. It cannot handle abstraction one little bit

  • @avasam06
    @avasam06 ปีที่แล้ว +104

    Something you should try to do, is refine your queries and give ChatGPT more information. You can tweak it to not recommend readability improvements and teach it what it cannot do on N64. Not saying it'll magically find optimizations after that, but the responses should be more relevant.
    Another tweak could be to remove code comments since it seems to trip it up. Or literally just tell it that lines starting with // are comments and should be ignored.

    • @rareosts5752
      @rareosts5752 ปีที่แล้ว +9

      Exactly, Kaze criticizes it at some points where it's just lacking context which can be provided. So much can be done by tweaking your prompts and by preparing it with info, since it takes your past conversation into consideration.

    • @mylittleparody2277
      @mylittleparody2277 ปีที่แล้ว +12

      That's what I wanted to post.
      Chat GPT is more efficient when it had the less assumptions to make.
      So, to precise that the code will run on a N64 (or on a MIPS processor, with RAMBUS issues) would probably help with the answers.

  • @IShallRiseAgain
    @IShallRiseAgain ปีที่แล้ว +77

    ChatGPT is basically stack overflow including the inaccurate answers.

    • @LilacMonarch
      @LilacMonarch ปีที่แล้ว +25

      At least you can look through them yourself instead of having the wrong answer upvoted and the right answer downvoted and deleted

    • @atemoc
      @atemoc ปีที่แล้ว +4

      @@LilacMonarch This

    • @UncleUncleRj
      @UncleUncleRj ปีที่แล้ว +18

      ChatGPT usually doesn't talk down do you and delete the thread for "being a n00b".

    • @superresistant8041
      @superresistant8041 ปีที่แล้ว +1

      @@UncleUncleRj this

    • @Mizu2023
      @Mizu2023 7 หลายเดือนก่อน

      ​@@UncleUncleRj LOL

  • @robertwyatt3912
    @robertwyatt3912 ปีที่แล้ว +12

    It’s kind of incredible how it actually made you go like “oh! Good point.” Once.

  • @BudgetBin
    @BudgetBin ปีที่แล้ว +26

    This video was literally just for Kaze to flex on how big brained he is. I love it.

  • @Deliveredmean42
    @Deliveredmean42 ปีที่แล้ว +114

    Yeah, seems more like suggestions for recent unoptimized games than already optimized game codes. But of course it still fundamentally flawed in many cases. It might get better at some point, but it's does almost help with stuff you want some quick answered to think you having trouble finding examples (but also be wary of it gets it wrong)

    • @KazeN64
      @KazeN64  ปีที่แล้ว +105

      it is really good if your question is just knowledge based, but it's not that great if it's idea - based. i still think it does better than most human programmers in both scenarios though.

    • @Deliveredmean42
      @Deliveredmean42 ปีที่แล้ว +12

      @@KazeN64 It's very helpful for sure. Did help with some with some codes I been trying to get around learning. And as you stated, it is knowledge based. So if you want to know something that is beyond it's scope (Beyond 2021 for ChatGPT) then it won't help you that much unfortunately. It's amazing when it does know something tho!

    • @alexc4924
      @alexc4924 ปีที่แล้ว

      @@KazeN64 It's kind of neither. GPT (all versions of it GPT) is designed as a bullshit generator. It generates bullshit. Sometimes the bullshit happens to be useful.

    • @avasam06
      @avasam06 ปีที่แล้ว +19

      > it does better than most human programmers in both scenarios though
      My feeling exactly about AI tools like these: Better than most people. But not better than someone highly-specialised for a specific scenario (especially when it requires human-level understanding of context and thinking outside the box).

    • @Deliveredmean42
      @Deliveredmean42 ปีที่แล้ว +4

      @@avasam06 Yeah, hard to be as good as the legendary Kaze!

  • @NovusDundus
    @NovusDundus ปีที่แล้ว +34

    Have you tried setting the ChatGPT "environment" to expect N64 and any specific language changes with its answers?
    I've found that it works best to set up the initial question with every possible piece of info it could need to better create usable answers.
    Otherwise it just looks at code and throws everything at it regardless of any limitations

  • @omnisel
    @omnisel ปีที่แล้ว +20

    I think the issue is that even if the code is as optimal as it can be, it's programmed to give you solutions anyway. It can't say "this is sufficient and optimal, so you're good" because then it's like, come on you're not going to give me something? So, at best, it will give you solution that are an alternative, and solutions that are slower. At worst, it'll do what it did and just guess based off of other code.

  • @deadinsky
    @deadinsky ปีที่แล้ว +22

    1:55 I think it was referring to using trigonometric identities, such as sin²x + cos²x = 1, to reduce the amount of trigonometric computations. Since fast inverse square root is faster than traditional sin/cos. You’re using a lookup table anyways which makes the point moot.

    • @KazeN64
      @KazeN64  ปีที่แล้ว +28

      you can't get back the sign from the sin/cos that way so you end up using a lot of instructiosn to reconstruct it from the angle range. i did try that before and it was an improvement over 2 seperate sin and cos tables. but ever since i merged them and can access both the sin and the cos of an angle in 1 dcache access, it is faster to just get them from the LUT

    • @MichaelPohoreski
      @MichaelPohoreski ปีที่แล้ว +23

      That’s not what (1) is referring to. Instead of using TWO luts for sin() and cos() you use a *single large table* where the sin and cos table *overlap* due to the fact that cos(x) = sin(x+90)
      This was common in the demoscene (90’s) where fixed point was used.

    • @KazeN64
      @KazeN64  ปีที่แล้ว +6

      if you had huge memory constraints, you should use a LUT that's just one quarter of a sinetable and do some sign flipping math on that to get cosine and sine.

    • @MichaelPohoreski
      @MichaelPohoreski ปีที่แล้ว +2

      @@KazeN64 Indeed! In the extreme case BASIC in the 80’s used a 6 term Taylor Polynomial evaluated via Horner’s rule to calculate sin().

  • @Zant5976
    @Zant5976 ปีที่แล้ว +17

    My man's ready to fold chatgpt's code like a lawn chair.

  • @DrPastah
    @DrPastah ปีที่แล้ว +25

    For storing them in an array or struct it's probably relating to memory interleaving where the memory address is adjacent to the previous memory access thus keeping both the cos & sin in the cache of the CPU.

    • @KazeN64
      @KazeN64  ปีที่แล้ว +25

      ah, yeah they are interweaved in their look up tables to get the most of their memory accesses! one dcache access reads 16 bytes so we can get both sin and cos of an angle in a single access that way.

    • @MichaelPohoreski
      @MichaelPohoreski ปีที่แล้ว +5

      No, it is referring to instead of using _two_ LUTs for sin() and cos() you use **one large one where the sin() and cos() data over-lap** making using of the identity: cos(x) = sin(x + 90°)
      This was a common technique in the 90's demoscene when fixed-point math would either use a *power of two* for a rotation "bygree" (256 = 1 rotation), 512 = 1 rotation, or some 16-bit variation such as 4.12, etc.

    • @KazeN64
      @KazeN64  ปีที่แล้ว +10

      having one LUT where they overlap is slower. you end up having 2 dcache misses per rotation isntead of 1 like in my interweaved table unfortunately. if you had huge memory constraints, you should use a LUT that's just one quarter of a sinetable and do some sign flipping math on that to get cosine and sine.

    • @KazeN64
      @KazeN64  ปีที่แล้ว +14

      zelda does that actually, but its super slow and bad and they should use my version isntead. it doesnt save enough memory to be nearly worth it.

    • @MichaelPohoreski
      @MichaelPohoreski ปีที่แล้ว +6

      @@KazeN64 That's the problem with ChatGPT .. it isn't aware of the memory access of the N64. :-/
      And yes, you can easily reduce the trig table to be 1/4 with a little bit of "angle folding" setup.

  • @0AThijs
    @0AThijs ปีที่แล้ว +10

    Glad to see how well optimized your code is!

  • @fruitsnackia2012
    @fruitsnackia2012 ปีที่แล้ว +5

    this game will be the perfect haircut. nothing out of place. everything in optimal positions. all codes at 100% optimization.

  • @SireBab
    @SireBab ปีที่แล้ว +2

    You can always tell gpt something like "the n64 does not have access to this function, please don't recommend it in the future, retry the previous answer"

  • @0fuxGiven
    @0fuxGiven ปีที่แล้ว +1

    2:07 The first point about cosine and sine values being stored in an array was the same suggestion as using a lookup table like you just mentioned.
    If memory access from the lookup table is quicker than computing the cosine or sine on the N64 hardware it could shave off some time.

  • @alec_almartson
    @alec_almartson ปีที่แล้ว +20

    Yeah, I use it sometimes... I think it's a genius idea to use ChatGPT to ask to the C++ Optimization Questions you want, it will throw some interesting alternatives everytime.
    I use it to whenever I get unstuck when Programming new Mechanics or Modules.

  • @rareosts5752
    @rareosts5752 ปีที่แล้ว +1

    I'm glad to see you using this, you were someone I thought of when I started using ChatGPT. It's great to see you put it to use for this stuff and, not gonna lie, it's a little bit satisfying seeing you be impressed by something lol. Thanks for all the great content.

  • @Ragesauce
    @Ragesauce ปีที่แล้ว +8

    I can't wait to play your remaster of the original SM64, nothing added, only improved, the way a remaster should be. I haven't tried playing the original even though I have wanted to ever since your first video, I am patiently waiting for this to come out to finally relive my childhood game!

  • @noahheninger
    @noahheninger ปีที่แล้ว +6

    Probably one of the most impressive things about ChatGPT is that it's almost always knows what the hell you're talking about.

  • @camofelix
    @camofelix ปีที่แล้ว +2

    The ternlog point might be reverencing the vpternlog SIMD instruction in AVX512 that allows you to do different operations (and, or etc.) on different SIMD lanes within a single instruction, operating on 2 512 bit vectors in 4 cycles

  • @snared_
    @snared_ ปีที่แล้ว +6

    GPT may have been assuming a modern CPU architecture since you didn't provide any specifics. In that case, code size doesn't necessarily translate to fewer cycles, as modern CPUs can be quite complex with completing many instructions per cycle, reordering, etc. Thanks for the vid!

  • @kevintyrrell7409
    @kevintyrrell7409 ปีที่แล้ว +4

    At 9:17, according to stack overflow (from what I've been told when I asked this question on there in the past) the C++ compiler will automatically optimize any `/ 2` section into `>> 1`, since bitshifting by one is equivalent to dividing by two. Haven't checked output code to ensure if that's the case or not.

    • @KazeN64
      @KazeN64  ปีที่แล้ว +12

      this is true for unsigned integers, but not for signed integers. i think i explain that in the video too. the C standard calls for any division to round towards zero and rightshifting a negative number results in a rounding away from zero, so it has to add reg>>31 after the rightshift, 2 useless instructions.

    • @hoo2042
      @hoo2042 ปีที่แล้ว

      @@KazeN64 lol, you did mention it in the video, but even knowing exactly what you were referring to, it was quick. I can’t imagine someone who didn’t would follow 😂

    • @kevintyrrell7409
      @kevintyrrell7409 ปีที่แล้ว

      @@hoo2042 ah must have missed him saying it, had the video up on my other monitor lol

  • @Fake-pq3fb
    @Fake-pq3fb ปีที่แล้ว +7

    I think that this video speaks more to the capabilities of kaze than it does ChatGPT. Doing great man!

  • @tachiweasel489
    @tachiweasel489 ปีที่แล้ว +8

    Assuming you're using a recent GCC, GCC lowers if statements and ternary operators to the same thing in GIMPLE. There should be no performance difference between the two. If there is, that's a bug in GCC :)

    • @KazeN64
      @KazeN64  ปีที่แล้ว +2

      i'm using 9.3 currently! yeah it did make the ternary slower in that one example so that is odd.

  • @srchronotrigger
    @srchronotrigger ปีที่แล้ว +5

    If possible, when you finish optimizing everything you want from the game, it would be interesting to make this source code available, as there are several ports that would benefit from this optimization, so you can have an idea, I tryed to compile some code that you showed in the video "FIXING the ENTIRE SM64 Source Code (INSANE N64 performance)" for testing pourposes and that alone has already considerably improved the game's performance on the old 3DS port, by the way excellent work the performance of this game is getting amazing.

  • @menaced.
    @menaced. ปีที่แล้ว

    This was a great look into the code and how you approach optimization of code

  • @TenorSine
    @TenorSine 8 หลายเดือนก่อน +1

    Bro I died when it suggested using the modulus operator around 7:00

  • @MegamanEXEv2
    @MegamanEXEv2 ปีที่แล้ว +14

    Wait, did you tell ChatGPT that it was for the n64's MIPS CPU? Its possible that would change its responses.

    • @avasam06
      @avasam06 ปีที่แล้ว

      As well as the compiler being used!

  • @dikkie2913
    @dikkie2913 ปีที่แล้ว +38

    Not only is he a better coder than me, he is also ripped. :(

    • @hiddencorner
      @hiddencorner ปีที่แล้ว +9

      keep grinding

    • @cian729
      @cian729 ปีที่แล้ว +1

      not ripped. Jacked*

    • @KazeN64
      @KazeN64  ปีที่แล้ว +7

      im at peak bulk right now, ripped has to wait until after my cut lol

  • @nullset2
    @nullset2 ปีที่แล้ว +3

    Get it? He's playing the CORE music because that's where you meet mettaton, a superpowerful AI (well, it's actually a ghost, but you know what I mean)

    • @XhsTro
      @XhsTro ปีที่แล้ว

      He's not even playing "CORE" he's playing "Another Medium".

  • @e-mananimates2274
    @e-mananimates2274 ปีที่แล้ว

    Considering only one idea seemed to work, it shows how brilliant you actually are!

  • @caiocc12
    @caiocc12 9 หลายเดือนก่อน

    In 10:13 It's suggesting you check the left-most bit of a signed integer to determine if it's positive or negative and negating the number based on that, instead of calling the absi function. What it doesn't know is that the absi function (7:19) does exactly that, just in a more concise way insted of using the & operator.

  • @angeldude101
    @angeldude101 ปีที่แล้ว +2

    The operation in the absi function is really freaking clever. The signed right-shift basically copies the input's sign across an entire register, so if the input is negative, it inverts the value and then subtracts -1, which is the same as negating it, while if it's positive, it xors with and subtracts 0, which does nothing.
    [EDIT: Ignore all following suggestions. I was using a later version of gcc. On 9.3, all of the suggestions in this comment result in the equivalent output, including the alternate version given by ChatGPT when I asked it to decompile some x86 assembly of the same function.
    For the absi code, while it probably doesn't make a difference when inlining, having it return a u16 instead of an s16 seems to save 1 instruction, since it can just mask off the upper 16 bits rather than shifting twice to sign-extend the result. Amusingly, calling the 32 bit version of the algorithm saves an instruction over making a 16 bit version due to, not needing to mask off the upper bits beforehand.

    • @gblargg
      @gblargg ปีที่แล้ว

      The book Hacker's Delight is full of this kind of algorithm. It's a fun read.

  • @joesaiditstrue
    @joesaiditstrue ปีที่แล้ว +1

    7:08
    "Noooo" 😂

  • @yoshi4980
    @yoshi4980 ปีที่แล้ว +53

    did you tell chatgpt that this code was running on the n64? chatgpt probably has some basic knowledge on technical details of the n64. it would take a lot of tinkering, but you might be able to "teach" it about the n64 behavior and tweak it to produce more accurate/plausible suggestions. although based off this alone, looks like you've already gone to great lengths to optimize the current code

    • @benedani9580
      @benedani9580 ปีที่แล้ว +8

      Honestly, this. It might just be thinking up optimization strategies for PC rather than N64 hardware.

    • @fabioferreiradarosaantunes9788
      @fabioferreiradarosaantunes9788 ปีที่แล้ว +2

      Exactly! From my experience, all the mistakes it made could be explained back to it so the next answers would be far better.

    • @emilywebzone
      @emilywebzone ปีที่แล้ว +1

      My guess is that it wouldn't be able to extrapolate much about the emergent properties of the architecture, which doesn't get much documentation, and would just spit out fairly simple technical details instead. Something like making code smaller and less ram access dependent to give the RCP more access to the ram bus and speed up rendering would probably be something it just wouldn't figure out. Possibly if you explained what the bottleneck was it could come up with something like that though.

    • @benedani9580
      @benedani9580 ปีที่แล้ว +4

      @@emilywebzone Well, you are describing the problem in human language right now. Why not tell this to ChatGPT? It understands human language too.

    • @emilywebzone
      @emilywebzone ปีที่แล้ว +5

      @@benedani9580 its not necessarily a language problem its mostly a dataset problem, my guess is that there isnt a large enough sample size of n64 specific optimization techniques on the web out there for it to accurately describe how one would approach programming for it (notice most of the instructions it gives in this video have to do with modern programming architectures because thats what it has the most data on) its possible it could abstract the problem given a description of the architecture but that seems less likely than it just still spitting out nonsense mostly

  • @JesusDaLawd
    @JesusDaLawd ปีที่แล้ว +1

    Shoutout to the simple flips shirt

  • @andersama2215
    @andersama2215 ปีที่แล้ว

    The ternary operator can help with optimizations, I don't know how it'd translate onto n64 hardware but it can communicate better to the compiler that an assignment can be made branchless. The reason being is that if statements aren't typed and usually have multiple expressions that may make simplifying to the branchless version potentially difficult. The ternary is an operator which forces two expressions to have matching types, this means a ternary can be thought of as a select operation where two values are calculated and one is picked between them.
    You can conceptualize it like:
    float a = 1.0; //pretend this could all be more complicated
    float b = 2.0;
    float results[2] = {a+b, a-b};
    float result = results[(a

    • @KazeN64
      @KazeN64  ปีที่แล้ว

      unfortunately, the first version here is worse by a factor of 2 than the second one in MIPS and changing the second code to use ternary compiles equivalently to the if/else version.

  • @synonys
    @synonys ปีที่แล้ว

    I literally thought this was a great concept and you already had the video out!

  • @notarandom7
    @notarandom7 ปีที่แล้ว +4

    The Colab we didn't know we needed

  • @cobywalker3922
    @cobywalker3922 ปีที่แล้ว

    @KazeEmanuar Awesome video! At 5:00 I think it is recommending "(diff & 0x8000)" instead of "(diff < 0)". Since it's an S16 the first bit will be set to represent a negative number and the bitwise comparison may be faster to evaluate than the less than.

    • @KazeN64
      @KazeN64  ปีที่แล้ว +1

      possibly - although using the assembly routine for absi is faster than both methods anyway, so i went with that.

    • @cobywalker3922
      @cobywalker3922 ปีที่แล้ว

      @@KazeN64 Oh, very cool! Thanks for the response. I've watched almost all your SM64 videos and always share them with people to get them excited about programming and the potential innovations it leads to. 👍🏻

  • @omegapointsingularity6504
    @omegapointsingularity6504 ปีที่แล้ว

    Have been thinking bout this optimized mario 64 last few days. Nice to see its still going strong!

  • @DanielVCOliveira
    @DanielVCOliveira ปีที่แล้ว +1

    I can't wait for you to release this and challenge the ABC crew to beat the game with no A presses in an optimized engine

  • @UCs6ktlulE5BEeb3vBBOu6DQ
    @UCs6ktlulE5BEeb3vBBOu6DQ ปีที่แล้ว +30

    I'd be beyond proud if ChatGPT said my code is pretty much optimized and that it should be left as is.

  • @BusinessWolf1
    @BusinessWolf1 10 หลายเดือนก่อน

    I absolutely love these optimisation videos

  • @LostEngineProductions
    @LostEngineProductions ปีที่แล้ว +4

    I will never get used to how jacked Kaze is

  • @Hylianmonkeys
    @Hylianmonkeys ปีที่แล้ว +3

    I've been using chat gpt for so much stuff. It's very useful even if it is proudly wrong sometimes.

  • @magnus87
    @magnus87 ปีที่แล้ว +3

    It doesn't matter if today it only serves to give little programming tips, this is just beginning!

  • @kristian4805
    @kristian4805 ปีที่แล้ว

    And i am just very amazed with it understanding a more simple question like: Write me a microsoft power automate function that takes the two first letters of four words, then take first letter of second word and first letter of third word from a string variable called output and combine them without space inbetween. (image how amazed i am with all the code talk in this video then).
    And it gives me something useful, with one small error, which i tell it, and it says.. Oh yes.. it's because of....

  • @frizzlefrack253
    @frizzlefrack253 ปีที่แล้ว

    That's pretty awesome you were able to get something from it

  • @DNVIC
    @DNVIC ปีที่แล้ว

    9:47
    Actually, I might be wrong, but the ternary operator might be better in this circumstance, along with removing the 'ret' variable
    since both returns use ret, you might be able to do something like
    "return (target + current) >> 1 + ((diff1 < (absi(target + current + 0x10000))) && (diff1 < (absi(target - current - 0x10000))) ? ret : ret + 0x8000;"
    to avoid having to initialize and set the ret variable, while also not calculating it in two places

    • @KazeN64
      @KazeN64  ปีที่แล้ว

      "initializing and setting the ret variable" is done on a register level. it won't require a memory access, so it's a free operation.
      i think something went wrong with your code snippit, that snippit would skip the first check entirely if target+current didn't average to 0 or -1

    • @DNVIC
      @DNVIC ปีที่แล้ว

      ​@@KazeN64 oops, i was tired when i wrote that, i put target + current instead of target - current, but what tried to do was just copy the condition from the if else to the start of the ternary operator
      i also am an idiot and put ret in the statement even though i meant to do something like "? 0 : 0x8000" at the end to avoid having to define ret
      and i totally missed it was a register, oops. didn't even cross my mind for some reason
      think it might be slower in this, since when returning from the ternary operator, in the true case, it has to add 0 to the result, though I don't know the compiler well enough to know if it actually adds 0 to the value.
      though there's also only one return statement at the end compared to two...

    • @KazeN64
      @KazeN64  ปีที่แล้ว

      @@DNVIC i'm fairly confident the code would compile equivalently with those fixes applied to what you wrote, so it won't make a difference here

  • @NotAUtubeCeleb
    @NotAUtubeCeleb ปีที่แล้ว +2

    Great video! I would be interested to see what ChatGPT says about the vanilla SM64 code.

    • @ellaquin
      @ellaquin ปีที่แล้ว

      I would love to see that, expesially if it uses his videos to teach itself

  • @lambdog
    @lambdog ปีที่แล้ว

    we do love shouting certain people out by wearing their funny shirts B)
    congrats on the brilliant sponsor btw!!

  • @lod4246
    @lod4246 ปีที่แล้ว +9

    I could barely get it to make working python code, and it's always a gamble without coding knowledge. I have a feeling this won't go well lol

  • @StephenOwen
    @StephenOwen หลายเดือนก่อน

    Hi kaze, I’m sorry if I missed it but did you ever make a video showing the original Mario 64 running full speed with all of your improvements?
    Love your stuff and especially your style

  • @DessertArbiter
    @DessertArbiter ปีที่แล้ว +1

    April Fools video idea: "How I optimized Superman 64"

  • @drygordspellweaver8761
    @drygordspellweaver8761 ปีที่แล้ว

    I use GPT for explaining and renaming decompiled code in Ghidra. Seems to be pretty accurate most of the time.

  • @Mechanite.
    @Mechanite. ปีที่แล้ว

    I had no idea it could parse code, that's insane. I threw it some obscure script from Unreal3 and it understood it completely

  • @NoSpamForYou
    @NoSpamForYou ปีที่แล้ว +2

    Hi Kaze, been watching your videos for about a year now.
    Been looking for ideas on how to get better performance out of potato pcs, found out about Asynchronous Reprojection (aka ASW / Spacewarp or Timewarp). It is not really used for 2d monitors yet but is used in VR.
    I was wondering if you could use the technique to increase detail in Mario 64 (widescreen would probably look better since it would push the edge distortion to the peripheral vision).
    You could use it to hide frame rate fluctuations if you can't reach 60fps locked at all times, vs going to 30fps locked.

  • @Zeegoner
    @Zeegoner ปีที่แล้ว

    Super interesting video. Thanks for making it, I haven’t used it for coding yet but expect to begin trying it out soon. Just never found the need yet…

  • @alectrona2988
    @alectrona2988 ปีที่แล้ว +1

    What's the music used in the background of this video? Kinda reminds me of the X-Naut fortress in TTYD for some reason...

    • @DanGRV
      @DanGRV ปีที่แล้ว +1

      It's from Undertale, you can find it as Another Medium

    • @alectrona2988
      @alectrona2988 ปีที่แล้ว +1

      @@DanGRV Ahhh, I see. Never played that game but I know it's quite popular. Thank you!

  • @btarg1
    @btarg1 ปีที่แล้ว

    Maybe the new Bing with internet access would be even better for this? A sequel to this video would be great!

  • @stan4143
    @stan4143 ปีที่แล้ว +1

    does anyone know what the music is in the background?

  • @enerjustics
    @enerjustics ปีที่แล้ว

    Great video! What was the actual performance impact of implementing the bit shift suggestion?

    • @KazeN64
      @KazeN64  ปีที่แล้ว

      basically zero lol, we are talking about tiny microoptimizations here to save a few microseconds. if i had to guess this saved maybe 2-3 microseconds per frame.

  • @galopeian
    @galopeian ปีที่แล้ว

    GPT is such a versatile tool. I'm excited to try this out for python code optimization

  • @pauls4522
    @pauls4522 ปีที่แล้ว

    I have not used chatgpt yet, but if its possible maybe have chatgpt someone further optimize your game models to use fewer polygons on the screen at once.
    Or use chatgpt to leverage a better method of compressing the textures.
    I'm not sure if its possible on N64, but maybe try to modify the code with binary space partitioning, so that areas not seen on screen are not rendered when you are not physically able to see those areas.

  • @SerErris
    @SerErris ปีที่แล้ว

    The cos/sin thing was to not calculate it really (and esp multiple times in a function), instead calculate a lookuptable for cos/sin (actually need only one table as they are 90 degrees different) and then do a lookup in the table (array) instead of calculating it.
    That was very common practice esp. in the time when CPUs could not even do multiplications and it was a very slow process to calculate cos/sin. Not sure if this is still faster today with modner CPUs.

    • @KazeN64
      @KazeN64  ปีที่แล้ว

      I have a lookup table - and I use an even better implementation than what you are suggesting here - I'm putting sin/cos of the same angle right next to each other so I can access them in a single dcache miss. I don't think this is what it was suggesting though.

  • @AndyGoth111
    @AndyGoth111 8 หลายเดือนก่อน

    7:18 Very nice absi(), I like!

  • @bi3lmobile2023
    @bi3lmobile2023 ปีที่แล้ว +1

    Wow!!! Perfect! I love tour channel kaze emanuar :) i play your games in my n64 emulator :)

  • @beefnuts2941
    @beefnuts2941 ปีที่แล้ว

    It might help to be more specific with chatgpt requests. "optimize" could be taken to optimize it for memory and not speed. massaging these AI prompts to get better output is an art in of itself.

  • @Zalied
    @Zalied ปีที่แล้ว

    it would be interesting to use the original functions, you are optimizing your optimizations but i wonder how well it would have been as a starting point for the base unoptimized code

  • @zeronecool
    @zeronecool ปีที่แล้ว

    How were you comparing performance after saving the code, before running? Was it the instructions size after compilation? Any links on this method? Im looking to make performance gains on math functions for Sega Saturn, but having a hard time comparing functions.

    • @KazeN64
      @KazeN64  ปีที่แล้ว +1

      I kinda "just know" how long everything takes on the N64 so comparing just the instructions in the compiled output is enough to figure this out. I was using the map file to compare sizes before and after.

  • @satibel
    @satibel ปีที่แล้ว

    I think it likes ternary operations because modern cpus have branch prediction and conditional operations and may behave better with a ternary (converted to a conditional set) than an if (converted to a conditional jump) though gcc might already optimize ifs with one line to that.
    for approach angle you might want to try
    register s32 delta = target-current
    return current - delta>>1 + [instruction in the if, replacing target - current with delta] ? 0 : 0x8000
    not sure it'd be faster on the N64, since unless the compiler optimizes it, you're adding regardless of the state of the if.
    also you should try saying that this is for the N64 or specify that it's for a mips III processor, might get better results

  • @maritoguionyo
    @maritoguionyo ปีที่แล้ว

    Could explain for which processor and which compiler you are using to chatgpt

  • @The_Mister_E
    @The_Mister_E ปีที่แล้ว

    Maybe ask the AI to write code that takes the constraints and optimizations of the VR4300 in mind.
    Your prompt can be as long as you like, so if you give it the nitty-gritty it would try to keep it in mind.

  • @xdmon1220
    @xdmon1220 ปีที่แล้ว

    thats already pretty crazy as a tool, i wonder how much better when gpt4 goes public, i remember they said its gonna happen in early 23

  • @CesarRodriguesdeOliveira
    @CesarRodriguesdeOliveira ปีที่แล้ว

    The thing with chatGPT being confidently wrong is because how it's trained with positive reinforcement. When it gets something right, it gets reward. If it says something wrong or just says it doesn't know, it gets nothing. But, if it says something wrong and the person evaluating it doesn't its wrong and rewards it, that teachs the AI to bullshit its way into getting rewarded instead of just saying it doesn't know. There's a video on Computerphile's channel that goes in detail about this.

  • @joshp3446
    @joshp3446 ปีที่แล้ว

    Humans: *Have a Computer issue*
    Humans: *Go ask ChatGPT
    ChatGPT: *Has An Issue*
    ChatGPT: *Asks Kaze*

  • @WIIRULESMAN
    @WIIRULESMAN ปีที่แล้ว

    I understood none of this video but still enjoyed it thoroughly.

  • @alkenstein
    @alkenstein ปีที่แล้ว

    After it's given a response, you can ask it to give more ideas for the same function.

  • @defenastrator
    @defenastrator 6 หลายเดือนก่อน

    I'm not sure that chat GPTs suggestion is slower for the angle diff. It's solution is branchless which is quite often faster. Particularly because a modulo of a power of 2 will be optimized to a bit mask by the compiler.

    • @KazeN64
      @KazeN64  6 หลายเดือนก่อน

      branchless doesn't matter much on the N64 because we have no branch predictor. branches work with 1 delay slot but we already have something useful to put into it

  • @Waldoe16
    @Waldoe16 ปีที่แล้ว

    We need a KazeGPT lol! Can it be asked to chatgpt best practices for programming in certain bottlenecks scenarios? Or to test and compare certain codes for speed?

  • @the_kovic
    @the_kovic ปีที่แล้ว

    Can you explain in more detail how keeping the variable ret being calculated and stored before the if/else statement is faster than calculating it on the spot in each of the branches? My naive assumption is that if we presume that storing ret in memory (so that it can be invoked later) costs X extra instructions, then it should be faster to calculate it in place instead. The code will get larger because we duplicate the calculation instructions per branch but at runtime, it should end up faster because we only ever go through one branch at a time and we spare the X extra instructions.
    So where is my naive assumption wrong? Is it perhaps a cache thing where smaller code is always strictly better?

    • @hoo2042
      @hoo2042 ปีที่แล้ว

      I think the last bit is what he was referring to when he said it would be more instructions. He’s said in other videos that code size *is* a huge factor here

    • @KazeN64
      @KazeN64  ปีที่แล้ว +1

      ret won't be stored in memory because the compiler has enough registers to work with - calculating it early costs nothing extra. but computing it in 2 spots would increase the total code size so there'd be slightly more cache misses.

  • @liquidsnake6879
    @liquidsnake6879 ปีที่แล้ว

    nice that it can at least provide interesting answers, definitely something you could use when you're in a rut to replace the old programming duck, and unlike stackoverflow ChatGPT won't insult you for asking it questions lol

  • @diggoran
    @diggoran ปีที่แล้ว

    I like that you used a rubber duck for the thumbnail because that's what this was really, just a bot to prompt you to think about your own code in more detail.

  • @diggoran
    @diggoran ปีที่แล้ว +1

    9:31 Why would this be a bad idea? The calculation might happen in two places in the code, but on each call of the function it would only happen once (the if and else are mutex). I'm not sure how the assembly looks currently but you could potentially skip the intermediate variable assignment by inlining the ret calculation in both the if and the else.

    • @diggoran
      @diggoran ปีที่แล้ว +1

      I guess you might be worried about the time to load more instructions into memory even if those instructions aren't executed?

    • @hoo2042
      @hoo2042 ปีที่แล้ว

      @@diggoranyeah, that’s exactly it

    • @KazeN64
      @KazeN64  ปีที่แล้ว +1

      yes, loading more instructions is exactly what i'm worried about. loading an instruction takes as long as executing 7 to 8 instructions on the n64.

    • @diggoran
      @diggoran ปีที่แล้ว

      @@KazeN64 wow, I never would have guessed it would be that bad

    • @danielpope6498
      @danielpope6498 ปีที่แล้ว

      @Kaze Emanuar wow, I knew memory access was slow but not THAT slow, that really does require a whole different way of looking at code to optimize.

  • @whuzzzup
    @whuzzzup ปีที่แล้ว

    Could you show it the unoptimized versions of those functions?

  • @mathematicallywilling
    @mathematicallywilling ปีที่แล้ว +61

    In my opinion, when it comes to artistic forms (including video-game development)
    Man-made > AI-made
    Keep it real Kaze, this is what's so beautiful about what you do!

    • @garfreld
      @garfreld ปีที่แล้ว +2

      Video games arent art, the visuals and sounds we put on top of the games are art but the actual game part isnt.

    • @Trimint123
      @Trimint123 ปีที่แล้ว +7

      It depends on what games we are talking about. And if it's for optimizing a 20 year old game, it's worthless.

    • @Trimint123
      @Trimint123 ปีที่แล้ว +22

      @@garfreld Video games *are* art, mate. Look it up.

    • @Koutsie
      @Koutsie ปีที่แล้ว +7

      @@garfreld nice bait.

    • @dikkie2913
      @dikkie2913 ปีที่แล้ว +7

      @@garfreld trolling at it's finest

  • @freddywondercat1362
    @freddywondercat1362 ปีที่แล้ว +2

    If anyone knows, please tell me what the name of the song is that's used in this video.

    • @Chaseroni
      @Chaseroni ปีที่แล้ว

      I have been trying to figure this out myself, it’s so familiar!!!

  • @IanZamojc
    @IanZamojc ปีที่แล้ว

    In your matrix function at the beginning, could you not reduce the number of multiplications by multiplying dest[0][0] * -1 to get the dest[1][0] result? Or even use the pre-multiplied dest[2][0] * dest[0][1] * dest[1][1], etc? Basically just reuse your pre-multiplied values.

    • @KazeN64
      @KazeN64  ปีที่แล้ว

      GCC does that by itself already so that won't do anything.

    • @IanZamojc
      @IanZamojc ปีที่แล้ว

      @@KazeN64 Yeah, I wasn't sure. I know you've had to make some strange code decisions to coax the compiler to optimize a certain way.

  • @Rotzahna
    @Rotzahna ปีที่แล้ว

    maybe you could feed it all the technical documentation for the n64 so it could think of new stuff :D