Prime React: Fast Inverse Square Root - A Quake III Algorithm

ThePrimeTime

มุมมอง 173 828

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 17 ม.ค. 2025

ความคิดเห็น • 332

@ultradude5410 ปีที่แล้ว ⁺⁷⁵⁷
On the topic of long and long long, if you want to see the best error message ever, try to declare a variable as a long long long, and compile with GCC.
“Long long long is too long for GCC”
@vibaj16 ปีที่แล้ว ⁺⁸²
too long with that attitude maybe
@ladislavdobrovsky8826 ปีที่แล้ว ⁺⁷
shame that existence of long double float type prohibits this to be extended to double long, tripple long, quadruple long, etc ;)
special type is long cat long int, which would have googol bits
@Designed1 5 หลายเดือนก่อน
my favorite beatles song is Const Long Long Long
@theondono ปีที่แล้ว ⁺³⁹⁸
I’ve had to write code that stores 10-12 constants into specific registers to get a RAM memory unit to work. Why those numbers? No one knows. Truly, no one.
I asked an Application Engineer of the company building the chip, no clue. He contacted the “expert” on that reference. No idea either.
I got the numbers from a random yt video of *another different chip*. Not even the guys who designed the chip know, because that module is a bought IP.
I had to explain all of this in comments and face a code review
@ThePrimeTimeagen ปีที่แล้ว ⁺¹³¹
YES!!! that is so funny
@mithrandirthegrey7644 ปีที่แล้ว ⁺⁵⁰
Welcome to my life as a firmware developer. Half of the time I have no f'ing clue what I'm doing. I was always one of the top students STEM classes but after 10 years of Embedded Firmware development I am humbled. There are some really freaking smart people out there and no matter how hard I work - I will never get to their level. These hacks that seemingly come out of nowhere just seem natural to some people. It's like my brain is running on an old Pentium 3 and they have a fully decked out i9 processor with 64 Gb of RAM.
Edit: I had a colleague who studied biology in college and was miles ahead of me in hardware design, electrical engineering and programming (I studied computer engineering). He just immediately understood anything he read. He'd read it once and completely grasp everything. Guy went on to become a bitcoin multi-millionaire. I'm not even jealous - he deserved it. The guy was a galaxy brain.
@assetaden6662 ปีที่แล้ว ⁺¹³
@mithrandirthegrey7644, so true. When you see those people operate on another level, you get humbled quickly. And they do it like its not a big deal. Like they were asked to do 2+2. You can ask them questions, they will provide answers and it will take a long time for you to decipher what they meant.
@ildar.ishalin.chelovek 2 หลายเดือนก่อน ⁺¹
// i store these constants here in order to calm the machine spirit of the RAM
@theondono ปีที่แล้ว ⁺¹⁹⁵
Btw, numbers with fixed fractional part do exist and are super common on DSPs, where you want decimal numbers but you can’t really take the performance hit of float numbers.
@ThePrimeTimeagen ปีที่แล้ว ⁺⁵⁴
oh, very interestning
@Krmpfpks ปีที่แล้ว ⁺¹⁰
Back in the day when I was coding assembly there were never enough floating point units and we often did part of the calculations with fixed point to balance the load on the integer arithmetic and the floating point arithmetic units. Don’t know if this is relevant today.
@Winnetou17 ปีที่แล้ว ⁺⁶
Or the inaccuracy, I assume
@AJMansfield1 ปีที่แล้ว ⁺³
DSPs are also the reason that signed overflows are undefined behavior in C -- since on DSPs, operations are usually _saturating_ meaning that an out-of-range result gets replaced with the min/max value automatically in hardware instead of overflowing to negative values.
@theondono ปีที่แล้ว ⁺⁶
@@AJMansfield1 kinda but also no. C is *very* old, and when the foundations were in place computers were still in the "bespoke project" stage. This meant that lots of processors would not use 2s complement, so defining integer overflow would penalize them.
The C committee weren't specifically thinking on DSPs, but they're the ones that have pretty much survived.
@ExpertOfNil ปีที่แล้ว ⁺⁷⁹
I love your enthusiasm and appreciation for these subjects
@ThePrimeTimeagen ปีที่แล้ว ⁺²⁴
hey ty!
@anderdrache8504 ปีที่แล้ว ⁺⁴⁶⁵
This is a nice algorithm but remember: the simple 1 / sqrt(x) is faster on todays CPUs
@Winnetou17 ปีที่แล้ว ⁺⁵⁰
Interesting. I assume that the division got sorted in the newer CPU instructions, but what about the sqrt ? How is that handled faster ? I don't know, but I don't think there's a CPU instruction to compute the sqrt...
@anderdrache8504 ปีที่แล้ว ⁺¹⁵⁶
@@Winnetou17 Actually, there is now! For example on x86, there is the sqrtss instruction. Languages like C/Rust will compile down to it when using the sqrt function from their standard libraries.
@Winnetou17 ปีที่แล้ว ⁺¹⁵
@@anderdrache8504 Nice!
@Zzznmop ปีที่แล้ว ⁺⁵⁸
Also don’t forget to remind the quake devs at the time that this algorithm didn’t exist
@ruyvieira104 ปีที่แล้ว ⁺³³
I wonder what happens if you replace that "fast inverse square root" with 1/sqrt in Quake's source code.
@jeremykothe2847 ปีที่แล้ว ⁺¹²⁵
The "tragedy" of int not having a defined size was both a feature and a bug. It was originally meant to allow easier porting between machines with different word-sizes. It was intentional. In hindsight... yes it was indeed a mistake that we've been correcting for decades.
@DatBoi_TheGudBIAS ปีที่แล้ว ⁺¹
the tragedy isnt being able to define how many bits we want to use for integers and floats
imagine me in my infinite wisdom, defining a 2048bit number just cuz i want to store something like the 1000000th element of the fibonacci series
@briankarcher8338 ปีที่แล้ว ⁺⁴
@@DatBoi_TheGudBIAS You can do that. It's just more work and extra CPU cycles. Think about how the Score works in Mario for example - the NES was only an 8-bit processor.
@thechosenone729 9 หลายเดือนก่อน ⁺³
Did you ever hear the Tragedy of Darth Int the Wise?"
"No."
"I thought not. It's not a story the JSdevs would tell you. It's a C legend."
@agatonspik 4 หลายเดือนก่อน ⁺¹
Was actually useful from time to time by the ISO standard to allow for char
@martijn3151 ปีที่แล้ว ⁺¹⁶¹
I tried this in our game engine and profiled it. It wasn’t that much faster anymore. CPUs have evolved since this was released and I’m pretty sure the math lib uses those improvements. One thing that I did notice though, was characters suddenly dropping through floors because of the imprecision of the algorithm. So what was fast was me reverting using this code.
@OmegaRC59 ปีที่แล้ว ⁺²¹
Yeah iirc there's several operations past this part of the algorithm to improve the precision a bit more
@framegrace1 ปีที่แล้ว ⁺¹⁴
They used this as you see in the video exclusivelly for the software rendering. The extra Genius of this algorithm is that you can adjust precission by looping around newton's regression, so when calculating hits or player positions they had a more accurate version.
@martijn3151 ปีที่แล้ว
@framegrace1 tried that, added a few iterations, all to no avail. The imprecision was still noticeable. And the code wasn’t faster. It worked in the past, but not anymore.
@thecreature7808 ปีที่แล้ว ⁺¹
Hardware-level instructions (see rsqrtss for inv sqrt with 11 bits of precision) rendering this pointless aside, software-level implementation are also prone to human error - example here, long read of a 32-bit float (write followed by larger read) results in a store-to-load forwarding failure which causes it to take an additional 10-11 (architecture dependent) cycles - essentially artificially slowing the algorithm down.
@NithinJune ปีที่แล้ว
lol
@Winnetou17 ปีที่แล้ว ⁺¹⁰¹
If I'm not mistaken, it was John Carmack that added that fast inverse square root in Quake 3, but what's important this time is that he didn't invented it. I mean, it's important because some people attributed that to him, and while he's a genious, it wasn't in this case. Still, way above the average coder for knowing or finding about this.
@dealloc ปีที่แล้ว ⁺²²
It probably wasn't even Carmack that added the code according to himself. At least he doesn't remember doing it. That said, he's still a genius for other things he did in 3D space with Doom, Quake and even Commander Keen back in the day that we take for granted today.
@NinjaRunningWild ปีที่แล้ว ⁺⁵
Dave's Garage does a video series that tracks down the source. It's not Carmack.
@sebastos- ปีที่แล้ว ⁺⁷¹
IIRC HolyC does what you described with the data types; instead of short, long etc it's just i16, i64, etc.
@demolicous ปีที่แล้ว ⁺⁴⁶
HolyC wins again
@EmiNNsoNify ปีที่แล้ว ⁺⁷⁴
About that log ... the log function is the inverse function of the exponential function which means that we can simplify 2^e-127 to only e-127 if we use log(2). This is the reason of using log in base 2 and because it is all part of an equation we also get a log to the other part of it which they cleverly replace with an estimation. And about the change of position of E that was one of the possible ways to remove the division from M while also having them grouped together.
Unfortunately the video creator in his decision not to go deep on the math makes it sound way more complex than it is.
@majoolwip9513 ปีที่แล้ว ⁺²¹
I think we can cut the C designers some slack on the "int" not having a fixed sized. Computer Word sizes were in constant fluctuation so having a type that was equal to the word size of the machine it was on made since for a compatibility perspective. The real tragedy was that when word sizes went from 32 to 64 bit. They decided to break the pattern and not make int equal 64 bits and keep it at 32 bits because the jump in memory was so big it would make your programs use way too much memory if you suddenly made every int 64 bits long.
If they could have imaged the downstream effect of double the memory usage of a program every time you increased the Word size, maybe they would have not gone with the design. But then again, C was so popular because of how portable it was. If int wasn't the same size as a Word, then maybe C wouldn't have had the staying power it has.
@majoolwip9513 ปีที่แล้ว ⁺³
Also, "long" and "long int" mean the same thing as all variables in C are defined as int by default so you can just write "long" and it will still give you a "long int". Same goes for "short"/ "short int"
@majoolwip9513 ปีที่แล้ว ⁺⁸
Actually, the real type naming tragedy is Window's Api since it was written when the word size was 16 bits so for 32-bit integers they have the type DWORD. DWORD is still in use today and still means 32 bits even though on 64-bit systems a "double word" would be 128 bits.
@NoX-512 ปีที่แล้ว ⁺¹
In modern languages you have usize which is an unsigned int the size of a pointer.
@sourestcake ปีที่แล้ว
@@NoX-512 You do also have size_t and uintptr_t in C.
@NoX-512 ปีที่แล้ว
@@sourestcake True. The difference is they are not native types.
@shapelessed ปีที่แล้ว ⁺¹
Funny how people think one char is one byte, where UTF-8 can take up to 4 bytes per character due to variable length encoding.
@Yupppi ปีที่แล้ว ⁺⁵
Part of why this whole inverse square root is so fascinating is that it became attributed to John Carmack, who by his own words doesn't really know anything about it, someone else did it. The maker of the video is aware, but I think it's still a very popular question to ask Carmack about. The interview of Carmack by Lex Fridman was something else too, to hear how Carmack as a young student/fresh developer just went regularly to spend his days on the library to study coding books and math. But he's a very special person so hard to set the standard up there.
Someone in the chat said he hates when people worship bad code like this. Sure it's bad on today's standards when you can do approximately anything with negligible performance hit until it grows massive, and your language and environment provides mad libraries and functionalities so that you don't have to implement much, but this more like represents the creativity of extremely limited tools they had back in those days.
Where this was the only way to make a pioneering software to work and coming up with that borderline illegal hack just represents the minds of engineers, game devs and developers back then, they were some serious shit if they got anything successful done. They had a completely different state of mind, I feel like often times people in the past were way creative with science. Of course only the most significant of them were left on the pages of history so it's probably an unfair comparison thinking "why am I not that brilliant", there are still some guys who are. But I feel like today there's so much less time to think freely during your studies or work and there's plenty to busy your mind in between, there's not that much motivation to let your mind fly and explore, but not as much time either.
This is probably the best explanation video, but even this goes slowly at first explaining some basic stuff and then just fast forwards so many math things that are not obvious at all unless you deal with serious math all the time. I don't know if it's the part where the creator realized that he doesn't really much understand it or that he didn't understand enough to explain it, or where he just got out of focus and thought "this is gonna take ages if I comment these 5 random looking operations to get to the result so I'll just say the refactoring is obvious". And the viewer is left with "wait what you said you're gonna explain this to me and now there's a completely different formula on the screen than we started with". Like "you wanted to know how this works so you understand what they went through and so on? Lol I'm gonna spare you from details, I'll just say that inverse square root hack works". The informed viewer is always a math professor.
@fennecbesixdouze1794 ปีที่แล้ว ⁺¹⁶
Nobody at id came up with this. The IEEE 754 sqrt trick was invented by mathematicians William Kahan (Berkeley, very famous numerical analyst) and J.C. Ng (Sun Microsystems).
But to be clear, the portion actually used in the Quake III source code is completely and utterly pedestrian bit-fiddling. Nothing about it is at all remarkable or novel, these are incredibly well-trodden ideas, even for an undergrad course in numerical analysis, let alone research-level work.
Greg Walsh heard about it from mathematician Cleve Moler, who got it from Kahan and Ng.
This is what mathematicians are helpful for: for Cleve Moler this is an absolutely trivial, pedestrian application of theory and techniques that go far above and beyond this.
So yeah, this is not an example of some game programmer thinking really hard about how to calculate something, this is about community-spanning interactions between mathematicians, who think really hard and come up with brilliant ideas with broad usefulness, and practical programmers who are willing to listen to and work with mathematicians to learn those things.
The lesson in this story is: pay attention to mathematicians. Hire them yourselves if you have the money. Work with and communicate with them.
@silitome3086 2 หลายเดือนก่อน
Seethe and cope + yap
@patocarrasco6266 ปีที่แล้ว ⁺⁶
thanks for bringing this video back to me again. I forgot how many times I've watched this, but the more I watch this, the more I enjoy this. It reminds me how much I'm looking forward to enrolling in a computer science course
@Takyodor2 ปีที่แล้ว ⁺¹
There are 10 kinds of people.
01) Those who know how to do some quick math in binary.
10) Those who don't.
@Sealedaway ปีที่แล้ว ⁺²
There are 10 kinds of people:
- those who understand binary
- those who don't
- and those who weren't expecting a ternary joke
@jamespong6588 14 วันที่ผ่านมา
Lol 😂 but Doubt you LL get more than 10 likes brother
@ogpurpledaddy ปีที่แล้ว ⁺⁸
14:27 its for this very same reason why I got into computer science. I love to nerd out on the most obscure nerd-intensive critical thinking for the sake of optimization at the lowest level I can get it.
@soniablanche5672 ปีที่แล้ว ⁺⁸
In Javascript, the "evil floating point bit hack" would be the equivalent of manipulating an ArrayBuffer via a Float32Array then looking up the values of that ArrayBuffer with a Float64Array. The data is the same but it doesn't represent the same thing.
@Celastrous ปีที่แล้ว ⁺⁹
Lol as an Electrical Engineer, I'm so proud of the IEEE standards like 754. I really love the work that was done before my time..
@not_ever ปีที่แล้ว ⁺¹⁷
The word mantissa triggers me. I did not have a fun time calculating floating point numbers by hand.
@ThePrimeTimeagen ปีที่แล้ว ⁺⁵
its just the best
its just the worst
@vibaj16 ปีที่แล้ว
mantissa
mantissa
mantissa
@not_ever ปีที่แล้ว
@@vibaj16 Noooooooooo my safe space has been violated!
@dog4ik ปีที่แล้ว ⁺¹⁶
You don't need math for programming... YEA
@ThePrimeTimeagen ปีที่แล้ว ⁺¹
oops ;)
@spicynoodle7419 ปีที่แล้ว ⁺¹
I practice data structures weekly and have yet to encounter a non-linear data structure in web dev.
@stevelk1329 ปีที่แล้ว ⁺⁶
I still think one of the most elegant inventions in computer science is two's complement. It's just badass.
@Rudxain หลายเดือนก่อน
True. But I still get OCD thinking about the fact that there's exactly 1 extra negative state than the count of positive states. This happens because `0` can't be "centered", it must share the same state-space as all other unsigned ints, as binary (by definition) can only provide an even count of states. This doesn't happen in ternary, because it's odd, so `0` can be "centered"
@stevelk1329 หลายเดือนก่อน ⁺¹
@Rudxain You're right, but it is cool that arithmetic just works. Perhaps it's an argument for zero being positive.
@Rudxain หลายเดือนก่อน
@@stevelk1329 That reminds me of Dual Numbers, where epsilon is the only non-zero number that satisfies `x^2 = 0`
@MikkoRantalainen 8 หลายเดือนก่อน ⁺¹
If I remember correctly floating point operation was a single FPU instruction but it took A LOT of CPU clocks to complete. Quake (the first version) did one FPU division operation at the same time as CPU was doing 16 multiplications and 16 additions. 486 was way slower to compute the floating point division which is why it was really slow to render Quake graphics. Square root is even slower operation than division in FPU. One could argue that original Quake was running on two cores on a single core CPU (the CPU and FPU could compute things at parallel on Pentium CPU).
The fast square root algorithm does division + square root using one bit bitshift operation, three multiplications and do additions (identical to substraction with the sign reversed) so it was about 5 times faster than a single division on FPU, never mind the square root operation.
@romaing.1510 ปีที่แล้ว ⁺¹¹
I understand it can be difficult for programmers, but for someone who is math educated, it is totally possible to come up with this algorithm.
The idea of working with logs comes from the way floats are represented in memory, and the fact that exponents and logs simplify.
Computations steps are a no brainer for the trained mind.
log(1 + x) has a famous approximation around zero.
The idea of offseting the line to minimise the mean error (or another error) is a natural optimisation question that arise.
Newton's method is smart but well known in numerical analysis.
I really think it could be posed as an exercise to some students and several would come up with the same algorithm.
You should see the algorithms used in computing the probability densities of some very common probalistic laws, they are full of neat tricks and in my opinion more impressive !
@jesusperezcuarenta ปีที่แล้ว
Agreed. Just pick up a real analysis textbook and have a day trip with all kinds of funky approximations.
@emmetallen5685 ปีที่แล้ว ⁺⁴
When the time comes for me to conduct interviews, and I hear someone mention IEE 754 or Mantissa... instant hire. Converting float point numbers using IEE 754 by hand during one of my classes has scarred me.
@theodorealenas3171 ปีที่แล้ว ⁺⁸
My most fun school project was making a recommendation system. I had to unwrap the math myself. I just wish I'll do something like that for a living one day
@dangdrjay3011 ปีที่แล้ว
What language did you make it in?
@theodorealenas3171 ปีที่แล้ว
@@dangdrjay3011 Python. It's definitely the wrong choice but I don't know how to make a dynamically linked library.
@shanewalsch ปีที่แล้ว
@@theodorealenas3171you can check it out on internet lol. Cherno has tutorial for c++ if im not mistaken
@theodorealenas3171 ปีที่แล้ว
@@shanewalsch I looked up sources but didn't find any. Okay. That's cool I guess.
@shanewalsch ปีที่แล้ว
@@theodorealenas3171 wdym by sources?
@NathanHedglin ปีที่แล้ว ⁺⁷
Matrix broke. I was rewatching the Quake video then switched to Primeagen's stream to see the video again!
@ThePrimeTimeagen ปีที่แล้ว
So good
@Exilum ปีที่แล้ว ⁺¹
I'm pretty sure I watched the original video like 2 years ago. I did not remember one bit of the explanation except there was some statistical trickery.
@oakley6889 ปีที่แล้ว ⁺²
I love this video, watched it like 2-3 times before and then through prime.
I learnt this stuff in my alevels and degree, and it still amazes me how people came up with it. From IEEE 754 to the algorithm as a whole, makes me want to be a better engineer
@ThePandaGuitar ปีที่แล้ว ⁺¹
lmao can't believe this is only a tiny tiny fraction of the program, imagine all the shenanigans that went in 😂
@scottiedoesno ปีที่แล้ว
I was lurking while working during this stream. Thank you for putting it up here so I could actually give it the attention it deserves!
@Naegimaggu 3 หลายเดือนก่อน
28:05 This is a humbling point. I majored math on my Bachelor's and just following the math required some thought nevermind actually having the mathematical intuition to realize from the starting point that this path is going to lead anywhere. The Forgotten Developer at id Software is such a beast.
@itsjustboarsley ปีที่แล้ว ⁺⁸
I’ve never used JS TS or any of the typically web-oriented technologies. Thankfully right out of college I got into embedded and real time dev. I enjoy the embedded / rt world a lot.
@efaile3431 ปีที่แล้ว ⁺⁵
Even LaTeX uses fixed-point fractions because they are independent of the underlying hardware and thereby every document would look the same *everywhere*
@Nitiiii11 ปีที่แล้ว ⁺²
Floating point numbers are completely standardized in IEEE 754 and are also independent of hardware since the late 80ies.
@Rudxain หลายเดือนก่อน
I wish CSS and SVG did this too. For SVG, it would make more sense if calculations were algebraic, as that would guarantee "infinite" (not "arbitrary") precision
@egor.okhterov ปีที่แล้ว ⁺¹⁷
I’ve read a classic book of Andre Lamothe on gamedev when I was in school and he described how to use fixed point numbers instead of floating point numbers, because they are much faster.
In the next edition he removed these chapters because CPUs became fast enough with floats 😀
@NinjaRunningWild ปีที่แล้ว ⁺¹
They're not faster, but they're sufficiently fast enough to forego the extra complication of using fixed point math.
@Rudxain หลายเดือนก่อน
They are also more accurate for calculations where the order of magnitude is constant (or approximately constant)
@t6hp ปีที่แล้ว ⁺⁴
Oh yeah, I'm doing a two-year degree, what we call diploma here, and I did study a lot of that stuff. Didn't do any real thing with it though. Nice to know it can be used for such things. It was taught to us in a way that gave a feeling that it's only ever used if you're creating an OS or programming a refrigrator.
@frydac ปีที่แล้ว ⁺²
The size of short/int/long is platform dependent, I remember reading that was like this so if you use an int it was relatively fast for numeric operations.
@ThePrimeTimeagen ปีที่แล้ว
yes, you should always use size_t (cpp) or usize for the bestest. no extra operations needed. the entire atomic memory region is used for computation :)
@frydac ปีที่แล้ว
Also as someone working in embedded audioprocessing and working on a audio codec, we are also focused on single bits, and representing/storing groups of numbers as efficient as possible, lookup run length encoding as an entry point if interested. Also understanding floating point and the operations and their influence on precision is just painful and something we have to deal with on a regular basis, fyi the same set of floating point operations in the same order can give different results depending on compiler/settings, always fun times. And these difference can get quite big in certain audio filters (e.g. IIR filters that basically do recursive math), so certifying integrations of audio processing libraries, or making sure something is not regressing can be time consuming ;)
@valentinrafael9201 4 หลายเดือนก่อน
19:29
That’s a standard multiplication by the identity ( 1 ).
He just wrote 1 as 2^23 divided by 2^23 because M was divided by 2^23 and now you have the same denominator which results in that 1/2^23. M*1 = M so it’s still M / 2^23 but because we also multiplied by the identity, now we have a 2^23 at the nominator as well.
@doglitbug ปีที่แล้ว ⁺¹
"I saved a bit" The Y2K bug would like a word...
@natenatters ปีที่แล้ว ⁺⁸
14:55 "I saved a bit" highlight of their career 😆
@vibaj16 ปีที่แล้ว ⁺⁶
when saving a bit is saving a lot
@NoX-512 ปีที่แล้ว
If you’ve ever made games on 8-bit computers, you know how important saving a bit is 😂.
@throwaway3227 ปีที่แล้ว ⁺⁴
There was a reason for not using Rust style integer types. Int was the architecture variable size, so 16 bits on 16 bit architecture, 32 bits on 32 bit architecture... All others are relatively sized to int. Then, when 64 bit architecture came they just felt that not knowing the size of your data types could be problematic, so they stuck with the 32 bit types. It's pretty shit 😅
@dwculp ปีที่แล้ว ⁺²
I wanted to make a correction about when Quake was released and CPU speeds of the time.
Quake was released in mid-1996 with a minimum CPU requirement of a 75Mhz Pentium CPU.
Some have gotten it to run on slightly slower CPUs.
@Sealedaway ปีที่แล้ว
This is Quake 3, not the first Quake.
@dwculp ปีที่แล้ว
@@SealedawayOMG. I totally missed that.
@swozzlesticks3068 8 หลายเดือนก่อน
I remember the first time I saw the thumbnail for that video and thought "the inverse of a square root is just squaring.. how did he improve multiplication?"
Computer scientists absolutely abusing the term inverse here.
@poutineausyropderable7108 ปีที่แล้ว
As a physics and Computer science major, I pretty much am forced to learn how to build a computer from scratch.
And The log trick is just really basic taylor expension table. The kind of stuff you use 10x per assignment.
All I need really is electrical engineering/chemistryand I could make my own Transistor.
Back then, they weren't programmer. They were COMPUTER SCIENTIST.
@matthewread9001 5 หลายเดือนก่อน
8:50 ah yes I remember freshman year sitting in Computer Org calculating converting floating point to decimal by hand on paper.
Or the next year in assembly class writing my own “multiply” and “divide” addition/subtraction loops.
Actually in Assembly class I wrote out Multiply/Divide/ForLoop/BubbleSort/ETC in separate txt files and just copied them in any time I needed. Helped a lot.
@alexsimper4153 ปีที่แล้ว
about the naming of integers, it's funny cause there's a c standard library for integers that actually use numbers to specify the integer size
@SilaDrenja ปีที่แล้ว ⁺¹
EDIT: rereading my comment I made a couple of mistakes. long isn't necessarily half the processor word, int is, long, depending on the compiler can be half or the whole word, aka sometimes sizeof(long) == sizeof(int) and sometimes sizeof(long) == sizeof(long long)
6:50 in C++ you have it by names to simplify, as if you use "long" etc you don't care about the size, just the specs (as long should be half of the processor word, long long is the whole processor word). Meaning scaling between architectures, you're not limited by the fixed sizes. Don't forget C++ is being compiled for microprocessors also. If you wanted to compile for an 16bit microprocessor, and you had int32_t you would need to do slow calculations as the processor only natively supports up to 16bit values. as such, in arduino, a 32bit processor, sizeof(int) == 2, and sizeof(long) == sizeof(long long) == 4. (not that on x86 sizeof(long)==int==4 and sizeof(long long) == 8). This guarantees the flexibility of the code between the architectures, especially for the libraries.
If on the other hand the size of the variable is of importance inttypes.h provides handy types like int8_t, int16_t... up to int64_t, and on some compilers even int128_t. This is good as you know the exact size of the variables, but on the other hand is less scalable between different word processors.
@huge_letters ปีที่แล้ว ⁺⁹
Im a noob to coding so I'm not sure I got the explanation right. Do I understand correctly that on the line with the first comment we take the bits representing a float value of y and tell the program to interpret those bits not as a float, but as a long?
@egykorsoszo4302 ปีที่แล้ว ⁺²
yes
@ThePrimeTimeagen ปีที่แล้ว ⁺⁴
very good! exactly!
@kacperrutkowski6350 ปีที่แล้ว ⁺¹
About that 'magic' and thinking about how stuff happens - there is quite a lot of it if you're writing some exetremely low level code, which usually is some HDL (probably Verilog) (due to instructions usually being already quite complex).
@Hazanko83 ปีที่แล้ว ⁺²
I love some of these comments in the feed...
one guy thinks it leads to security issues, and another hates it when people worship ''shitty code like this''.
Hate to be that guy, but just because you don't understand something doesn't mean it's inherently dangerous or a bad idea.
It's an architectural-specific(if I remember correctly) optimization that was a huge gain from the ''naive approach'' - and while confusing - the alternative would have been losing out on a ton of extra performance. This ''trick'' is basically useless today, as it's usually replicated behind the scenes automatically for you via compiler optimization or CPU opcodes. SOMETIMES, making optimizations means making things more confusing and ''overcomplicating'' things(although we always try to avoid that if possible).
@junkyardmonkie 11 หลายเดือนก่อน
It’s so funny. My background before becoming a dev was in Biology and chemistry. After studying things from animals to cells to atoms, you get used to drilling down infinitely because there is always more information.
@MarcCastellsBallesta ปีที่แล้ว
There's a "documentary" a dude made trying to find out the author of this algorithm. He emailed Carmack & others, he traced their sources until an older source. I can't remember who it was or the name of the video. It was a 2 part documentary.
@Thect ปีที่แล้ว
Primeagen just pulled out a plate of cookies from the void, in the middle of someone explaining a super complex algorithm.
@peroyhav 6 หลายเดือนก่อน
I would never have discovered the log hack for making an initial guess for the fast inverse square rootalgorithm on my own given only one year to find it, but I did actually discover newton aproximation without having learned about it in an assignment I did in colledge, when we learned basic programming and had an assignment to take one number to the power of another, the professor of course was intending us to accept integer values only, but the interface we were given was double to the power of a double, and dumb me was of course not in the lecture when the assignment was given, so I solved it for decimal powers as well.
@technikchaot ปีที่แล้ว
last time I thought about float was last week. Because I wanted to determine how likely it is that my code will have a problem with bad luck on random numbers. And if you add to a realitive big float number a very small random number between 0 and 1 I don't add that much random bits, because the exponent will stay the same after the addition and the mantissa will only change in the last few bits, because the first bits represent the value of the way bigger old number. So the problem was how likly is it to get two different random numbers generated that result after they got added to the bigger number in the same new number, just because they where to close to each other.
@kakterius ปีที่แล้ว
Long long and types like that are not named as sizes because those languages typically use the register sizes, and that used to change a lot more
@oskrm ปีที่แล้ว ⁺⁵
3:33 Sure, Prime, it was "farm animal" robots
@ThePrimeTimeagen ปีที่แล้ว
sh....
@QayLikeKay 9 หลายเดือนก่อน ⁺¹
obviously C is lacking unsigned float
@shawn576 10 หลายเดือนก่อน
I love how the entire thing gets derailed by the mustache comment.
@untodesu ปีที่แล้ว
7:45 actually, these types in C have only a minimal length, char being at least 8 bits, short at least 16, int as well at least 16, then long is at least 32 bits and dlong is at least 64 bits, so names make sense... That is why long is different size on different toolchains
@raykirushiroyshi2752 10 หลายเดือนก่อน
My university lecturer spent like half a lecture on this topic(including story telling about his years of young and playing videogames)
@techwithattila ปีที่แล้ว ⁺⁷
Watching it the second time yielded the same result.
I still don’t understand it.
@ThePrimeTimeagen ปีที่แล้ว ⁺²
yes... still confused :)
@JP_Hatecrew หลายเดือนก่อน
I watched it at least 5 times and its still confusing^^
@SEFcool ปีที่แล้ว ⁺⁶
Prime second channel finally
@ThePrimeTimeagen ปีที่แล้ว ⁺¹
welocome
@RicardoSilva-hk2er 11 หลายเดือนก่อน
Wait, how the hell did you do fast squirt inverse root again?
@t3dotgg ปีที่แล้ว ⁺²
I do not stand for this stache slander
@ThePrimeTimeagen ปีที่แล้ว ⁺²
It's funny though :)
@DexterMorgan ปีที่แล้ว
This is a mustache safe space for all.
@jesse2667 ปีที่แล้ว
14:35 😂Some Dude: if you drop the 1! I saved a bit😂🎉
@LagMasterSam ปีที่แล้ว ⁺¹
I thought they whole int, long, char, byte, long long thing was for allowing architecture dependent sizes. In other words, an int on one platform wasn't necessarily the same size as an int on another platform.
@SilisAlin ปีที่แล้ว ⁺¹
You’re correct on that one.
@arbitrarycomplexity 9 หลายเดือนก่อน
I implemented this in marlin 3d printer firmware once to see if it was close enough (and a hell of a lot faster) for a delta printer
You could hear the approximation as a buzz and vibration. And it was cumulative so ultimately it didn’t work.
@markhaus ปีที่แล้ว
Isn’t the magic number in the fast inverse square function derived by using the Newtonian method? Or am I misunderstanding
@mcawesome9705 ปีที่แล้ว
6:30 for me it's that there's char, short, int, long, and long long
and none of them have a well-defined god damn size
it's only ever "well, it'll be _at least_ this size" (char: ≥8 bits, short: ≥16, int: ≥16, long: ≥32, long long: ≥64)
this is especially true for int and long. _sometimes_ int will be 16 bits instead of 32. and _sometimes_ long will be 64 bits instead of 32.
as far as I'm aware, the others are more consistent.
I'm so glad they eventually added fixed-width integer types (c++11)
@EdwardBeech ปีที่แล้ว
"let's look at paul allen's mustache" ahaha
@sinom ปีที่แล้ว
Since 1999 c does just have numbers with int8_t up to int64_t with a bunch of other variants depending on what exactly you want
@venc0r ปีที่แล้ว ⁺¹
this blows my mind every time I see it, it's at least the 3rd time :D
@smnkumarpaul 8 หลายเดือนก่อน
I think Short was in C to make it compatable with 8-bit microporcessors. We don't use short beacuse standard GCC compiler is a 16 bit one. A 16 bit microporcessor can perform 16 bit addition in one cycle, for 32 bit it requires 2 cycle which is long and for 64 bit it requires 4 cycles which is long long.
@spinloki ปีที่แล้ว
I wonder why the code uses weird pointer casting instead of a union?
@Geert2682 ปีที่แล้ว
In C, the 'weird pointer casting' is actually idiomatic. It's not too bad when you get used to it.
@MrOboema ปีที่แล้ว
@2:03, hehe you see this code and the first thing you notice and focussing on is the curse word? 🤣
@Merssedes ปีที่แล้ว
Few words about int sizes in C++.
* sizeof (short int) is 2
* sizeof (int) can be 2 or 4 (arch dependent)
* sizeof (long int) can be 4 or 8 (compiler and arch dependent)
* sizeof (long long int) is 8
@NdxtremePro ปีที่แล้ว ⁺¹
Long longs are great. If you want to add the length of a long to a long long, it becomes a long long long, and if you want to add a long long to a long long it becomes a long long long long long.
This is a perfectly obvious and natural thing to happen.
@ThePrimeTimeagen ปีที่แล้ว
its like a const * to a const... except it grows forever
@emilfilipov169 ปีที่แล้ว
We add 2 because it's 5 had me raving in the living room like a maniac 🤣
@carriagereturned3974 ปีที่แล้ว ⁺¹
yeah, this one is golden. I remember i was playing Q3 Arena. Approximation? It was perfect. It was FAST.... super FAST (pentium mmx 200MHz). I even... did not know how fast it was, that is how fast it was. There were no fraps shit back then, it was just FAST.
@toxic_icecream ปีที่แล้ว
With C int is confusing, because it says its either 2 or 4. You use short for it to be always 2 and long for it to be always 4. I still don't understand how it can be 2 or 4 at the same time
@yjlom ปีที่แล้ว
long isn't always 4 and short isn't always 8
char - at least 6 bits, must be the smallest object size supported (almost always 8 bits in practice)
short - at least as large as char
int - at least as large as short
long - at least as large as int
long long - at least as large as long
so in theory, even int long long can be only 6 bits
it isn't different sizes at the same time though, the compiler just gets to pick whatever size it wants it to be
@Sealedaway ปีที่แล้ว ⁺¹
It's not 2 or 4 at the same time. The size of int is implementation-defined, so whichever implementation of the C language you use decides whether it's 2 or 4 (or sometimes even 3 on a couple old exotic systems). For most personal computers these days there are pretty solid standards, so an int is pretty much always 4 bytes. But for other kinds of systems it's really helpful to have the platform itself pick out whichever size is fastest/most efficient on it. If the range of numbers is important to specify, the C standard library has typedefs for specific sizes in stdint.h to make sure you always get exactly 1, 2, 4, or 8 byte integers. It's a bit outdated and not exactly intuitive for beginners, but there are good reasons for why it works like this.
@martijn3151 ปีที่แล้ว ⁺¹
09:13 that idea isn’t terrible, it’s called fixed point arithmetic and has been used for ages before CPUs got dedicated floating point units. Try doing floating point arithmetic on a game boy or PlayStation1. Yup, that’s right a PS1 performed all arithmetic using fixed point. Fixed point is incredibly fast, since it’s just integer based operations. It’s severely limiting, but it had (and still has) its use -cases. So I definitely wouldn’t call it terrible.
@sourestcake ปีที่แล้ว
The FreeType library also uses exclusively fixed point math. It's used by many programs and platforms to render fonts.
@edwardecl ปีที่แล้ว
So guys, how do we do this without squrting?
@kevingonzalez2120 ปีที่แล้ว
Carmack didn't come up with this solution, its based on a paper by William Kahan
@Andrew90046zero ปีที่แล้ว ⁺¹
I would not be surprsied if some of that code was pulled from a book or somthing at the time. I mean, I could be wrong. But what I've learned about the programming world is that most of the mathematical innovation that gets turned into code is usually at least 10 years old by the time It's used. Like some actual Newton level person focused all their time on a specific thing, published it in a book and then was lost to time. And then some Bro-skis from the 90's read it in a book and use it. And then they probably continute to read these books for more info. It also requires a lot of skill and understanding, but all the hard math that was shown in the video was most likely done by some other guy who wasn't a game developer.
@mage3690 ปีที่แล้ว
Longest type definition I've ever written in C "unsigned long long int", for some reason.
@winters-rp ปีที่แล้ว
Topics like this are always a humbling experience for remembering not all of us are pea brains, maybe one day i'll stop using vscode long enough to understand this.
@GammaStyleGaming 6 หลายเดือนก่อน
The Seven bridges was Euler, not Gauss :)
@maddsua ปีที่แล้ว ⁺¹³
By the way, C has proper number types like int32_t, uint64_t and so on. All of that stuff is defined in stdint.h
@ThePrimeTimeagen ปีที่แล้ว ⁺¹⁷
yeah, but that came WAY after long long
@maddsua ปีที่แล้ว ⁺⁹
@@ThePrimeTimeagen I love when standard C types are not so standard when compiling for different CPU's.
Just like with AVR: float is 4 bytes, but int is only 2
@SoKette ปีที่แล้ว ⁺³
Yep, char, short, int and long are different on different plateforms. C want the code to be portable and stay performant. You don't want to assume the size of words and registers, otherwise only 1 platform benefits from using correct types and all the other get crippled performance-wise and get to use wayyy more memory than necessary.
@isocuda ปีที่แล้ว ⁺³
@@ThePrimeTimeagen you mean it took a long time or a long long time for that?
@CaptainWumbo ปีที่แล้ว
@@SoKette yep came here to say this. That's why there's fast and least types.
It does mean you have to know what you're deploying to if you get messy with testing for overflow or things like this (and the compiler can remove your tests due to UB for extra fun and games).
@krazymeanie ปีที่แล้ว
Ah your second channel finally found its way into my algorithm. Congrats 🎉🎉
@jzmmm ปีที่แล้ว
those id software guys were geniuses
@NithinJune ปีที่แล้ว
8:55 Just took an assembly course and it’s not really that hard to implement multiplication in assembly 😅 just loop addi instructions
@gregorymorse8423 ปีที่แล้ว
Square root is 1 clock cycle in modern hardware. Its a unary operator with fast converging formulas
@TeoremaJohn 9 หลายเดือนก่อน
I still wonder about that 5 to this day
@Lazlo-os1pu ปีที่แล้ว
John Carmack said on the Lex Friman pod that he never actually did this and it’s been mis attributed to him
@hextav ปีที่แล้ว
After watching this entire video, I can confirm I remember nothing.
@Turalcar ปีที่แล้ว
27:03 I'm pretty sure you mixed up Gauss and Euler
@mrKreuzfeld ปีที่แล้ว
That log trick is really old, from the days they had to do this stuff by hnd
@RedHandedBug ปีที่แล้ว ⁺²
That box art is $$
@ThePrimeTimeagen ปีที่แล้ว ⁺¹
i agree, LETS GO CANDY

ต่อไป

เล่นอัตโนมัติ

The Truth About The Fast Inverse Square on N64 | Prime Reacts