this is completely incomprehensible. you don't actually understand how to teach. you're rambling and scribbling things that have literally nothing to do with the data you're presenting. everything in a lesson should help to understand that lesson. this is like explaining something in a loud cafe on a napkin, except you've recorded it. your sheer incompetence at your chosen occupation is admirable.
Many years ago when designing the Sheerpower programming language for business applications, we spent a ton of money (over $100K) on this exact problem. We ended up with a data type called "real" with integer and fractional components located in their own memory locations. The hard part was making the runtime performance fast. Once done, it has been enjoyable never worrying about all of the FP pitfalls that you very well explained. In fact, this is the best explanation and clarity I have ever seen! Thank you.
@@simondev758 Fixed point using separate memory locations to speed up things like "convert to an integer" where one just clears the fraction part memory location... no calculations required.
This sounds like "integer" format (with number of bits twice the number of bits in your word length) scaled by 2^(-n) where n is the word length. Why not use double-word integers?
In the 70's and 80's we called floating point computer math: "floating point approximation". Someone in marketing dropped the word "approximation" sometime over the years.
When we designed a language for PLC use in 1984, the language didn't have "Compare Equal" for the REAL (floating point type), but a "Compare Tolerance", with an explicit tolerance argument provided (as described in video). Many customers were confused at first, until they realized that "measurements" are not exact and need to be treated as approximations everywhere. I was young and inexperienced at the time, but the boss were old school veteran in analog computers, sensor technology and much more, so he insisted "no compare equals for REALs. It is not possible!".
@@niclashShould typically be two tolerances, one relative and one absolute. The fun part with subnormals is they have variable relative precision, but their absolute precision remains the minimum available, so with both tolerance checks they don't need special handling.
My favorite floating point hack is that 7/3 - 4/3 - 1 will always give you machine epsilon. I don't quite remember how, but I found a comment in the depths of stackoverflow that claimed it worked regardless of programming language, OS and computer. As long as it's using the IEEE standard it works.
Oh, that makes sense! A third is like the perfect middle step between powers of 2, so the mantissa is all ones. But 7/3 has a one greater exponent than 4/3, so it's "missing" a decimal digit that's presumably rounded up. The difference between them cancels out everything except 1 and the least significant digit of 4/3, making it 1 + ulp. What a cool trick.
It's called rounding. The rounding mode in IEEE defaults to round to nearest even. So your trick only works in some rounding modes. Meaning your condition of IEEE is incorrect. And not understanding the trick is in rounding is a blunder.
I think this is why Sun did that big push to evangelize interval arithmetic. It basically covered for all the imprecision of floats by simply treating them as fuzzy intervals. Things like == comparisons are now interval overlap checks and operations that make the error worse actually make the intervals grow. You basically avoid a lot of these headaches by just assuming that error will always be there and developing your arithmetic around that assumption.
Which incidentally is only of the few non pitfall way of using floats. Keeping an error counter and controlling the interval manually. It's a PITA doing it in C thou
...or do it in integer & know the "error" is zero. The problem with floating-point fuzzy scheme is that the error builds with the number of chained computations, which the math doesn't know about. Of course, if you stack irrational (such as trig) computations, this error appears no matter what the number representation scheme. Floating point disease was so bad because as soon as it was introduced, everybody "had to have it" & it was boasting point for computer manufacturers. It was so much that computers had ONLY that format, even when working with integers. Early desktop HP computer calculated 2^2 = 3 (it used logs to compute exponentials).
@@coopergates9680 Yes they did. HP made a desktop computer in the 1970's that had ONLY floating point format. (When loop counters & other integers were needed, the computer internally TRUNCATED to integer. That's what caused the "2^3 = 7" problem. (I had to add a "+ 0.5" to any exponentiation calculation to get 2^n iterations of the loop.) I guess this "simplified" the machine as there was only ONE type of variable, of a fixed size. Remember in those days a lot of the math was done by dedicated HARDWARE. It is simpler to have fixed-size fields in memory. Most of the calculators also used this format, not changeable. "JavaScript" in 1970 had something to do with coffee & writing, & nothing else.
My "favorite" thing about floats is that float operations are nonassociatve. That is, (a + b) + c need not equal a + (b + c), and same for multiplication.
My rule of thumb with programming using floating point numbers to just assume that two floating point numbers are never equal. The only time a FP is equal to another FP is when they were obtained by copying. FPs can be compared as "less than" or "greater than" as a sort of "inside/outside" check, with "equals" case being implicitly bundled with either one of those two.
It's good never to rely on them being equal, but it doesn't solve all your problems. Like, 50 billion and one is bigger than 50 billion, but if X=50,000,000,000 and Y=50,000,000,001, then Y>X will return false.
@@dylangergutierrez Hence it's a percent error issue, it's more like abs(X/Y - 1.0) < 0.000001. We all know that bug in old Minecraft when the player is far from the origin lol
@@dylangergutierrez 50B + 1 is larger than 50B, if the former can be stored. Otherwise the result of the addition is 50B, and you would be comparing 50B with 50B.
My boss recently told me a story of a game he once worked on. If you left it running for about 28 hours or so, all kinds of weird shit would start happening. Like the rendering would break completely, certain things would stop moving etc. The reason was that certain things in the game kept some kind of on-going timer. This was usually a timer of accumulated delta times and in the range of seconds. Turns out that after the amount of time mentioned above, these accumulated timers got so big that a delta time of 1/60 was no longer large enough to affect them in any way, thus they froze entirely. It's basically one of the floating point issues you mentioned in the video. This specific bug never got a proper fix, just a workaround, which was to simply pause the game on inactivity.
Incremental stuff should always use a char (byte), short, int, or long, and every once in a while it's fine to convert that millisecond or nanosecond figure into a float of seconds. Given that a double has more significant figures than a 32-bit int, if a timer goes far enough to lose this much resolution in a double, it's ticking stupidly far anyway and should be reset or redesigned.
@@coopergates9680 Yeah, switching to integers and using millisecond delta times in general is one of the proposals my boss had to fix this problem for good. Just tedious and dangerous to do in an already existing game, so it's something we'll likely be doing for future games.
Since the time interval is constant (1/60) you should use fix point instead of floating point: use integers and count the number of 1/60ths seconds, i.e. the least significant bit is interpreted as 1/60 of a second. With a 32 unsigned integer you can then run it for 828 days (add more bits if needed!)
It kind of makes sense that floating point values don’t play well with equality, because the real numbers are infinitely divisible. In the real world, when you’re comparing things, you are always working to a certain degree of precision. The only way for two objects to be the exact same length would be for them to have the same number of atoms, which is an integer comparison.
If anyone is having problems with floating point precision errors, consider switching to fixed-point. 32-bit fixed-point might not give enough precision for most problem spaces, but 64-bit fixed-point would and is an easier data structure to deal with as a lot of the precision errors become predictable.
@@SomeStrangeMan All well and good and practical, but I hate when people use "epsilon" (like, from analysis) to mean "really small number". The point of epsilon in analysis is that it's the _arbitrarily_ small number.
@@somestrangescotsman Yeah, but the _name_ of it is obviously a mistaken reference to epsilon from analysis. And epsilon in analysis really means "the smallest number you can possibly imagine, except for zero", sort of like an inverse infinity. It doesn't have an "actual value" that you could in principle write down, whereas the Matlab eps' whole point is to be an actual value, so it's really inappropriate to name one after the other.
Years ago, I recall reading the specs for a Java3D library and I think they had a 256bit fixed point library. IIRC, you could represent Planck lengths in the same model as the observable universe. Though I imagine there would be performance costs for that, with 32 byte numbers. A spacetime coordinate system would use 128 bytes for 3 space and 1 time coordinates. Or even just homogeneous space coordinates.
@@Islacrusez I have notes and stuff jotted down, but I kinda go with what I get excited about at any given time. I was happy to dive back into graphics a bit the last few months.
Unfortunate you haven't made this video on May. I've been learning this for my final examination in university. It's always better to watch someone explain it this way than reading a bunch of papers. Keep up the good work, currently I've seen all of your videos.
12:27 A perfect example of this in effect is Minecraft (specifically something called the farlands on with wiki), back before there was a world border. I'd encourage to go check it out! It's really interesting and works wonderfully to visualise these floating point errors in action.
I don't know if you have ever been a lecturer, but I can tell you you are really good, and all people who have had you around are very lucky. Great coverage, great dissecting of the subject, extremely well presented. Thank you, subscribed just from one video. :)
Never been a lecturer, but I've spent a lot of time as a mentor because apparently I'm good at that. These videos are a great way for me to work on collecting my thoughts into a more cohesive form and working on my presentation skills.
The fundamental problem with FP arithmetic is that Real numbers are not natural fit for binary computers. There's no way to directly map values with moving decimal point in a register, since the register has fixed length, without accumulating large errors. That leaves you with the fixed point format option, where you have to choose between limited range or limited precision, but not both. The convoluted way FP arithmetic is implemented in the binary logic constraints makes it possible to have both cases (range and precision), at a cost of added complexity and a thick book of rules/limitations -- the IEEE-754 standard -- that historically made high-perf FP hardware implementation even more expensive.
The fundamental problem with FP arithmetic is that Real numbers are not natural fit for computers. It doesn't matter what base you're working in. 1/3 is unrepresentable in base 10, since it's 3.33... repeating. You will run into this issue at some point. You have a countably finite space in any case and you need to cram in an uncountable infinity. There are more reals than there are integers, to cover *any* subspace exactly is impossible. Even if you had to exactly represent the space [0.00001, 0.000011] you would undoubtedly have to use either FP or fixed point and in either case lose a lot of precision. What FP does do is provide acceptable precision in the vast majority of cases through the observation that small numbers we work with often have smaller differences between them.
It's fun to think that floating point units were so complex the first processors didn't even have them and you'd have to use a coprocessor that was often larger than the processor itself (like the intel 8087 that had almost double the amount of transistors the 8086 had) and today we have GPUs that have thousands of FPUs in a single die
@@smlgdat some point we are going to have to ditch digital computers and use analog voltages. Just look how terribly inefficient is Machine learning on GPUs.
real numbers are, by definition, not fit for COMPUTERS. the definition of computation and computational problems requires that ALL inputs must be representable with a finite sequence of symbols. otherwise, it literally is not computation. the real number set (or any continuous subset of it) is not entirely representable with finite symbols. however, nothing stops us from picking a few real numbers and sticking some labels on them. and that's what floats are. (smartly ordered) labels for a (smartly picked) finite subset of the real numbers. in case you're wondering "hey, but what if we could have infinite inputs?", that's called hypercomputation. good luck with that.
Hint for business programmers: Use integers if you're dealing with money. (Some languages support a "numeric" data type, which is nothing but an integer with an implied decimal point.) But avoid floating point for monetary values!
A minor niggle: What is described as a “mantissa” here is really a significand, one which is linear within the range allowed for a given exponent value. Mantissas, as in log tables, are logarithmic. If the exponent in a floating point format were represented as a binary fixed point (so that the usual significand would no longer be needed), the fractional part of the exponent would truly be the mantissa (and in the language of log and antilog tables, the integer part of the exponent would be called the “characteristic”). (Watch out for negative exponents, since the mantissa still has a positive sense in log tables. For M=0.113943, C = [−1, 0, 1, 2], 10^(C+M) yields [0.13, 1.3, 13, 130].)
Yes, it really is the significand, but in adopting mathematical techniques into computer engineering, the word mantissa was used, and became the defined word.
Small correction: a number like 1E-9 usually means 1*10^-9 or possibly 1*2^-9 (I'm not 100% sure on rhat one) but 1*e^-9 is something different entirely (e=2.718... is Euler's number)
The capitalization of the "E" doesn't matter in floating-point literals. "1E-9" and "1e-9" both mean "one times ten to the minus ninth power" (though the capital version is preferred to avoid confusion). Euler's number is represented in an entirely different way, depending on the exact programing language in question (e.g. "M_E" for C and most of its descendants).
I really appreciated this video, I have thought about it on and off during the last week, thanks for quality content. I'm always excited for your new videos. Keep it up Simon!
If you can use integers instead of floats without overcomplicating the program, do it. For example, the currency is better handled by integers. Just store 995 cents instead of 9.95f dollars, and convert to dollars only to interact with user. That's the best way to avoid all these issues. Also, one issue not mentioned in the video is that these errors like 0.01f + 0.02f accumulate, if you do thousands or millions of operations on a floating point variable the error may become quite substantial. Again, use integers instead, if it's feasible. I know there are libraries that help to deal with fractions. It's not a bad alternative. Just keep in mind that integers are native type, and calculations on integers are much much faster than on any non-native type.
The problem with Floating Point representation (IEEE 754) is that we're basically trying to force a base 10 number into a base 2 representation. As such, compromises have to be made in order to reduce both computational and memory complexity. Another way of looking at it is using scientific notation: You can describe pretty much any rational number through scientific notation, but the number of significant figures generally increases both complexities on a linear scale. You can bound the complexity by limiting the significant figures, but this leads to a loss of information. Once we had excess memory and computational resources, things like Java's and SQL's decimal for more accurate but memory and computationally more expensive representation.
Whenever I can, I use integers instead of floating point. I just pick a smallest unit, e.g. 1mm, and count how many of those I have in all my measurements. If you work in 64 bit integers, you have enough range to cover a whole lot that way.
That's what KiCad (tool for designing PCBs etc) does. Uses 32 bit integers with nanometer increments and you get a reasonable upper limit of just over 2m for the PCB size.
@@simondev758 Yes, well, it *is* fixedpoint. That is all fixed point is: Choosing a minimal unit that is some specific fraction of your base unit and counting how many such fractions you have.
The issue is the propagation of error. That works if you have to do a few (in computer terms) operations, but if you have to do many, like in the simulation it's one of the worse approaches. Each multiplication, for example, produces a loss of precision "identical" to truncate. It's simply not viable for problems of modern scale.
@@jaimeduncan6167 It is actually exact, that is the whole point of using fixed point. Floating point has a lot of precision problems, but when you are counting a specific number of your minimal units, you are simply counting an exact number of those units. There is no error and thus no error propagation. You have to accept that whatever you are counting is quantized, if you are doing a flight sim, your planes will be snapped to a 1mm grid (or whatever minimal unit you decide to use). As long as that is fine, you have no error and no error propagation. With floating point you do get an issue of error propagation which makes many operations much more complicated. Like the video says, you can't directly compare two floating point numbers for equality. If you are adding an array of numbers, you have to sort them by exponent and add the smallest ones first, before adding larger ones. If you don't, adding a number with an exponent 53 higher than a smaller number will make the smaller number vanish with no effect (the whole mantissa is too small to have an impact). In a summation of many numbers, that small number could have made a contribution if it had been added to an only slightly larger number first and thus been propagated up to the big numbers. This means that the order of addition is important in floating point, you lose commutativity. Without the ability to swap the order of summation freely, a lot of algebra is lost as well, making many other things much harder.
Have you encountered unums / posits? They're an attempt to redo floating point in a way that reduces these problems quite a bit. Obviously they'd need hardware support to be fully performant, but it's possible to implement them for accuracy testing purposes (e.g. Julia has an implementation), and they do extremely well.
For using floating-point numbers as a black box, intermediate calculations should have double the precision of the final result; tests for equality should pass if two numbers are within "epsilon" times one of the numbers (it doesn't matter which you choose) or absolutely the smallest normalized number in the target precision. These, of course, can be given as defined constants. If you actually want to _understand_ floating-point computation, IEEE is not a good place to start. It's great for a standard to put into microchips. But, for learning, a good starting place is to represent sign, exponent, and mantissa as integer values (fixed point) in their own right, so that, by implementing them, you see how you are handling rounding errors.
I’m no computer scientist nor a mathematician, just a casual web dev… but I’ve never understood why floating points is the norm and not rational (like rubys rational class). I get that we can not represent all numbers as rational (because of irrational numbers like pi obviously) but many problems with floating points would be spared. Like the 0.1 + 0.2 == 0.3, in rational 1/10 + 2/10 = 3/10. I guess I’m going deeper in the rabbit hole. Great video!
Hah! Happy to have increased the amount of confusion! I didn't read into the decision process itself, but if I had to guess, to me floating point is a better tradeoff as a general purpose data type with it's massive range compared to fixed point.
The reason (as far as I can tell) is because floating point is insanely simple to implement in hardware, and is extremely fast. I'm not certain, I should probably use the internet to find the answer
Rational datatypes often suffer from representing the same value multiple times. If you're using two 16 bit integers to store the numerator and denominator, then you have 65,000 ways to have 0, 32,000 ways to have 1/2, etc. This can cause problems with comparisons, overflow, etc. So most implementations I've seen simplify rationals into their lowest unique value, which decreases performance and requires prime factorization after each calculation. But you're still left with massive holes in your datatype, and you've slowed down all your algorithms tremendously anyways. Floating point represents a much wider range of values, with higher precision for small values, and there are no duplicate values to deal with (with an asterisk for NaNs and on systems where subnormals are truncated to 0).
Research how it's implemented in the hardware. The hardware limitations give rise to software limitations. This key understanding of hardware is the difference between programmers and computer scientists. With that said, 1/10 and .1 are the same.
@@AnarchistEagle 1e+1 and 10e+0 is also the same and still no problem to understand. They simply get both converted to 0.1e+2 or 10. Something like that could also be done here by converting everything to an integer and a power integer. So 0.032445 would get converted to 00000032445 and -6, 134.31 to 00000013431 and -2, 12300000 to 123 and 5. But i think speed is the key. Maybe mathematical operations aren't that fast with this format.
This is why I like the Decimal type, it stores the integer portion and the decimal portion as 2 integers so there are no precision errors. Especially for money, you can't tell people it may spawn or banish because floating points are weird.
I work in FinTech and we always use a library for these reasons. C# has type Decimal but for JS and Go we use open source libs. IIRC Shopify was the base we built off of. And remember fractional numbers in JSON are Doubles so most(all?) Decimal libraries serialize to/from string.
To visualize the "gap between 1 and 2 is cut up into parts of size 1.19*10⁻⁷)" that's like measuring a distance of 1m with a precision of 12µm (micro meters) = 0.012mm, which is a tad smaller than a thin human hair.
Around 4:01: When you are working in binary, you probably shouldn't call the point "decimal point" or the places "decimal places". If you do, it's just very confusing. Just called them "point" and "places".
A cool way to see the approximation nature of floating point is to do a Mandelbrot Zoom with 32-bit floats, eventually you'll see the image become pixelated and your "continuous" zoom stutters and ultimately stops.
Wow, thanks! I've known about scientific notation, binary, integers, and significant digits for a while; even supported scientific compute where these problems come up; but with the underlying algebra you have shown us exactly why, no more blind attribution to intuitive real and binary conversion errors...
this is very helpful, in a piece of code I wrote recently, I kept running into this issue where when trying to calculate percentages made 3/10 into 31% with the ceiling function, and couldnt figure out the issue, I will try to reimplement it with this in mind and update how it goes in the edits later today
When comparing whether two floats or double are equal, i always use a percentage-wise tolerance, that is suitable for the application. like A is within 0.999*A
I'm honestly surprised how rarely this has actually given me trouble. I know some languages offer types like decimals to go absolutely sure, but I believe I never actually had to use one. Most problems fall into a "if it's roughly right, it's fine" category after all. The only case that's regularly important for me is to use epsilon to check for equality. I usually use a pretty big one like e-4 since false positives tend to be better than false negatives in my experience. One time I was actually diving into the float implementation to encode some bitmask into a texture on the GPU and I was curious if I could avoid bitshifts... only to find out that the framework supported integer texture formats after all 😅
15:03 The best way of doing it is to manually find the exact representation of the float as an array of 32 bits then handle the comparisons yourself, or better yet, just use fixed points!
From my experience, this is how we compare two floats/doubles. You need two tolerances. Relative, and Absolute. abs_tol is the value you accept "as zero" in "this context" of comparison. rel_tol, is the max amount of "relative difference" two numbers can have to judge them as equal. And the formula is: abs(a-b) < rel_tol * abs(a) + abs_tol As you can see, there's an "a" multiplied on the right side. And what that does is, it "scales" your rel_tol to the vicinity of the numbers you're comparing. So, if you are comparing really close to zero, (a is small) rel_tol * a will become smaller and the significant member in the RHS is abs_tol, so, near zero, you are using your abs_tol. If you are comparing two large numbers, rel_tol * a becomes large and now this term (rel_tol * a) is the most significant term of RHS, controlling the comparison result. This is a variation on the simpler version which is: abs(a-b)/abs(a) < rel_tol You take the abs(a) to the right side, but add the abs_tol. From my experience, for "double precision" we set abs_tol to something like 1e-16~20 while rel_tol to something like 1e-8~10. This has worked mostly in the past for me, But I've had cases where even this does not work!!! Right now I'm reading randomascii article and it is fascinating. I'd love to know your thoughts on this. Thanks everyone.
Try looking for functions to extract the parts of a float, as well as functions to reunite them. You get the exponent of whichever value you're treating as dominant, then pack together with epsilon (at least, I THINK it was epsilon, it's been a while since I did this), and that gets you the smallest possible step size for the context you're interested in... more or less. You may want to consider the scale above and below as well... It may also be that extracting the exponent gets you everything you care about, but I've never tried that, so I can't speak to the sanity of attempting it.
Related: when summing numbers with a large range of values you need to sort by abs value in case you have a lot of very small numbers a a few very large ones, in which case an unsorted add of a large value can saturate the available precision and the small values (no matter how many whose sum is large) will be ignored.
Love the video, I'm currently studying for a software degree and they sadly don't teach anything this low level so this is a big help. I was so glad when you just went into an example at the start too, I hate when TH-camrs try to teach a concept and they go all the way back to the stone age just to cover the origin 😂
Remember, the outsider thinks computer science is magic. The novice programmer will tell you about how computer science makes perfect sense. The experienced programmer *knows* computer science is magic.
after watching this video i feel good for choosing to represent collectible crystals in my game code not as a floating point number but as an integer which counts 12ths, only converted to float for display: ``float crystals = crystal_shards/12.0``
A few years ago when I started programming a universe sized environment I first used floating point. I quickly learned that was a big mistake. I switched to 64 bit and 128 bit integers which are 100% accurate.
I remember using something like if ( abs(a - b) < error_value) with error_value = 0.0001 instead of if (a == b) to circunvect this problem with floating point comparison. It was some numerical computing (I think I was playing with a numerical method to finding the roots of an equation, or something), and the "a == b" part was never being triggered...
I wrote a machine controller once, and the position for the steppers were calculated using floating point numbers. When I tested the stepper driver routines the shaft position would be updated by some small value that gave a certain RPM. At first everything sounded normal and smooth, but after about 5 minutes the steppers sounded horrific and choppy. I eventually figured out the compiler for the microcontroller did not support double precision by default, but does not generate a warning during compile. It just silently interprets it as regular floating points. After enabling the right flags and recompiling it finally worked. But the error was simply the problems floating point numbers have in representing certain spans of numbers.
IEEE 754 octuple-precision binary floating-point format: binary256 In its 2008 revision, the IEEE 754 standard specifies a binary256 format among the interchange formats (it is not a basic format), as having: Sign bit: 1 bit Exponent width: 19 bits Significand precision: 237 bits (236 explicitly stored) The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only 236 bits of the significand appear in the memory format, but the total precision is 237 bits (approximately 71 decimal digits: log10(2237) ≈ 71.344).
I caused a bit of s stir on the old Risks list a few decades ago commenting on the error characteristics of base 2 floating point versus base 10 floating point. This was about the time that people were finding PC spreadsheet programs were making mistakes with currency values because the program was using binary floating point instead of something in base 10. I knew something about this because Texas Instruments had implemented a base 100 floating point system in their home computers - and had documented it in the BASIC manual!
Fantastic video! I'm constantly forgetting what I know about floating point numbers so I'm definitely going to be coming back to remind myself in the future.
Great explanation of Floating point numbers! I designed the FP execution unit on the 387 and 486 processors and this brought back a lot of memories. Handling Denormals and Unnormals were a pain but we got it done. Same with NaNs. Unfortunately, the guys who did the Pentium design after this failed in getting the right division lookup table entry and it led to an interesting story.... The next interesting topic might be a discussion on rounding using Guard, Round, and Sticky bits for numerical correctness.
Woah, you've been around! I was just a kid playing Sierra games back then, would love to hear more about your experiences if you have a blog or something.
@@simondev758I haven't written my memoirs yet, but have had many discussion with other folks about the earlier days of CPU design. After the FP design I was the Design Manager for the P6 (Pentium Pro) and then GM/VP for Pentium II, Pentium III, Pentium 4 and the first Celeron. It was fun until it wasn't and then I left and started a company and then worked at SpaceX for a while. I'm not sure how to do a blog about so much of this since so many other people are intertwined in the history.
great video from a dev for devs, now i see, why it's beneficial to have increasingly big amount of numbers, when you get closer to zero. Was really wondering. Also: always think of my english teacher, which urges me, not to curse . Then i watch one of your videos and smile "3052... and some crap, give or take" 😂 Fun Fact: C# has the very handy decimal type, which is a floating point number with base 10, instead of 2. So you can actually do things like "0.1m + 0.2m == 0,3m" (m is the literal for decimal type). It's a real life saver for LOB applications, not for games or other high performance scenarios of course.
A handy trick I figured out that better handles floating point equality is to xor the integer representation of the two floats data. Comparing the resulting Int giving a rather effective way if telling if two numbers are effectively identical. for example: xor(0.1+0.2,0.3) == 7 (0b111). so anything 7 or below can easily be considered floating point math error (we could say ≤15 (0b1111) to be safe). It's at least more accurate than a direct comparison, with the single cavoite I've found being 0 vs -0.
The way you included that joke about Konrad Zuse without drawing any attention to it and then you actually read and liked my comment with over 100K views makes you one of my favourite people in the world, and you were already pretty high up there. Just and interesting note, tonight in a South African comedy club I saw the actual Darryl Philbin from Dunder Miflin perform live musical comedy. I was supposed to perform but I got bumped to next week. I got to show him the 3D caricatures I made of his coworkers (edit: coSTARS), and now I'm going to make a caricature of him (his name is Craig Robinson) and show it to him before he leaves my country. Can't wait to meet you some day too! I'm working on a game! almost done!
Hah, I mean it doesn't take long to go through the comments, there's not a million of them. If you take the time to write a comment, I'll definitely read it. re: music, that's super neat! I loved the Office when it aired!
I'm aware of (most of) this and it's always surprising. Fun was something like if (a>=0) { b=std::min(1.0/a, 1e6); // since now assume that b is inside range between zero and million }. Program had some weird behaviour. After some debugging, it turned out that a can contain negative zero and b can be negative infinity.
This also implies that in a sum of more than two numbers, the order of the summation might change the result slightly. As a consequence, a perfectly "deterministic" program can have completely different outcomes every time you run it, as soon as you have some section of optimized/parallelized code where you do not have full control over the exact order in which some low-level stuff is computed. I was shocked when I first experienced this first hand as a young student working on physics simulations.
@@Kalumbatsch That is why I have written "deterministic" in quotation marks, LOL. I had naively assumed that some fancy optimized function (also involving some multi-processor stuff) would perform just like an ideal mathematical function, giving you the exact same output for the same input every single time. In floating point reality, not so much.
It's quite amazing the designed floating points to allow for such a surprising failure of 0.1 + 0.2 == 0.3. I was wondering why the C++ QT SDK bothered including a "real" data type and this might explain that.
Back in the day we used fixed point in many games to get around the issues with FP. It takes up more memory, but it speeds calculations up and eliminates some of the issues with comparison and arithmetic. It suffered from accuracy, but IEEE-754 does as well, just in different ways. Also, old processors didn't have a dedicated FP unit. I think the 486 was the first that had a built in FP coprocessor. The problem is that it was a coprocessor, so you had to block your main processor for FP calculations. It wasn't until the Pentium they finally put true FP pipelines in the processors. I don't recall if the FP pipeline was super scalar or not - but modern processors do have super scalar FP pipelines, so you can execute multiple FP instructions at the same time - or more accurately get the results of two calculations on the same clock cycle.
I started my career long after floating point had become the standard, so I know of and have experimented with fixed point, but never shipped a game with it. Worked with plenty of people from those days though, they had all the craziest low level tricks up their sleeves.
@@simondev758 Those days were crazy days for sure. It was all about squeezing every single ounce of processing power where you could. We didn't have massively parallel GPUs to offload things like particle engines and such. All of that was CPU bound, so we had to figure out clever ways to trick the player into thinking they are seeing more than they really are - I mean that's game programming in a nutshell really, but "back in the day" it was an art. Today though you have extremely powerful processors that are super scalar and can execute multiple instructions at the same time. They have separate floating point and integer pipelines, with prefetch, branch prediction, and all the good stuff we know and love today. I honestly don't miss those day though. I remember one time working for almost two weeks to figure out how to squeeze 20 (20!) bytes out of Splinter Cell, so the game would fit on a smaller handset (I was doing mobile porting). Removing levels or content was not an option, so I had to figure out a clever approach to save that 20 bytes so the game would run on certain handsets (this was back in the BREW/J2ME days a few years before smartphones).
@@BitwiseMobile Damn! I came in on the tail end of the Xbox/PS2 era. I did some amount of instruction fiddling with the in-order powerpc's in the xbox360/ps3. Vivid memories of reducing LHS's. Nothing really compared to the engineers who had worked in the era's before. If you have some interesting areas to investigate for videos, I've kinda been wanting to dig into some of those older optimizations. I still have my Black Book from Abrash, been meaning to open it up again.
Back in the day (I was maybe 12), my first contact with fixed point math was a 3d rendering sample written for the 386. I didn't have a 386, and somewhat foolishly backported it by wrapping each 32-bit operation to run on my IIT 2C87 - individual adds and multiplies done by having the FPU do integer conversion and rescaling around them. All a massive detour; it was even doing matrix multiplication that particular FPU had an instruction for! It would have been better to rewrite to float instead of these assembly wrappers. But I did get it running and learned some math in the process.
If one is doing modeling then floating point can introduce chaos at each step on the model code. We used to validate models by increasing numerical precision until the results became comparable ie within tolerance. One would use single, then double precision then quad precision...
10:00 minutes mentions the 8 million floating point units that get halved everytime you go out by a power of two. The high precision double would go some way to helping this such as in OpenCL2 (or 2.2) but mostly it would be for signal processing (like spectral analysis) such as audio and video, but especially (with or without those) in neural networks computed on GPU. The Cache on CPU is large nowadays but for the aforementioned, the GPU would still be relevant as long as it is proprtinal to the system's other specs in CPU _(like using a Radeon RX570 or thereabouts)._ So the performance penalties incurred could easily be for Gaussian kernelized classification in computer science, seeking the outliers (of what "might" be an answer as something small to ensure is worth classifying), so as to then look at Gaussian white noise (Chebyshev polynomials) and Gaussian kernel (density) probability estimation. It isn't that other ways could not solve for X, such as Markov chains (state transition probability matrix) but efficiency in optimisation can depend on what hardware a person has to hand or would need to be "prepared" to have to hand _(hence a use case for heterogenous computing)._ If you are adding an outlier like a millimetre, you might find that gauging when to include it or drop it could be handled in the aforementioned ways (estimation) by means of Gaussian Kernelization classification in computer science, and by that I mean one might estimate _(not the same as approximation, for an estimate can sometimes be on the mark as ironically the exact, precise value)._ So as to mitigate those performance penalties, the Parzen-Rosenblatt window method would be worthwhile using. In its probability _(similar to Gaussian Kernel density estimation, but Gaussian Kernel probability estimation)_ it (the method, as a pre-processing stage) can better ascertain whether it is worth it or not to include the millimetre (or whatever miniscule measure) in the outliers of the Gaussian Kernelized classification. The floating point (mantissa exponent) can be adapted thereby accordingly. As an aside, the description text can do with ieee-754 since it has a typo. This information is the help people. The above information would thereby apply to Gaussian heatmaps, tracking weather whereby an initial 16 or 32bit float is all that is needed and then the high precision Gaussian double is used for the area of deep analysis. Then also disease or health tracking and crops or tree of life can be assisted. The 1.5 C climate aims could then become carbon Zero, by process of elimination. It will be a yes to the question: Have you found what you are looking for? Because it's here. It's what they can't see. Tones of home. Data, with enough interrogation so as to torture it, will yield the correct outcome. It's about ruthless efficiency. _"Suddenly I see."_ or _"Karma's gonna track you down, step by step from town to down"_ My comment has no hate in it and I do no harm. I am not appalled or afraid, boasting or envying or complaining... Just saying. Psalms23: Giving thanks and praise to the Lord and peace and love. Also, I'd say Matthew6.
This really screwed me over when I was trying to do some angle calculations on a coordinate plane. I knew the line AB was parallel to the line CD in my test case, but the angle comparison in my code kept failing. It was infuriating. When I figured out what was going on, it was even more infuriating.
There is also a difference in 32-bit and 64-bit compilers and CPU/FPU (x86/x64 Intel I mean). 32-bit CPUs use an 80-bit intermediate representation of data in the FPU during operator evaluation. The programmer also has access to the 80-bit long double type On 64-bit systems with vector instructions, compilers prefer them even for calculations with single numbers, so that even an explicitly declared variable of type long double is implicitly converted to double. As a result, the same program compiled in 32-bit and 64-bit modes will produce different results!
A (64-bit) double-size floating point can exactly represent all 32-bit integers (and a few more), and the operations match. So if you only need 32 bits, JavaScripts Number works for integers as well.
I believe you mixed up programming exponential notation and Euler's number. (at 6:26) - In programming, E(number) is a shorthand for 10^(number) and you DON'T put the number up on top the same way as writing Euler's number raised to an exponent.
Once had a subnormal issue with a shader dispatch setting a shaders input values randomly from a tiny value to 0, so not only did i have the issue intermittently, it was also hard to see even when it did occur as it just resulted in a black pixel in an already dark shader effect.. Except on some specific hardware sets in release mode it caused the shader to crash, resulting in major graphics glitches and boy was that fun to debug, took awhile to just realize i needed to dump in release mode to get the issue even randomly!
The coordinates in GTA: San Andreas was a floating point number like most things in the internal script. The world spanned a few thousand units in either direction. But someone made a mod with a boring road on the water to the edge that was at 20,000. When approaching that, every part on the car began visibly shifting. I had the great idea to separate the bumper, the license plate, and the lights, etc., so that they could be later selected and copied. In Age of Empires, the money were floating point numbers off by a significant amount, and it was not possible to find them with a simple cheating tool. Floating point matches our perception of the world where small differences become less important as we have more of the stuff.
And this is why I jump through moderate hoops to treat my numbers as integers. "So for this I'm going to count my universe in millimeters." "Why?" "Because it's more precision than I think I'll need, and it's not float." I have seen (and ranted about) using float for *currency*. Please, dear god just use integer pennies...
Btw, please support me for more videos!
My Courses: simondev.teachable.com/
Patreon: www.patreon.com/simondevyt
this is completely incomprehensible. you don't actually understand how to teach. you're rambling and scribbling things that have literally nothing to do with the data you're presenting. everything in a lesson should help to understand that lesson. this is like explaining something in a loud cafe on a napkin, except you've recorded it. your sheer incompetence at your chosen occupation is admirable.
Many years ago when designing the Sheerpower programming language for business applications, we spent a ton of money (over $100K) on this exact problem. We ended up with a data type called "real" with integer and fractional components located in their own memory locations. The hard part was making the runtime performance fast. Once done, it has been enjoyable never worrying about all of the FP pitfalls that you very well explained. In fact, this is the best explanation and clarity I have ever seen! Thank you.
Interesting! It sounds a lot like fixed point?
@@simondev758 Fixed point using separate memory locations to speed up things like "convert to an integer" where one just clears the fraction part memory location... no calculations required.
This sounds like "integer" format (with number of bits twice the number of bits in your word length) scaled by 2^(-n) where n is the word length. Why not use double-word integers?
@@bpark10001 We used two int64s. The use of two memory locations made many frequent operations (truncating numbers, etc) much faster.
@@simondev758 'decimal' type in c#
In the 70's and 80's we called floating point computer math: "floating point approximation". Someone in marketing dropped the word "approximation" sometime over the years.
When we designed a language for PLC use in 1984, the language didn't have "Compare Equal" for the REAL (floating point type), but a "Compare Tolerance", with an explicit tolerance argument provided (as described in video). Many customers were confused at first, until they realized that "measurements" are not exact and need to be treated as approximations everywhere. I was young and inexperienced at the time, but the boss were old school veteran in analog computers, sensor technology and much more, so he insisted "no compare equals for REALs. It is not possible!".
@@niclashShould typically be two tolerances, one relative and one absolute. The fun part with subnormals is they have variable relative precision, but their absolute precision remains the minimum available, so with both tolerance checks they don't need special handling.
@@0LoneTechcould you explain how both of those numbers would be used?
There was that scandal when one of Intel's processors did the approximation incorrectly.
My favorite floating point hack is that 7/3 - 4/3 - 1 will always give you machine epsilon. I don't quite remember how, but I found a comment in the depths of stackoverflow that claimed it worked regardless of programming language, OS and computer. As long as it's using the IEEE standard it works.
Super cool! I found a reference for it here, problem 3: rstudio-pubs-static.s3.amazonaws.com/13303_daf1916bee714161ac78d3318de808a9.html
Oh, that makes sense! A third is like the perfect middle step between powers of 2, so the mantissa is all ones. But 7/3 has a one greater exponent than 4/3, so it's "missing" a decimal digit that's presumably rounded up. The difference between them cancels out everything except 1 and the least significant digit of 4/3, making it 1 + ulp.
What a cool trick.
@@volbla Thanks for the intuition! It makes a lot of sense when you explain it that way.
I remember that the old Windows 3.1 calculator had a bug where 3.11-3.1 (the two major Windows releases at the time) would equal 0.00. Good times.
It's called rounding. The rounding mode in IEEE defaults to round to nearest even. So your trick only works in some rounding modes. Meaning your condition of IEEE is incorrect. And not understanding the trick is in rounding is a blunder.
I think this is why Sun did that big push to evangelize interval arithmetic. It basically covered for all the imprecision of floats by simply treating them as fuzzy intervals. Things like == comparisons are now interval overlap checks and operations that make the error worse actually make the intervals grow. You basically avoid a lot of these headaches by just assuming that error will always be there and developing your arithmetic around that assumption.
Which incidentally is only of the few non pitfall way of using floats.
Keeping an error counter and controlling the interval manually.
It's a PITA doing it in C thou
...or do it in integer & know the "error" is zero. The problem with floating-point fuzzy scheme is that the error builds with the number of chained computations, which the math doesn't know about. Of course, if you stack irrational (such as trig) computations, this error appears no matter what the number representation scheme.
Floating point disease was so bad because as soon as it was introduced, everybody "had to have it" & it was boasting point for computer manufacturers. It was so much that computers had ONLY that format, even when working with integers. Early desktop HP computer calculated 2^2 = 3 (it used logs to compute exponentials).
Interval arithmetic is also incredibly useful for anything scientific
@@bpark10001 "computers had ONLY that format, even when working with integers"
*cough* JavaScript *cough*
@@coopergates9680 Yes they did. HP made a desktop computer in the 1970's that had ONLY floating point format. (When loop counters & other integers were needed, the computer internally TRUNCATED to integer. That's what caused the "2^3 = 7" problem. (I had to add a "+ 0.5" to any exponentiation calculation to get 2^n iterations of the loop.) I guess this "simplified" the machine as there was only ONE type of variable, of a fixed size. Remember in those days a lot of the math was done by dedicated HARDWARE. It is simpler to have fixed-size fields in memory. Most of the calculators also used this format, not changeable.
"JavaScript" in 1970 had something to do with coffee & writing, & nothing else.
My "favorite" thing about floats is that float operations are nonassociatve. That is, (a + b) + c need not equal a + (b + c), and same for multiplication.
My rule of thumb with programming using floating point numbers to just assume that two floating point numbers are never equal. The only time a FP is equal to another FP is when they were obtained by copying. FPs can be compared as "less than" or "greater than" as a sort of "inside/outside" check, with "equals" case being implicitly bundled with either one of those two.
@@piisfun
It's good never to rely on them being equal, but it doesn't solve all your problems. Like, 50 billion and one is bigger than 50 billion, but if X=50,000,000,000 and Y=50,000,000,001, then Y>X will return false.
@@dylangergutierrez Hence it's a percent error issue, it's more like abs(X/Y - 1.0) < 0.000001. We all know that bug in old Minecraft when the player is far from the origin lol
@@dylangergutierrez 50B + 1 is larger than 50B, if the former can be stored. Otherwise the result of the addition is 50B, and you would be comparing 50B with 50B.
that is what unity does in the animation, floats cant have an equal comparisson, only integers can
My boss recently told me a story of a game he once worked on. If you left it running for about 28 hours or so, all kinds of weird shit would start happening. Like the rendering would break completely, certain things would stop moving etc. The reason was that certain things in the game kept some kind of on-going timer. This was usually a timer of accumulated delta times and in the range of seconds. Turns out that after the amount of time mentioned above, these accumulated timers got so big that a delta time of 1/60 was no longer large enough to affect them in any way, thus they froze entirely. It's basically one of the floating point issues you mentioned in the video.
This specific bug never got a proper fix, just a workaround, which was to simply pause the game on inactivity.
Absolutely. This is part of the reason developers do soak tests.
Incremental stuff should always use a char (byte), short, int, or long, and every once in a while it's fine to convert that millisecond or nanosecond figure into a float of seconds.
Given that a double has more significant figures than a 32-bit int, if a timer goes far enough to lose this much resolution in a double, it's ticking stupidly far anyway and should be reset or redesigned.
@@coopergates9680 Yeah, switching to integers and using millisecond delta times in general is one of the proposals my boss had to fix this problem for good.
Just tedious and dangerous to do in an already existing game, so it's something we'll likely be doing for future games.
how do you normally fix it?
Since the time interval is constant (1/60) you should use fix point instead of floating point: use integers and count the number of 1/60ths seconds, i.e. the least significant bit is interpreted as 1/60 of a second. With a 32 unsigned integer you can then run it for 828 days (add more bits if needed!)
I love the little HTML changes you make in the websites.
The description in 7:32 makes the title even better :D
And at 0:36
@@kirbofn524 thanks for pointing that out! haha
This is an amazingly dense video! I have to watch it multiple times to completely absorb it.
Heh yeah I hate repeating myself and figure you can always just rewind.
@@simondev758 thats why you are so clear ;)
It kind of makes sense that floating point values don’t play well with equality, because the real numbers are infinitely divisible. In the real world, when you’re comparing things, you are always working to a certain degree of precision.
The only way for two objects to be the exact same length would be for them to have the same number of atoms, which is an integer comparison.
Yes and even then we're still making a lot of assumptions about the nature of atoms.
If anyone is having problems with floating point precision errors, consider switching to fixed-point. 32-bit fixed-point might not give enough precision for most problem spaces, but 64-bit fixed-point would and is an easier data structure to deal with as a lot of the precision errors become predictable.
This was so helpful! I always wondered why in MatLab, I sometimes have to do a - b < 0.00001 instead of a = b to compare two values.
eps(num) gives you the minimum value that can be added to num. Usually useful to do something like abs(a-b)
@@SomeStrangeMan All well and good and practical, but I hate when people use "epsilon" (like, from analysis) to mean "really small number". The point of epsilon in analysis is that it's the _arbitrarily_ small number.
@@smorrow in MATLAB the eps function returns the smallest number that may be added to the floating point number given to it as an argument.
@@somestrangescotsman Yeah, but the _name_ of it is obviously a mistaken reference to epsilon from analysis. And epsilon in analysis really means "the smallest number you can possibly imagine, except for zero", sort of like an inverse infinity. It doesn't have an "actual value" that you could in principle write down, whereas the Matlab eps' whole point is to be an actual value, so it's really inappropriate to name one after the other.
The smallest number you can imagine is the smallest number you can add to another. Within the rules of floating point numbers, that IS epsilon.
Years ago, I recall reading the specs for a Java3D library and I think they had a 256bit fixed point library. IIRC, you could represent Planck lengths in the same model as the observable universe. Though I imagine there would be performance costs for that, with 32 byte numbers. A spacetime coordinate system would use 128 bytes for 3 space and 1 time coordinates. Or even just homogeneous space coordinates.
Oooh fixed point is awesome, I've been meaning to make a vid on that.
@@simondev758ooh, did you ever get anywhere with that?
@@simondev758 Maybe you can also discuss the Java BigDecimal class.
@@Islacrusez I have notes and stuff jotted down, but I kinda go with what I get excited about at any given time. I was happy to dive back into graphics a bit the last few months.
Unfortunate you haven't made this video on May. I've been learning this for my final examination in university. It's always better to watch someone explain it this way than reading a bunch of papers. Keep up the good work, currently I've seen all of your videos.
12:27 A perfect example of this in effect is Minecraft (specifically something called the farlands on with wiki), back before there was a world border. I'd encourage to go check it out! It's really interesting and works wonderfully to visualise these floating point errors in action.
in Minecraft bedrock, there's also the stripe lands
This is the clearest explanation of what floats are that I have ever, ever, _ever_ seen. Thank you for this. :D
I don't know if you have ever been a lecturer, but I can tell you you are really good, and all people who have had you around are very lucky. Great coverage, great dissecting of the subject, extremely well presented. Thank you, subscribed just from one video. :)
Never been a lecturer, but I've spent a lot of time as a mentor because apparently I'm good at that. These videos are a great way for me to work on collecting my thoughts into a more cohesive form and working on my presentation skills.
The fundamental problem with FP arithmetic is that Real numbers are not natural fit for binary computers. There's no way to directly map values with moving decimal point in a register, since the register has fixed length, without accumulating large errors. That leaves you with the fixed point format option, where you have to choose between limited range or limited precision, but not both. The convoluted way FP arithmetic is implemented in the binary logic constraints makes it possible to have both cases (range and precision), at a cost of added complexity and a thick book of rules/limitations -- the IEEE-754 standard -- that historically made high-perf FP hardware implementation even more expensive.
The fundamental problem with FP arithmetic is that Real numbers are not natural fit for computers. It doesn't matter what base you're working in. 1/3 is unrepresentable in base 10, since it's 3.33... repeating. You will run into this issue at some point. You have a countably finite space in any case and you need to cram in an uncountable infinity. There are more reals than there are integers, to cover *any* subspace exactly is impossible. Even if you had to exactly represent the space [0.00001, 0.000011] you would undoubtedly have to use either FP or fixed point and in either case lose a lot of precision. What FP does do is provide acceptable precision in the vast majority of cases through the observation that small numbers we work with often have smaller differences between them.
It's fun to think that floating point units were so complex the first processors didn't even have them and you'd have to use a coprocessor that was often larger than the processor itself (like the intel 8087 that had almost double the amount of transistors the 8086 had) and today we have GPUs that have thousands of FPUs in a single die
@@smlgdat some point we are going to have to ditch digital computers and use analog voltages.
Just look how terribly inefficient is Machine learning on GPUs.
All machine learning engineers agree
real numbers are, by definition, not fit for COMPUTERS. the definition of computation and computational problems requires that ALL inputs must be representable with a finite sequence of symbols. otherwise, it literally is not computation. the real number set (or any continuous subset of it) is not entirely representable with finite symbols. however, nothing stops us from picking a few real numbers and sticking some labels on them. and that's what floats are. (smartly ordered) labels for a (smartly picked) finite subset of the real numbers.
in case you're wondering "hey, but what if we could have infinite inputs?", that's called hypercomputation. good luck with that.
Hint for business programmers: Use integers if you're dealing with money. (Some languages support a "numeric" data type, which is nothing but an integer with an implied decimal point.) But avoid floating point for monetary values!
A minor niggle: What is described as a “mantissa” here is really a significand, one which is linear within the range allowed for a given exponent value. Mantissas, as in log tables, are logarithmic. If the exponent in a floating point format were represented as a binary fixed point (so that the usual significand would no longer be needed), the fractional part of the exponent would truly be the mantissa (and in the language of log and antilog tables, the integer part of the exponent would be called the “characteristic”). (Watch out for negative exponents, since the mantissa still has a positive sense in log tables. For M=0.113943, C = [−1, 0, 1, 2], 10^(C+M) yields [0.13, 1.3, 13, 130].)
Yes, it really is the significand, but in adopting mathematical techniques into computer engineering, the word mantissa was used, and became the defined word.
An exceptional video about floating point precision. A great teacher right there. He gives a lesson like no problem
Small correction: a number like 1E-9 usually means 1*10^-9 or possibly 1*2^-9 (I'm not 100% sure on rhat one) but 1*e^-9 is something different entirely (e=2.718... is Euler's number)
10^-9 yes, a power-of-two exponent is only used for hexadecimal float literals (0x1p-9 would be 1*2^-9)
The capitalization of the "E" doesn't matter in floating-point literals. "1E-9" and "1e-9" both mean "one times ten to the minus ninth power" (though the capital version is preferred to avoid confusion). Euler's number is represented in an entirely different way, depending on the exact programing language in question (e.g. "M_E" for C and most of its descendants).
I was confused why he was using base e lol
This is a really fantastic video! Floating Point is one of those things in my early Computer Science 101-level coursework that kind of blew my mind.
I really appreciated this video, I have thought about it on and off during the last week, thanks for quality content. I'm always excited for your new videos. Keep it up Simon!
8:23 scared the shit out of me.
Some say I'm a master of horror...
@@simondev758 All we know, is he's called The Stig.
Omg I can't believe you are a time traveller. What other exciting computer science moments have you witnessed?
The end of the gpu shortage was crazy, can't believe how it went. Wait, has that happened yet?
@@simondev758 no wait... There is a shortage TIME TO BUY SOME GPUS!!!
LOL
@@simondev758 all hail Proof of Stake
If you can use integers instead of floats without overcomplicating the program, do it. For example, the currency is better handled by integers. Just store 995 cents instead of 9.95f dollars, and convert to dollars only to interact with user. That's the best way to avoid all these issues.
Also, one issue not mentioned in the video is that these errors like 0.01f + 0.02f accumulate, if you do thousands or millions of operations on a floating point variable the error may become quite substantial. Again, use integers instead, if it's feasible.
I know there are libraries that help to deal with fractions. It's not a bad alternative. Just keep in mind that integers are native type, and calculations on integers are much much faster than on any non-native type.
The problem with Floating Point representation (IEEE 754) is that we're basically trying to force a base 10 number into a base 2 representation. As such, compromises have to be made in order to reduce both computational and memory complexity.
Another way of looking at it is using scientific notation: You can describe pretty much any rational number through scientific notation, but the number of significant figures generally increases both complexities on a linear scale. You can bound the complexity by limiting the significant figures, but this leads to a loss of information.
Once we had excess memory and computational resources, things like Java's and SQL's decimal for more accurate but memory and computationally more expensive representation.
I understand the basics of float, and I've decided to just use integers whenever possible. Far less issues.
A basic principle of writing business application software, especially involving money...
Yep, but games historically had to put all their chips in the "performance" pile
Whenever I can, I use integers instead of floating point. I just pick a smallest unit, e.g. 1mm, and count how many of those I have in all my measurements. If you work in 64 bit integers, you have enough range to cover a whole lot that way.
Yeah that approach works super well, basically a simplified fixed point?
That's what KiCad (tool for designing PCBs etc) does.
Uses 32 bit integers with nanometer increments and you get a reasonable upper limit of just over 2m for the PCB size.
@@simondev758 Yes, well, it *is* fixedpoint. That is all fixed point is: Choosing a minimal unit that is some specific fraction of your base unit and counting how many such fractions you have.
The issue is the propagation of error. That works if you have to do a few (in computer terms) operations, but if you have to do many, like in the simulation it's one of the worse approaches. Each multiplication, for example, produces a loss of precision "identical" to truncate. It's simply not viable for problems of modern scale.
@@jaimeduncan6167 It is actually exact, that is the whole point of using fixed point. Floating point has a lot of precision problems, but when you are counting a specific number of your minimal units, you are simply counting an exact number of those units. There is no error and thus no error propagation. You have to accept that whatever you are counting is quantized, if you are doing a flight sim, your planes will be snapped to a 1mm grid (or whatever minimal unit you decide to use). As long as that is fine, you have no error and no error propagation. With floating point you do get an issue of error propagation which makes many operations much more complicated. Like the video says, you can't directly compare two floating point numbers for equality. If you are adding an array of numbers, you have to sort them by exponent and add the smallest ones first, before adding larger ones. If you don't, adding a number with an exponent 53 higher than a smaller number will make the smaller number vanish with no effect (the whole mantissa is too small to have an impact). In a summation of many numbers, that small number could have made a contribution if it had been added to an only slightly larger number first and thus been propagated up to the big numbers. This means that the order of addition is important in floating point, you lose commutativity. Without the ability to swap the order of summation freely, a lot of algebra is lost as well, making many other things much harder.
It was Concise: Giving a lot of information clearly and in a few words; brief but comprehensive.
Thank you Sir 😊👍
Have you encountered unums / posits? They're an attempt to redo floating point in a way that reduces these problems quite a bit. Obviously they'd need hardware support to be fully performant, but it's possible to implement them for accuracy testing purposes (e.g. Julia has an implementation), and they do extremely well.
Only read about them, haven't had a chance to try them out though
For using floating-point numbers as a black box, intermediate calculations should have double the precision of the final result; tests for equality should pass if two numbers are within "epsilon" times one of the numbers (it doesn't matter which you choose) or absolutely the smallest normalized number in the target precision. These, of course, can be given as defined constants.
If you actually want to _understand_ floating-point computation, IEEE is not a good place to start. It's great for a standard to put into microchips. But, for learning, a good starting place is to represent sign, exponent, and mantissa as integer values (fixed point) in their own right, so that, by implementing them, you see how you are handling rounding errors.
12:25 Ahh! That's why every second block disappears when you go to X/Z = 16777216 in Minecraft Bedrock Edition
DEC64 is a proposal for a floating point implementation with decimal exponent, which would fix operations with decimal fractions.
I’m no computer scientist nor a mathematician, just a casual web dev… but I’ve never understood why floating points is the norm and not rational (like rubys rational class). I get that we can not represent all numbers as rational (because of irrational numbers like pi obviously) but many problems with floating points would be spared. Like the 0.1 + 0.2 == 0.3, in rational 1/10 + 2/10 = 3/10. I guess I’m going deeper in the rabbit hole. Great video!
Hah! Happy to have increased the amount of confusion!
I didn't read into the decision process itself, but if I had to guess, to me floating point is a better tradeoff as a general purpose data type with it's massive range compared to fixed point.
The reason (as far as I can tell) is because floating point is insanely simple to implement in hardware, and is extremely fast.
I'm not certain, I should probably use the internet to find the answer
Rational datatypes often suffer from representing the same value multiple times. If you're using two 16 bit integers to store the numerator and denominator, then you have 65,000 ways to have 0, 32,000 ways to have 1/2, etc. This can cause problems with comparisons, overflow, etc. So most implementations I've seen simplify rationals into their lowest unique value, which decreases performance and requires prime factorization after each calculation. But you're still left with massive holes in your datatype, and you've slowed down all your algorithms tremendously anyways.
Floating point represents a much wider range of values, with higher precision for small values, and there are no duplicate values to deal with (with an asterisk for NaNs and on systems where subnormals are truncated to 0).
Research how it's implemented in the hardware. The hardware limitations give rise to software limitations. This key understanding of hardware is the difference between programmers and computer scientists.
With that said, 1/10 and .1 are the same.
@@AnarchistEagle 1e+1 and 10e+0 is also the same and still no problem to understand. They simply get both converted to 0.1e+2 or 10. Something like that could also be done here by converting everything to an integer and a power integer. So 0.032445 would get converted to 00000032445 and -6, 134.31 to 00000013431 and -2, 12300000 to 123 and 5.
But i think speed is the key. Maybe mathematical operations aren't that fast with this format.
This is why I like the Decimal type, it stores the integer portion and the decimal portion as 2 integers so there are no precision errors. Especially for money, you can't tell people it may spawn or banish because floating points are weird.
I work in FinTech and we always use a library for these reasons. C# has type Decimal but for JS and Go we use open source libs. IIRC Shopify was the base we built off of. And remember fractional numbers in JSON are Doubles so most(all?) Decimal libraries serialize to/from string.
To visualize the "gap between 1 and 2 is cut up into parts of size 1.19*10⁻⁷)" that's like measuring a distance of 1m with a precision of 12µm (micro meters) = 0.012mm, which is a tad smaller than a thin human hair.
If i am not mistaken 1.19*10⁻⁷ meters is actually 0.00012mm which is around the diameter of a single coronavirus according google.
Around 4:01: When you are working in binary, you probably shouldn't call the point "decimal point" or the places "decimal places". If you do, it's just very confusing. Just called them "point" and "places".
When I clicked on this I wasn't expecting Bob from Bob's burgers to educate me on some complex concepts
This is really great knowledge with regards to writing audio plugins for digital audio workstations as well.
Hey, thanks for the quick refresher. It's been over 25 yrs I'm out of school and all I remembered was "stay the hell away from floats, they be crazy"
A cool way to see the approximation nature of floating point is to do a Mandelbrot Zoom with 32-bit floats, eventually you'll see the image become pixelated and your "continuous" zoom stutters and ultimately stops.
It was super cool of Bob from Bob's Burgers to take some time out of his day to teach us all this
Wow, thanks! I've known about scientific notation, binary, integers, and significant digits for a while; even supported scientific compute where these problems come up; but with the underlying algebra you have shown us exactly why, no more blind attribution to intuitive real and binary conversion errors...
this is very helpful, in a piece of code I wrote recently, I kept running into this issue where when trying to calculate percentages made 3/10 into 31% with the ceiling function, and couldnt figure out the issue, I will try to reimplement it with this in mind and update how it goes in the edits later today
Your explanation and the way you talk throuth like in 0:32 really useful and entertainning😁, thank you for the hard work!
I love this channel for all my uninitialized variable needs. But you can really get anything here.
When comparing whether two floats or double are equal, i always use a percentage-wise tolerance, that is suitable for the application. like A is within 0.999*A
I'm honestly surprised how rarely this has actually given me trouble. I know some languages offer types like decimals to go absolutely sure, but I believe I never actually had to use one. Most problems fall into a "if it's roughly right, it's fine" category after all.
The only case that's regularly important for me is to use epsilon to check for equality. I usually use a pretty big one like e-4 since false positives tend to be better than false negatives in my experience.
One time I was actually diving into the float implementation to encode some bitmask into a texture on the GPU and I was curious if I could avoid bitshifts... only to find out that the framework supported integer texture formats after all 😅
I feel like this is one of those things where you could go years without ever coming near an issue.
15:03 The best way of doing it is to manually find the exact representation of the float as an array of 32 bits then handle the comparisons yourself, or better yet, just use fixed points!
From my experience, this is how we compare two floats/doubles. You need two tolerances. Relative, and Absolute. abs_tol is the value you accept "as zero" in "this context" of comparison. rel_tol, is the max amount of "relative difference" two numbers can have to judge them as equal. And the formula is:
abs(a-b) < rel_tol * abs(a) + abs_tol
As you can see, there's an "a" multiplied on the right side. And what that does is, it "scales" your rel_tol to the vicinity of the numbers you're comparing.
So, if you are comparing really close to zero, (a is small) rel_tol * a will become smaller and the significant member in the RHS is abs_tol, so, near zero, you are using your abs_tol. If you are comparing two large numbers, rel_tol * a becomes large and now this term (rel_tol * a) is the most significant term of RHS, controlling the comparison result. This is a variation on the simpler version which is:
abs(a-b)/abs(a) < rel_tol
You take the abs(a) to the right side, but add the abs_tol.
From my experience, for "double precision" we set abs_tol to something like 1e-16~20 while rel_tol to something like 1e-8~10. This has worked mostly in the past for me, But I've had cases where even this does not work!!! Right now I'm reading randomascii article and it is fascinating.
I'd love to know your thoughts on this. Thanks everyone.
Try looking for functions to extract the parts of a float, as well as functions to reunite them. You get the exponent of whichever value you're treating as dominant, then pack together with epsilon (at least, I THINK it was epsilon, it's been a while since I did this), and that gets you the smallest possible step size for the context you're interested in... more or less. You may want to consider the scale above and below as well...
It may also be that extracting the exponent gets you everything you care about, but I've never tried that, so I can't speak to the sanity of attempting it.
Bookmark
Related: when summing numbers with a large range of values you need to sort by abs value in case you have a lot of very small numbers a a few very large ones, in which case an unsorted add of a large value can saturate the available precision and the small values (no matter how many whose sum is large) will be ignored.
Good point, wish I had thought of that for the video.
Best explanation of the floating point binary format.
Love the video, I'm currently studying for a software degree and they sadly don't teach anything this low level so this is a big help.
I was so glad when you just went into an example at the start too, I hate when TH-camrs try to teach a concept and they go all the way back to the stone age just to cover the origin 😂
Heh yeah, I hate including a bunch of unrelated info to pad the video out.
That was my favorite history section ever, thank you
Remember, the outsider thinks computer science is magic. The novice programmer will tell you about how computer science makes perfect sense. The experienced programmer *knows* computer science is magic.
You need quantum physics to understand semiconductors and that's the closest thing to magic.
dark magic, possibly made of bees.
after watching this video
i feel good for choosing to represent collectible crystals in my game code
not as a floating point number
but as an integer which counts 12ths,
only converted to float for display:
``float crystals = crystal_shards/12.0``
"Konrad Zuse, loyal subscriber to SimonDev's youtube channel" XD
A few years ago when I started programming a universe sized environment I first used floating point. I quickly learned that was a big mistake. I switched to 64 bit and 128 bit integers which are 100% accurate.
I remember using something like
if ( abs(a - b) < error_value)
with error_value = 0.0001 instead of
if (a == b)
to circunvect this problem with floating point comparison. It was some numerical computing (I think I was playing with a numerical method to finding the roots of an equation, or something), and the "a == b" part was never being triggered...
I wrote a machine controller once, and the position for the steppers were calculated using floating point numbers. When I tested the stepper driver routines the shaft position would be updated by some small value that gave a certain RPM. At first everything sounded normal and smooth, but after about 5 minutes the steppers sounded horrific and choppy. I eventually figured out the compiler for the microcontroller did not support double precision by default, but does not generate a warning during compile. It just silently interprets it as regular floating points. After enabling the right flags and recompiling it finally worked. But the error was simply the problems floating point numbers have in representing certain spans of numbers.
I happened upon your videos 2-3 weeks ago. You're f'n crushing it dude. Just sub'd.
IEEE 754 octuple-precision binary floating-point format: binary256
In its 2008 revision, the IEEE 754 standard specifies a binary256 format among the interchange formats (it is not a basic format), as having:
Sign bit: 1 bit
Exponent width: 19 bits
Significand precision: 237 bits (236 explicitly stored)
The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only 236 bits of the significand appear in the memory format, but the total precision is 237 bits (approximately 71 decimal digits: log10(2237) ≈ 71.344).
That's a big float
This is so underrated channel
I caused a bit of s stir on the old Risks list a few decades ago commenting on the error characteristics of base 2 floating point versus base 10 floating point. This was about the time that people were finding PC spreadsheet programs were making mistakes with currency values because the program was using binary floating point instead of something in base 10.
I knew something about this because Texas Instruments had implemented a base 100 floating point system in their home computers - and had documented it in the BASIC manual!
Fantastic video! I'm constantly forgetting what I know about floating point numbers so I'm definitely going to be coming back to remind myself in the future.
Great explanation of Floating point numbers! I designed the FP execution unit on the 387 and 486 processors and this brought back a lot of memories. Handling Denormals and Unnormals were a pain but we got it done. Same with NaNs. Unfortunately, the guys who did the Pentium design after this failed in getting the right division lookup table entry and it led to an interesting story.... The next interesting topic might be a discussion on rounding using Guard, Round, and Sticky bits for numerical correctness.
Woah, you've been around! I was just a kid playing Sierra games back then, would love to hear more about your experiences if you have a blog or something.
@@simondev758I haven't written my memoirs yet, but have had many discussion with other folks about the earlier days of CPU design. After the FP design I was the Design Manager for the P6 (Pentium Pro) and then GM/VP for Pentium II, Pentium III, Pentium 4 and the first Celeron. It was fun until it wasn't and then I left and started a company and then worked at SpaceX for a while. I'm not sure how to do a blog about so much of this since so many other people are intertwined in the history.
Integers with bit flipping algorithms is the only reason I haven’t lost my mind
0:30 We owe everything to SimonDev
great video from a dev for devs, now i see, why it's beneficial to have increasingly big amount of numbers, when you get closer to zero. Was really wondering.
Also: always think of my english teacher, which urges me, not to curse . Then i watch one of your videos and smile "3052... and some crap, give or take" 😂
Fun Fact: C# has the very handy decimal type, which is a floating point number with base 10, instead of 2.
So you can actually do things like "0.1m + 0.2m == 0,3m" (m is the literal for decimal type).
It's a real life saver for LOB applications, not for games or other high performance scenarios of course.
That's interesting, the language specification has some overlap with fixed point.
My english teacher never encouraged my swearing either :(
C23 has Decimal numbers too.
A handy trick I figured out that better handles floating point equality is to xor the integer representation of the two floats data. Comparing the resulting Int giving a rather effective way if telling if two numbers are effectively identical. for example: xor(0.1+0.2,0.3) == 7 (0b111). so anything 7 or below can easily be considered floating point math error (we could say ≤15 (0b1111) to be safe). It's at least more accurate than a direct comparison, with the single cavoite I've found being 0 vs -0.
>desperately changing to doubles and hoping that your problems magically go away
I feel personally attacked.
The way you included that joke about Konrad Zuse without drawing any attention to it and then you actually read and liked my comment with over 100K views makes you one of my favourite people in the world, and you were already pretty high up there. Just and interesting note, tonight in a South African comedy club I saw the actual Darryl Philbin from Dunder Miflin perform live musical comedy. I was supposed to perform but I got bumped to next week. I got to show him the 3D caricatures I made of his coworkers (edit: coSTARS), and now I'm going to make a caricature of him (his name is Craig Robinson) and show it to him before he leaves my country. Can't wait to meet you some day too! I'm working on a game! almost done!
Hah, I mean it doesn't take long to go through the comments, there's not a million of them. If you take the time to write a comment, I'll definitely read it.
re: music, that's super neat! I loved the Office when it aired!
I'm aware of (most of) this and it's always surprising. Fun was something like if (a>=0) { b=std::min(1.0/a, 1e6); // since now assume that b is inside range between zero and million }. Program had some weird behaviour. After some debugging, it turned out that a can contain negative zero and b can be negative infinity.
“Under the supervision of a time traveller known only as SimonDev…”
This also implies that in a sum of more than two numbers, the order of the summation might change the result slightly. As a consequence, a perfectly "deterministic" program can have completely different outcomes every time you run it, as soon as you have some section of optimized/parallelized code where you do not have full control over the exact order in which some low-level stuff is computed. I was shocked when I first experienced this first hand as a young student working on physics simulations.
"where you do not have full control over the exact order in which some low-level stuff is computed" Doesn't sound very deterministic to me.
@@Kalumbatsch That is why I have written "deterministic" in quotation marks, LOL. I had naively assumed that some fancy optimized function (also involving some multi-processor stuff) would perform just like an ideal mathematical function, giving you the exact same output for the same input every single time. In floating point reality, not so much.
Yep, floating point isn't associative heh
It's quite amazing the designed floating points to allow for such a surprising failure of 0.1 + 0.2 == 0.3. I was wondering why the C++ QT SDK bothered including a "real" data type and this might explain that.
Back in the day we used fixed point in many games to get around the issues with FP. It takes up more memory, but it speeds calculations up and eliminates some of the issues with comparison and arithmetic. It suffered from accuracy, but IEEE-754 does as well, just in different ways. Also, old processors didn't have a dedicated FP unit. I think the 486 was the first that had a built in FP coprocessor. The problem is that it was a coprocessor, so you had to block your main processor for FP calculations. It wasn't until the Pentium they finally put true FP pipelines in the processors. I don't recall if the FP pipeline was super scalar or not - but modern processors do have super scalar FP pipelines, so you can execute multiple FP instructions at the same time - or more accurately get the results of two calculations on the same clock cycle.
I started my career long after floating point had become the standard, so I know of and have experimented with fixed point, but never shipped a game with it. Worked with plenty of people from those days though, they had all the craziest low level tricks up their sleeves.
@@simondev758 Those days were crazy days for sure. It was all about squeezing every single ounce of processing power where you could. We didn't have massively parallel GPUs to offload things like particle engines and such. All of that was CPU bound, so we had to figure out clever ways to trick the player into thinking they are seeing more than they really are - I mean that's game programming in a nutshell really, but "back in the day" it was an art. Today though you have extremely powerful processors that are super scalar and can execute multiple instructions at the same time. They have separate floating point and integer pipelines, with prefetch, branch prediction, and all the good stuff we know and love today. I honestly don't miss those day though. I remember one time working for almost two weeks to figure out how to squeeze 20 (20!) bytes out of Splinter Cell, so the game would fit on a smaller handset (I was doing mobile porting). Removing levels or content was not an option, so I had to figure out a clever approach to save that 20 bytes so the game would run on certain handsets (this was back in the BREW/J2ME days a few years before smartphones).
@@BitwiseMobile Damn! I came in on the tail end of the Xbox/PS2 era. I did some amount of instruction fiddling with the in-order powerpc's in the xbox360/ps3. Vivid memories of reducing LHS's. Nothing really compared to the engineers who had worked in the era's before. If you have some interesting areas to investigate for videos, I've kinda been wanting to dig into some of those older optimizations. I still have my Black Book from Abrash, been meaning to open it up again.
And to think the semiconductor industry is chasing TOPs with FP8 and FP4.
Back in the day (I was maybe 12), my first contact with fixed point math was a 3d rendering sample written for the 386. I didn't have a 386, and somewhat foolishly backported it by wrapping each 32-bit operation to run on my IIT 2C87 - individual adds and multiplies done by having the FPU do integer conversion and rescaling around them. All a massive detour; it was even doing matrix multiplication that particular FPU had an instruction for! It would have been better to rewrite to float instead of these assembly wrappers. But I did get it running and learned some math in the process.
In Java you could and should use BigDecimal if you work with currencies, to avoid the issues with double and float. How does that work?
Thanks this is a great explanation
If one is doing modeling then floating point can introduce chaos at each step on the model code. We used to validate models by increasing numerical precision until the results became comparable ie within tolerance. One would use single, then double precision then quad precision...
Loved the wiki history. now we know time travel is possible.
Love the H Jon Benjamin impression
10:00 minutes mentions the 8 million floating point units that get halved everytime you go out by a power of two.
The high precision double would go some way to helping this such as in OpenCL2 (or 2.2) but mostly it would be for signal processing (like spectral analysis) such as audio and video, but especially (with or without those) in neural networks computed on GPU. The Cache on CPU is large nowadays but for the aforementioned, the GPU would still be relevant as long as it is proprtinal to the system's other specs in CPU _(like using a Radeon RX570 or thereabouts)._
So the performance penalties incurred could easily be for Gaussian kernelized classification in computer science, seeking the outliers (of what "might" be an answer as something small to ensure is worth classifying), so as to then look at Gaussian white noise (Chebyshev polynomials) and Gaussian kernel (density) probability estimation. It isn't that other ways could not solve for X, such as Markov chains (state transition probability matrix) but efficiency in optimisation can depend on what hardware a person has to hand or would need to be "prepared" to have to hand _(hence a use case for heterogenous computing)._
If you are adding an outlier like a millimetre, you might find that gauging when to include it or drop it could be handled in the aforementioned ways (estimation) by means of Gaussian Kernelization classification in computer science, and by that I mean one might estimate _(not the same as approximation, for an estimate can sometimes be on the mark as ironically the exact, precise value)._ So as to mitigate those performance penalties, the Parzen-Rosenblatt window method would be worthwhile using. In its probability _(similar to Gaussian Kernel density estimation, but Gaussian Kernel probability estimation)_ it (the method, as a pre-processing stage) can better ascertain whether it is worth it or not to include the millimetre (or whatever miniscule measure) in the outliers of the Gaussian Kernelized classification. The floating point (mantissa exponent) can be adapted thereby accordingly. As an aside, the description text can do with ieee-754 since it has a typo. This information is the help people. The above information would thereby apply to Gaussian heatmaps, tracking weather whereby an initial 16 or 32bit float is all that is needed and then the high precision Gaussian double is used for the area of deep analysis. Then also disease or health tracking and crops or tree of life can be assisted. The 1.5 C climate aims could then become carbon Zero, by process of elimination. It will be a yes to the question: Have you found what you are looking for? Because it's here. It's what they can't see. Tones of home.
Data, with enough interrogation so as to torture it, will yield the correct outcome. It's about ruthless efficiency. _"Suddenly I see."_ or _"Karma's gonna track you down, step by step from town to down"_
My comment has no hate in it and I do no harm. I am not appalled or afraid, boasting or envying or complaining... Just saying. Psalms23: Giving thanks and praise to the Lord and peace and love. Also, I'd say Matthew6.
This really screwed me over when I was trying to do some angle calculations on a coordinate plane. I knew the line AB was parallel to the line CD in my test case, but the angle comparison in my code kept failing. It was infuriating. When I figured out what was going on, it was even more infuriating.
I’ve run into this many times! & combined with my logic dyslexic, it would drive me crazy !
There is also a difference in 32-bit and 64-bit compilers and CPU/FPU (x86/x64 Intel I mean).
32-bit CPUs use an 80-bit intermediate representation of data in the FPU during operator evaluation. The programmer also has access to the 80-bit long double type
On 64-bit systems with vector instructions, compilers prefer them even for calculations with single numbers, so that even an explicitly declared variable of type long double is implicitly converted to double. As a result, the same program compiled in 32-bit and 64-bit modes will produce different results!
A (64-bit) double-size floating point can exactly represent all 32-bit integers (and a few more), and the operations match. So if you only need 32 bits, JavaScripts Number works for integers as well.
Good video! I appreciate your effort.
12:22 Is that why Minecraft Bugrock has the stripe lands and why in Minecraft Java blocks eventually lose their collision box?
I believe you mixed up programming exponential notation and Euler's number. (at 6:26) - In programming, E(number) is a shorthand for 10^(number) and you DON'T put the number up on top the same way as writing Euler's number raised to an exponent.
Once had a subnormal issue with a shader dispatch setting a shaders input values randomly from a tiny value to 0, so not only did i have the issue intermittently, it was also hard to see even when it did occur as it just resulted in a black pixel in an already dark shader effect..
Except on some specific hardware sets in release mode it caused the shader to crash, resulting in major graphics glitches and boy was that fun to debug, took awhile to just realize i needed to dump in release mode to get the issue even randomly!
excellent content. senior developer here, never thought much about it
The coordinates in GTA: San Andreas was a floating point number like most things in the internal script. The world spanned a few thousand units in either direction. But someone made a mod with a boring road on the water to the edge that was at 20,000. When approaching that, every part on the car began visibly shifting. I had the great idea to separate the bumper, the license plate, and the lights, etc., so that they could be later selected and copied.
In Age of Empires, the money were floating point numbers off by a significant amount, and it was not possible to find them with a simple cheating tool.
Floating point matches our perception of the world where small differences become less important as we have more of the stuff.
Very informative video!
And this is why I jump through moderate hoops to treat my numbers as integers. "So for this I'm going to count my universe in millimeters." "Why?" "Because it's more precision than I think I'll need, and it's not float."
I have seen (and ranted about) using float for *currency*. Please, dear god just use integer pennies...
Yes, the problem of the granularity of the float numbers. It always come around and byte you in the back when you least expect it.