Floating Point Numbers - This is Where Things Get Weird!

Gary Explains

มุมมอง 5 060

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 15 ก.ค. 2024
Following on from my video about Fixed Point Numbers, now it is time to look at Floating Point Numbers. On the way we will look at binary fractions and why in C you can't increment a float above 16777216! Strap in, things are about to get weird!
---
Let Me Explain T-shirt: teespring.com/gary-explains-l...
Twitter: / garyexplains
Instagram: / garyexplains
#garyexplains
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 43

@gregholloway2656 2 หลายเดือนก่อน
Great video, Gary. Glad you pointed out the classic 7 sig-fig rounding problem. One other snag, for programmers, is:
If (floata == floatb) // dangerous comparison
👍
@TheEulerID 2 หลายเดือนก่อน
For transactional financial calculations decimal fixed point representation is generally preferred because of things like rounding errors which accumulate in such things as running totals, and balances. Sometimes these can involve tens or hundreds of thousands of accumulations. That's one of the reasons that many older computers had decimal number representations, usually packed decimal, along with the instructions to support it. Typically those could support up to 31 decimal digits of precision.
Also, IEEE754 does allow for base 10 as well as base 2 exponents.
I'd also add that cumulative FP rounding errors can quickly become apparent, even in something like an Excel spreadsheet and you have to be very careful to avoid those sort of problems.
@Chalisque 2 หลายเดือนก่อน ⁺¹
While floating point numbers are stored as binary fractions between 1 and 2 with an exponent, it's actually equivalent to an integer multiplied by a power of 2. For example 1.5 is equal to 3×2^-1, and 1.25 is equal to 5×2^-2 and so on. So for example if we have a binary fraction with 8 bits after the 'decimal point', together with an exponent, we can shift the mantissa to the left by 8 places and subtract 8 from the exponent to get the same number. (I find this picture especially useful in digital audio when considering the difference between 24bit integer and 32bit float PCM.) The advantage of the binary fraction approach is one extra bit of mantissa precision for the same total number of bits. The clever observation is that for binary numbers that are not identically zero, the leading digit is _always_ 1. Thus if the number is nonzero, you don't have to actually use a bit to store the leading digit. (So they use the entirely-zero bit pattern as a special case to store zero.)
@aleksandardjurovic9203 2 หลายเดือนก่อน
This is great. Thank you!
@lalmuanpuiamizo 2 หลายเดือนก่อน ⁺³
1:42 picture is not correct, it should be 32768, not 32786
@GaryExplains 2 หลายเดือนก่อน ⁺¹
A typo.
@drfrancintosh 2 หลายเดือนก่อน ⁺¹
Another great explainer. I recently read there is a new format specifically for numbers between -1 and +1 - used in AI / ML. Would you be able to get into that? It's supposed to be an order of magnitude faster than IEEE-754
@thelastofthemartians 2 หลายเดือนก่อน
I've only ever used floating point numbers as a last resort (for the reasons you have pointed out). If, for example, you want your microcontroller to monitor room air temperature, then you can easily represent -50°C to +50°C in a 2 byte integer with 2 decimal precision. As a bonus, your program will be smaller and faster (as if anyone cares about that these days :D ) and will exhibit a lesser "astonishment factor".
@toby9999 2 หลายเดือนก่อน
You could do that, but the floating point numbers as we have them implemented in most languages are more than accurate enough for storing temperatures. For instance, the "double" type in C++ is 8 bytes and is accurate to at least 14 digits for whole numbers, with a 52 bit mantisa. I work on commercial software where all numbers are represented internally by the 64bit variant of the floating point type "double" with absolutely no problems. The said, I'll never use a 32bit "float" for the reasons presented in this video.
@stuartajc8141 2 หลายเดือนก่อน ⁺¹
There is FP64 too, for big scary numbers (AKA Float64, or Double-Precision)
@GaryExplains 2 หลายเดือนก่อน ⁺¹
I thought I covered that in the video as long double?
@alatnet 2 หลายเดือนก่อน
@@GaryExplains you did, around 5:20.
FP64 seems to just be a 64-bit representation of a floating point number, exactly how you described it in the video being Double Precision.
@stuartajc8141 2 หลายเดือนก่อน
@@GaryExplains Whoops, I missed that
@Chalisque 2 หลายเดือนก่อน
And FP128 and FP256, but these are not currently implemented in hardware. And if you really totally want to crazy, there's arbitrary precision done in software (e.g. via the gmp library) so you can have as many bits as memory permits. (Those deep mandebrot zooms make copious use of this.)
@IvanToshkov 2 หลายเดือนก่อน
10:08 Don't use floating point for currency! It's not going to be OK. Used fixed points instead.
Floating point is developed so that it can efficiently store both very big and very small numbers. The precision goes down when you store bigger numbers. The errors can easily crop up even with smaller numbers when you do arithmetic operations with them. This is not acceptable for monetary computations.
@lale5767 2 หลายเดือนก่อน
11:53
@roysigurdkarlsbakk3842 2 หลายเดือนก่อน
There's FP4 too ;)
@Eugensson 2 หลายเดือนก่อน
Would you cover Posit floating point format?
@GaryExplains 2 หลายเดือนก่อน
I think that might be a little too niche. Sorry.
@Apocalymon 2 หลายเดือนก่อน
@@GaryExplainsit definitely is not if my local meathead cobbler knows about it & he dislikes STEM subjects
@Benjiq8787 2 หลายเดือนก่อน
I want to hear you say “if you want to understand 3body problem, please let me explain “
@GaryExplains 2 หลายเดือนก่อน ⁺¹
That would be after my Nobel prize ceremony!
@Benjiq8787 2 หลายเดือนก่อน
@@GaryExplains look forward to that ;)
@GaryExplains 2 หลายเดือนก่อน ⁺¹
😂
@ernstoud 2 หลายเดือนก่อน
Whereby 42 is an integer, namely that what you get if you multiply 6 by 9.
@GaryExplains 2 หลายเดือนก่อน
eh?
@ernstoud 2 หลายเดือนก่อน
@@GaryExplains Your thumbnail. And the answer to life, the universe and everything. Douglas Adams’ book Hitchiker’s Guide to the Galaxy.
@GaryExplains 2 หลายเดือนก่อน
I know what 42 means, I just didn't understand your comment.
@ernstoud 2 หลายเดือนก่อน
@@GaryExplainsHumor, always difficult.
@Apocalymon 2 หลายเดือนก่อน
@@ernstoud42 is too cliché now as a nerdy joke
@electrodacus 2 หลายเดือนก่อน
2^15 is 32768 not 32786 :)
@GaryExplains 2 หลายเดือนก่อน
Yeah, it is called a typo.
@CStoph1979 2 หลายเดือนก่อน
Only the most astute fans know the true meaning behind the number 42.
yes it has one.
no, you can't look it up.
the answer is hilarious, and profound, and superbly sublimely simple, its amazing that it wasnt figured out decades ago.
no i cant tell you, yes i can be persuaded to give a hint if you'd like to figure it for yourself
@bpark10001 2 หลายเดือนก่อน
Your 15 column is on error. It should be 32768, not 32786. Floats are used WAY TOO MUCH.
Currency is NOT represented as floats & rounded. That's why there continues to be packed BCD representation in computers.
@GaryExplains 2 หลายเดือนก่อน
Thanks for spotting the typo, but several other people have spotted it as well. But it is just a typo and not really worth much fuss.
@lezbriddon 2 หลายเดือนก่อน
I propose dropping all this complication and we store everything in bcd, thats 2 digits per byte and storage is cheap now. you can even assign a byte to state length of bcd string and a 2nd byte to give length of anything fractional so thats 255 bytes or 510 digits then '.' 510 digits ...... would make for interesting routines for math...
@Kabodanki 2 หลายเดือนก่อน ⁺¹
Storage is cheap, we are not talking about ram or ssd, but CPU registers
@lezbriddon 2 หลายเดือนก่อน
@@Kabodanki irrelevant, because in BCD you only deal in pairs of digits for any math, just like with pencil and paper...... 5 bit accumulator can hold the result of 2 digits as the biggest result you will see is 9+9. Of course division is tricky but not that hard.
@toby9999 2 หลายเดือนก่อน
@@lezbriddonIt's not irrevent if you value performance. A 64 bit floating point value can be held in a 64bit cpu register, whereas something huge like you describe cannot. And calculations would require software emulation... even slower. But if performance is not an issue, yes, that or one of many other approaches can be and are used. I do rember using BCD way back in the 70s because registers were only 8bit, abd 8bit was insiffient for any decent sized numeric representation.
@lezbriddon 2 หลายเดือนก่อน
@@toby9999 yup my method is simply taken from clown computing 101, but it works and unlike bitwise math, has infinite granularity, if you have infinite storage.....

ต่อไป

เล่นอัตโนมัติ