Double it Like Dekker: A Remarkable Technique to Double Floating Point Precision (Part 1)

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 มิ.ย. 2023
  • This video is an introduction to Double Double arithmetic. Which is a technique used to greatly improve the precision of floating point computations in programming. It was first proposed by T. J. (Dirk) Dekker in his paper from 1971.
    Link to Dekker's original paper: csclub.uwaterloo.ca/~pbarfuss...
    Support What's a Creel? on Patreon: / whatsacreel
    FaceBook: / whatsacreel
    Software used to make this vid: Visual Studio 2019 Community: www.visualstudio.com/downloads/
    Audacity: www.audacityteam.org/
    Davinci Resolve 16: www.blackmagicdesign.com/prod...
    OpenOffice: www.openoffice.org/
    Gimp: www.gimp.org/

ความคิดเห็น • 189

  • @TerjeMathisen
    @TerjeMathisen ปีที่แล้ว +141

    Not only is having twice as many mantissa bits pretty nice, it is so useful that the last time the ieee754 FP standard was updated (2016-2019), we added two new instructions:
    AugmentedAddition() and AugmentedMultiplication(), both of them take the usual two inputs, but they then deliver two results, the high and the low part, so that together the operation is always exact. This can typically be implemented in FPUs with no more than one extra cycle of latency, and they are also very good building blocks for even more precision, i.e. using arrays of double to implement arbitrary precision.

    • @marcomoreno6748
      @marcomoreno6748 ปีที่แล้ว +5

      Commenting to bookmark this. I enjoy reading about IEEE754 standard for purposes of deterministic lockstep networked applications and somehow I missed this. Will check out later.

    • @oscarsmith3942
      @oscarsmith3942 ปีที่แล้ว +4

      Note that `AugmentedMultiplication` isn't really necessary because you can compute it with a single fused multiply add instruction (fma). Once you have computed c=a*b, you can compute c_lo = fma(a, -b, c) which will preduce exactly correct result for the multiplication low bits.

    • @TerjeMathisen
      @TerjeMathisen ปีที่แล้ว +10

      @@oscarsmith3942 You are of course correct. It is in fact the existence of fused FMA which makes the Dekker add/sub algorithm really useful, even though it typically takes at least three times as long as a single FADD: FMUL + FMA is "only" twice as slow as a single FMUL, while both Augmented operations can realistically run in zero or one cycle more than the plain ops.
      In the end this makes extended-precision operations at least twice as fast as what is possible today.

  • @FloydMaxwell
    @FloydMaxwell ปีที่แล้ว +18

    I studied chemical engineering at the University of British Columbia in the late 70s and early 80s. It was a demanding bachelor's program with many labs, including an 8-hour chem lab, and you had to do a bachelor's thesis (very unusual in itself). In addition, you had to do a summer project (before your final year). In the project I chose, I took over the bachelor's thesis work of two previous students. Their efforts ...had not worked, and Dr. Janis Lielmezs wondered if I could get it to work. Long story short: the program needed double double precision instead of just double precision. Dr. Lielmezs was most pleased with the result. And passed away in 2021.

  • @friendlyoctopus9391
    @friendlyoctopus9391 ปีที่แล้ว +83

    One place these "old tricks" are very useful is when programming GPUs. Some GPUs are significantly faster with floats than with doubles (Nvidia) and some don't support doubles at all (Intel Xe) - identifying where your precision bottlenecks are and applying these methods gives you much better memory use and performance while keeping precision... As long as you're not using a device compiler that doesn't let you turn off unsafe floating point optimizations that throw them away (looking at you, Intel).

    • @Spielix
      @Spielix ปีที่แล้ว +13

      For most Nvidia GPUs you are right that the peak double precision performance (i.e. FLOP/s) is worse by a factor of 64. Just two refinements of that statement: 1. High-end server accelerators (P100, V100, A30, A100, H100) have good (i.e. only a factor 2 worse than single precision) double precision performance. 2. That performance difference only has practical implications for algorithms that are not memory-bound (low arithmetic intensity) anyway. E.g. for properly implemented dense matrix multiplication (DGEMM) you should be able to see it, but not for vector addition (DAXPY).

    • @lpi3
      @lpi3 ปีที่แล้ว +10

      It's also useful for microcontroller programming. Cortex m4 has floating point 32 bit instructions (float), but no 64 bit (double)

  • @Florian-sh9nf
    @Florian-sh9nf ปีที่แล้ว +25

    When you think he couldn't become any cooler and he suddenly plays the piano as an intro. Legend!

  • @Intermernet
    @Intermernet ปีที่แล้ว +61

    Love the skills on the piano my dude. Also, thanks for explaining the topic so well.

    • @FadkinsDiet
      @FadkinsDiet ปีที่แล้ว +2

      With those skills I hope he can get the piano tuned.

    • @lorenzodiambra5210
      @lorenzodiambra5210 11 หลายเดือนก่อน +1

      3:56 *The death of a 0.3 floating point is a tragedy, the death of millions people is a statistic*
      cit

  • @andersjjensen
    @andersjjensen ปีที่แล้ว +3

    The legend returns. He comes baring his usual gifts of entertaining madness sprinkled with useful knowledge. And we are grateful.

  • @jacob_90s
    @jacob_90s ปีที่แล้ว +20

    I've used compensated arithmetic on a number of occasions. It's very handy on embedded devices that hardware float support but don't support doubles.

  • @warmCabin
    @warmCabin 11 หลายเดือนก่อน +2

    Incredible! With twice the memory, we can store twice the information!

    • @MrRobertX70
      @MrRobertX70 11 หลายเดือนก่อน

      LOL, that was my reaction too

  • @johnteran8889
    @johnteran8889 ปีที่แล้ว +14

    This is pretty cool. I can see how it extends to quad double but with significantly more bookkeeping.

    • @WhatsACreel
      @WhatsACreel  ปีที่แล้ว +3

      Yep, it just keeps going! Cheers for watching mate :)

  • @imrank340
    @imrank340 ปีที่แล้ว +3

    Never quite got my head around with Dekker's Double Double computation. But you made it significantly easy particularly carry Over or under, Thank YOU Very Very Much, my Aussie Friends, I always look out your video.

  • @dainxor
    @dainxor ปีที่แล้ว +3

    Curiously enough, a while ago, when i was wondering around the c++ std lib, i found some functions using this method, even got to understand some of it, but now i know how ingenious it really is, great video man!

  • @oresteszoupanos
    @oresteszoupanos 11 หลายเดือนก่อน

    Can't BELIEVE that I missed a new Creel vid for 2 weeks!
    For shame! Thanks for the cool info as always, matey!
    P.S. In the UK, we honour Dekker's algos all over the country with our DoubleDekker buses. ^_^

  • @Jhat
    @Jhat ปีที่แล้ว +4

    Man I wish your upload schedule was more frequent 😅 all your content is so interesting and insightful.

  • @hjups
    @hjups ปีที่แล้ว +23

    Neat! I had always wondered how this was done, but never took the time to understand it. A lot of GPU based computational physics codes use this method given a limited number of FP64 units (depending on the problem scale, the exponent in FP32 is sufficient, but we really want higher decimal precision).
    That does lead to an obvious question though: does it need to be a "round to nearest" operation? Or will this work with any rounding mode assuming it's consistent, such as round to 0 (truncation)?
    Anyway, looking forward to the multiplication and division!

  • @slow_learner
    @slow_learner ปีที่แล้ว +2

    Ty Creel.... love how precise and in depth your videos are. Piano skills are killer as well

  • @Rottingflare
    @Rottingflare ปีที่แล้ว

    This was a fascinating topic you brought up! Really appreciate all the concepts you find worthy enough to discuss!!

  • @MartinPHellwig
    @MartinPHellwig ปีที่แล้ว

    That intro is brilliant, thanks for brightening my day, now on to the actual content.

  • @AT-zr9tv
    @AT-zr9tv ปีที่แล้ว

    This was thoroughly entertaining! Your delivery is clear and super fun. Love your humor. Nerd out

  • @MAli-wu4rx
    @MAli-wu4rx ปีที่แล้ว +1

    Welcome back man! You are the best!

  • @codemonkey6173
    @codemonkey6173 11 หลายเดือนก่อน

    Nice to see you back!

  • @Antonio-yy2ec
    @Antonio-yy2ec ปีที่แล้ว

    Just what I needed, your videos are pure gold

  • @slembcke
    @slembcke ปีที่แล้ว +3

    Interesting! I've implemented 256 bit fixed point before (64.196), and figuring out on efficient-ish assembly implementation for multiplication and reciprocal was tricky! I figured it was possible to chain multiple floating point values together as well, and it's nice to have it spelled out. Thanks! :)

  • @osakaandrew
    @osakaandrew ปีที่แล้ว +18

    Canadian programmers of almost every skill level know all about double-double. Many employ it literally daily today. And for us, the OG is A Cruller. Tim Horton's got most of the credit for popularizing this, but it was around for yonks before that.

    • @bytesandbikes
      @bytesandbikes ปีที่แล้ว +10

      Now I need to order a Tim Horton's "Quad precision float". I wonder if I'll end up with a milkshake.

  • @dankillinger
    @dankillinger ปีที่แล้ว +1

    babe wakeup, Creel vid dropped

  • @WyrdieBeardie
    @WyrdieBeardie 8 หลายเดือนก่อน

    Your videos are absolutely fantastic!

  • @mhcbon4606
    @mhcbon4606 ปีที่แล้ว

    it has been a while have not seen your video ! I am delighted.

  • @barakap2372
    @barakap2372 ปีที่แล้ว +2

    Yessss ur back what took so long

    • @WhatsACreel
      @WhatsACreel  ปีที่แล้ว +1

      Nothing in particular mate, cheers for watching :)

  • @rstewart2702
    @rstewart2702 หลายเดือนก่อน

    Hmmm. Well, I guessed correctly about Dekker, and would not have had to asked my earlier question if I had bothered to watch the whole thing before asking my earlier question. He was indeed Dijkstra’s colleague, another one of those early titans of computing science…

  • @tbird81
    @tbird81 ปีที่แล้ว +7

    Why did they not call it "double Dekker" arithmetic?!

    • @unvergebeneid
      @unvergebeneid ปีที่แล้ว

      Nice :)

    • @wrc1210
      @wrc1210 11 หลายเดือนก่อน +1

      Came to say the same thing!

  • @rjones6219
    @rjones6219 ปีที่แล้ว

    I first encountered the concept of double precision when reading about programming the DEC PDP8E. That was nearly 50 years ago. 😊

  • @ktownjunkie
    @ktownjunkie ปีที่แล้ว

    the first equation reminds me of the equations used to encode and decode video or audio signals that have multiple channels... It always looked weird to me, but it is making more sense now. thankyou for this video!

  • @criticalpoint2779
    @criticalpoint2779 ปีที่แล้ว

    My goodness! Finally some good f***ing content

  • @timpalmer4939
    @timpalmer4939 4 หลายเดือนก่อน

    The font on his slides is terrible. But this video is very helpful! One of the best videos I've watched on floats so far

  • @VioletGiraffe
    @VioletGiraffe ปีที่แล้ว

    Never heard of this technique, thanks!

  • @ciberman
    @ciberman ปีที่แล้ว

    I didn't watch the video yet, but that is the best intro ever

  • @angeldude101
    @angeldude101 ปีที่แล้ว +1

    I wasn't sure what the result would be initially, but I quickly remembered Posit documentation on a use of the proposed "quire" to round the value, subtract that value from the quire, and then round again and repeat for however much precision the quire can store. Here we don't have a higher precision register to use, but the core principle seems to be the same: representing a higher precision value as the sum of multiple values with differing levels of precision.

  • @kenhaley4
    @kenhaley4 11 หลายเดือนก่อน +6

    It would have been interesting to see what happens if the decimal point locations in the two operands were farther apart. E.g., adding 470 + 0.0022. I'm sure this method works, but it would be interesting to see how the smaller number completely disappears in the result.

    • @damian_smith
      @damian_smith 11 หลายเดือนก่อน +2

      Sure, but I also think that he's provided enough info for us to experiment with excel or pen and paper.
      Just gave it a go! Your example actually yields C = 470, c = 0.0022 -> no loss of precision: the lower hal just isn't adjacent to the upper.
      However, if you use 4709 and 0.002216, you get 4700 and 9.0 so all the lower digits disappear.

  • @rogerfroud300
    @rogerfroud300 11 หลายเดือนก่อน

    The idea of changing the order of calculations to reduce the error works for integer and floating point arithmetic too.

  • @algorithminc.8850
    @algorithminc.8850 ปีที่แล้ว

    Hahaha ... enjoyed the piano ... checked out your music channel too ... good stuff. Great video on this ... back in the days of early microcontrollers and the like, worked out or used all sorts of methods to computer various precisions. Thanks much for the video. Cheers from sunny Florida, USA.

  • @child_of_god_
    @child_of_god_ ปีที่แล้ว

    i bumped into this method when i got different results double additions by the cpu vs using nvidia cuda. now i know the history of this. cool

  • @steubens7
    @steubens7 ปีที่แล้ว

    even without the extra precision, being able to check how far away from the precise answer this easily is great. it can be a sanity check for code modifications or let you know when long term drift starts being huge in long term accumulated values

  • @electrolyteorb
    @electrolyteorb 11 หลายเดือนก่อน

    ❤ please upload more often

  • @esra_erimez
    @esra_erimez ปีที่แล้ว

    This is a doubly good video

  • @romancharak3675
    @romancharak3675 ปีที่แล้ว

    Fascinating !

  • @leyasep5919
    @leyasep5919 ปีที่แล้ว

    This structure reminds me a bit of the "lifting scheme" of reversible wavelet transforms. Noyce !

  • @IanBLacy
    @IanBLacy 11 หลายเดือนก่อน

    I’d like to propose doubly-imprecise floating point. Use one floating point number to hold the mantissa and one to hold the exponent

  • @MaxTSanches
    @MaxTSanches ปีที่แล้ว

    Early in the 1970's one of the first computer languages IBM BASIC required that you set your varibles to Binary, Single Presicision, or Double Prescision. Binary only used one bit, Single used eight bits, and Double used sixteen bits. This was when our school's computer only used a eight bit chip and only had 4K of memory.
    .

  • @esra_erimez
    @esra_erimez ปีที่แล้ว

    I really really liked this video

  • @hobbified
    @hobbified ปีที่แล้ว +1

    I haven't worked with this, but I have used Kahan summation, which is based on the same core idea ­- you're adding together a bunch of regular doubles, and your output is a regular double, but you keep double-double precision on the running sum (and just forget about the lower part once you're done, that's your un-representable roundoff error). In this special case, you can skip about half of the floating point operations compared to doing full Dekker addition each time.

  • @asandax6
    @asandax6 ปีที่แล้ว

    Very interesting this video came out right after I was playing with adding or subtracting and dividing numbers by 2. I was trying to figure out a naive way to efficiently store the result. The reason for this is odd numbers have a .5 result so this needs an extra bit of data to store but modern computers use 8bit as 1 byte so that means I'd be wasting 7bits at best and 8bits at worst. I ended up just using 1byte to store the .5 for the next 8bytes and that's the best way I've found so far but I was also considering investigating variable length encoding.

  • @wumi2419
    @wumi2419 ปีที่แล้ว +1

    I learned about this when I had way too much error and wanted to reduce it. There was a lot of summation, so Kahar sums were very helpful (they basically do the same, but compensator is carried over instead of being stored as part of value)

  • @IcyMidnight
    @IcyMidnight 10 หลายเดือนก่อน

    Nice video 🙂
    Woulda been nice to have a quick run through with overflow.

  • @ChrisM541
    @ChrisM541 ปีที่แล้ว

    What a super interesting video, thanks Creel. I wonder if some game engines include this as an option in order to increase e.g. 3D precision?

  • @alecgamer420
    @alecgamer420 11 หลายเดือนก่อน

    Your videos are goated

  • @activelow9297
    @activelow9297 ปีที่แล้ว

    The bucket of fish is back! I don't use this technique.. I use two 64-bit integers. One of them covers all integral values between -2^63 and +2^63-1. The second integer covers all fractional values from 1/1e17 to 1-(1/1e17). Much easier and faster!

  • @stuartmcconnachie
    @stuartmcconnachie 11 หลายเดือนก่อน

    Double Dekker arithmetic.
    So famous they named a type of British bus after it.

  • @ianboard544
    @ianboard544 11 หลายเดือนก่อน

    I've done this for years - just never knew what it was called - especially in multiply-accumulate expressions like polynomial evaluation or FIR filters.

  • @yash1152
    @yash1152 9 หลายเดือนก่อน

    5:26 that ligature with fl at the base line. while youtube making the fl ligature at their heads. woah.

  • @zxborg9681
    @zxborg9681 ปีที่แล้ว +2

    Very interesting,and well presented. I know of Dekker from the mutex work but had not seen this before. Fun fact: up here in Canada, a double double usually refers to a coffe with double sugar and double cream, usually from a coffe-and-donuts store like Tim Hortons.

    • @PaulPassarelli
      @PaulPassarelli ปีที่แล้ว +1

      You spel't "Dunkin Donuts" wrong. ;^)

  • @Czeckie
    @Czeckie ปีที่แล้ว

    mad piano playing my friend

  • @ZomB1986
    @ZomB1986 ปีที่แล้ว

    This brings me back to 2006 when I ported dhbailey's doubledouble in Java and found and fixed a lot of bugs, like corner cases in exp() and found some very specific "Devil's values" as I'd like to call them, that really stress the algorithm.

    • @ZomB1986
      @ZomB1986 ปีที่แล้ว

      After watching the video, the algorithm I have (and verified on paper) is not using IF statements, and is more complicated. It has many operations as in your p++ code, if the if and else statements are both counted, and then some.
      public DoubleDouble add(DoubleDouble y) {
      double a, b, c, d, e, f;
      e = this.hi + y.hi;
      d = this.hi - e;
      a = this.lo + y.lo;
      f = this.lo - a;
      d = ((this.hi - (d + e)) + (d + y.hi)) + a;
      b = e + d;
      c = ((this.lo - (f + a)) + (f + y.lo)) + (d + (e - b));
      a = b + c;
      return new DoubleDouble(a, c + (b - a));
      }

  • @AnotherPointOfView944
    @AnotherPointOfView944 10 หลายเดือนก่อน

    Hi, great video as always. You present very well!
    Can you do one on QUATERNIONS please, in your inimitable style!!!

  • @Crush_Netics
    @Crush_Netics 11 หลายเดือนก่อน

    You should upload videos on solving crackme(s). That would be awesome!

  • @ruadeil_zabelin
    @ruadeil_zabelin ปีที่แล้ว +2

    I play the piano too. I love being explained things that have nothing to do with the piano.

  • @aonodensetsu
    @aonodensetsu ปีที่แล้ว

    now i need this extended to infinite (arbitrary) precision with some clever loops and arrays

  • @rstewart2702
    @rstewart2702 หลายเดือนก่อน

    Sorry I’m so late to the party! Just discovered your marvelous AVX512 introduction. Is TJ Dekker the same Dekker who derived the correct mutual exclusion algorithm at the Dutch Mathematical Center in the 1950’s, when EW Dijkstra was building the THE operating system?

  • @Shifticek
    @Shifticek 11 หลายเดือนก่อน +1

    When you rounded when calculating A+B do you mean actual round or ceil?
    E.g if the result was 154 would you go to 160 or 150?
    Thanks for your videos, you explain complicated stuff in really understandable way

    • @Erhannis
      @Erhannis 10 หลายเดือนก่อน +1

      I think it was more of a ceil or truncation - not sure how it'd deal with negative numbers, but your example would be 150 (and 4 remainder in the other half of the computation) I think

  • @user-xe8oi5oq6c
    @user-xe8oi5oq6c ปีที่แล้ว

    I'll watch the vid. It's interesting to compare with Kohan-Babushka-Neumaier and Shevchuk algorithms.
    P.S. Watched already. Nice. You actually use KBN algo for adding higher and lower parts of double double.

  • @sb_dunk
    @sb_dunk ปีที่แล้ว +1

    Gotta get eyes in the sky so that we can high in the sky, get a birds eye view.

  • @Sonnentau1
    @Sonnentau1 ปีที่แล้ว

    Question. At one point we made sure to have the bigger number at first for substracrion of R. But your c++ code did not show any parentheses to make sure the order ist carried out that way? Because if I am not mistaken either clang or gcc is doing its operations from right to left.... Or won't this matter?

  • @GMHarpov
    @GMHarpov หลายเดือนก่อน

    great piano.

  • @Datamining101
    @Datamining101 10 หลายเดือนก่อน

    7:43 I was thinking Jemaine Clement from Flight of the Conchords

  • @OpenGL4ever
    @OpenGL4ever 11 หลายเดือนก่อน

    @Creel
    Could you make a comparison video between AMD's 3DNow! and Intel's SSE (1st SSE Version)? Which one was better? What were the pros and cons of each? Which was easier to use from a programming point of view, more powerful, more versatile etc.?

  • @GrayBlood1331
    @GrayBlood1331 11 หลายเดือนก่อน

    Surely there are op codes to handle this quickly. Just like how if you wanted multiplication (or floating point) for the 6502, it could be done, but slowly. So CPU designers added op codes to later chips to greatly speed these operations up.

  • @yash1152
    @yash1152 9 หลายเดือนก่อน

    6:43 so, its like a tuple right? in the same way a complex number, and also a rational number is. its nice to think about how to do operations on 2 of these dd's. that the result of op on error has to be added to the main part, then leftover propagated back.
    feels like how cascading in electronic full adders work.

  • @MaxCareyPlus
    @MaxCareyPlus ปีที่แล้ว +1

    Soo... All aboard the double Dekker bus? 🚌😂

  • @SteveBakerIsHere
    @SteveBakerIsHere ปีที่แล้ว

    A practical problem that I have is in using a GPU. It has double precision numbers - but in 3D graphics - we have to use the available hardware interpolators that can only cope with single precision. I'm trying to get my head around whether a similar Dekker trick would allow me to use hardware interpolators to interpolate double precision using single precision hardware.

  • @andrewporter1868
    @andrewporter1868 ปีที่แล้ว

    1:48 I'm convinced I'm watching a legitimate madman, and this is him showing his true colors. If you haven't found it already, you've got to find yourself the "real instructions ||| mental disorders" meme image for x86. I think you've personified it 😂

  • @bbbl67
    @bbbl67 8 หลายเดือนก่อน

    In Canada, a double-double is a coffee with 2 cream and 2 sugar.

  • @c4ashley
    @c4ashley ปีที่แล้ว

    No but it does have something to do with playing piano, because... like, if you... well you see, the... it's like when you... the tone of the string contains harmonics of various factors of the fundamental frequency, which is analogous to the variable containing subvariables of various factors of precision of the number. Boom. Nailed it.

  • @PaulPassarelli
    @PaulPassarelli ปีที่แล้ว

    Hmmm. Many years ago I wrote a floating point emulator for the 8087 family of Intel co-processors. Since it was a full blown but granular co-processor emulator, I had to implement the chip family's 80-bit registers. So it was just as accurate as common ISO 64-bit doubles. Since then I've also had to code numerous 'fixed point' math systems as well to process hard real-time data for ECU systems.
    If total flexibility was the goal, then I'd simply extend my co-processor library out to arbitrarily many bits in the registers increasing both the absolute magnitude and the bits of precision whether that'd be 128bits (double-double) 160 bits (double-80), 240 bits (triple-80), 256 bits (quad-double)or more, perhaps 512 bits (64 byte or octo-double) the code is almost unchanging, and with early out detectors for rational data the speed penalty for such super precision is quite minimal.
    What's even better, assuming the C compiler still supports the '/Emulator' switch, the original (simple) compiled code would run on any computer without additional modification, or could make use of the co-processor for faster first pass estimations.
    Anyway, thanks for the trip down memory lane -- no pun intended

  • @macsmola
    @macsmola ปีที่แล้ว

    Now im thinking how do I go about outputting this nicely formatted?

  • @fredrikbergquist5734
    @fredrikbergquist5734 ปีที่แล้ว

    Is this a way to better the result when subtracting two numbers that are very close?

  • @chickenduckhappy
    @chickenduckhappy ปีที่แล้ว

    Great video, thanks! 🙂 Just one tiny bit pick: why is your metronome obviously 17 times as expensive as your camera? 😎

  • @cmyk8964
    @cmyk8964 11 หลายเดือนก่อน

    A “double” precision solution? Devised by a Mr. “Dekker”? If God exists, he must absolutely love puns.

  • @1verstapp
    @1verstapp ปีที่แล้ว +2

    >picture
    obviously fake - no pocket protector. 😀. your channel, you post when you want to.

  • @uplink-on-yt
    @uplink-on-yt ปีที่แล้ว

    Does that second number have a name? If it were integer division, it would be called the "remainder". This value in floating point rounding plays a similar role. I could see it being called a "remainder" too, as maths don't shy away from reusing terms as long as it's clear from context what the meaning is.

  • @jimf2525
    @jimf2525 11 หลายเดือนก่อน

    Very nice video. It was a bit frustrating that you didn’t include the information that X has to be the ‘larger’ number, or more specifically the number with digits in the same places as R.

    • @jimf2525
      @jimf2525 11 หลายเดือนก่อน

      SORRY! You explained that in minute 14. I just started thinking about it on my own b4 I got there. Yea me and you!

  • @X_Baron
    @X_Baron ปีที่แล้ว

    There has to be a display/print/string conversion algorithm too, right?

  • @aleksandersabak
    @aleksandersabak 11 หลายเดือนก่อน

    Does this addition algorithm always produce the most accurate double double result? One edge case that wasn't shown here is when the overflow in r happens it will lose information about the least significant digit of the sum due to the overlap between r and R. This will not be recovered by computation of c and we will be left with c which will definitely have a zero in the lowest digit, losing one bit of precision.

  • @KipIngram
    @KipIngram 23 วันที่ผ่านมา

    15:48 - Well, if the real answer is 157 and the computed answer is 160, I'd say the error is +3, not -3. So I'd call r the "correction" or something like that rather than the error. Or perhaps the "residual" - maybe that's why R and r are used for it in the first place.

  • @dejaphoenix
    @dejaphoenix ปีที่แล้ว +9

    Sounds like a lot of toil and trouble.

    • @WhatsACreel
      @WhatsACreel  ปีที่แล้ว

      Hahaha, nice! :)

    • @SameAsAnyOtherStranger
      @SameAsAnyOtherStranger ปีที่แล้ว

      Not really. It just seems like that when something so simple is dragged out into a 22 minute video.

  • @codenamelambda
    @codenamelambda ปีที่แล้ว

    I do wonder how it compares performance wise with emulated floating point using integer arithmetic to get the same precision.....

  • @unvergebeneid
    @unvergebeneid ปีที่แล้ว +1

    Wait, if _r_ overflows, don't we lose precision already? How are we going to get it back if we don't split _r_ itself into an upper and a lower part? I'm sure I missed something but I can't see it right now.

    • @Axman6
      @Axman6 ปีที่แล้ว

      The idea is that if r ended up being 15.2 in the example, we’d need to round it to 15 losing the 0.3, but the 1 in the tens place can also be stored in R by adding it, and then finding that we’d be left with 5. Applying the final clean up,you’d recover the .3 and end up with R=170 and r=5.3

  •  11 หลายเดือนก่อน

    How does this compare to using 128-bit floating point numbers instead?

  • @INT41O
    @INT41O ปีที่แล้ว +2

    There is a better formula by Knuth where we do not need to order the values by magnitude:
    R = a + b
    c = R - a
    r = (a - (R - c)) + (b - c)
    now you can add stuff like 1 + 1e100

    • @michaeldeakin9492
      @michaeldeakin9492 11 หลายเดือนก่อน

      in my experience, Dekker's formula is faster on modern CPUs. I'd guess it's because branch prediction is good enough for this case so it's faster than adding extra floating point operations.

    • @INT41O
      @INT41O 11 หลายเดือนก่อน

      @@michaeldeakin9492 I doubt that, but maybe I will check when I have time. I don't think branch prediction actually works well in this case, since the relative size of the values is unknown and basically impossible to predict.
      It all comes down to what the compiler actually does. The intel compiler for instance uses logical masks to avoid branching (fp-model=precise needs to be used, otherwise the error value is optimized to zero).
      For simd branching is out of the question anyway.

    • @michaeldeakin9492
      @michaeldeakin9492 11 หลายเดือนก่อน

      @@INT41O I have actually tried it somewhat recently; for my use case (exact predicate evaluation) it was faster. I'd link my project, but youtube doesn't seem to let any post through pointing to it at all.
      And while branches won't work with simd, blending will work. I don't know if that will be faster than Knuth's formula.
      BTW, Chandler Carruth's talk isn't quite the same but seems relevant here: CppCon 2017: Chandler Carruth “Going Nowhere Faster”
      Skip to the cmov section for the relevant part.

    • @michaeldeakin9492
      @michaeldeakin9492 10 หลายเดือนก่อน

      @@INT41O Okay, I reran some scalar benchmarks I had with 3 different methods, and implemented vector versions: scalar Dekker summation, scalar branchless Dekker summation via array indexing, scalar Knuth summation, vector (branchless) Dekker summation, and vector Knuth summation.
      Scalar performance ranking from best to very much the worst (as in absolutely awful) was as follows: Dekker summation, Knuth summation, branchless Dekker summation
      Vector performance was similar, Knuth summation was much better than branchless Dekker summation. I'm unsure why at this point

  • @siquod
    @siquod ปีที่แล้ว

    The funniest part was the super subtle sponsor segue spoof..

  • @Lexxaro
    @Lexxaro 11 หลายเดือนก่อน

    Would double double work in a case like 5700.018 + 1800.053 ?
    Because then it would actually have really weird, specific positions where it is much more than double the precision of a normal double.

  • @vaakdemandante8772
    @vaakdemandante8772 ปีที่แล้ว

    Nice trick, trading speed and space for precision. Weird choice of example values that result in no carry happening.

  • @dunda563
    @dunda563 11 หลายเดือนก่อน

    I wonder how double dekker doubles would compare to a single floating point number with double the bits? Are there any advantages perhaps? At the very least the method allows double the precision on hardware not built for it, and the two groups of bits don't have to be consecutive