CppCon 2018: JF Bastien “Signed integers are two's complement”

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 ก.ค. 2024
  • CppCon.org
    -
    Presentation Slides, PDFs, Source Code and other presenter materials are available at: github.com/CppCon/CppCon2018
    -
    There is One True Representation for signed integers, and that representation is two’s complement. There are, however, rumors of a fantasy world-before C++20-where ones' complement, signed magnitude and "pure binary representations" dwell. That world boasts Extraordinary Values, Padding Bits, and just like our world it hosts swaths of Undefined Behavior.
    Join me in exploring this magnificent fantasy world, and discover its antics. Together we'll marvel at how the other representations were forever banished from real-world C++, doomed to cast mere shadows onto our reality.
    -
    JF Bastien, Compiler engineer
    Apple
    JF is a compiler engineer. He leads C++ development at Apple.
    -
    Videos Filmed & Edited by Bash Films: www.BashFilms.com
    *-----*
    Register Now For CppCon 2022: cppcon.org/registration/
    *-----*

ความคิดเห็น • 89

  • @Bourg
    @Bourg 5 ปีที่แล้ว +41

    Post-CppCon update! The final approved wording for C++20 is present in P1236R1 (as voted by the committee on November 2018 in San Diego). It has math-y wording (instead of my engineering wording), leaves a bit more implementation freedom for bool, and doesn't resolve LWG3047 atomic compound assignment (Library will resolve it separately for C++20, including resolving the same issue in atomic_ref and atomic).

    • @MatthijsvanDuin
      @MatthijsvanDuin 5 ปีที่แล้ว +1

      Is the text online somewhere? The link I find is wg21.link/p1236r1 but it's not (publicly) accessible.

    • @Bourg
      @Bourg 5 ปีที่แล้ว +1

      @@MatthijsvanDuin It's available now. Mailings come out a few weeks after each meeting.

    • @movax20h
      @movax20h 4 ปีที่แล้ว

      Leaving bool a bit implementation defined is good. the storage 2's complement, is a good move forward. it simplifies a language a lot. Thanks for your work on this.

  • @OnWhenReady
    @OnWhenReady 5 ปีที่แล้ว +17

    Cool story :-) Awsome talk btw !!

    • @Bourg
      @Bourg 5 ปีที่แล้ว +3

      Exactly the response I was looking for :)

  • @TranscendentBen
    @TranscendentBen 7 หลายเดือนก่อน

    There's so much here ... I learned sign-magnitude, ones and twos complements in the late 1970s, and from the 8 and 16 bit microprocessors at the time, it was clear things were leaning toward twos complement. I learned (or started learning) C in 1986 and indeed by then virtually everything used twos complement. I knew that things such as referencing a null pointer was "undefined behavior" but not that signed integer overflow was. I forget when I finally learned that but it was years or decades later. It always bothered me because I KNOW what the equivalent assembly/machine code does, and I saw the purpose of a compiler was to generate equivalent code to the source code, and overflow was a natural occurrence of exceeding the bounds of the integer size, and everyone knows what the twos complement result will be.
    That's another thing, somewhere along the way - "integer" size was "clearly defined" by what original C standards there were as the size of the register word, but at least 16 bits (thus compilers generated 16 bit code even for 8 bit processors), and it commonly became 32 bits as compilers targeted newer 32 bit processors. C99 introduced the int8_t, uint8_t, int16_t, uint16_t etc. types and I thought to myself 20 years ago, why do people still use int? If you're not sure what processor you're targeting (my career has been embedded, so it could be 8-bit to 32-bit wordlength), or you're targeting several(!), you don't know how big an int is! So I started using the new types exclusively, so I always know variable size at a glance.
    Interesting that you mention MATLAB doing saturation, but that's also the standard operation for most DSPs. I was reading in the late 1990s how "mainstream" processors were adding DSP instructions (such as MAC, multiply-accumulate, multiply two numbers and add the product to a register), and as I recall, they may have been adding a saturation mode as well. Saturation is a much more appropriate way (better approximation to what the signal "should be") to handle overflow than "wraparound" in signal processing. Of course saturation is NOT part of any C or C++ standard that I've heard of, yet C and C++ are used almost exclusively for DSP programming. Programmers just know and accept that that's how DSPs work.
    But I can (now) see where different people expect certain things, and the standard committees have to somehow take these things into account. I've read such things about Microsoft Windows, people wrote production code that called Win system functions with wrong values, but the code still did something useful, and rather than "fixing" things MS has to make sure newer versions of Windows still work with such improper calls so that older apps don't break.

  • @TranscendentBen
    @TranscendentBen 7 หลายเดือนก่อน

    58:18 Bool always guaranteed to be 0 or 1 may seem at first insignificant, but it's a great feature, you can now write:
    out_of_bounds_count += (value > Max_value); // comparisons always return type bool
    which compiles to straight-line code, rather than
    if (value > max_value)
    out_of_bounds_count++;
    which (unless the compiler is Really Smart, often true but not always) compiles to branching code which on modern processors may clear the instruction cache, slowing things down substantially. The first expression is also quite easy to read, once you see that the comparison returns 0 or 1.

  • @movax20h
    @movax20h 4 ปีที่แล้ว +3

    12:25 Damn. What a prophet. I did watch this video, checked how is D compiler dealing with this on my platform, and it did actually good on "obvious" code (primary reason is that D has defined behavior on integer overflow and defined integer representation), but not so good on "workaround" cases. So yes, I did fill some bugs to gcc and llvm. :D Fortunately I can use __builtin_sadd_overflow in gdc very easily, and yes it does optimal code (especially after inlineing).

  • @User-cv4ee
    @User-cv4ee ปีที่แล้ว

    So, does this mean one can now rely on and assume 2's complement implementation after this passes the commitee?

  • @fdwr
    @fdwr 5 ปีที่แล้ว +6

    55:30 As an American, there is no need to apologize. That is the sane way to write dates.
    Nice to see the practical reality (that integers are 2's complement) and the spec align.

  • @movax20h
    @movax20h 4 ปีที่แล้ว +3

    I think it is good that signed integers are being defined as two complements, but that is not going to solve the signed integer overflow being undefined by itself. 2-compleness was always implementation defined, and is just a crust worth removing from the standard. There were no machines using it for last 30 years. Maybe there were some emulators for PDP-11 using it, but that it is all. If you want old stuff on this machines (and there is probably less than 10 people using it), just stick to old compiler version. Done.

    • @TheMrKeksLp
      @TheMrKeksLp 3 ปีที่แล้ว +1

      I really don't get their reasoning. If integers are stored as two's complement why not also define them as wrapping? That's what every platform will do, so why not define it. Processors with non-wrapping arithmetic are even rarer than ones not supporting two's complement, in fact I doubt _any_ exist

    • @hemerythrin
      @hemerythrin 2 ปีที่แล้ว

      @@TheMrKeksLp I know this comment is 1 year late, but for anyone else reading this in the future:
      Overflow on signed types is not defined, because that provides optimization opportunities for the compiler. Basically, the compiler assumes that every math operation on signed integers cannot overflow, and so it can generate more efficient code. This is not just theoretical, compilers really do use this. So if you just defined overflow for signed types, that would make existing code slower, sometimes much, much slower.
      Now you might say, "Why did they decide to tie the wrapping behavior to the signedness? Instead of forcing `unsigned` to also always implicitly mean `wrapping`, and `signed` to also mean `overflow_impossible`, why didn't they just add distinct `wrapping signed int` and `wrapping unsigned int` types?" And you would be completely right, that's probably what they should have done. But now C exists, and C++ is compatible with it, and people don't want their existing code to get slower.
      But hey, maybe one day they could fix this design mistake that's still haunting us so many decades later. I'm not holding out much hope, though.

    • @TranscendentBen
      @TranscendentBen 7 หลายเดือนก่อน

      I did mention DSPs and their use of saturation (it's not just MATLAB), and there's a lot more than 10 people using saturation, but still it's apparently too niche of a feature.

  • @thomasweller7235
    @thomasweller7235 ปีที่แล้ว

    3:22 how's that supposed to work? What about overflows(2,-1)? Don't consider UB at this stage. That code won't work in the first place.
    6:40 won't that give a compiler warning since an unsigned is compared to a signed?

  • @styleisaweapon
    @styleisaweapon 7 หลายเดือนก่อน

    good evidence out of the math guys that sum infinite series that twos complement is more fundamental than programmers realize

  • @iddn
    @iddn 5 ปีที่แล้ว +1

    Wouldn't having both unsigned and signed overflow be UB break some std::hash algorithms?

    • @seditt5146
      @seditt5146 4 ปีที่แล้ว

      @Peterolen There are some optimizations for RingBuffers as well that rely on overflow to behave properly to prevent needing to perform a check every single iteration. Much easier to make the container sized as a multiple of 2 and use Bitoperations to wrap as it can greatly increase performance as you are getting savings on every single lookup or write into the buffer.

    • @TheMrKeksLp
      @TheMrKeksLp 3 ปีที่แล้ว

      Unsigned and signed overflow ARE always UB

    • @yasserarguelles6117
      @yasserarguelles6117 2 ปีที่แล้ว +3

      @@TheMrKeksLp No only signed integer overflow

  • @filippol1138
    @filippol1138 5 ปีที่แล้ว +3

    So, if you make the storage 2 complement, but integer owerflow is still ub, then you cannot really rely on the fact that addition wraps on overflow? So I do not really see the point... Unless you do the addition yourself, but then that is way less expressive than writing a+b or using builtins.
    I do not see really the point of many suggestions. The overflow thing for example: the only example it would fix is that the overflow check (which to me is too weird anyways, much more expressive to cast to unsigned and then check) would be nicer and a bunch of infinite loops would disappear due to optimization, but is it really worth it?
    In the end, if I write something like (a+b) < a for natural numbers, I just wrote a statement which is always false for positive b, and integers are supposed to represent integer numbers. So the overflow check at the beginning is just madness to me. Because you are reasoning in terms of internal storage, instead of what an integer is supposed to represent...

  • @seditt5146
    @seditt5146 4 ปีที่แล้ว +5

    18:50 Atomic Ghandi was pretty much disproven by the developers. They just did not read state like that in a way that overflow could have mattered. Cool story , just sadly not real.

  • @TranscendentBen
    @TranscendentBen 7 หลายเดือนก่อน

    Modern Use of Something Other Than 2's Complement (and it's not just MATLAB):
    en.wikipedia.org/wiki/Saturation_arithmetic#Implementations

  • @timothymusson5040
    @timothymusson5040 5 ปีที่แล้ว +5

    If volatile goes away, how does memory mapped IO work?

    • @nullplan01
      @nullplan01 5 ปีที่แล้ว +1

      The way it works right now, using inline assembly to force reads and writes to happen in program order. And the inline assembly is typically portable, because it contains no actual code.

    • @timothymusson5040
      @timothymusson5040 5 ปีที่แล้ว +2

      Could you elaborate or point to an example?
      Setting up with ‘volatile uint32_t* m_StatusReg = m_BaseAddr + m_STATUS_REG_OFFSET’ and then using m_StatusReg in a straightforward and obvious way has been working great. Is there unexpected code generated for this direct memory access?

    • @styleisaweapon
      @styleisaweapon 7 หลายเดือนก่อน

      memory mapped {fill in blank} is driven by either exceptions or hardware translation tables on all modern hardware .. really isnt anything about any programming language here

  • @cbehopkins
    @cbehopkins 5 ปีที่แล้ว +3

    sizeof(void *) == 8: is this implying that c++ is not for use in the embedded (32 bit) world?

    • @chrishopkins2506
      @chrishopkins2506 5 ปีที่แล้ว

      It's an interesting world if you're bothered enough about optimisation to use C++, but not bothered enough to use 32 bit pointers when you can get away with it. I'm not saying it could not exist, but the embedded world is almost certainly decades off being able to abandon

    • @flatfingertuning727
      @flatfingertuning727 5 ปีที่แล้ว

      @Peterolen If an application needs to store a large number of pointers, but accesses less than four gigs of storage, keeping everything needed by the program within a 4-gig region of address space and using 32-bit pointers would likely improve cache performance even on a 64-bit machine. Given that many applications would have no need to access even four megs of storage--much less four gigs--I would expect that the performance benefits of 32-bit pointers to remain on any platforms that continue to support them.

  • @obfuscator2
    @obfuscator2 2 ปีที่แล้ว +1

    5:32 how is that code working? If lhs is INT_MAX and rhs is 1, you'll end up with an unsigned int with the value of "INT_MAX +1", which is roughly UINT_MAX/2, and isn't less than INT_MAX. So you're not detecting overflow from positive to negative ints, are you?

    • @styleisaweapon
      @styleisaweapon 7 หลายเดือนก่อน

      its your last assertion (that it isnt less than INT_MAX) that is in error. The result of that addition is INT_MIN which is most certainly less than INT_MAX

  • @NicolayGiraldo
    @NicolayGiraldo ปีที่แล้ว

    I would like to have fixed point representation for numbers between 1 and 0. Seems both very fast, and relevant now for neural networks.

  • @MatthijsvanDuin
    @MatthijsvanDuin 5 ปีที่แล้ว +9

    54:10 _what_? why on earth would you consider "char" to be signed, given that it in practice it means "a byte from an utf-8 string or maybe a string that uses some legacy 8-bit encoding"?

    • @Bourg
      @Bourg 5 ปีที่แล้ว +8

      Because in practice it is signed?

    • @MatthijsvanDuin
      @MatthijsvanDuin 5 ปีที่แล้ว +7

      @@Bourg In practice it is architecture-dependent. char is unsigned on ARM for example.

    • @MatthijsvanDuin
      @MatthijsvanDuin 5 ปีที่แล้ว +3

      @@Bourg It is also unsigned on PowerPC.

    • @Hauketal
      @Hauketal 5 ปีที่แล้ว +1

      It was the original sin of the IBM PC. K&R wrote in their C manual, before ANSI-C Editions, in one of the first paragraphs about char being unspecified wrt signedness; but all machine character set values were positive. The PC extended ASCII to 8 bits, but C compilers for it never acknowledged that as an extension, so they continued with the traditional char for Intel being signed. That's what you get for not RTFM. Now about 40 years later we have still to deal with it. :-(

    • @MatthijsvanDuin
      @MatthijsvanDuin 5 ปีที่แล้ว +1

      @@Hauketal Bad or careless decisions getting enshrined really sucks. I also really hate that integer division is specified as being round towards zero rather than round down (with the obnoxious fallout that -1 % 4 is -1 instead of 3, and x/2 is not the same as x>>1)

  • @Verrisin
    @Verrisin 2 ปีที่แล้ว

    EDIT: EVERY fing time..... I write a comment, and like magic, it's addressed a minute after XDXD
    Regarding overflow... It's insane people would have to write code to check it. I remember from school, CPU will _tell you*_ in a register. Shouldn't there be a built in way to check?
    Some kind of "add with check" like:
    add a b
    rslt = eax
    didOverflow =
    ... I always assumed this is how it is implemented...
    * and I remember the nice diagram, showing the carry bit setting the overflow flag

    • @Verrisin
      @Verrisin 2 ปีที่แล้ว +1

      yeah: status register; just looked it up
      - There is no way to access it from C++ without target specific asm ???

  • @ssl3546
    @ssl3546 2 ปีที่แล้ว

    better solution - we sell a cheaply available CPU that uses 1s complement and one that uses sign-magnitude (like a the raspberry pi) so that people can test and fix their non-portable code. if code does not work on a big-endian, sign-magnitude machine it is broken.

  • @kwinzman
    @kwinzman 5 ปีที่แล้ว +4

    Good talk. But could you make the font on the slides a bit smaller? It's still readable sometimes.

    • @Bourg
      @Bourg 5 ปีที่แล้ว +9

      ᴵ ᶜᵒᵘˡᵈ ᵐᵒˢᵗ ᶜᵉʳᵗᵃⁱⁿˡʸ ᵐᵃᵏᵉ ᵗʰᵉ ᵗᵉˣᵗ ˢᵐᵃˡˡᵉʳ!

  • @user-ni2od5lu6j
    @user-ni2od5lu6j 5 ปีที่แล้ว

    deprecating of volatile qualified member functions (p1152) is mistake, because if you miss one volatile qualifier in old code,
    you would get compiler error (calling non volatile member function from volatile ref/pointer)
    or still correct behavior (lost of volatile near root pointer, but calling still volatile function member, and may be warning by some tools).
    but if you deleted all volatile s from top of hierarchy and from member function qualifier, and putting them only to build in data types, you could easily lose one
    std::byte for example (and check tool could lose it too),
    and then if you really unlucky then all test and even test rocket launches passed ok, but some years after that new compiler may decide to optimize access to that std::byte other way around, and you would get rocket blowup.
    if whole region of memory marked as volatile any pointer/reference(that includes custom aggregate types with member functions) which points inside it should have volatile qualifier (no tool would warn about using memcpy from such pointer if volatile qualifier removed)

    • @TranscendentBen
      @TranscendentBen 7 หลายเดือนก่อน

      I've done lots of embedded, and I don't know how a compiler would know not to optimize away or not to delay a write to a register (that otherwise looks like any memory location) without using the volatile keyword. Otherwise it thinks "the value at this location is never read back anywhere else, so I don't have to write it."

  • @joshingaboutwithjosh
    @joshingaboutwithjosh ปีที่แล้ว

    Ah atomic gandhi we meet again

  • @BlackBeltMonkeySong
    @BlackBeltMonkeySong 5 ปีที่แล้ว +2

    Listened to the talk, still not sure why this is important.

    • @Bourg
      @Bourg 5 ปีที่แล้ว +7

      Read your comment, still not sure how it's relevant.

    • @User-cv4ee
      @User-cv4ee ปีที่แล้ว

      @@Bourg The talk was great! However, it did left me wondering what did we gain by defining the storage but still not being able to rely on it since the arithmetic is undefined. Can you expand on that please? Much appreciated.

  • @FalcoGer
    @FalcoGer 11 หลายเดือนก่อน

    Name one cpu architecture, not even one targeted by c++, that doesn't use 2s complement for signed integer representation and that has signed integer representation. 2s complement is the natural choice because it allows for easy addition, which in turn results in less transistors, which makes chips smaller, cheaper and use less power. I think we can ditch the 0.000000000002% of programmers that deal with super special and niece hardware and make them do the workaround (or buy sane hardware) instead of everyone else having to deal with the compiler destroying our code. What are the most common architectures? intel family and intel compatible amd64 make up nearly the entire market, then arm family, power pc, z80, whatever crap apple produces, risc, atmel, and then a whole array of other micro-controllers, most, if not all, of which use 2s complement. Why would you want to support a 70 year old processors? Anything before 1970 doesn't exist anyway. Time began on The first of January 1970 after all. If overflow is a bug, then it's a bug. But unsigned integer overflow being a bug is also still a bug. The compiler silently optimizing out parts of my code is simply worse.
    If wrapping or trapping doesn't fix it, it would at least be a more noticeable error. If you have a bug in your program you want it to give a positive signal, that the programmer or the user has to handle, not silently do something. Throw an exception or trap. Spit out a stacktrace. How would you fix pacman or donkey kong? Using a larger integer type would just push the problem back. Would players actually reach level 2^32? Probably not. But it's still a bug.

  • @dipi71
    @dipi71 5 ปีที่แล้ว +1

    I wish C++ had an elementary, built-in and highly optimized Integer type that never overflows but transparently expands the range of a specific integer value - like in Ruby.
    At 47:03 the objection to this is »I don’t want my addition to allocate«. Well, I don’t mind - especially if this extra allocation occurs about once in millions of additions.
    As soon as such a »BigInt« is to be incorporated into a fixed-size data structure you’d have to check its size, but for storage purposes BigInts can be thought of just as variable-sized Unicode strings. Computers have gotten pretty good at that, or so I’ve heard. Cheers!

    • @brenogi
      @brenogi 5 ปีที่แล้ว +1

      Is it possible to implement that without doing a check for every operation to know if it will overflow and allocate?

    • @dipi71
      @dipi71 5 ปีที่แล้ว +1

      Breno Guimarães replied (although his comment not showing up here): »Is it possible to implement that without doing a check for every operation to know if it will overflow and allocate?« - I don’t think it’s possible to avoid _some_ kind of check if you’re aming for correct results. The cases where you have to squeeze every bit of performance out of integer arithmetic may be less common than you think, though: maybe in the case of data moshing for fast randomization or some kinds of real-time DSP stuff where overflows just become part of noise; or algorithms where you can afford to perform a large amount of unchecked integer calculations and then check the overflow bit once at the end.
      Exceptions aside, and if overflow bugs like those listed by JF Bastien in this video from 16:30 to 24:24 are to be avoided, we ought to value _robust and correct code_ over the theoretical extremes of the fastest execution possible. (This guideline ought to be applied to hardware as well - consider debacles like Rowhammer, Meltdown and the variations of Spectre having one thing in common: fetishizing speed over safety and putting performance over security.)

    • @brenogi
      @brenogi 5 ปีที่แล้ว +3

      @@dipi71 Well, in my world, those checks are unacceptable. I use C++ (also) because I need every bit of performance I can get. I don't want to pay for what I don't care about.
      And if I need the BigInt, there are library solutions for that, which are more than enough for where I need it.
      But I have no idea what type of code is out there, so I can't say what is the preference of the majority of the C++ codebases.
      I can only add my 2 cents.

    • @zekilk
      @zekilk 5 ปีที่แล้ว +1

      The datatype you're looking for is a bignum. The C++ Standard Library doesn't have a bignum class but you can easily create your own with basic C++. Bignums incur a lot of overhead since most processors don't have built in mechanisms to support it and it's overkill for most projects. If you are that crazy for a number that'll never overflow in proper program execution, you can always use a signed 64bit integer. Those things can represent numbers lower than -9'000'000'000'000'000'000 to numbers higher than 9'000'000'000'000'000'000. It'll still preform much faster than the most optimal bignum implementation.

    • @dipi71
      @dipi71 5 ปีที่แล้ว +2

      @@zekilk Again, I stress the importance of safe and robust and correct code over maximum execution speed. Yes, lacking a proper native data type, you can use a Bignum class, but it will affect the readability of the code. And of course, every CPU I know has a mechanism for fast overflow checks: the overflow bit.