CppCon 2016: Chandler Carruth “Garbage In, Garbage Out: Arguing about Undefined Behavior..."

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 ก.ค. 2024
  • CppCon.org
    -
    Presentation Slides, PDFs, Source Code and other presenter materials are available at: github.com/cppcon/cppcon2016
    -
    There has been an overwhelming amount of tension in the programming world over the last year due to something that has become an expletive, a cursed and despised term, both obscene and profane: **undefined behavior**. All too often, this issue and the discussions surrounding it descend into unproductive territory without actually resolving anything.
    In this talk, I'm going to try something very bold. I will try to utterly and completely do away with the use of the term "undefined behavior" in these discussions. And I will unquestionably fail. But in the process of failing, I will outline a framework for understanding the actual root issues that the software industry faces here, and try to give constructive and clear paths forward, both for programmers and the programming language.
    And, with luck, I will avoid being joined on stage by any unruly nasal demons.
    -
    Chandler Carruth
    Google
    C++ Lead
    San Francisco Bay Area
    Chandler Carruth leads the Clang team at Google, building better diagnostics, tools, and more. Previously, he worked on several pieces of Google’s distributed build system. He makes guest appearances helping to maintain a few core C++ libraries across Google’s codebase, and is active in the LLVM and Clang open source communities. He received his M.S. and B.S. in Computer Science from Wake Forest University, but disavows all knowledge of the contents of his Master’s thesis. He is regularly found drinking Cherry Coke Zero in the daytime and pontificating over a single malt scotch in the evening.
    -
    Videos Filmed & Edited by Bash Films: www.BashFilms.com
    *-----*
    Register Now For CppCon 2022: cppcon.org/registration/
    *-----*

ความคิดเห็น • 132

  • @jacob_90s
    @jacob_90s 3 ปีที่แล้ว +40

    Not gonna lie; hearing Chandler criticize the standard put a big ass smile on my face. I've lost count of the number of times I've tried to discuss both possible, and legitimate issues with languages with other developers over the years, only to have them fanboy up and refuse to admit there could be anything wrong with their perfect little language.
    This isn't to say I was right in all cases, but the general stubbornness to admit there could possibly be an issue, or that something could be done better, has just absolutely driven me nuts over the years. So frankly it's refreshing to just hear someone else criticize one for a while, much less someone with as much weight and authority as Chandler.

  • @teekwick
    @teekwick 7 ปีที่แล้ว +61

    Having Chandler in the committee gives me hope for the future of the language. Good talk as always!

  • @MrAbrazildo
    @MrAbrazildo 7 ปีที่แล้ว +28

    I guess the best thing for the "narrow vs wide contract" is compiler warnings: keeps C/C++ narrow(ness), but not without trying to save us.

  • @NoNameAtAll2
    @NoNameAtAll2 3 ปีที่แล้ว +5

    47th slide is so confusing... "good" and "bad" words don't correspond to UB...

  • @UGPepe
    @UGPepe 5 ปีที่แล้ว +21

    people are not upset because the effects of violations are latent (hence, write more sanitizers) but because the contracts themselves are stupid

    • @flatfingertuning727
      @flatfingertuning727 4 ปีที่แล้ว +2

      Not only that but compilers' interpretation of the C Standards is in direct defiance of the Committee's intentions as stated in the published Rationale. "Undefined behavior gives the implementor license not to catch certain program errors that are
      difficult to diagnose. It also identifies areas of possible conforming language extension: the
      implementor may augment the language by providing a definition of the officially undefined behavior." The authors of the Standard note (in discussion of translation limits) that it makes no attempt to forbid someone from contriving an implementation that is "conforming" but "succeeds at being useless", the Standard's failure to mandate that all implementations process something usefully hardly implies any judgment that quality implementations shouldn't do so anyway when practical.

  • @gregkrimer1000
    @gregkrimer1000 3 ปีที่แล้ว +6

    Very insightful talk. Thank you!

    • @CppCon
      @CppCon  3 ปีที่แล้ว +1

      You are so welcome!

  • @UGPepe
    @UGPepe 5 ปีที่แล้ว +11

    x

  • @jmille01
    @jmille01 7 ปีที่แล้ว +6

    "Presentation Slides, PDFs, Source Code and other presenter materials are available at: github.com/cppcon/cppcon2016"
    I don't see the presentation at the supplied location.

  • @dizekat
    @dizekat ปีที่แล้ว

    Regarding integer overflow, the sensible behavior would be for the compiler to apply optimizations when it can prove that the overflow won't occur. When it can't prove that overflow won't occur, and it is doing that optimization, that's a security exploit waiting to happen.

  • @kostikvl
    @kostikvl 7 ปีที่แล้ว +16

    Interesting talk. Example why one should prefer signed integers is really great.
    I think, the key problem with UB (and why it is so hateful) is that compiler allowed not only do something terrible, but also not to do something. For instance:
    for (int i = 0; i < 10; ++i) cout

    • @PieterKockx
      @PieterKockx 7 ปีที่แล้ว +3

      Great example! GCC does warn about "iteration 3 invoking undefined behavior" under "aggressive-loop-optimizations" but I haven't managed to actually trigger the optimization.

    • @JuddMan03
      @JuddMan03 7 ปีที่แล้ว +1

      Konstantin Vladimirov great example. I can think of how this might happen, but if a compiler likes to optimise code so aggressively, surely it can see that a loop of length 10 is more efficient than one of length infinity? It should clearly choose to unroll the loop and discard all iterations that invoke undefined behaviour, which is a true improvement down to 3 iterations. But to take examples from the Twitter posts in the slides, it could also be excused for discarding the loop itself, the function that called it, the program containing the function. It would stop at deleting the whole OS because that would be increasing the number of operations required

    • @jonesconrad1
      @jonesconrad1 5 ปีที่แล้ว +1

      could you explain a bit more please, ? I don't understand why the loop would be made infinite

    • @animowany111
      @animowany111 5 ปีที่แล้ว

      +Conrad Jones
      A value of `i` larger than 3 is "impossible", because the multiplication would then overflow, and the compiler can assume UB never happens.
      Since with that assumption `i` is always smaller than 4, `i < 10` always evaluates to true, the compiler then "simplifies" the conditional, making the loop infinite.

    • @MsJavaWolf
      @MsJavaWolf 5 ปีที่แล้ว +1

      @@animowany111 The loop is actually fine, the value of i doesn't change in the print statement.

  • @sinom
    @sinom ปีที่แล้ว

    What happened to expects/ensures/assert? It's still not in the standard afaik. Was there a reason for that?

  • @MrMidjji
    @MrMidjji 4 ปีที่แล้ว +8

    Why not make the shift more than nr of bits in the type a compile error then? Its super common for these to be compile time known. For the runtime case, make the operator autocast to a range type which has compiler configurable behaviour and can be either free and undefined if wrong, or excepts or performs modulo sizeof(type)*8?

    • @joestevenson5568
      @joestevenson5568 11 หลายเดือนก่อน

      It is - thats why he had to slap volatiles on it to stop the static analysis telling him it was a bug

  • @grisevg
    @grisevg 7 ปีที่แล้ว +11

    What if you have "char*" that you increment. Does it do same nasty wrap around handling or is it as fast as signed int?

    • @Berdes1
      @Berdes1 4 ปีที่แล้ว +10

      For the people still watching this and reading the comments, this is as fast as signed int. If you increment a "char*", I'm assuming your char* is pointing to an element of an array of char. In that case, the result of the increment must be pointing to an element of that array or point to one-past-the-end. If the result is not one of these, it is undefined behavior. Given that your array is contiguous in memory, it cannot wrap around.

  • @dannystoll84
    @dannystoll84 9 หลายเดือนก่อน

    The example on slide 48 actually produces the mathematically correct value when n=0. You would get the same value if the numbers were signed. The issue isn’t the overflow, it’s that 0 should never have been input in the first place (as the resulting 8 bytes are insufficient space for the rtvec_def object).

  • @Yupppi
    @Yupppi 6 หลายเดือนก่อน

    The biggest issue with undefined behavior to me is that it's poorly named. Undefined behavior doesn't sound scary. It sounds like "I don't know if it's cloudy tomorrow" like nobody's scared of it being cloudy or not tomorrow, they'll still wake up and go to work despite not knowing before hand which it is. They dress up respectively. And then you also see these "this is actually undefined behavior" like in some Sean Parent talks I remember, and it ends up being something that doesn't do anything interesting, really nothing to worry about but the standard has not defined accurately what it should do. And some code depending on undefined behavior. It just doesn't sound too scary because it's such an enigma, nobody just has said what it should be doing so technically it could do anything imaginable and unimaginable (but many times it also won't do anything bad, possibly even desired).
    So what I'm understanding is that the committee can't fix bad programming and illegal use of the language?

  • @kered13
    @kered13 4 ปีที่แล้ว +5

    37:00 Doesn't the math actually work out here to produce the expected result? Chandler says that unsigned multiplication is defined as modular arithmetic, so the calculation should go as follows:
    Promote -1 to unsigned -> 2^32 - 1 (or 2^64 - 1, doesn't matter)
    (2^32 - 1) * 8 mod 2^32 = 2^32 - 8
    8 + (2^32 - 8) mod 2^32 = 0
    And 0 is the exact size you would expect when the input is n = 0.

    • @smiley_1000
      @smiley_1000 3 ปีที่แล้ว +2

      exactly. but it's still a memory error, because you'll be allocating 8 bytes for a 16-byte struct. so the error isn't related to the overflow at all.

  • @MrMidjji
    @MrMidjji 4 ปีที่แล้ว

    It would be a good thing if the compiler could statically find contract violations compiletime though, a function which takes AcyclicGraph not Graph e.g.

  • @EvanED
    @EvanED 7 ปีที่แล้ว +3

    I don't know if you're reading Chandler, but re. the signed/unsigned optimization discussion around 48:00, if you had your druthers would you suggest using signed types even for sizes and such, over size_t? E.g. it looks like the LLVM SmallVector templates' size/max_size/capacity/etc. use size_t. Is that more legacy stuff, or it doing the right thing?

  • @TheLeontheking
    @TheLeontheking 4 ปีที่แล้ว +1

    Good documentation of the library and language-features, careful and conscious programming, as well as good compilers should be the ways to not get into undefined territory.. A library which constantly does runtime-checking, or tries to circumvent UB even in obvious cases of contract-misuse does not sound like a good option to me.

  • @smiley_1000
    @smiley_1000 3 ปีที่แล้ว +3

    36:36 this error is actually not related to overflows at all. it behaves *exactly* as expected: we allocate
    16 + (0-1)*8 = 16 + (-1)*8 = 16 - 8 = 8
    bytes. so basically, we say that since we don't want any of the 8-byte rtunions, we'll just not allocate any - not even the one in the latter 8 bytes of the 16-byte rtvec_def. which means we'll try to allocate 8 bytes for a 16-byte struct, which is of course an error. but the error is a completely logical error and is not related to overflows at all.

    • @mononix5224
      @mononix5224 2 ปีที่แล้ว +1

      It is due to overflow, since `typeof(sizeof(x)) = size_t`, so `(−1) * 8 ≠ −8`.
      The `−1` is converted to a `size_t` which results in the value `1**bitsize(size_t) − 1` (** is used for exponentiation).
      On a 64-bit arch (assuming 2's complement) this will result in `((2**64 − 1) * 8) % 2**64 = (2**64 − 8) % 2**64 = 2**64 − 8`,
      then we still need to add 16: `(2**64 − 8 + 16) % 2**64 = (2**64 + 8) % 2**64 = 8`.
      So the result is still 8, but is was _due_ to overflow.

  • @SolomonUcko
    @SolomonUcko 4 ปีที่แล้ว +1

    30:54 Isn't left-shifting a negative number always UB? "Otherwise, if E1 has a signed type and non-negative value, and E1*2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined." (expr.shift, 5.8.0.2, www.open-std.org/JTC1/SC22/WG21/docs/papers/2010/n3092.pdf#page=131, N3092 page 116)

  • @iamvfx
    @iamvfx 7 ปีที่แล้ว +7

    47:53 This should be a compile error. Like, "Error: you can't use 32 bit values for 64 bit pointer indexing". Compiler has all the information for that. No silent promotions to 64 bit is needed. One can use (u)int_fast32_t types for platform specific size.

    • @MrAbrazildo
      @MrAbrazildo 7 ปีที่แล้ว

      And what if you want to pack 2 32-bits values in a 64-bits var?

    • @iamvfx
      @iamvfx 7 ปีที่แล้ว +2

      Value packing is not related to pointer indexing, or at least I can't think of any example of that right now.

    • @BlairdBlaird
      @BlairdBlaird 2 ปีที่แล้ว

      Well yes and modern languages do that (the downside being that it often leads to very noisy code).
      However C++ inherits integer promotion rules from C, and because integer promotion is a thing its use is *ubiquitous*. Furthermore, because it doesn't technically trigger a bug (with respect to abstract interpretation) I don't think any compiler will warn on this.
      On implicit narrowing conversions maybe, optionally (because there's a risk of information loss), but not widening. In fact there are regularly proposals of adding "convenience" implicit integer widening on languages which are currently stricter than that (like Go, Swift, Rust, ...).

  • @ManOfThrills
    @ManOfThrills 7 ปีที่แล้ว +1

    I thought the title referred to SG12 meetings...

  • @valetprivet2232
    @valetprivet2232 7 ปีที่แล้ว +5

    Why when we multiply int and unsigned int at 35:39 we convert int to unsigned, not he other way around, which seems more logical?

    • @smiley_1000
      @smiley_1000 3 ปีที่แล้ว

      you're right that it'd make sense to convert to signed int, but they both in fact produce the exact same output with two's complement

  • @ephimp3189
    @ephimp3189 4 ปีที่แล้ว +7

    the simplest way to detect cyclic graph is to keep a single counter for node traversal and check it against total graph size

    • @ChristianBrugger
      @ChristianBrugger ปีที่แล้ว +4

      What if you don't know the graph size. In the example I think only a pointer to a note was given.

    • @J-Random-Luser
      @J-Random-Luser ปีที่แล้ว

      @@ChristianBrugger I feel if you're passing an arbitrary node instead of an overall "graph" structure that doesn't keep track of the number of nodes, then you have bigger problems. As nodes are added to the graph, you can just increment it.

    • @dannystoll84
      @dannystoll84 9 หลายเดือนก่อน

      @@J-Random-LuserHaving an arbitrary node in a graph is a common situation. Often, the graph is not even a latent structure in memory, but rather a mathematical object that arises due to iteratively applying a function. As a concrete example, consider Pollard’s rho algorithm for integer factorization.

  • @szirsp
    @szirsp 2 ปีที่แล้ว +1

    I understand that not all UB can be defined, but a lot of them could at least be implementation defined (or require diagnostic output, and not be silent UB).
    For example just saying that casting a byte/character pointer type to a pointer of float type is undefined behavior could mean that the compiler can just stop generating instructions for the code after it encountered UB, (since it's UB anyway doesn't matter what the rest of the code would do). But we know that this should be fine on all platforms if the pointer if properly aligned, and it's fine on some platform even if it is not aligned. This could/should be implementation defined even if it just says the behavior is architecture dependent, but the compiler (or toolchain) can guaranty that it will at least output machine code/instructions and not just give up on you.
    I'm fine if the compiler says to me "Hey, we don't know what this will do, but at least we tried. Maybe you should look at this once more if this is really what you want." instead of the compiler doing 'Look at this stupid human! I recognize that this is UB, so I'm just gonna stop trying to even generate code, and I'm not gonna tell anyone, not gonna notify the user/programmer'.

    • @199t8
      @199t8 ปีที่แล้ว

      firstly, it is impossible to read a float from a char * unless you explicitly say that this is what you want to do via reinterpret_cast. also, unless you know exactly how your compiler and platform works (unlikely for any modern tech stack), you do not know whether this is fine. you can somewhat achieve the behavior you want in your last paragraph by turning off optimizations.
      it is pointless to make stuff like this implementation-defined because implementators would have two options: maintain docs on the inner workings of their entire compiler, os, and hardware (impractical for both author and reader when considering code optimization, address randomization, speculative execution, etc), or just define it as ub in the implementation spec making things worse, since now you have to read the docs for the compiler, the os and the cpu in addition to the standard to determine if something is ub.

  • @SolomonUcko
    @SolomonUcko 4 ปีที่แล้ว +4

    10:50 Isn't an infinite loop without side-effects UB in C++? 6.5.0.1 (in stmt.iter) states: "A loop that, outside of the for-init-statement in the case of a for statement,-makes no calls to library I/O functions, and-does not access or modify volatile objects, and-performs no synchronization operations (1.10) or atomic operations (Clause 29) may be assumed by the implementation to terminate. [Note: This is intended to allow compiler transformations, such as removal of empty loops, even when termination cannot be proven.- end note]" (www.open-std.org/JTC1/SC22/WG21/docs/papers/2010/n3092.pdf#page=143, N3092 page 128)

  • @JackAdrianZappa
    @JackAdrianZappa 5 ปีที่แล้ว

    47:20
    yes if you size_t that will also also avoid the problem.
    How does it avoid the problem? Is there something special about size_t that says that it won't wrap, stopping the compiler from adding extra code?

    • @DaNikeTrations
      @DaNikeTrations 5 ปีที่แล้ว +12

      If you use size_t there, on that specific platform, it will be 64 bits, and will wrap at the 64 bit boundary, just like the 64 bit addressing mode. It still wraps, it just wraps at the right size.

  • @OneWheelGuy1
    @OneWheelGuy1 7 ปีที่แล้ว +8

    47:08 - Today it's fairly easy to get enough bits, without having to use the sign bit.
    47:26 - The downside of using size_t is if ... is in a data structure it uses more space.
    So, we it is easy to get enough bits, except that it's not.
    I think that using size_t is the ideal solution. It gives you efficient code *and* it ensures that the code will work if you ever need to sort 10 GB of data. If your structures don't need to support more than 4 GB of data then you can store uint32_t in your structures and load into a size_t local variable.

    • @naxaes7889
      @naxaes7889 3 ปีที่แล้ว +1

      But he's talking about the sign bit. `ssize_t` is signed and gives you 2^63 bits of addressable data, which is much larger that 10 GB.

  • @Asdayasman
    @Asdayasman 4 ปีที่แล้ว

    Yo I'm only 21:00 in but what about dual implementation of things that can exhibit UB? The wide contract shitty slow one runs when a compiler flag is given to say "yo be slow and tell me if I'm stupid", and the narrow contract good and sexy one runs without that flag. Compile with that flag, run the test suite (you wrote full coverage, right?), and away you go.

  • @d.m.3316
    @d.m.3316 7 ปีที่แล้ว +1

    Isn't incrementing a uint32 the same as incrementing a uint64 but ignoring the higher 32 bits? If so, why can't the compiler generate the optimised code, and make sure it doesn't go on to rely on the higher 32bits?
    [I'm talking about slides 49-50]

    • @ManOfThrills
      @ManOfThrills 7 ปีที่แล้ว +1

      How can the compiler not rely on the higher 32 bits when it needs a full 64 bit address? It's adding a 32 bit unsigned int to a 64 bit address on slide 49, and that base address can have some lower bits set. The compiler cannot easily emulate overflow wrapping of the original 32 bit index after it has added the index to the 64 bit base address.

    • @OMGclueless
      @OMGclueless 7 ปีที่แล้ว +3

      It could. The problem is that "make sure it doesn't go on to rely on the higher 32bits" means applying a bitmask, probably with the AND bitwise operator. It turns out this is even slower than the "leal" instruction which is why the compiler does that instead.

    • @douggale5962
      @douggale5962 7 ปีที่แล้ว +1

      x86-64 zeros the upper 32 bits whenever it stores a 32 bit value to a register. Consider: movabs $0x123456789ABCDEF0,%rax ; mov %eax,%eax. this mov will zero the upper 32 bits of %rax. If the second instruction were add $0,%eax it would also zero the upper 32 bits of %rax.

  • @MrAbrazildo
    @MrAbrazildo 7 ปีที่แล้ว

    - I got a crash on gcc/Linux, 64 bits, using: for ( ; bits; bits >>= 1). It was fixed after for (int j=0; j < MAX_MEANINGFUL_BITS; j++, bits >>= 1)
    - Inside a gcc header file, it warning us that the reverse iterator, (pointing) to the (reverse) end of a container, may be an INVALID pointer!
    - Also on gcc/Linux, I got a Segmentation Fault (not a compile error!) writing:
    int some_f () {
    blablablablablabla;
    extensive calculation ready to be send; //WITHOUT the return keyword.
    }

  • @annazolkieve9235
    @annazolkieve9235 2 ปีที่แล้ว

    Why UB instead of IB?

  • @richardbarrell4043
    @richardbarrell4043 7 ปีที่แล้ว +7

    I'm having trouble understanding the bug on slide 48.
    (size_t)-1 is SIZE_MAX. (SIZE_MAX * 8) rolls over to (SIZE_MAX - 7). 16 + (SIZE_MAX - 7) should roll over to 8? under the assumption that all arithmetic on size_t values takes place modulo some power of two.
    Are there multiple conversions that I'm missing or something, please?

    • @Som1Lse
      @Som1Lse 7 ปีที่แล้ว +1

      The bug is that the value passed to `obstack_alloc` is huge when in fact, there is no more memory is needed.

    • @OMGclueless
      @OMGclueless 7 ปีที่แล้ว +2

      I think the problem is just that the code allocated 8 bytes of memory, when sizeof(struct rtvec_def) is 16. He doesn't show this happening, but it's easy to assume someone later deferences that memory as a (struct rtvec_def *) and runs into a problem.

    • @ManOfThrills
      @ManOfThrills 7 ปีที่แล้ว +2

      What's the point of the example on slide 48? Is it that unsigned calculations here are correct by "mere coincidence"? Had they been all signed, it would have neither fixed the bug, nor hit signed's undefined behavior, nor let Chandler's hypothetical tool give a warning about signed/unsigned multiplication. So what is the beneficial narrow contract he's talking about? The only thing that would help here is his suggested distinct unsigned integer types with undefined overflow behavior, but they are not mentioned in this example.

    • @vaughncato
      @vaughncato 7 ปีที่แล้ว +2

      Slide 48 is a bit unclear to me as well. He seems to be arguing that along the lines that if unsigned overflow wasn't legal, then you could have a tool that would show that the code would be incorrect in the case of n==0. I think this is supposed to support the general idea that narrow contracts can be useful as opposed to a specific recommendation about how to improve the code.

    • @OneWheelGuy1
      @OneWheelGuy1 7 ปีที่แล้ว

      That was my analysis also. And, since rtvec_def contains a rtunion, and it isn't needed (because size is zero) then everything works out perfectly.
      Now, on a 64-bit system it would not work as well, I don't think, because -1 would convert to UINT_MAX-1 not SIZE_MAX-1, and then we end up trying to allocate ~32 GB of RAM.

  • @GeorgeTsiros
    @GeorgeTsiros 2 ปีที่แล้ว

    I am curious, if any of you knew who "rygorous" is...

  • @cptroot
    @cptroot 5 ปีที่แล้ว +2

    It's worth noting that swapping the uint32_t on slide 49 for size_t also does the job. Which one you prefer probably depends on how register limited the other parts of the code are.

    • @JackAdrianZappa
      @JackAdrianZappa 5 ปีที่แล้ว

      Why is that? What is special about size_t that would prevent modulo arithmetic? Is it because it must be the max width of the register size, thus it will just automatically wrap?

    • @cptroot
      @cptroot 5 ปีที่แล้ว +1

      @@JackAdrianZappa Chandler mentions the reason for the behavior change is that size_t wraps at the same point that the register does. This means that they can use the instruction that wraps from 2^64 to 0.

  • @kwinzman
    @kwinzman 7 ปีที่แล้ว

    What's the easiest way to get an std library that uses ssize_t instead of size_t ? Edit: ssize_t may or may not be the type I am looking for (it seems to be intended for negative error codes).

  • @derekli3604
    @derekli3604 7 ปีที่แล้ว

    can't find slides for this talk

    • @aaaab384
      @aaaab384 7 ปีที่แล้ว

      They're basically empty slides... Why would anyone want them?

  • @WilhelmDrake
    @WilhelmDrake ปีที่แล้ว

    @~30min
    I think the lesson here is don't do bit manipulation with anything but unsigned.

  • @MrMidjji
    @MrMidjji 4 ปีที่แล้ว

    Still think signed int should wrap around. In practice today, portable code needs to use a wrapper around signed integer types which guarantees wraps when needed. That has a penalty, but which processor architecture does not wrap signed ints on addition?

  • @andik70
    @andik70 3 ปีที่แล้ว

    Actually asserts should stay in production release (or production asserts should exists) and not become a precondition which if violated will become UB.

    • @lunakid12
      @lunakid12 2 ปีที่แล้ว

      "Production asserts" do exist: they are just our normal runtime error checks.

  • @grisevg
    @grisevg 7 ปีที่แล้ว +1

    I can't reproduce it - signed and unsigned assembly is nearly identical on both latest gcc and clang: godbolt.org/g/KysL1T

    • @ManOfThrills
      @ManOfThrills 7 ปีที่แล้ว +1

      Clang's assembly for signed and unsigned are very much different and correspond to what Chandler shows, especially if you add a couple more iterations. GCC's assembly is inefficient in signed case, I think because it doesn't treat signed overflow as undefined and faithfully implements two's complement signed wrap-around.

  • @GRHmedia
    @GRHmedia 8 หลายเดือนก่อน

    I've never really had an issue with it. It is my job as the programmer to make sure the program does what I want it to do. That means using the language properly.
    I've noticed though a lot of the current generation of programmers are more the ones who have issues with it. I think in part this is because of how we were raised.
    They are looking for other people or a system to make it easier on them. Were we were taught to make due with the tools we had if we didn't like them create our own.
    Don't expect others to solve your problems so much.
    I've been programming since 1983. I've never had an issue with pointers in that nature. It is now 2023 that is 40 years without an issue. I don't think I am someone spectacularly exceptional.
    So how is this a problem? To me it is just making an issue out of something that has never been an issue to me.

    • @sirhenrystalwart8303
      @sirhenrystalwart8303 6 หลายเดือนก่อน

      Tens of thousands of CVEs written by your generation would like a word.

  • @connorhorman
    @connorhorman 5 ปีที่แล้ว +2

    What I would really like is Signed Integer Overflow/Underflow well defined. Its Well defined in java, and I have hashcode functions that can easily Overflow a signed integer, which have to be signed because interop.

    • @connorhorman
      @connorhorman 5 ปีที่แล้ว

      And then you show me the optimization of that code, and it makes me question that statement. (47:00 - 50:00) Oof. Still kind of annoying that I either need to write UB, or have additional code to cast out of signed. Interop with C++ is hard.

    • @flatfingertuning727
      @flatfingertuning727 4 ปีที่แล้ว +1

      @@connorhorman What's needed for real optimizations is a class of situations with *loosely*-defined behavior, which *programmers could safely allow to occur* in cases where all of the allowable behaviors would meet requirements. For example, recognizing a class of implementations where "x+y > z" would be guaranteed never to do anything other than yield 0 with no side-effects or yield 1 with no side-effects would mean that a compiler could transform the above into "y > 0" in cases where it could prove that x and z are equal--something which it wouldn't be able to do if the programmer had written the expression as "(int)((unsigned)x+y) > z" for purposes of avoiding UB.

  • @metallitech
    @metallitech 2 ปีที่แล้ว

    Damn, this makes c++ look like a dog's breakfast. I have to decide whether to get into Rust now.

  • @UGPepe
    @UGPepe 5 ปีที่แล้ว +5

    defining behavior for all platforms doesn't mean that the behavior has to be the same for all platforms, how on earth did that implication came about?

    • @BatmanAoD
      @BatmanAoD 4 ปีที่แล้ว +3

      Yeah, this completely baffles me, especially given that someone actually asks "why not make it implementation defined".

  • @Fetrovsky
    @Fetrovsky 7 ปีที่แล้ว

    48:50 Aren't 32-bit registers available in AMD64 Long Mode?
    Also, the comments about wrapping don't make sense because int32_t's will also wrap except at half the distance (from [(2^32)-1] to [-(2^32)]. So there's really no advantage to using int over uint.

    • @Myriachan
      @Myriachan 7 ปีที่แล้ว

      The difference is that the compiler is allowed to assume that signed integers can't overflow, because to do so would be undefined behavior.

    • @Fetrovsky
      @Fetrovsky 7 ปีที่แล้ว

      Does C++ not assume 2's complement representations in CPUs?

    • @Myriachan
      @Myriachan 7 ปีที่แล้ว +3

      Quoting the Standard: "this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types." (Note that it theoretically could support others.)

    • @Fetrovsky
      @Fetrovsky 7 ปีที่แล้ว

      Ah, ok. It now makes sense.

    • @Fetrovsky
      @Fetrovsky 7 ปีที่แล้ว

      Thanks!

  • @0xCAFEF00D
    @0xCAFEF00D 7 ปีที่แล้ว +21

    Well the conclusion I can draw here is that C++ is a very poor language for most people because almost nobody writes programs which could fit the narrow path which avoiding all the UB/programming issues let you walk on. Not knowingly anyway. Because almost nobody knows enough to manage that or can even hold all the edge cases in their heads.
    So most of us are stuck with dealing with the latent bugs, always.
    Unless you make (for example) an expensive shift and a cheaper shift where I have to deal with the potential issues. Leaving you more potential for making consistent results across platforms or whatever situation you're in.
    Also chandler seems really upset over people making jokes on Twitter. Quite likely nobody thinks your compiler will delete files when there's UB. Or make a male cat pregnant.
    Edit:
    Watching this again (4 months later), because it's a good talk, I realise that chandler makes the exact suggestion about a slow vs fast shift except he does it for the compression example. I wouldn't mind having a lot of these options. But maybe there should be facilities in the language to express expectations rather than having a bunch of different types that imply behavior. If programmers could annotate when they expect or want modular integer behavior or not you'd have a more pleasing consistency between unsigned and signed types while still having all the benefits of narrow contracts.

  • @pointlessone3702
    @pointlessone3702 4 ปีที่แล้ว +2

    bzip example is dishonest. bzip was conceived when there was no 64bit arch around. When compiled for 32bit x86 it's as efficient as the modified code on 64 bit arch. If anything this example demonstrates that C portability is not quite as perfect as one could wish.

  • @UGPepe
    @UGPepe 5 ปีที่แล้ว

    "narrow contract" easily includes the set of all absurdly narrow contracts

  • @andreashohmann5994
    @andreashohmann5994 7 ปีที่แล้ว

    the Video is broken ? i See always a Grey half of the Screen??

  • @origamibulldoser1618
    @origamibulldoser1618 7 ปีที่แล้ว +5

    Maybe I'm missing the point, but this sounds like the same kind of defensive talk on c++'s behalf, like Bjarne gave this year, where it all boils down to is: "we can't fix it because we can't do both a and b if and b aren't orthogonal features." I understand some of the reasons given, but do not see a possible solution to any of this.
    Ref the shift examples: If this language is supposed to be platform agnostic, why does the shift operator even exist? Sounds like C is violating it's own principle of abstracting the assembly code. Shift is an intrinsic and has no place in a platform agnostic language.
    Well. I'm only a regular developer, so maybe it isn't for me to understand what the point of all this is supposed to be.

    • @kostikvl
      @kostikvl 7 ปีที่แล้ว +3

      There is nothing wrong in shift operation. Bitwise operations shall present in every language because they do present in every architecture and they are useful. Problem is how to define those operations in the most portable way. And sometimes undefined behavior for corner cases is the best answer.

    • @origamibulldoser1618
      @origamibulldoser1618 7 ปีที่แล้ว +4

      But they're apparently not implemented the same way, and that abstraction leaks through, if I understand Chandler correctly. Anyway, I thought about what I said, and I don't really have the experience to make these kinds of sweeping, general statements, so someone else will have to continue this discussion, if there's a point to be made.

    • @origamibulldoser1618
      @origamibulldoser1618 7 ปีที่แล้ว +9

      aa haha. oh look, a troll.

    • @aaaab384
      @aaaab384 7 ปีที่แล้ว

      _"C is violating _*_IT'S_*_ own principle"_

    • @origamibulldoser1618
      @origamibulldoser1618 7 ปีที่แล้ว +7

      Hahahah, go fuck yourself.

  • @UGPepe
    @UGPepe 5 ปีที่แล้ว +3

    but I am paying for something I'm not using by having to mask the shift because you're throwing portability down my throat

    • @smiley_1000
      @smiley_1000 3 ปีที่แล้ว +2

      well yes, you'll have to do the thing the compiler would otherwise do for you. but if you hide it behind a #ifdef and make it depend on the architecture (basically what the compiler would do otherwise), it incurs no overhead

  • @UGPepe
    @UGPepe 5 ปีที่แล้ว +7

    standards leave too much undefined while compilers act like an AI that exploits every loophole to reach its simplistic objective function which is to make code run faster.

    • @UGPepe
      @UGPepe 5 ปีที่แล้ว +3

      then compiler writers have to justify their eagerness to optimize with strawmans such as "since you can't define every single behavior anyway... might as well run wild"

  • @Myriachan
    @Myriachan 7 ปีที่แล้ว +4

    The problem with undefined behavior in C++ is not that there's a problem with undefined behavior itself. The problem is that C++ considers too many things to be undefined behavior that should not be. Signed integer overflow comes to mind.

    • @Spillerrec
      @Spillerrec 7 ปีที่แล้ว +3

      Back when C was created, different platforms had different representations of signed integers so there was no way to define the behavior of signed overflow. Just think of the performance penalty if you had to check for overflow on every single arithmetic expression (which isn't unsigned) on certain platforms... Maybe not that relevant today, but I don't know the consequences of trying to change it now.

    • @andik70
      @andik70 5 ปีที่แล้ว +3

      @@Spillerrec then why not make it implementation defined. UB is a very differebt beast

    • @Spillerrec
      @Spillerrec 5 ปีที่แล้ว +2

      ​@@andik70 That is certainly a valid point and I don't know why they decided on that. Do you have any specific use cases for signed overflow btw?
      However due to recent experiences with finding several unexpected overflows using UB-san I'm actually happy that it is UB and wished I had a similar tool for unsigned overflow. Since overflow is rarely what we expect with a random arithmetic expression, I think it would be neater if it was always an error and you had to annotate somehow that it should have a defined overflow behavior (and which for signed) for those special cases. (Like the fallthrough attribute for switch statements.) It would help the programmer spot the intended use of overflow, it would make it easier to catch incorrect arithmetic at run-time, and it would make otherwise implementation defined behavior cross-platform.

  • @UGPepe
    @UGPepe 5 ปีที่แล้ว +3

    the fact that you have programming errors that you cannot detect or would be prohibited to detect is not an argument for anything. it doesn't give you license to make languages unusable by humans by riddling them with UB

  • @UGPepe
    @UGPepe 5 ปีที่แล้ว +2

    a+b is defined behavior on every platform for the native number types that they support. i'm ok with that, i don't want you to abstract over that. my contract is with the platform. you're in my way. just do the optimizations that work best for each platform and stop there. the C standard is misinforming you about what your role should be and it's giving me, the user, a crappy language

  • @FalcoGer
    @FalcoGer 10 หลายเดือนก่อน

    I don't see what the problem is. Define what it means for an nullpointer to be dereferenced. On a C6502 it means dereferencing address 0. on linux running on x86 it means a segfault, in kernel code it means a panic. In fact, why bother defining it. Let the hardware manufacturer or the operating system developers decide what it means. Because at the end of the day, this will generate some assembly instructions. xor eax, eax; mov edi, [eax]; or similar will be generated. The cpu will attempt to run that code. What happens next is not your concern anymore. Just generate the assembly like a good little compiler and be done with it.
    10:00 well you just defined all the cases. congratulations, your UB is now defined. and what defined it? the implementation and the platform it runs on. that's a programmer's job. writing a program to run on a platform. The program defines the behavior for the platform it was written for. Again, no problem.
    20:00 fine, let's talk language features then. Say a standard library function with a narrow API that accepts only sorted containers as an input, take std::equal_range. for example. It's undefined to give it a container which is not sorted. Or is it? I say it is very well defined because the code is right there in the library! It probably gives you the subrange from the first occurrence to the last occurrence that is consecutive to that first occurrence. Whatever it does, just describe it and you're done. Boom, defined behavior. Yes, you might be using it wrong, or you might want exactly what it does anyway. And what it does is described in the code.
    25:40 fine. then define it individually for each platform. generate the assembly and let the CPU deal with it. the programmer knows the platform he's dealing with.
    27:00 but it already IS implementation defined. on a 32 bit machine this is apparently a bug, even though it does what I would expect on an x86 architecture if I run it on such. and on 64 bit it just runs fine anywhere. What about integer overflow? not a single cpu since at least 1980 has had anything other than 2's complement. why is integer overflow STILL not defined? It's silly is what it is.

  • @akashpatel2898
    @akashpatel2898 5 ปีที่แล้ว

    The actual Car
    Here something tricky involving undefined behivour and no it is not “To Nasal Demons”. By the way, I do NOT work here.

    Too Low Preformacne and they pick high power electronics:
    We created a blackout.. yes energy used up. Maybe put up the no outlet on the road with deadends or live with blackouts. And yes the second part is nonsense… it really is plugs and power.
    Why Its Not Prisoners Dillema:
    If it were Prisoner’s Dilema and I am player 2 then I would have the choices there but its not so it can’t be prisoner’s dilemma.. before someone argues it is. By the way, game theroy is actually something I fairly good at.. I know in about a second what the best move is.
    The Pointers in Functions Aproach:
    Well this program itself has quite a few program optimizations left in it but it is already using very low level C style code.. the kind most people belive to be faster but isn’t and also less mantaible. Pointers without Restrict Alising and not efficient algorithms.. I have the second most effiecnt algorithm.
    By the way, data structures and algorithms really are not quantifiable… text and hexadecimal and “colors “ (or just a therotical construct) are not quantive on the level of measurement. Unless you’re a computer sceintest and take a = 64 and b = other number and text being a list of numbers… then yes its quantativive.
    The Simplex Algorithm
    Without Known Center of Mass known. Builds of the work of Prof. Elegnbogen… his reserch into linear programming. This works on doubles or floats or something that was written as scalars in linear algebra. Without Splitting Vector Quantinities it’s a 1d.. split vectors to get 2d.
    Experimental Huerstic
    When one part moves whole rest of it is still. Also uses 2d netwonain physics on that one part with rest still. Worked okay on a robot for a few years… later on would be worried it breaks / something goes horribly wrong.
    Semi- Definite Quadratic Constraint Quadratic Objective Function
    These are allowed to be non-negative on each number..
    This one corrects the simplex algorithm for forces that are either quadratic or linear forces involved. By the way, for moving car with moving part, this also has something do with relative motion and possibly adding on another force. Anways, adding another foce is only another row or col in the linear algebrea (after vectors split up).
    This still uses the simplifying assumption that the center of mass point is somewhere on the diagram rather than anywhere. Without it.. normative question: should the car continue self driving or be put in manual.
    Undefined, Implemention Defined, and In The Standard
    First, our code is required to be meeting the C++ standard fully for this, including preconditons and postconditions.
    If it is undefined behviour then the optimizer is a little bit less compared to undefined behviour.
    The optimizer… uses undefined behivour (which I am okay with). Now here comes the tricky part - it shaves off bits from double or floating point so that it is lower power. Hecne, the execption to R^3. By the way, it is pretty much never a scalar type like that.. polynomial for the real number that it is.
    This is where the disagreement is in… undefined behivour here with the shaving off from bits to reduce the energy used.
    By the way, in C++ choices they have include: floating point and double type. Use tolerance anaylsis on distances of bridges built - now you have a choice to make.
    Which is affected by that other one… I will pick double and let bits be shaved off.
    Or if we are allowed this as alterantive: ffastmath without bits not there. Go ask them, I am perfectly fine with the first one.

  • @CaseyC104
    @CaseyC104 7 ปีที่แล้ว +3

    Sounds like a use case for a language like Rust to me

    • @VioletGiraffe
      @VioletGiraffe 5 ปีที่แล้ว +3

      Sounds to me like Rust is not needed, we have C++ and it's a better language with huge code base.

    • @lunakid12
      @lunakid12 2 ปีที่แล้ว +1

      @@VioletGiraffe sounds like another fanboy to me.