Undefined Behavior in C++: What Every Programmer Should Know and Fear - Fedor Pikus - CppCon 2023

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 ก.พ. 2024
  • cppcon.org/
    ---
    Undefined Behavior in C++: What Every Programmer Should Know and Fear - Fedor Pikus - CppCon 2023
    github.com/CppCon/CppCon2023
    This talk is about You-Know-What, the thing in our programs we don’t mention by name.
    What is this undefined behavior every C++ programmer has grown to fear? Just as importantly, what it isn’t? If it’s so scary, why is it allowed to exist in the language?
    The aim of this talk is to approach undefined behavior rationally: without fear but with due caution. We will learn why the standard allows undefined behavior in the first place, what actually happens when a program does something the standard calls “undefined,” and why it must be taken seriously even when the program “works as-is.” As this is a practical talk, we will have live demos of programs with undefined behavior and sometimes unexpected outcomes (if you are very lucky, you might see demons fly out of the speaker’s nose). Also, as this is a practical talk, we will learn how to detect undefined behavior in one’s programs, and how to take advantage of the undefined behavior to gain better performance.
    ---
    Fedor Pikus
    Fedor G Pikus is a Technical Fellow and head of the Advanced Projects Team in Siemens Digital Industries Software. His responsibilities include planning the long-term technical direction of Calibre products, directing and training the engineers who work on these products, design, and architecture of the software, and researching new design and software technologies.
    His earlier positions included a Chief Scientist at Mentor Graphics (acquired by Siemens Software), a Senior Software Engineer at Google, and a Chief Software Architect for Calibre PERC, LVS, and DFM at Mentor Graphics. He joined Mentor Graphics in 1998 when he made a switch from academic research in computational physics to the software industry.
    Fedor is a recognized expert in high-performance computing and C++. He is the author of two books on C++ and software design, has presented his works at CPPNow, CPPCon, SD West, DesignCon, and in software development journals, and is also an O'Reilly author. Fedor has over 30 patents and over 100 papers and conference presentations on physics, EDA, software design, and C++ language.
    ---
    Videos Filmed & Edited by Bash Films: www.BashFilms.com
    TH-cam Channel Managed by Digital Medium Ltd: events.digital-medium.co.uk
    ---
    Registration for CppCon: cppcon.org/registration/
    #cppcon #cppprogramming #cpp
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 52

  • @ArminHasitzka
    @ArminHasitzka 6 หลายเดือนก่อน +35

    Love Fedor's unique presentation style, and very important talk to listen to for everyone!

  • @dickheadrecs
    @dickheadrecs 4 หลายเดือนก่อน +13

    GCC “wait forever? you’re the boss!”
    Clang “Don’t be ridiculous”

  • @temdisponivel
    @temdisponivel 4 หลายเดือนก่อน +28

    I don't think the compiler (current or future) would optimize the original g() to return true.
    The compiler would optimize it to:
    g(int i) { if i == INT_MAX return false; else return true; }
    The compiler can only assume that "i" is not INT_MAX inside f(), not inside g() as in the g() there was an explicit check for the INT_MAX case.
    What am I missing?

    • @bernb
      @bernb 4 หลายเดือนก่อน

      That exactly what I got as a result when I tested it with latest gcc and clang.

    • @francoisandrieux7954
      @francoisandrieux7954 4 หลายเดือนก่อน +3

      You are correct. I suspect the example used was oversimplified from an originally correct example. Calling `g(INT_MAX)` is required to return `false`.

    • @DMStern
      @DMStern 3 หลายเดือนก่อน +4

      I suspect there's a typo in the slides, and that the first line of g() was supposed to read "if (i == INT_MAX) return true;"

    • @JacksonBockus
      @JacksonBockus 2 หลายเดือนก่อน +1

      I think the intention was to say that f(i) always returns true, since any time the result is defined it is true, and the compiler doesn't care about what happens if the result is undefined.

  • @feisty-trog-12345
    @feisty-trog-12345 4 หลายเดือนก่อน +11

    In the example shown at 5:00, `g(INT_MAX)` is blatantly not UB and must return false. That supposed second half of the optimization that no compiler currently performs would be a miscompilation. Something clearly went wrong with the presentation here, I'd expect anyone giving a talk on UB to notice that the example they're currently testing on multiple compilers simply doesn't cause any UB.

  • @sjswitzer1
    @sjswitzer1 4 หลายเดือนก่อน +4

    The classic example of a contract that’s too expensive to validate is that a binary search must be given sorted data. Validating the ordering is as expensive as a linear search, making the binary search pointless.

  • @gamekiller0123
    @gamekiller0123 4 หลายเดือนก่อน +11

    Isn't the first example wrong? If i is INT_MAX, then there is no undefined behavior because the else branch won't be taken. If I is any other value, then calling f is fine.

  • @johnmcleodvii
    @johnmcleodvii 4 หลายเดือนก่อน +4

    I've written a line of code with undefined behavior that destroyed my hard disk twice. The second time I was single stepping through the code and went one line too far. Long = short * short without casts. So the 2 shorts could multiply to be too large for a short. Sometimes that would set the sign bit. Next step is to seek from the start of the file, when write some data. That hit the engineering sectors of the disk.

  • @gfasterOS
    @gfasterOS 4 หลายเดือนก่อน +6

    I'm still not convinced that making g never return false in the first example is a valid optimization.
    f assumes the the input will never be INT_MAX, but the check before it diverts control flow and strictly dominates the call.
    If that were valid, that would allow the compiler to optimize out all checks for UB, including null pointer checks. It would be literally impossible to guard against any operation capable of causing UB.

  • @Kriby88
    @Kriby88 4 หลายเดือนก่อน

    Fedor is a fantastic speaker, always love to see talks from him!

  • @djouze00
    @djouze00 4 หลายเดือนก่อน

    This topic is always interesting! Thank you!!

  • @paradox8425
    @paradox8425 4 หลายเดือนก่อน +1

    Great talk! UB effecting previous code and what happens with debugger was truly eye opening

  • @LucasSantos-ji1zp
    @LucasSantos-ji1zp 4 หลายเดือนก่อน +14

    The code at 6:05 does not have undefined behavior. The branch checks if it is legal to call f, and, if so, calls it. If this were undefined behavior, it would be impossible to prevent undefined behavior in any program. The generated machined is correct, just checked on godbolt (and it always will be, unless the compiler has a bug).

    • @bernb
      @bernb 4 หลายเดือนก่อน +1

      Thanks for pointing it out. That's what I wondered. "If the check gets optimized away, how do you even avoid UB?".

  • @n0ame1u1
    @n0ame1u1 4 หลายเดือนก่อน +4

    I still don't understand how the example with f and g is undefined behavior. As written, f is never called if i is INT_MAX, and f is valid for all other i, so there is no case in which UB happens. What am I missing?

    • @n0ame1u1
      @n0ame1u1 4 หลายเดือนก่อน +1

      I also couldn't get the optimization to happen on godbolt

  • @Bolpat
    @Bolpat 4 หลายเดือนก่อน +1

    23:30 I don’t think it’s UB to cast away const, it’s UB to cast away const _and_ change the value. A common pattern for getters is to overload a non-static member function of a class on the const’ness of the instance and in the mutable version, you cast the object to const, call the member function for const, and cast away const of the result. That is valid as the original object wasn’t const. An example could be std::vector::operator[]. For a given index, it returns a reference to the exact same integer only typed const if the vector was const. The mutable version of the function doesn’t actually mutate anything, it differs from the const version only by preserving the non-const’ness in the type system.

  • @Roibarkan
    @Roibarkan 4 หลายเดือนก่อน +3

    23:56 [slide 29] I think Fedor meant that a compiler might have optimized this code in case the y variable was declared to be “const int” AND the call f(x) would have been changed to f(y)

  • @Digrient
    @Digrient 4 หลายเดือนก่อน +2

    Thanks for that talk, very interesting! I’m still not entirely clear though on why compilers don’t emit more warnings when they optimize away code based on the assumption of the absence of undefined behavior, when it in fact seems much more likely that the programmer has intended something else or made a mistake.

  • @AlfredoCorrea
    @AlfredoCorrea 4 หลายเดือนก่อน +2

    23:55 at the end of slide 29, it is clear that at some point Fedor, exchanged what he meant for x, he really meant it for y.

  • @TerjeMathisen
    @TerjeMathisen 4 หลายเดือนก่อน +4

    UB is a good (maybe even sufficient?) reason to switch to Rust. I have written C since around 1983, C++ a bit later, and in the beginning C was in fact defined to be a "machine-independent, portable assembler replacement", and early compilers did just that, i.e. they would output the expected asm for pretty much all constructs. In that world incrementing an int until it wrapped around was perfectly fine. The same goes for the classic pointer check to make sure it was non-NULL, it would always be there in the compiled program unless the code was inlined and the compiler could see that in this particular instance, it could not be NULL.
    What happened a lot later was that C was coopted to be this compiler research exercise where someone/some group thought it was a great idea to use UB to silently remove a lot of code, even though the actual speed improvements for real production code have been shown to be trivial. As Fedor stated, some sanity is returning, in the form of (too slowly) moving stuff from UB to Implementation Defined which does at least obey the least surprise principle.

  • @austinsiu2351
    @austinsiu2351 4 หลายเดือนก่อน +1

    9:53 I remember having to put `asm("nop");` inside the while loop to state that it is intentional. I had a ncurses program that i simply want to make sure it inits the screen properly. I put the empty infinite loop and clang decides to remove it.

  • @polmarcetsarda
    @polmarcetsarda 4 หลายเดือนก่อน +2

    Great presentation! I just wanted to point out that the code snippet about integer overflow is not true; otherwise pointer guards would be useless. I'm sure that was a small mistake while changing the code to fit in the slides, and this does not change at all the point of the presentation

  • @kuhluhOG
    @kuhluhOG 2 หลายเดือนก่อน

    18:05 So, what's with hardware where (on kernel level) you have to access things at address 0 (aka null) for certain operations because the hardware dictates it?
    Does this mean that you theoretically just can't use C++ on such hardware?

  • @Bolpat
    @Bolpat 4 หลายเดือนก่อน

    IIRC, I remember your name from the conspiracy talk. That one was hilarious and I hope this one has gets some good laughs out of me as well. The topic most definitely allows for it.

  • @X_Baron
    @X_Baron 4 หลายเดือนก่อน

    Does Example 01 on slide 9 (4:17) imply that, to be completely correct, the numeric limits check must always be inside the function that uses the int (or in another function called by that function)? This seems like a pretty severe limitation.

  • @aniketbisht2823
    @aniketbisht2823 4 หลายเดือนก่อน +1

    The first example does not have any UB whatsoever. No preconditions of any invoked expression is being violated. If "i" equals INT_MAX then g() returns true otherwise f() is invoked given that (i+1) will not overflow and hence a valid expression. The compiler knows that (i+1) is always greater than "i" (because signed integer overflow would mean UB), therefore, it simplifies call to f() to returning true. With that the invocation of g() is simplified to (i != INT_MAX).
    Now the optimization that Fedor is talking about might be triggered if somehow f() is called unconditionally because then the compiler can assume that (i != INT_MAX) and return true.
    Something like this ...
    bool f(int i) { return i+1 > i;}
    bool g(int i) {
    f(i);
    return i != INT_MAX;
    }

  • @ConceptInternals
    @ConceptInternals 4 หลายเดือนก่อน +3

    Can someone explain how g returned true? I get that f returned true, but that should result in g() to be `return i != INT_MAX;` by compiler instead of `return true;`, correct?

    • @sverkeren
      @sverkeren 4 หลายเดือนก่อน

      g() cannot simply return true. He is WRONG, you are right.

  • @GeorgeTsiros
    @GeorgeTsiros 4 หลายเดือนก่อน

    omg Fedor 🥰

  • @rssszz7208
    @rssszz7208 4 หลายเดือนก่อน

    Please add time stamp in every video it will be helpful

  • @Peregringlk
    @Peregringlk 4 หลายเดือนก่อน

    In the first example I think fedor meant `i > INT_MAX` or maybe he was thinking about INT_MAX as the "next after the last", like if it were the upper bound of a range.

  • @LaserFur
    @LaserFur 4 หลายเดือนก่อน +1

    4:19 I don't think that is a good example. The compiler should not be making an assumption on a code path that never happens. f() never gets called with INT_MAX due to the check and return. So it can't assume that i can't be INT_MAX when g() is called. I agree that the optimization of f() is correct as it can only return true. but this case the compiler would be guessing at UB that can't happen.

  • @zachansen8293
    @zachansen8293 4 หลายเดือนก่อน +1

    It sure seems like there are better talks on this topic in many different years of cppcon.

  • @kwitee
    @kwitee 4 หลายเดือนก่อน

    It's a shame that the first example (at 5:00, with f and g functions) is analysed wrongly (as others have pointed out). There are valuable optimizations that can result from UB. A Fortran example:
    program fortran
    implicit none
    integer :: az,xw
    xw = 42
    call test(az,xw)
    stop xw
    contains
    subroutine test(a,x)
    integer, intent(out) :: a ! starts undefined
    integer, intent(inout) :: x
    integer:: h
    read(*,*) h
    if (even(h+1)) a = 666 ! legs akimbo
    if (even(h)) x = a
    end subroutine test
    logical pure function even(h)
    integer, intent(in) :: h
    even = mod(h,2) == 0
    end function even
    end
    can be optimized down to
    program fortran
    read (*,*)
    stop 42
    end
    and I am not even sure whether the read can be omitted (probably not).

  • @clementdato6328
    @clementdato6328 4 หลายเดือนก่อน +1

    Why is const cast-able to non-const? Does that mean if I see a function taking const ref as input, it is not correct to assume it does not alter the input?

    • @Digrient
      @Digrient 4 หลายเดือนก่อน

      Trying to modify a const value via const_cast is undefined behavior. The only legitimate use of const_cast that I remember is when you need to use a legacy function/API (like a C function) where the parameter is not defined as const but you know from the documentation that the argument value will not be modified.

    • @woodandgears2865
      @woodandgears2865 4 หลายเดือนก่อน

      Yes, a poor programmer might do a const cast and mess with the const & value. I think you'll have general acceptance from the c++ community to block that code at review time. The interesting bit here is that such bad code is . If it was, UB. If not, just sad code.

  • @ABaumstumpf
    @ABaumstumpf 4 หลายเดือนก่อน

    Why did you first give example functions "f" and "g", and then go and introduce A DIFFERENT "g" ? Cause the original "g" does not introduce UB as it explicitly prevents that by checking against INT_MAX. This is just unnecessarily error-prone.
    Undefined behaviour would be a lot less of a problem if it wasn't silently introducing problems and also being so corrosive - many naive checks and attempts to avoid it will be optimised away.
    This would actually be a good use of attributes or some other alternatives (if attributes weren't also fundamentally broken and useless as of C++23). Give us something that allows programmers to influence (aka defining it) undefined behaviour:
    For the signed integer overflow that would be "overflow_saturating", "overflow_wrapping", "overflow_exception" or even "overflow_unspecified". Now the compiler, for that specific section of code, must check the target platform against the given specifier and act accordingly. With "unspecified" we don't care what the actual behaviour of that operation is, the compiler is just not allowed to introduce UB into the rest of the code. With "wrapping" on a lot of hardware it wouldn't need to do anything. This would be a simple mechanism to allow all the optimisations of UB to still exist while also giving programmers better control (and specially prevents UB from causing bugs).

  • @davidsicilia5316
    @davidsicilia5316 4 หลายเดือนก่อน +1

    that first example of UB makes no sense to me

  • @anon_y_mousse
    @anon_y_mousse 4 หลายเดือนก่อน

    I still don't get why there's so much hoopla because of overflow. Every major platform defines it in basically the same way and it's a natural function of 2's complement negation. It's easy to account for it at the compiler level because x86 and ARM processors both set a flag that can be conditionally jumped due to it being set, and it's easy to avoid it in your own code by simply checking any important calculations either before or after a series of operations. This covers at least 95% of platforms in regular use, maybe more, and yet people keep complaining about it. If a calculation needs to be error free, and you don't know the possible outcome based on the inputs, then check it, but ultimately, I think this boils down in large part to not sanitizing user input and that's far more problematic than the possibility of integer overflow.

    • @err6910
      @err6910 4 หลายเดือนก่อน

      My opinion on integer overflow is that it does not matter if it's UB or not, if your operation overflows, then it's most probably a bug anyway (99% of the time).

  • @robertolin4568
    @robertolin4568 4 หลายเดือนก่อน +20

    This shows how awkward c++ has become. As a high level language, the only obvious way to prove or disprove undefined behavior exists before runtime is to "manually inspecting the assembly output." And people do that in every cppcon talk, completely giving up that fact that it should be a high level language. What an irony!

    • @kristiannyfjell8097
      @kristiannyfjell8097 4 หลายเดือนก่อน +7

      Always been this way, even the first C standard had UB, etc. This is why people stress "follow the (ISO) standard when programming."
      Also, C/C++ is not meant to be high-level languages. They were meant to be 'system languages', fast, efficient, and only assembly underneath.
      C++ have their core guidelines. If you follow them, you will not get any form of UB.

    • @robertolin4568
      @robertolin4568 4 หลายเดือนก่อน

      @@kristiannyfjell8097 C is not meant to be high level, but C++ is. Otherwise you don't need everything after C++14 and all those allegedly "memory safety" features.
      C is much better in that it is rather consistent in its concepts and standards. There are only that many ways things could go wrong (although quite recurring ones). And most new features have been in, say GNU C and Linux kernel, for a long time before added to the standard.
      C++ is not. Every year you see the "How C++XX changes the way we write code" talks. Most of them reject at least some of the "best practices" mentioned cppcon in the last year. C can at least have consistency. C++ is a totally disaster.

    • @johannesschneider1784
      @johannesschneider1784 4 หลายเดือนก่อน +2

      But most modern compilers have sanitizers, right?

  • @voxel1554
    @voxel1554 4 หลายเดือนก่อน

    I love hrt 🏳️‍⚧️