Purging Undefined Behavior & Intel Assumptions in a Legacy C++ Codebase - Roth Michaels CppCon 2022

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 พ.ย. 2022
  • cppcon.org/
    ---
    Case Study: Purging Undefined Behavior and Intel Assumptions in a Legacy C++ Codebase - Roth Michaels - CppCon 2022
    github.com/CppCon/CppCon2022
    For large C++ codebases, adding support for a new platform (e.g. Apple Silicon/ARM) can be a scary, expensive endeavor. One of the biggest causes for alarm is undefined behavior (UB), which is an unfortunate part of many legacy codebases; luckily there are tools to help. After a brief review of what undefined behavior (UB) is we will discuss what issues it can cause and why it should be avoided. We will look at a few real-life bugs caused by UB in our codebase and discuss a common type of UB in legacy codebases: "it works on Intel". We’ll discuss how eliminating undefined behavior from a cross platform codebase can reduce maintenance costs and make it less stressful to support new platforms for your codebase. Then, we’ll go over the specific cultural and tooling initiatives we used to eliminate undefined behavior in our C++ codebase, including how we used static analysis and clang sanitizers to identify and address issues.
    ---
    Roth Michaels
    Roth Michaels is a Principal Software Engineer at iZotope/Soundwide, an industry leader in real-time audio software for music production and broadcast/film post-production. In his current role on the Audio Research Team at iZotope's parent company, Soundwide, he is focused on developing new fast prototyping frameworks. When he joined iZotope, Roth was the lead library designer of a new internal cross-platform "Glass", part of which is now available as open-source. More recently in his former role as Mix/Master Software Architect, Roth helped develop the reference implementation to move iZotope's products to subscription and led the team that launched the company’s first SaaS offering for music producers. Roth studied music composition at Brandeis University and continued his studies in the Dartmouth Digital Musics program. Roth began his career in software development writing software for his own compositions, and the works of other composers and artists, and teaching MaxMSP to composers and musicians; both private instruction and designing university courses. Before joining iZotope, he was working as a consultant for small startups working on mobile applications specializing in location services and Bluetooth.
    ---
    Videos Filmed & Edited by Bash Films: www.BashFilms.com
    TH-cam Channel Managed by Digital Medium Ltd events.digital-medium.co.uk
    #cppcon #programming #code
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 12

  • @JohnDlugosz
    @JohnDlugosz ปีที่แล้ว +7

    I think the culture change is linked to the tooling change not to find and point out such errors, but to the optimizer exploiting such things as a way to generate much better code.

  • @billpodolak7754
    @billpodolak7754 ปีที่แล้ว

    Great talk!

  • @ABaumstumpf
    @ABaumstumpf ปีที่แล้ว +2

    Automatic formatting and automated static analysis.... i wish we used those correctly, but they added automated checks like 2 years ago and they are STILL excluding all already existing code an a lot of their new code from those checks - Cause there are too many errors. We have a couple hundred generated classes featuring double-underscore in their names and they still refuse to change the name-generation. Even if the chance is slim of that causing any problems - there is no sane reason to stick with those names as these are also only used in other generated code.

  • @AbelShields
    @AbelShields ปีที่แล้ว +2

    Can we get the links from the slides please?

  • @JohnDlugosz
    @JohnDlugosz ปีที่แล้ว +2

    "Do not invoke Undefined Behavior" in your fork...
    wait, why isn't that in the CPP Core Guidelines proper?

    • @rothmichaels
      @rothmichaels ปีที่แล้ว

      Good question; perhaps I'll open a PR against the upstream guidelines.

  • @khatdubell
    @khatdubell ปีที่แล้ว +3

    28:55
    The transform function shows a lack of forethought as to how people tend to write code.
    If you are accepting a `const T&`, there is a 100% chance someone will pass you a temporary.

    • @MarekKnapek
      @MarekKnapek ปีที่แล้ว +4

      Accepting a `const&` and using it is completely fine. Storing pointer to it and using it later is the problem. Could be fixed by adding overload taking && or const&& and marking it deleted. Something in standard is doing the same, I guess regex constructor.

    • @khatdubell
      @khatdubell ปีที่แล้ว +1

      ​@@MarekKnapek Yes, i didn't mean to imply you shouldn't write functions with such a signature, just that you shouldn't assume the argument isn't an X value.

  • @jamesburgess9101
    @jamesburgess9101 ปีที่แล้ว

    16:13 your asserts are inverted

  • @szirsp
    @szirsp ปีที่แล้ว +4

    17:15 This is one of the reasons why people hate C(++). Because the compiler is an a**hole.
    "I detected undefined behavior so I'm going to punish you for it. You made a mistake so I'm going to skip your work and not tell you about it. F*** you for not being perfect and not reading every word of the language standard."
    Maybe the compiler shouldn't just "assume" things, or at least tell you about it and not get lazy by just removing code the developer spent time to write.
    The compiler is welcome to optimize away well defined behavior, if it can evaluate things in compile time.
    But it shouldn't silently optimize away code because it found undefined behavior. The code should be either considered ill formed with required diagnostics or do what the code supposed to do and maybe issue a warning to ask the programmer "Is this what you really wanted?"
    The goal should be to produce safer, more reliable code, not less.
    We want to notice problems as early as possible, we want bad code to fail fast. (Preferably at compile time, then link time, then testing... and not in live systems.)
    The compiler cannot notice every mistake programmers make, that's fine, but when it is capable of, when it does notice, the most a**hole thing it can do is not tell you and silently produce buggy object. Even an error message "You are so stupid, don't you know this is undefined behavior!?" would be more helpful than silently changing your code.
    If the language standard committee and/or compiler developers want to make their superiority felt by saying "You stupid filth, how can you be this bad? You deserve to be punished. Don't you ever make a mistake and invoke undefined behavior! You are not even worthy for us to talk to you, to help you, to let you know you made a mistake..." that's their choice.
    But then don't be surprised if your precious language that only gods can use correctly will disappear... (because it produces the most security vulnerabilities)
    "NSA advises organisations to consider making a strategic shift from programming languages that provide little or no inherent memory protection, such as C/C++, to a memory safe language when possible."
    www.tribuneindia.com/news/science-technology/us-national-security-agency-tells-developers-to-shun-c-and-c-programming-language-450295
    By the way it should be
    array table;
    not
    array table[4];

    • @ChristopherKankare
      @ChristopherKankare ปีที่แล้ว +11

      The compiler isn't an asshole on purpose, and it is for _sure_ not implemented as
      "I detected undefined behavior, so now I'm just going to remove everything."
      What the compiler does is trying to optimize the code as much as possible, and these kinds of undefined behavior is then a side-effect of that. In some cases this optimization opportunity is on purpose, e.g. overflowing an int is undefined (this allows checks for int overflows in loops to be removed; because it assumes the developer won't overflow an int), and in others it is about removing, e.g., unnecessary branches, calculations, etc. (which gives raise to even more optimization opportunities) because it can make assumptions about the code (again, because it assumes the developer doesn't invoke undefined behavior).
      For example, let say that we have this function:
      int do_stuff(foo* bar) {
      if (!bar)
      return 0;

      // use 'bar' to calculate a value.
      ...
      }
      This function is then called from hundreds of different places, in some places 'bar' is null and in others it is not. Let say that, we then have this piece of code:
      int do_many_stuff(foo* bar) {
      if (!bar)
      return 0;
      int value = bar->initial();
      for (int i = 0; i < 100000; ++i)
      value += do_stuff(bar);

      return value;
      }
      Now, assuming 'do_stuff' gets inlined, we for sure want the unnecessary null checks (in 'do_stuff') for 'bar' to be deleted in the loop, right? Because, why should we check it for each iteration if we already _know_ that 'bar' can't be null (think about all the branches that are removed, how the compiler might even be able to vectorize the code). But what if instead of a null check at the top 'do_many_stuff', the developer choose to document that the function should not be called with null:
      // @param bar non-null pointer to foo
      int do_many_stuff(foo* bar) {
      assert(bar != nullptr);
      int value = bar->initial();
      for (int i = 0; i < 100000; ++i)
      value += do_stuff(bar);

      return value;
      }
      Shouldn't the compiler still be allowed to optimize away the null check? And what if asserts are disabled (and then 'do_many_stuff' is called thousands of times)?
      Should the compiler warn if the user invokes undefined behavior? Yes, if the compiler can easily detect it, but in many cases it can first be detected at runtime. Note also that often, at compile time, that undefined behavior is invoked might not be clear as it might show itself first after several optimization passes (when any resemblance to C++ has been removed) and, here, for the compiler to be able to detect it would probably be a huge burden (i.e. it would need to know how the final IR maps back to C++ and if it invokes undefined behaviour).
      Should the compiler warn any potential place where undefined behavior might be invoked? I don't think that is possible, as that would basically be everywhere (e.g. each time a pointer is accessed, each time integers are involved in calculations, each time several threads are involved).
      Should the compiler optimize less and, thus, remove all these "crazy" side-effects of undefined behavior? Many people, businesses and projects are counting on the compilers being incredible at optimizing the code; in some case even just 0.1% faster code can lead to huge gains.
      Should we remove undefined behavior from C++? Some cases could probably be removed with some performance cost (e.g. define how int overflow behaves) or be opt-in (note that many compilers actually have opt-out from some undefined behavior or optimization passes). Other cases is probably not possible (e.g. we can't remove pointers).
      There are many blog posts, and C++ talks, about undefined behavior; which have many examples of where the side-effects of undefined behavior originate from. I think they are quite interesting. At least, it might help with the "the compiler is doing this on purpose to destroy my program" attitude.