CppCon 2016: Nicholas Ormrod “The strange details of std::string at Facebook"

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ธ.ค. 2024

ความคิดเห็น • 116

  • @NeonStorm5
    @NeonStorm5 5 ปีที่แล้ว +141

    Wow, I didn't think a 30 minute talk about string optimisation could be so entertaining, but here we are.

  •  8 ปีที่แล้ว +248

    If there's such a thing as a beautiful bug, that jemalloc + kernel + null terminator bug is beautiful :)

    • @zvpunry1971
      @zvpunry1971 5 ปีที่แล้ว +9

      I had to pause the video because of the laughter in the video at 24:10 ;)

    •  4 หลายเดือนก่อน

      I think our speaker is just mistaken here.
      There's no such thing as conditionally returning memory to the kernel. At least not insofar as the kernel is involved: you either munmap or you don't.
      The conditional return is something that completely internal to libc.
      So the kernel can't do anything clever about 'undefined behaviour' here. That's all your C userland programme's doing.

  • @sergi0YT
    @sergi0YT ปีที่แล้ว +14

    That bug with the last byte of string going into the new page of memory is beyond scary

  • @Ratstail91
    @Ratstail91 2 ปีที่แล้ว +4

    This guy is so passionate! I'm just glad he's not selling MLMs.

  • @ancientapparition1638
    @ancientapparition1638 7 ปีที่แล้ว +154

    Ahhhhhhhhhhh thats the cleverest 0 I've ever seen

  • @0xCAFEF00D
    @0xCAFEF00D 7 ปีที่แล้ว +72

    He's really good at talking. Fun subject too.

  • @TheThirdAttractor
    @TheThirdAttractor 8 ปีที่แล้ว +47

    one of the best talks at CppCon 2016!!!

  • @kim15742
    @kim15742 6 ปีที่แล้ว +19

    9:27 oh my fucking god is that genious! I literally said that out loud. Wow.

  • @Peregringlk
    @Peregringlk ปีที่แล้ว +2

    The part starting at 22:50 is hilarious Love the characterization.

  • @williamchamberlain2263
    @williamchamberlain2263 6 ปีที่แล้ว +33

    10:00

  • @chrisanderson687
    @chrisanderson687 6 ปีที่แล้ว +74

    20:35 "Most people write good code" ... it's sad that this statement made me laugh...

    • @yackawaytube
      @yackawaytube 6 ปีที่แล้ว +13

      may be most people at FB write good code

  • @kered13
    @kered13 4 ปีที่แล้ว +17

    What was the conclusion!? Does FB use fbstring or gcc string now?

    • @BrianLanders
      @BrianLanders 4 ปีที่แล้ว +7

      mostly std::string in new code since newer versions of the language have fixed most of the problems.

  • @matiasbatero
    @matiasbatero 6 ปีที่แล้ว +8

    The historical pascal string is a fixed-size of 256 bytes and allocated in the stack. ;)

    • @duminicad
      @duminicad 4 ปีที่แล้ว +3

      yes, and string[0] indicated the size, it was indexed starting from 1, I think they are called "ShortString" in delphi and freepascal

    • @stefanalecu9532
      @stefanalecu9532 5 หลายเดือนก่อน

      Yet another case of people reinventing what Pascal has done 30-40 years ago because C has plagued the world and so has its null terminated strings

  • @CP-vj6yn
    @CP-vj6yn ปีที่แล้ว +1

    So i wonder if they still use fbstring or std::string?

  • @IsmeGenius
    @IsmeGenius 8 ปีที่แล้ว +34

    data[size()] = '\0' was UB in the first place.
    "Favorite thing to do is to put base10 serialized randoms in strings". Gosh, make a class for that.

    • @valetprivet2232
      @valetprivet2232 8 ปีที่แล้ว +17

      What they were doing is basically "If this allocated uninitialized byte is not happen to be zero, then make it so" and after execution of that piece of code they were "safe" to assume that last byte is initialized by '\0'. Even if this check was UB. And that teaches us how careful we should be about UB and never use it. The bug is fascinating and interesting though

    • @radioleta
      @radioleta 6 ปีที่แล้ว +1

      What is UB?

    • @ZiadHatahet
      @ZiadHatahet 6 ปีที่แล้ว +6

      radioleta UB = undefined behavior

    • @Cromius771
      @Cromius771 3 ปีที่แล้ว +6

      data[size()] was allocated but never initialized. The operating system knew you never wrote to that page so when you request a read it says here's a 0 to represent this uninitialized memory. But then if another piece of the program writes to that page the operating system returns the actual value after that and your c_str breaks.

  • @alexmiller4642
    @alexmiller4642 8 ปีที่แล้ว +8

    Does anyone follow the bug that he was describing? The behavior that he describes sounds like the behavior of madvise DONTNEED, but this would mean that jemalloc would have to be trying to purge a dirty *used* page. That should instead be a jemalloc bug.
    So for this to happen, does jemalloc explicitly track the requested allocation size vs size class given to be able to prove that the byte used for the null terminator shouldn't be able to be written to?

    • @nicksmith9521
      @nicksmith9521 8 ปีที่แล้ว

      See my comment below :)

    • @alexmiller4642
      @alexmiller4642 8 ปีที่แล้ว +1

      Thank you for the reply. I had seen your comment below, and was trying to better understand your line of:
      > If the address, with undefined mem, that you are trying to read from doesn't already live in the cache then the kernel is free to make an optimization by not loading the page you are trying to read into cache and instead returning whatever it wants.
      The kernel side of this, I believe we agree on, but let me link www.kernel.org/doc/gorman/html/understand/understand007.html as I'll be using the description contained in its second paragraph in my clarification.
      The explanation of the bug involves two assertions:
      1) That a page returned back to the system, can then be loaded back in with non-zero content.
      2) That jemalloc will return the page back to the system.
      I'm willing to leave (1) alone for now, although I'd love to have a better understanding of that. I'm more perplexed by (2). Even if we assume that madvise(DONTNEED) is allowed to change the underlying page back to its previous contents when written to again, it still seems incorrect to me for jemalloc to have returned that page back to the system. There was a live allocation that partially existed on that page, and returning it back to the system in a fashion that would allow reads from it to return zero would cause havoc! If this occurred, any time an allocation was freed one might see other allocations on the same page have their data zeroed until the next write, which would crash most programs. I assert that this cannot be the case.
      For this to work safely, jemalloc would need to understand that:
      1) It rounded an allocation up to a higher size class, and maintain the allocation size on the sized
      2) purging pages needs to look at the real allocation size, and not the size class size, when determining if a page can be returned back to the system.
      I cannot seem to find evidence of this logic existing within jemalloc, but I'm perhaps just not very well versed in skimming such code. But given that only `extent`s can be returned back to the system, which is generally a word used for large allocations of blocks of data, I'm suspicious that I've overlooked something here.
      With the longer explanation, would you happen to see where I've gone wrong?

    • @nicksmith9521
      @nicksmith9521 8 ปีที่แล้ว +4

      You can correct me if I'm wrong because I have never used jemalloc before, but I think it is completely in user space so it doesn't really know exactly when the kernel is mapping physical pages. If I'm understanding the bug correctly, jemalloc really has nothing to do with it and it is more about undefined memory in general. Consider the following example:
      char* buf = malloc(sizeof(char));
      assert(buf[0] == '\0');
      The above assertion is undefined behavior and may or may not pass. What made the bug (and potentially the bug in my example) hard to detect, is that the kernel sees a memory read request to a location that has never been written to and decides to load a 0 rather than loading the data that is actually at that address. _The kernel likely hasn't even allocated a physical page for this address since we haven't written anything to it_. So let's say that the above assertion spuriously passes because the kernel was able to make that optimization and we add the assertion:
      assert(*((volatile char *)buf) == '\0');
      By casting out pointer to be volatile, we force the kernel to load the data at our buf address and this time we get the actual memory contents. Again, this assertion may or may not pass because this time we will simply get whatever happens to live at that memory address, it very well could spuriously pass again. Some kernels will zero out newly allocated pages for security. Anyway, hopefully this clears things up, and while we're talking about undefined behavior, I'd definitely recommend watching Chandler's cppcon talk, I learned a lot from that one!
      Edit: Trying my best to use asterisks in the code segment without having youtube format bold! :P

    • @alexmiller4642
      @alexmiller4642 8 ปีที่แล้ว +2

      Yes! I personally find godbolt to illustrate these things well, so I'll leave links for future readers.
      Undefined memory does not require a load: godbolt.org/g/QexoHK
      Volatile does: godbolt.org/g/KuroXV
      However, the bug described in this video did not seem to be one of undefined/initialized memory. It clearly could be, as you illustrate, and I cannot tell from the video why it wasn't, but he appeared to be rather certain that the read of the last maybe-a-null-terminator byte did indeed happen, so let's assume it did. The assertion made with this bug is that by malloc returning a page back to the system, one can set up a situation in which, given two pointers `char *x, *y;` such that `x != y`, a write to `*x` will affect the what is stored at `*y`. That's a very interesting claim. And It's specifically what he talks about starting at 24:27 in this video that I'm asking about.
      I've looked around a bit more since first asking. I've still struggled to find anything that has the semantics of what he describes. madvise() is how one would generally return memory back to the system. (Man page for the lazy: man7.org/linux/man-pages/man2/madvise.2.html) MADV_DONTNEED will either give you zeros or the data you had on the page previously on the next access, which includes a read. MADV_FREE will give you back your previous data or zeros on an access, which again includes a read. lkml.org/lkml/2007/4/30/565 is probably a more reputable explanation.
      I don't believe Nicholas Ormrod is deceiving us about the details of this bug, I just can't figure out any way to set up a situation such that in `char *x, *y; assert(x != y); char c = *y; *x = 'a'; assert(c != *y);`, both asserts can pass. If FB happens to be running a kernel with non-standard return-memory-to-the-system semantics, or a kernel that doesn't zero pages for security, then I could understand how this bug would work. I'd surprise (and interest) me though if that's the case?
      I was hoping someone would wander by eventually and go, "Ah! ioctl(/dev/mem, DO_QUESTIONABLE_PAGE_RETURN) is at page.cc:123 in jemalloc" and would reveal the code behind this magical trick described as part of the bug.

    • @nicksmith9521
      @nicksmith9521 8 ปีที่แล้ว

      What's interesting about the first godbolt link you provided, the non-volatile one, is that the compiler is smart enough to detect the undefined behavior and it not only doesn't load, it doesn't do a compare or even make a call to malloc! It just returns 0 and that's it!

  • @MrAbrazildo
    @MrAbrazildo 2 ปีที่แล้ว +1

    And what about the new benchmark? How fbstring performed vs the new GCC 1, for overall Facebook?

  • @Tondadrd
    @Tondadrd 2 ปีที่แล้ว

    29:52 "We are gonna run an experiment" (on if they should use fb strings or gcc 5 strings.)
    Q: What's the experiments conclusion?

  • @GeatMasta
    @GeatMasta 6 ปีที่แล้ว +17

    Wouldn’t valgrind pick up that bug in unit tests just immediately?

    • @stefanalecu9532
      @stefanalecu9532 5 หลายเดือนก่อน

      If you're lucky enough to catch it

  • @为民程
    @为民程 5 หลายเดือนก่อน

    really great talk, learns a lot, thanks!

  • @davidbrown7313
    @davidbrown7313 8 ปีที่แล้ว +16

    Great talk, it made the issues around current strings really clear.
    It inspired me to comment; and then research; and then rework my comments somewhat. Especially the undefined behaviour stuff near 25:00 :)
    Ok that undefined behaviour setup at 25:00; so 129 bytes is allocated with the first 128 bytes on one page and the last byte on another page. And the last byte is not written.
    The bug is explained to be caused by malloc conditionally allocating a page (I think 'conditionally' means, the numa node the page is allocated on will be determined by the numa node of the thread that first touches a virtual address in the page). Then the kernel returns zero for the read because the page has not been assigned a node yet; and on the next write the page is allocated a node, and the address space is allocated a page on that node...
    Ok so that requires the OS to have very surprising behaviour: a read fault on a page that is pending assignment is satisfied by a temporary mapping of a zero filled page, a write fault is satisfied by a persistent mapping of a page of data (from memory, swap or etc) that contains non-zero data being mapped into the physical memory of the numa node address space.
    Ok; so I googled a bit around conditional page allocation, and found stuff about page allocation based on numa_node_id() ... and then did some searching, and found some more under numa first touch policy, and even numa_next_touch().
    stackoverflow.com/questions/12196553/is-there-numa-next-touch-policy-in-modern-linux
    The text above suggests that the OS developers define 'touch' as read or write.
    This is the sort of conservative approach that I expect from a working OS.
    So I understand this to be a theoretical bug, not an actual bug. That is some vendor **could** implement some numa policy as defined (ie reads go to a special zeros page until a write happens, then non zero data is mapped the same address space for the program). No client of such a vendor would be happy, as it would break programs in ways to difficult to imagine. It would be widely parodied as a broken OS. (I don't use windows, please tell me this is not how windows does this).
    The next message is about why I think the undefined behaviour of accessing uninitialised POD values should be limited to stating that the initial value is an unknown, but otherwise valid, value.
    Argh.. excluding floating point values that might have some codes that generate hardware exceptions.

    • @davidbrown7313
      @davidbrown7313 8 ปีที่แล้ว +1

      On 'undefined behaviour' is evil; i rather enjoyed Chandler's talk. Which inspired me to question the assertion that the neat trick presented and taken away as buggy.
      I personally find the assertion that comparing the value of a memory location, that is allocated to the program, to zero is buggy to be contrary to the mental model that I use when programming. So I am inclined to question if the assertion might be wrong.
      I dont read the standard for pleasure; or even with the expectation that I will understand it...
      Ok; why is accessing uninitialised memory undefined behaviour?.
      My mental model is that if it is undefined, because there are so many reasonable values for it to hold, that choosing any value for it to be set to would disadvantage some platform or other.
      Plausibly memory may be set to some chosen value by the kernel (on linux it is zero to avoid information leaks), not set to any chosen value (to avoid wasting
      CPU time on micro-controllers, or the last value stored in the memory before it was last freed by the program itself.
      This is why initialising memory to a known state is required to KNOW what that state is. And hence why I thought the undefined behaviour section of accessing uninitialised memory is in the standard; with the potential for amplification of the uncertain/undefined result when the uninitialised memory is mapped to some complex object.
      To me, extending this 'lack of definition' of initial state to a 'lack of definition of operation of POD until written to' is an over-stretch.
      I seem to recall that the standard has a get out of jail clause for accessing memory within a container where the exact value is not important. So memcpy of structures with padding is legal, for example.
      Given the need to deal with memory initialised by actors unknown to essentially noise values; and a high penalty (cache coherency pain) for actual writes, the change submitted seems entirely reasonable.

    • @NXTangl
      @NXTangl 5 ปีที่แล้ว

      @@davidbrown7313 It's probably a simpler issue than all that...
      "We make it UB because it lets us optimize [common scenario x] and this results in a 2% speedup in most codebases."

    • @spunit262
      @spunit262 5 ปีที่แล้ว +1

      Conditional allocation isn't relevant to this case, this is about conditional freeing. However my reading of the doc's on madvise(..., MADV_FREE) implies that a page can only go from non-zero (exactly was originally written) to zero (a afresh page). So it still sounds like it might only be theoretical.

    •  4 หลายเดือนก่อน

      I think our speaker is just mistaken here.
      There's no such thing as conditionally returning memory to the kernel. At least not insofar as the kernel is involved: you either munmap or you don't.
      The conditional return is something that completely internal to libc.
      So the kernel can't do anything clever about 'undefined behaviour' here. That's all your C userland programme's doing.

  • @lbgstzockt8493
    @lbgstzockt8493 ปีที่แล้ว

    This guy is a great speaker

  • @saeedradmehr1976
    @saeedradmehr1976 6 ปีที่แล้ว +4

    How is it while there is a handle in scope which points to a result of c_str(), the the page containing part of that allocated memory is returned to kernel, why isn't this a bug?

    • @BaddeJimme
      @BaddeJimme 3 ปีที่แล้ว +1

      Because a well formed program will initialize memory before depending on it's contents. And the act of initialization, ie writing to the memory, will require the OS to give you real memory. The bug is in the code that depends on the contents of uninitialized memory, something not supported by the C++ standard.
      Note that reading uninitialized memory is most likely to be accidental, for example because you have a struct that doesn't require everything to be initialized but uses the default copy constructor/assignment, which copies everything. In this case, requiring an actual memory page just to represent some uninitialized data would be a waste of memory.

  • @jasplaysbass
    @jasplaysbass 5 ปีที่แล้ว +3

    Why do they store the 64 bit random numbers as base 10 encoded string? He says that’s one of their favorite strings and gcc’s smaller string can’t accommodate their random numbers. Well don’t store it as a string then! Store them as a long ie 8 bytes.

    • @keris3920
      @keris3920 4 ปีที่แล้ว +6

      Depends what the use case is. Many web services depend on random string generation... Things like keys and hashes. There usually isn't much you can do about that if it's a standard protocol. Even if you store it as an integer of some kind, you'll need to convert it anyway, which may be slower than initially creating it to begin with.

  • @churchianity
    @churchianity 2 ปีที่แล้ว +2

    Something I don't understand is how you get away with the data pointer in the 'normal string' case being 8 bits. Probably missing something obvious? I understand there might be a custom allocator and the pointer is really an index into some table but it still seems far too small. What am I missing?

    • @alexparker7791
      @alexparker7791 2 ปีที่แล้ว +5

      In the normal string, the data pointer, size, and capacity are all 64 bits.
      In the small string, the last 64 bits are 56 bits of string data followed by eight bits of size.
      This means the small string "spare capacity" and the normal string capacity representations share that last byte. Flags stored in that byte need to be stripped out. Another commenter said the implementation has two flags, so I'll explain that more restrictive case.
      For little endian, the bit numbers in capacity go from 0, 1, ... 62, 63. You can reserve bits 62 & 63, which are in the last byte, and mask those out after reading capacity. (capacity & 0x3fffffffffffffff)
      For big endian, the bits go 63, 62, ... 1, 0. You can reserve bits 0 & 1 in the last byte, and right-shift capacity twice after reading it.
      So the flag bits are in the same byte either way but have different places values on different endian architectures.
      The implementation needs to ensure that valid states of the string don't alias those bits. In the small string, the "spare capacity" field can range from 0b00010111 (23) to 0, so the upper two bits don't interfere. In the normal string the capacity can be no more than 2^62-1, freeing up two bits at the beginning/end of the number.

    • @alexparker7791
      @alexparker7791 2 ปีที่แล้ว +1

      Another way of looking at it: of all 2⁶⁴ permutations of the last 64 bits, 2⁶² are reserved for small strings, and 3×2⁶² are reserved for normal strings. (This counts cases where bits in the "spare capacity" change as distinct cases but ¯\_(ツ)_/¯ )

  • @PamirTea
    @PamirTea 3 ปีที่แล้ว +2

    Great talk!

  • @ignatloskutov2792
    @ignatloskutov2792 3 ปีที่แล้ว +3

    I don't get the "conditionally return the second page to the kernel" part :(. How can the allocator return a page to the kernel if it intersects with an allocated memory block? How can it know whether there was a write inside the intersection?
    However, if this code is executed concurrently, technically it's a classic data race, which is UB, as well as reading uninitialized memory. However, as long as we're talking about x86_64 and the code the compiler has generated is more or less straight-forward, I still can't imagine how one read can return zero and another one, later -- non-zero. Except a case when the compiler knows that byte is uninitialized (it came from malloc after all!) and pretends it's zero just to skip the write, but that has nothing to do with either kernel of jemalloc.
    Am I missing something?

    • @SianaGearz
      @SianaGearz 3 ปีที่แล้ว +1

      I don't know the specifics of how any given kernel actually handles it, because it's been decades since i was last really curious, but here's my vague idea. The page table entry for a page can temporarily point to the designated zero-page, set read-only, this is actually common and sensible because a lot of mmapped memory will never actually be used, so there's no need to actually allocate unique memory for them right up front. A malloc implementation will often take a 4MB mmap per bucket, but some buckets will barely have a few kilobytes in them by the end of the day. Similarly the stack is an 8MB mapped chunk but odds are most of the pages in it aren't really needed. Then when a write occurs, the kernel gets invoked to handle the page fault condition. Then it can allocate an actual unique page in RAM and continue program execution, and so it will make the write succeed retroactively. So from then on, a unique page is allocated and undefined behaviour is cured. However, the contents of the page at that point can be arbitrary, with the exception of the bytes that you have actually written to.
      Another interesting trait of page table is the per-page Dirty bit, which exists for writeable pages. It's fundamentally possible that the kernel can allocate a page, and seeing that it wasn't written to, collect it later, making the conclusion that its contents is irrelevant, because it's UB for a process to read any bytes it hasn't written previously. But i suspect this isn't necessarily what happens, though it is at least fundamentally a possibility.
      This is not a case of concurrent data race. Reading uninitialised memory is fundamentally UB even well without. The compiler usually has no proof at all that the memory was never written to, as the function receives a pointer, and C++ compiler is incapable of running extensive proof where it really came from and who might or might not have written to which bytes before, and it's dealing with dynamic data. I imagine the first thing they did was to check, whether the compiler optimised away the read and the write, and disassembled the code in question, and determined that the compiler did no such thing. Furthermore had the compiler optimised it away, it would fail reliably in at least one location in the program, but hitting the page boundary of unallocated memory is almost astronomically rare, but it will happen, making for a very evasive bug.

    • @BaddeJimme
      @BaddeJimme 3 ปีที่แล้ว +4

      Reads and writes to memory on a modern CPU go through a Memory Management Unit that matches virtual addresses to physical memory. If there is no corresponding physical memory, it calls the OS to sort it out. One use for this is paging memory to disk in order to pretend the computer has more RAM than it really has.
      When a program requests memory from the OS, it just gets address space, not physical memory. When the program tries to write to that memory, the OS finds a page of physical memory for the program to use. The memory is usually zeroed out for security reasons.
      If a program tries to read from such memory before writing to it, the OS just returns zero, without allocating any physical memory.
      Now if the allocator returns memory to the OS, then later asks for it back, it will once again get address space, not physical memory. Reading from it before writing will return zero. But if a page is written to, the OS may return the old physical memory and not zero it out.
      The moral of the story is that if you don't initialize your data it's value can actually change. This is why reading uninitialized data is undefined behaviour.

    • @doodspav6843
      @doodspav6843 3 ปีที่แล้ว

      @@SianaGearz that’s amazing

    • @RobotronSage
      @RobotronSage 2 ปีที่แล้ว +1

      This is the Facebook way of reinventing the wheel buddy.

    • @RobotronSage
      @RobotronSage 2 ปีที่แล้ว

      @@BaddeJimme And this is more optimised than just using string as per recommended practice? :'D
      No wonder FB is falling apart at the seems.
      ''Hey guys we saved like 8 bytes of data''
      >Errors everywhere
      ''Yeah don't worry i asked Zuck he said don't worry about those''

  • @chedatomasz
    @chedatomasz 4 ปีที่แล้ว +6

    So if I get it correctly, the bug is that reading uninitialised memory, in that setup, can first return 0, and then return a different value for the same location?

    • @yashas9974
      @yashas9974 3 ปีที่แล้ว +2

      Yes. It doesn't even require that setup. It's UB and case-closed. But even with reasonable assumptions that the compiler/system doesn't do anything insane (and punish you for UB code), you can still end up in trouble.

  • @yackawaytube
    @yackawaytube 6 ปีที่แล้ว +27

    wow... there surely a lot of smart C++ programmers at FB

  • @MaceUA
    @MaceUA 4 ปีที่แล้ว +4

    Great talk!
    BTW, Alexandrescu is a genius :)

  • @Kyzuuh
    @Kyzuuh 7 ปีที่แล้ว +1

    Can someone explain how flags are stored for the normal sized string? I've been trying to wrap my head around it but I can't seem to understand it. Starts at 11:28 or so. Also why can't the flag bits be place somewhere in the size or capacity blocks? Couldn't either of those spare a couple of bits?

    • @BJovke
      @BJovke 7 ปีที่แล้ว +7

      If you're thinking about fbstring, here's how it's setup:
      First thing member methods of fbstring need to do is to figure out if it's small string or heap allocated string (there's also a "medium" string). The only part of structure not mangled by the actual small string data is the last byte of data, that is, the last byte of capacity in case of heap data string (not highest/lowest byte of capacity, this depends of endianness). If you look at the Github code of fbstring, you will see this code:
      constexpr static uint8_t categoryExtractMask = kIsLittleEndian ? 0xC0 : 0x3;
      enum class Category : category_type {
      isSmall = 0,
      isMedium = kIsLittleEndian ? 0x80 : 0x2,
      isLarge = kIsLittleEndian ? 0x40 : 0x1,
      };
      Category category() const {
      // works for both big-endian and little-endian
      return static_cast(bytes_[lastChar] & categoryExtractMask);
      }
      So if it's little-endian system (Intel x86) the information is in the highest two bits of last byte and in the lowest two bits in case of big-endian.
      After that size/capacity is retrieved in this way:
      - For small string the remaining 6 bits of last byte are size, giving 2^6=63 max capacity which is OK because it can be at most 23.
      - For heap allocated string data flags will occupy highest two bits of capacity field in case of little-endian and the lowest two bits otherwise. In little-endian case highest two bits are ignored (zeroed) when retrieving the value. In big-endian case the capacity field is shifted 2 bits to the right. This limits the maximum fbstring length to 2^62.
      "isMedium" type is being allocated in the same way as "isLarge", differences are not related to this story.

  • @gubgubbubbub
    @gubgubbubbub 2 ปีที่แล้ว +1

    Love this talk!!!

  • @DmitriNesteruk
    @DmitriNesteruk 8 ปีที่แล้ว +12

    Great talk, thanks!

  • @jim0_o
    @jim0_o 8 ปีที่แล้ว +2

    25:00 was that just a optimization? (for the random occurrence of aligning exactly on a page boundary, a 1 byte optimization in a blue moon?) (NinjaEdit: I understand is would save more than 1-byte but only in a blue moon. by 1-Byte I refer to the null terminator.)

    • @nicksmith9521
      @nicksmith9521 8 ปีที่แล้ว +14

      It wasn't an optimization in the string implementation, it was just a bug. It took me a bit to think about it, but really what it boils down to is: reading from uninitialized memory is undefined behavior. If the address, with undefined mem, that you are trying to read from doesn't already live in the cache then the kernel is free to make an optimization by not loading the page you are trying to read into cache and instead returning whatever it wants. The unfortunate bit in this case is that the kernel returns 0, so the string terminator check spuriously succeeds and doesn't actually write the terminator. I do wonder if a static analyzer could have detected this.

  • @Serenadio
    @Serenadio 6 ปีที่แล้ว +7

    Has Nicholas played in theater? Very good talk!

  • @delanyinspiron6400
    @delanyinspiron6400 8 ปีที่แล้ว +4

    Wow, a really interesting talk and absolutely hilarious! :D

  • @gast128
    @gast128 8 ปีที่แล้ว +4

    Nice talk. I think that Plauger's STL shipped with Visual Studio 2003 already contained a SSO basic_string implementation.

    • @RobotronSage
      @RobotronSage 2 ปีที่แล้ว +1

      lol
      Forget about Plauger we have Zuckerberg to reinvent the internet with facebook style ''optimisations'' lmfao

  • @mwilby3027
    @mwilby3027 2 ปีที่แล้ว

    i thought programs get their own page to themselves?

  • @wCupOfTea
    @wCupOfTea 8 ปีที่แล้ว +1

    Very useful video, thank you.

  • @kamidphish
    @kamidphish 6 ปีที่แล้ว +5

    Anyone have a link to the Alexandrescu "Sheer Folly" talk he mentions?

    • @chrisanderson687
      @chrisanderson687 6 ปีที่แล้ว +4

      A google search found it here: www.downvids.net/andrei-alexandrescu-sheer-folly-184197.html
      Not sure why it's not on YT, kinda odd... maybe someone could download this and post it?

    • @chrisanderson687
      @chrisanderson687 6 ปีที่แล้ว +2

      Also appears to be here: metavideos.com/video/582841/andrei-alexandrescu-sheer-folly

    • @SolomonUcko
      @SolomonUcko 4 ปีที่แล้ว

      @@chrisanderson687 Maybe too old for YT? Maybe someone could upload it unofficially.

  • @ali.sheharyar.sheikh
    @ali.sheharyar.sheikh 7 ปีที่แล้ว +2

    No Q/A ? why
    or was there one?

  • @BrookMonroe
    @BrookMonroe 6 ปีที่แล้ว +15

    Two observations:
    1) Clever and wise are two different things. The c_str() "optimization") was clever, but not wise.
    2) Tightly coupling fbstring to jemalloc() is an anti-pattern. They should have known better.

    • @nullplan01
      @nullplan01 6 ปีที่แล้ว +7

      The coupling isn't all that tight. Almost all malloc() implementations I know the details of can only ever return 16-byte or 32-byte chunks, depending on architecture. Only OpenBSD's malloc will ever return a smaller allocation (it has a special bitmap allocator for tiny allocations, so that malloc(1) will indeed allocate a single byte), but the tiny allocation threshold is 16 bytes (I think), so as soon as SSO is no longer valid, they will call the medium allocator, which rounds sizes again.
      Assuming the capacity to be an even number isn't unreasonable. I only wonder how they determine capacity though. Because that is usually managed by the application: Capacity is what you stuffed into malloc. Then the implementation itself never matters.

    • @fishmd1
      @fishmd1 4 ปีที่แล้ว +3

      it has nothing to do with fbstring, the global allocator is jemalloc...when you call new[] or malloc you'll get a jemalloc allocated string.

  • @RobotronSage
    @RobotronSage 2 ปีที่แล้ว +3

    ''it's technically illegal''
    ''but we did it anyways, and it works''
    >Facebook engineers in a nutshell

  • @marcusaurelius6607
    @marcusaurelius6607 7 ปีที่แล้ว +33

    great talk, and very aggressive attitude on the scene, wtf. it looks like he's about to punch someone in the face

    • @mohammedj2941
      @mohammedj2941 3 ปีที่แล้ว +2

      The content is great but the presentation is a bit too theatrical in an annoying way.

  • @PixelPulse168
    @PixelPulse168 7 ปีที่แล้ว +2

    great talk.

  • @ramakrln
    @ramakrln 8 ปีที่แล้ว +1

    Amaaaaazing talk :)

  • @movax20h
    @movax20h 4 ปีที่แล้ว

    Serialize 64-bit ints as hex. Would fit probably.

    • @atimholt
      @atimholt 4 ปีที่แล้ว +2

      Why not just store and use it as a `uint64_t` (inside a class)? If you ever need it as a string, write a to_string() for it.

    • @iProgramInCpp
      @iProgramInCpp 4 ปีที่แล้ว +2

      No, that'd be 16 bytes

  • @SrIgort
    @SrIgort 3 ปีที่แล้ว

    That null terminator bug was something unexpected lol, I will never again believe in uninitialized data comparison

  • @Anonymouspock
    @Anonymouspock 6 ปีที่แล้ว +1

    :( audio is clipping slightly.

  • @__hannibaalbarca__
    @__hannibaalbarca__ ปีที่แล้ว

    Best and only best string is char* const char* and char[].

  • @sparant1
    @sparant1 2 ปีที่แล้ว

    This talk reads like a bunch of students who are just discovering c++ and how data structures are implemented in memory for the first time.

  • @Pavel1915
    @Pavel1915 8 ปีที่แล้ว +1

    I still don't understand, why 24 bytes exactly? Why not 28 or 48?

    • @aholder97
      @aholder97 8 ปีที่แล้ว +19

      24 bytes is because the pointer is 8 bytes, the size is 8 bytes, and the capacity is 8 bytes, which adds to 24.

  • @perfectionbox
    @perfectionbox 3 ปีที่แล้ว +2

    You know Moore's Law has plateaued when a one percent performance gain in software is celebrated

    • @CianMcsweeney
      @CianMcsweeney 3 ปีที่แล้ว +2

      1% still isn't huge in regular consumer pc levels of optimization, but when you have as many servers as facebook that could save you a ton of cash. Also moore's law isn't the only thing that defines how fast CPU's can be, CPU's have become massively parallelized lately, not only in terms of the number of cores but in how much parallel work a single core can do via SIMD instructions

    • @mohammedj2941
      @mohammedj2941 3 ปีที่แล้ว

      How can the law plateau?

  • @gsilos
    @gsilos 5 ปีที่แล้ว

    22:17 mark willians

  • @Superlokkus5
    @Superlokkus5 6 ปีที่แล้ว +4

    Yay, I picture all those C++ beginners guessing the must do things like facebook and write their own buggy string implementation, which will be slower and buggy. Not just because I'm a sadist, but also because I know it sadly happens.

    • @foxikira
      @foxikira 5 ปีที่แล้ว +2

      i am a cpp beginner and i wasnt planning to write my own string implementation and after hearing about that null terminator trick (where capacity becomes a null terminator) i realized i shouldnt do it anyway because i cant think that clever

  • @mintx1720
    @mintx1720 7 หลายเดือนก่อน

    It's pretty amazing that myself as a very mid rust dev knows this is UB in 2024, where it takes a genius c++ dev to figure this out in 2016. Society has come a long way.