"Whoops! I Rewrote it in Rust" by Brian Martin

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 พ.ย. 2024

ความคิดเห็น • 28

  • @konga8165
    @konga8165 2 ปีที่แล้ว +420

    Watching this on 2022 just after Elon fired half of Twitter

    • @rutabega306
      @rutabega306 2 ปีที่แล้ว +77

      Lol he looks so happy when he says he's an engineer at Twitter 🥲

    • @KirinDave
      @KirinDave 2 ปีที่แล้ว +67

      Watching this later in 2022 when Elon fired half of Twitter, then half the remainder quit, then 10% came back, and half the remaining people got fired.

  • @nekomakhea9440
    @nekomakhea9440 2 ปีที่แล้ว +56

    It's not "duplicating previous efforts" it's "paying technical debt" lol

  • @frikkied2638
    @frikkied2638 2 ปีที่แล้ว +175

    Would be nice to at least give the PhD student a name to recognize their effort.

    • @DavidLindes
      @DavidLindes 2 ปีที่แล้ว +34

      According to another comment, Albert Rustein. Agree, though, that at 27:30, it would have been nice to name that person.

    • @PaulSebastianM
      @PaulSebastianM ปีที่แล้ว +24

      @@DavidLindes Rust-stein! EPIC!

    • @hamu_sando
      @hamu_sando ปีที่แล้ว +2

      ​@@PaulSebastianM 'ein' translates (German->English) to 'one'.

  • @rocketnewton
    @rocketnewton 2 ปีที่แล้ว +60

    Or to rephrase the title - own the whole stack if you care about performance.

  • @jebarchives
    @jebarchives 2 ปีที่แล้ว +50

    I guess he no longer works at twitter

  • @pas.
    @pas. 3 ปีที่แล้ว +239

    That PhD student? Albert Rustein! :)
    Also ~2 months to get better than memcached performance? (And more importantly safer, simpler, a lot more maintainable software!) Ridiculous. Especially considering that Memcached has been painstakingly optimized by pouring years and years into it. (Since ~2003.) These rewrites pay for themselves in months.

    • @julkiewitz
      @julkiewitz 3 ปีที่แล้ว +14

      I don't think that's what was said here. They had an internal Memcached alternative, written in C. They rewrote it in Rust.

    • @Spiritusp
      @Spiritusp 3 ปีที่แล้ว +44

      @@julkiewitz I think he said memcached fork, so based on the real memcached

    • @PaulFisher
      @PaulFisher 3 ปีที่แล้ว +17

      It’s pretty common for the tech majors to have internal forks of external software. Often (in my experience) it’s largely about adapting it to their particular tech infrastructure.

    • @markotikvic
      @markotikvic 3 ปีที่แล้ว +44

      Most of the rewrite benefits come from having a better picture of what needs to be done going second time around. You fix existing bugs, optimize code that had been sitting there for years, you make incremental improvements as you're rewriting it and there you have it. You end up with a faster software.
      People usually don't change the existing, working codebase because that software has already been paid for and, more important, it's getting the job done. Also, no sane professional is going to start rewriting perfectly fine, working code just for the fun of it :) That's the main reason why you can see these drastic improvements over decades old codebases in just a few months of work.
      Don't get me wrong, having a more modern language and tooling helps with this (and I would argue is even worth sacrificing a couple of benchmark points for), but that's only part of the story.
      Anyway, people should be very cautions about trying to rewrite their software in a different language because they think it's somehow magically going to make their code faster if they do a 1:1 port.
      It would be interesting to see the benchmarks of 1:1 feature-complete implementation when they're done with it.

    • @ABaumstumpf
      @ABaumstumpf 3 ปีที่แล้ว +17

      memcached is a general-purpose solution that is written in C.... it would be a sad story if creating your own purpose-build solution in a more modern language would not work out.
      (and even for the general-purpose solutions memcached is not bad, but also not outstanding either).
      We did try memcached but just the overhead of adding more C-code was not worth it, and it didn't take a month to write a custom solution in C++ that is way easier to handle and faster. Could you use out code for more general stuff? Sure, but it would then not really have the benefit anymore.

  • @garfieldnate
    @garfieldnate ปีที่แล้ว +20

    Would have been nice to dig into why the performance of your Rust implementation was better than Memached in the tail end.

  • @ujin981
    @ujin981 3 ปีที่แล้ว +79

    All you need to know is at 14:18. The choice of programming language is a religious act.

    • @musicdev
      @musicdev 3 ปีที่แล้ว +49

      While I widely agree with the sentiment, I think it tends to be a bit more pragmatic (or not depending on how you look at it) than that. I’ve found that often, the choice of programming language boils down to one thing:
      What language do I know best?
      Seriously, that’s the story behind Quora, Facebook, Twitter, university research, hell, even at my lab our choice of language is literally just what do we know? Unfortunately this means pretty much everything I write is in TypeScript…luckily we do a lot of web stuff anyway

    • @antonlee0
      @antonlee0 3 ปีที่แล้ว +33

      Prob wouldn't even happened w/o storage algorithm change 22:37. They've swapped to better suited storage algorithm got same performance as with C on old algorithm.

  • @xy4489
    @xy4489 2 ปีที่แล้ว +1

    Excellent talk.

  • @GeorgeTsiros
    @GeorgeTsiros 2 ปีที่แล้ว +6

    1:25 does the database have no caching itself? Why would the database need "protection" from frequent requests?

    • @DavidLindes
      @DavidLindes 2 ปีที่แล้ว +13

      It may, but it would still likely benefit, because the database is going to be designed to be able to handle arbitrary queries, whereas the caching solution will be tailored to a particular frequent query. For example, an SQL database for a social networking site might theoretically have tables for users, posts, responses, etc., and be able to query based on username, user id, post id, post content, etc. etc. etc... and might use complicated joins to answer a query. A cache could be built in front of that for just looking up, say, all the information about a post based on its id, and that would include all the joined-in data. For example.

    • @tissuepaper9962
      @tissuepaper9962 2 ปีที่แล้ว +3

      Separation of concerns. You already need a separate server to receive and redirect incoming requests to individual database machines, why not avoid routing the request entirely by just storing the most frequently used data right on the "greeter" server? It needs to be an infinitely scalable solution, routing a bunch of packets to get the same exact data between the same two machines over and over is obviously wasteful.

    • @z11i
      @z11i ปีที่แล้ว

      Database caching is in most cases useless. Any changes to a table invalidates all cache on the table.

    • @OMGclueless
      @OMGclueless ปีที่แล้ว

      @@z11i 1. A change to a table does not invalidate all caches on that table. Just the ones that were dependent on that data. Updating a row with id=5 does not invalidate a cache of the row with id=6. 2. Not all requests need fresh data. e.g. Your Netflix profile name needs to be updated immediately in a DB somewhere when you submit a change on your profile page, but it's OK if your TV still uses your old name for a few minutes even for new requests.