The Hardest .NET Bug I've Ever Fixed

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 พ.ย. 2024

ความคิดเห็น • 9

  • @Petabridge
    @Petabridge  3 หลายเดือนก่อน +1

    Just a heads up - we reference a bunch of other videos throughout this one; we've added the appropriate TH-cam info cards but for the past year or so they've not worked properly. No idea why.
    So, every other video we reference:
    Akka.Cluster Simply Explained: th-cam.com/video/8PenRoEjZKc/w-d-xo.html
    Distributing State Reliably with Akka.Cluster.Sharding: th-cam.com/video/2apFt9v0Vjw/w-d-xo.html
    Split Brains Explained: th-cam.com/video/aTu7WUJfGo8/w-d-xo.html
    Consistent Hash Distributions Explained: th-cam.com/video/byL_Cs0dGO0/w-d-xo.html
    Introduction to Petabridge.Cmd: th-cam.com/video/b7Qxg2YiOTI/w-d-xo.html
    Introduction to Phobos 2.0: th-cam.com/video/feExYBcqAtc/w-d-xo.html
    Introduction to Akka.Hosting - HOCONless, "Pit of Success" Akka.NET Runtime and Configuration: th-cam.com/video/Mnb9W9ClnB0/w-d-xo.html
    These, the original GitHub issues we fixed, and more are also linked in the description.

  • @hugebug4ever
    @hugebug4ever หลายเดือนก่อน

    Great stroy, learned a lot from the journey, thanks! One thing to clarify, in the conclusion page, I think the terminology "smoke test" shoud be rephrased to "Integration tests" or "auto test", as "smoke test" means specific to a test finish in a seconds or so.

  • @jackkendall6420
    @jackkendall6420 3 หลายเดือนก่อน +1

    What an odyssey! This is a tour de force kind of video alright

    • @Petabridge
      @Petabridge  2 หลายเดือนก่อน

      Thanks!

  • @mitzrael2k6
    @mitzrael2k6 3 หลายเดือนก่อน +1

    Nice video, Aaron!
    Just one thing that wasn't clear to me in the end: in case the split brains situation happened to my system, before the fix, what should I do to remedy it? Just restart the server?

    • @Petabridge
      @Petabridge  3 หลายเดือนก่อน +1

      That's a great question - so you'd need at least a 3-node cluster in order to even run into this problem (something I probably should have mentioned in the video) and the issue would be that at least two of the surviving nodes would have duplicates. The most robust solution would be to SLOWLY restart both of them - waiting at least 20 seconds between each. That'd remedy the current situation and prevent it from happening again. Detecting the duplicate would be the hardest part, probably, unless you had good OpenTelemetry support that could prove that more than 1 instance of the same entity actor was alive concurrently.

  • @KvapuJanjalia
    @KvapuJanjalia 3 หลายเดือนก่อน +2

    Fixed a ".NET bug" - Do you mean a bug in .NET Runtime? Is this a clickbait title?

    • @Petabridge
      @Petabridge  3 หลายเดือนก่อน +2

      It's a bug in a popular .NET distributed programming framework? How on earth would that title be clickbait?