Advanced Topics: Hardware Memory Barriers

แชร์
ฝัง

ความคิดเห็น • 19

  • @MiriBenNissan
    @MiriBenNissan 4 หลายเดือนก่อน +1

    Such a great lecture! This is the best explanation of HW barriers I heard. Thanks for that!

  • @blipman17
    @blipman17 3 ปีที่แล้ว +5

    As always, awesome content! Also very informative on how and why the cpu reorders loads and stores.

    • @CoffeeBeforeArch
      @CoffeeBeforeArch  3 ปีที่แล้ว +3

      Thanks - glad you found it informative!

  • @markusbuchholz3518
    @markusbuchholz3518 3 ปีที่แล้ว +2

    Thanks Nick for sharing your passion and knowledge. As I mentioned some time ago about your video performance => how you "discuss with audience" is probably taken directly from Broadway theatres (or even you are far beyond) . Amazing for all the senses. Great you display advanced topics since it forces community to capture new knowledge. It is probably obvious the programming concepts/techniques go further so the complexity has to grow. Great to see Intel which develops state-of-the-art compilers and libraries (I mean the latest releases oneAPI TBB). Thanks && have a nice day!

  • @archanasampath4809
    @archanasampath4809 ปีที่แล้ว +1

    Thats the best explanation for barrier!

  • @abhishekpandey71
    @abhishekpandey71 2 ปีที่แล้ว +1

    awesome, exactly what i was looking for.

  • @__karthikkaranth__
    @__karthikkaranth__ 3 ปีที่แล้ว +2

    1) Why does the store buffer have 56 entries? Is this just some heuristic chosen by Intel?
    2) Would it make sense to have more granular fences? Ex: mfence(0x456688ff) to just flush that one write? Or is too granular to be efficient?
    Thanks for making these videos, I've learnt so much from them!

    • @CoffeeBeforeArch
      @CoffeeBeforeArch  3 ปีที่แล้ว +2

      1. Like the size/configuration of any hardware structure, it'll be determined by some sort "common case" analysis. 56 entries is probably just "good enough" for most cases.
      2. Depending on the specifics of what you mean, that would break the x86 processor ordering memory model. If the write you want to flush is not next to be drained to the L1$ in what would logically be a FIFO store buffer, you would be reordering that write past earlier writes in program order that have not become globally visible yet. There are more relaxed memory models that allow reordering of more than older writes with later reads, but x86's does not allow this.
      Glad you are enjoying the videos!

  • @goobensteen
    @goobensteen 3 ปีที่แล้ว +1

    Great content, as always. Do you have a discord server or something similar for questions? It's kinda hard to elaborate in the comment section here.

    • @CoffeeBeforeArch
      @CoffeeBeforeArch  3 ปีที่แล้ว

      Nothing set up at the moment. Easiest way to chat is by email (coffeebeforearch@gmail.com) or to schedule a meeting through google for something like a video call

  • @jankeshchakravarthy9389
    @jankeshchakravarthy9389 ปีที่แล้ว +1

    Thanks Nick for very informative videos. I wonder why software memory barrier did not work? Thanks

    • @CoffeeBeforeArch
      @CoffeeBeforeArch  ปีที่แล้ว

      Software barriers ensure that the compiler does not reorder memory accesses, but that makes no guarantees about what the hardware does at runtime (it’s free to execute some operations out of program order)

  • @93Mosfet
    @93Mosfet 2 ปีที่แล้ว

    Really good video. Thanks!

  • @archanasampath4809
    @archanasampath4809 11 หลายเดือนก่อน

    Instead of the hardware barrier instruction, we can also readback the value we just wrote (from the same address)..This will force CPU to flush the writes before the read..

    • @leonwoestenberg6001
      @leonwoestenberg6001 5 หลายเดือนก่อน

      Not in the general case, as the compiler might not do the actual read from memory, as it already has the value in a register. Remember that the compiler only sees one thread of execution and will optimize that. (With volatile, this might work, but at a cost.)

  • @nisachannel7077
    @nisachannel7077 3 ปีที่แล้ว

    Awesome, awesome!!

  • @qubasaqube1112
    @qubasaqube1112 3 ปีที่แล้ว +2

    Uhh ohh that’s something that nobody knows about. God damn concurrency is hard. Thanks a lot!

    • @CoffeeBeforeArch
      @CoffeeBeforeArch  3 ปีที่แล้ว

      Never a dull moment in parallel programming :^)

  • @raghul1208
    @raghul1208 2 ปีที่แล้ว

    awesome!