CppCon 2018: Olivier Giroux “High-Radix Concurrent C++”

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 ก.ย. 2024
  • CppCon.org
    -
    Presentation Slides, PDFs, Source Code and other presenter materials are available at: github.com/Cpp...
    -
    In this talk we will share the joy of seeing ordinary C++ concurrent code, doing ordinary concurrent things in 100000 concurrent threads. I’ll analyze how the code is behaving and point out what is similar and different about the execution of that code at this scale. This talk is the code-heavy continuation of last year’s English-heavy “Designing C++ Hardware” about the Volta architecture.
    Featuring:
    Multiple compilers.
    Godbolting.
    C++20 concurrency predictions.
    Live demo (attempt).
    -
    Olivier Giroux, NVIDIA
    Distinguished Architect
    Olivier Giroux has worked on nine GPU and five SM architecture generations released by NVIDIA. Lately, he works to clarify the forms and semantics of valid GPU programs, present and future. He was the programming model lead for the NVIDIA Volta architecture. He is the chair of SG1, the Concurrency study group of the ISO C++ committee, and is a passionate contributor to C++'s forward progress guarantees and memory model.
    -
    Videos Filmed & Edited by Bash Films: www.BashFilms.com *-----*
    Register Now For CppCon 2022: cppcon.org/reg...
    *-----*

ความคิดเห็น • 19

  • @piotrarturklos
    @piotrarturklos 5 ปีที่แล้ว +4

    This video contains a crash course on CUDA, as well as a great explanation of what CUDA is and what is it capable of.

  • @krytharn
    @krytharn 5 ปีที่แล้ว

    Awesome presentation, well prepared and very interesting.

  • @zhaoli2984
    @zhaoli2984 5 ปีที่แล้ว

    impressive talk, thank you.

  • @miketag4499
    @miketag4499 5 ปีที่แล้ว

    Great video. Thanks for sharing.

  • @OperationDarkside
    @OperationDarkside 5 ปีที่แล้ว +14

    What a surprising change from heavy french accent to barely noticable

  • @llothar68
    @llothar68 5 ปีที่แล้ว +1

    9min into and upvoted. Finally a very interesting topic for me. ..... Later: still nice but unfortunately we have to wait until this hardware becomes mainstream and we can use it in our programs. Lucky are the server side programmers :-(

    • @BillyONeal
      @BillyONeal 5 ปีที่แล้ว

      If your algorithm is wait free it works on older GPUs too.

  • @OperationDarkside
    @OperationDarkside 5 ปีที่แล้ว

    Will there be a re-upload? Because the microphons seemed to be off.

    • @piotrarturklos
      @piotrarturklos 5 ปีที่แล้ว +1

      The microphones are only off for the questions. Furthermore, if you are in a quiet room, you can hear the questions anyway.

  • @TruthNerds
    @TruthNerds 5 ปีที่แล้ว

    I didn't even watch the whole talk TBH, but 7:28 triggered me[1]… a trie is *not* characterized by a "large fanout". Many tries have a (potentially) large fanout, but binary tries are perfectly possible, and occasionally very useful, particularly for CIDR[2], even though actual hardware routers on the Internet usually use TCAM[3] instead.
    [1] www.xkcd.com/386
    [2] de.wikipedia.org/wiki/Classless_Inter-Domain_Routing
    [3] en.wikipedia.org/wiki/Content-addressable_memory#Ternary_CAMs

  • @the1969huff
    @the1969huff 5 ปีที่แล้ว

    *its title

  • @Voy2378
    @Voy2378 5 ปีที่แล้ว +1

    I get that this guy works for nV but it is really deceptive what he did with CPU... nobody would write such dumb code for CPU. If you wanted to really parallelize code on CPU you would have 1 triee per CPU core and partition words based on first letter(or something smarter to each CPU).... way he wrote it contention is horrific.

    • @catlakprofesormfb
      @catlakprofesormfb 5 ปีที่แล้ว +2

      But it is the same for the gpu code

    • @BillyONeal
      @BillyONeal 5 ปีที่แล้ว +2

      Fan out here is like 26x, that’s the high radix part. Actual contention is very low.

    • @Voy2378
      @Voy2378 5 ปีที่แล้ว

      @@BillyONeal If only there was this thing called false sharing...

    • @Voy2378
      @Voy2378 5 ปีที่แล้ว

      @@BillyONeal also he probably uses the same allocator for all 40 cores... lol

    • @BillyONeal
      @BillyONeal 5 ปีที่แล้ว

      Voy2378 these things are way bigger than cache lines so false sharing isn’t really an issue.