The Art of SIMD Programming by Sergey Slotin

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 พ.ย. 2024
  • Modern hardware is highly parallel, but not only in terms of multiprocessing. There are many other forms of parallelism that, if used correctly, can greatly boost program efficiency - and without requiring more CPU cores. One such type of parallelism actively adopted by CPUs is "Single Instruction, Multiple Data" (SIMD): a class of instructions that can perform the same operation on a block of 16, 32, or 64 bytes of data in one go, yielding a proportional speedup over scalar code.
    While SIMD shares many similarities with classic multiprocessor computing, it is quite different and often requires creative use of the instruction set. In this talk, we will give a general introduction to the technology (focusing on x86/AVX2), derive and implement several state-of-the-art SIMD algorithms, and discuss their use in impactful open-source projects.
    skillsmatter.c...

ความคิดเห็น • 9

  • @yuangchen905
    @yuangchen905 ปีที่แล้ว +3

    great video. Thank very much for your lightening example and insightful explanation!

  • @martingeorgiev999
    @martingeorgiev999 ปีที่แล้ว +3

    I don't understand why these architecture specific instructions are not recognized directly by gcc on O3.

    • @bouazzase4202
      @bouazzase4202 ปีที่แล้ว +10

      they are, when you give the -march= argument, otherwise the compiler doesn't know which instruction sets are allowed and will fall back to a default (usually x86-64 without avx)

  • @OptimusVlad
    @OptimusVlad หลายเดือนก่อน

    I don't understand why, in the masking intro slide (at 22:42), the author says that the following has no branches:
    for (int i = 0; i < N; i++)
    s += (a[i] < 50 ? a[i] : 0);
    That's a ternary operation, which branches between left and right expressions. What am I missing?

    • @az09letters92
      @az09letters92 หลายเดือนก่อน

      That will execute both options and disregard (mask) incorrect ones. Counterintuitively this yields a massive speedup! No branches.

    • @OptimusVlad
      @OptimusVlad หลายเดือนก่อน

      @@az09letters92 Is that because the compiler can prove that executing both sides has no side-effects? Because if the left or right were expressions that could have side effects, then it would be a short-circuiting branch, correct?

  • @Roxas99Yami
    @Roxas99Yami ปีที่แล้ว +1

    Thanks very appreciated. Especially the examples in C. Is this directky compatible in Cython ?

    • @Roxas99Yami
      @Roxas99Yami ปีที่แล้ว

      The intrinsics i mean

  • @petrvset1960
    @petrvset1960 3 หลายเดือนก่อน

    Hard to understand English and unpleasantly small text...