Great talk, it's kind of hard to find good information on DSP optimization as it is a very specific problem compared to general purpose optimizations. Wish more people saw this
Here is a summary of the talk with timestamps: Introduction @00:00 Speaker is Gusta Anderson, senior software dev at Elk Audio Elk Audio makes Elk Audio OS (Linux-based OS for musical instruments) and Elk Live (low latency online music collaboration tool) Optimizing DSP code with compiler assistance @01:00 Readability and clarity should be prioritized, but code should express intent so compiler can optimize Collapsing abstractions: abstractions like loops, functions, classes should disappear when compiling What affects performance @05:29 CPU bound, memory bound, pipeline bound (long dependency chains) What compilers do @11:29 Inlining, loop optimization, auto-vectorization, rearranging statements, removing unneeded variables Compilers are good at optimizing, so avoid sacrificing readability prematurely Promoting auto-vectorization @17:39 Loop count should be known at compile time, no control flow in loop Fixed buffer sizes allow better optimization, but many plugin APIs don't support this well Fixed size arrays often faster than dynamic ones like std::vector Inlining and constants @21:01 Functions in headers, especially member functions, are usually inlined Avoid virtual functions in inner loops Be careful with float/double division - multiply by reciprocal if dividing by a constant value Branchless code @26:06 Branchless code is deterministic and doesn't depend on data Compilers can often make simple code branchless, like max(), even in loops - trust the compiler Use optimized math libraries before trying to hand-code branchless algorithms Reducing memory access @31:29 Fastest data is in registers, then cache, then main memory Copying state variables to stack may allow better compiler optimization Aliasing (having multiple pointers to same data) can prevent optimization C++ lacks good "restrict" keyword to indicate no aliasing - raw pointers or compiler extensions needed Handling recursive filters @38:50 Recursive algorithms like IIR filters are hard to optimize due to dependencies Interleaving samples from parallel instances (e.g. stereo) can help Adding delays between cascaded sections allows parallelizing higher-order filters Summary @42:29 Use fixed buffer sizes Ensure functions are inlined Write clear branchless code Exploit parallelism opportunities Benchmark standard library functions Avoid float divisions More resources @43:52 Compiler Explorer, Agner Fog's optimization guides, Elk Audio blog posts, CppCon/ADC talks
Incredible talk! Very clear and good examples
Great talk, it's kind of hard to find good information on DSP optimization as it is a very specific problem compared to general purpose optimizations. Wish more people saw this
Here is a summary of the talk with timestamps:
Introduction @00:00
Speaker is Gusta Anderson, senior software dev at Elk Audio
Elk Audio makes Elk Audio OS (Linux-based OS for musical instruments) and Elk Live (low latency online music collaboration tool)
Optimizing DSP code with compiler assistance @01:00
Readability and clarity should be prioritized, but code should express intent so compiler can optimize
Collapsing abstractions: abstractions like loops, functions, classes should disappear when compiling
What affects performance @05:29
CPU bound, memory bound, pipeline bound (long dependency chains)
What compilers do @11:29
Inlining, loop optimization, auto-vectorization, rearranging statements, removing unneeded variables
Compilers are good at optimizing, so avoid sacrificing readability prematurely
Promoting auto-vectorization @17:39
Loop count should be known at compile time, no control flow in loop
Fixed buffer sizes allow better optimization, but many plugin APIs don't support this well
Fixed size arrays often faster than dynamic ones like std::vector
Inlining and constants @21:01
Functions in headers, especially member functions, are usually inlined
Avoid virtual functions in inner loops
Be careful with float/double division - multiply by reciprocal if dividing by a constant value
Branchless code @26:06
Branchless code is deterministic and doesn't depend on data
Compilers can often make simple code branchless, like max(), even in loops - trust the compiler
Use optimized math libraries before trying to hand-code branchless algorithms
Reducing memory access @31:29
Fastest data is in registers, then cache, then main memory
Copying state variables to stack may allow better compiler optimization
Aliasing (having multiple pointers to same data) can prevent optimization
C++ lacks good "restrict" keyword to indicate no aliasing - raw pointers or compiler extensions needed
Handling recursive filters @38:50
Recursive algorithms like IIR filters are hard to optimize due to dependencies
Interleaving samples from parallel instances (e.g. stereo) can help
Adding delays between cascaded sections allows parallelizing higher-order filters
Summary @42:29
Use fixed buffer sizes
Ensure functions are inlined
Write clear branchless code
Exploit parallelism opportunities
Benchmark standard library functions
Avoid float divisions
More resources @43:52
Compiler Explorer, Agner Fog's optimization guides, Elk Audio blog posts, CppCon/ADC talks