ModCon 2023 Breakout Session: MAX Engine Performance
ฝัง
- เผยแพร่เมื่อ 11 ก.พ. 2025
- In this session, Modular engineers Abdul Dakkak and Hengjie Wang discuss Modular AI Engine performance across models and hardware architectures. They dive deep into how AI Engine works and show its performance against Pytorch and TensorFlow and demonstrate how AI Engine scales to models of all sizes including LLMs.
00:00 Introduction and performance numbers
04:33 Runtime, compiler, and kernels working in unison
05:30 Runtime parralelism and memory management
05:50 Moving transforms out of inference to model initialization
06:04 Automatic fusion of graphs to a single op
06:30 Specialized kernels on dimensions
08:49 Simplification with Mojo
10:26 Generality across hardware
11:51 Cross platform development example
13:22 Kernel JIT
13:43 Develepor friendly
14:02 Autotuning, Custom Ops, Multi-model support
14:58 Stable diffusion example