Ethernet Fabrics for GenAI workloads

แชร์
ฝัง
  • เผยแพร่เมื่อ 19 พ.ค. 2024
  • In this video, Sharada Yeluri, Senior Director of Engineering at Juniper Networks, describes the traffic patterns between the GPUs during LLM and GenAI model training and how to optimize the network topologies for these traffic patterns. She compares the different switch options and the challenges in controlling congestion and improving the performance of training workloads.
    Read more about GPU Fabrics for GenAI Workloads:
    www.linkedin.com/pulse/gpu-fa...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 2

  • @nabromov
    @nabromov 29 วันที่ผ่านมา +1

    insightful! thank you

  • @jasoniannone9675
    @jasoniannone9675 29 วันที่ผ่านมา +2

    Thank you for sharing!
    RE: ~15:30 Modular vs. Fixed Switches - Are deep buffers desirable in RDMA? Jitter and latency variation and associated frame reordering over parallel flows seems like a sensitivity. Don't these absurdly integrated systems (GPU to GPU) have their own transmission control mechanisms and realtime knowledge of the state of the flows?
    Is the Ethernet network doing ECN or Pause in these deployments?
    I think the slide at 21:50 addresses my questions. Sorry. Impatient.