Crossroads 3D-FPGA Academic Research Center
Crossroads 3D-FPGA Academic Research Center
  • 33
  • 41 214
Lowering the barrier to entry to use customized hardware for cloud systems [Invited]
Speaker: Moein Khazraee, Nvidia
With the increasing demand in cloud computing, there is a vital need to efficiently transfer the data and process them on high-performance and scalable systems within the data center. However, network bandwidth is outpacing our ability to process packets in software, forcing cloud providers to resort to specialized hardware. Unfortunately, hardware development is an inherently intricate, laborious, and costly procedure. Furthermore, integrating specialized hardware into a networked application requires hardware-software co-design, exacerbating the situation as developers with markedly different specializations have to collaborate.
In this talk, I will discuss how we can build frameworks to systematically tackle these challenges and lower the barrier to entry of hardware customization in cloud systems. Then, I will focus on two frameworks: Rosebud for wired, and SparSDR for wireless networks in base stations. The Rosebud framework brings software-like control and debugging to FPGA-based middleboxes, which enabled us to port the state-of-the-art intrusion detector in less than a month and double its throughput to 200 Gbps. The SparSDR framework makes the backhaul and computation of Software-Defined Radios more efficient while maintaining their universality. It enables backhauling a 100 MHz frequency band over only 224 Mbps instead of 3.2 Gbps, and decoding BLE packets in real-time on a low-end processor.
Speaker Bio: Moein Khazraee is a Senior Architect at NVIDIA, focusing on applied research in networking for large scale systems, such as high performance computing and machine learning. Previously, he was a postdoctoral research associate in MIT Computer Science & Artificial Intelligence Laboratory, where he focused on network optimizations for machine learning, as well as benefiting from the rising Silicon Photonics technology to scale performance beyond single-chip limitations. He received his PhD in Computer Science and Engineering from UC San Diego.
His research interests lie primarily in the intersection of network systems and computer architecture. He has worked on bringing the hardware customization to different parts of the cloud infrastructure, such as building data-centers from ASICs, co-optimizing network topology and ML parallelization strategy, simplifying FPGA development for high-bandwidth network middleboxes, and developing backhaul and compute-efficient software-defined radios for mobile base stations.
----------------------------------------------------------
For more videos subscribe to the TH-cam channel th-cam.com/channels/GXVSHxK6PZKFiP9-XtjxtA.html!
For more information visit www.crossroadsfpga.org/seminars.html
----------------------------------------------------------
Website: www.crossroadsfpga.org
The Intel/VMware Crossroads 3D-FPGA Academic Research Center is jointly supported by Intel and VMware. The center is committed to public and free dissemination of its research outcome.
----------------------------------------------------------
Chapters
00:00
มุมมอง: 236

วีดีโอ

Under the Hood of OpenFPGA [Invited]
มุมมอง 68611 หลายเดือนก่อน
Speaker: Pierre-Emmanuel Gaillardon, The University of Utah In this talk, we will introduce the OpenFPGA framework. whose aim is to generate highly customizable Field Programmable Gate Array (FPGA) fabrics and their supporting EDA Rows. Following the footsteps of the RISC-V initiative, OpenFPGA bring reconfigurable logic into the open-source community and closes the performance gap with commerc...
Evading Datacenter Tax Using MangoBoost’s Customizable FPGA Data Processing Units [Invited]
มุมมอง 55311 หลายเดือนก่อน
Speaker: Eriko Nurvitadhi, Mangoboost Modern datacenter servers rely on an increasing number of devices to improve efficiency in data-centric tasks, such as data storage (SSDs), movement (NICs), and processing (GPU, NPU, etc). Moreover, they offer advanced infrastructure for users to access resources easily/flexibly (e.g., via virtual machines, containers) and deploy/scale applications (e.g., v...
CAD and Architecture Exploration Tools for Next-Generation Reconfigurable Acceleration Devices
มุมมอง 229ปีที่แล้ว
Speaker: Andrew Boutros, University of Toronto Field-programmable gate arrays (FPGAs) have evolved beyond a fabric of soft logic and hard blocks surrounded by programmable routing to also incorporate high-performance networks-on-chip (NoCs), general-purpose processor cores and application-specific accelerators. These new reconfigurable acceleration devices (RADs) open up a myriad of architectur...
How hard is it to use an FPGA for compute acceleration in 2023?
มุมมอง 1.9Kปีที่แล้ว
Speaker: James C. Hoe, Carnegie Mellon University In this talk I want to explore the question: how hard is it to use an FPGA in a computer system in 2023? Secondarily, there is the question: what application domain would most profit from FPGA acceleration if the historical programmability and usability challenges are removed. With advances in new single-source heterogeneous programming language...
Terminus: Moving the Center of Cloud Servers from Cores to SmartNICs and Beyond
มุมมอง 437ปีที่แล้ว
Speaker: Derek Chiou, The University of Texas at Austin Since the start of computing, server design has been core-centric. As infrastructure functionality, such as network virtualization, storage virtualization, encryption, etc. consumes more and more computational power, the center of the server is moving from cores/processors to SmartNICs. This talk motivates this move, describes how SmartNIC...
Re-envisioning generic server architectures for I/O-driven compute
มุมมอง 208ปีที่แล้ว
Speaker: Justine Sherry, Carnegie Mellon University In this talk, I will explore how traditional server architectures are "CPU-driven" rather than "I/O-driven", and why this architecture is a poor for a wide range of networked applications. I will highlight three Crossroads RV1 projects targeting I/O driven compute, and discuss how the Crossroads 3D FPGA will help server architectures better ma...
Capturing Realistic Architectures for Field Programmable Gate Array Optimization
มุมมอง 260ปีที่แล้ว
Speaker: Kimia Talaei, University of Toronto In this talk, we will present a VPR-compatible architecture description of Intel's Stratix 10 device. This capture enables benchmarking and optimization of FPGA CAD flows on a more complex architecture, and serves as a baseline architecture on which researchers can evaluate architectural enhancements. We will describe how the primitives, functional b...
CHARM: Composing Heterogeneous AcceleRators for Matrix Multiply on Versal ACAP [Invited]
มุมมอง 1Kปีที่แล้ว
Speaker: Peipei Zhou, University of Pittsburgh Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with the high computation demands of these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC accelerators have emerged as promising platforms. For example, the AMD/Xilinx Versal ACAP architecture combines g...
The Future of Computing Beyond Moore’s Law [Invited]
มุมมอง 734ปีที่แล้ว
Speaker: John Shalf, Lawrence Berkeley National Laboratory Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power, and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach...
Soft Embedded FPGA Fabrics: Top-down Physical Design and Applications [Invited]
มุมมอง 405ปีที่แล้ว
Speaker: Prashanth Mohan, Carnegie Mellon University Embedded FPGA (eFPGA) fabrics are increasingly used in modern System-on-Chip (SoC) designs as their programmability can be leveraged to accelerate a variety of workloads and enable upgradeability, feature addition, and security. As technology scales down to sub 5nm nodes, designing eFPGA fabrics using custom layout techniques requires extensi...
Groq’s Software-Defined Hardware for Dataflow Compute [Invited]
มุมมอง 1.2Kปีที่แล้ว
Speaker: Andrew Bitar, Groq With the end of Dennard Scaling and explosion of data-flow compute in the domains of AI and HPC, there has been a new renaissance in domain specific architectures (DSAs) to help meet today’s compute demands. A large swath of these architectures are spatial in nature, where compute is unrolled in space to expose more parallelism for data-flow-heavy workloads. With the...
HPIPE-NX: Leveraging Tensor Blocks for High-Performance CNN Inference Acceleration on FPGAs
มุมมอง 298ปีที่แล้ว
Speaker: Marius Stan, University of Toronto HPIPE is a state-of-the-art sparse-aware CNN accelerator for FPGAs. Through building deeply pipelined, customized hardware for every layer in the CNN and modelling the physical device characteristics, HPIPE can achieve very high compute density while maintaining high operating frequency. HPIPE also leverages sparsity, allowing it to skip multiplicatio...
OverGen: Improving FPGA Usability through Domain-specific Overlay Generation [Invited]
มุมมอง 495ปีที่แล้ว
Speaker: Tony Nowatzki, University of California at Los Angelos The mainstream programming approach for FPGAs is high level synthesis (HLS). Unfortunately, HLS leaves a significant programmability gap in terms of reconfigurability, customization and versatility: 1. FPGA physical design can take hours, 2. FPGA reconfiguration time limits the applicability of HLS to workloads with little dynamic ...
Crossroads FPGA Seminar: RAD-Sim - Rapid Architecture Exploration for Novel Reconfigurable Devices
มุมมอง 2702 ปีที่แล้ว
Speaker: Andrew Boutros, University of Toronto To improve the efficiency of FPGAs for new datacenter use cases and data-intensive applications, a new class of reconfigurable acceleration devices (RADs) is emerging. In these devices, the FPGA fine-grained reconfigurable fabric is a component of a bigger monolithic or multi-die system-in-package that can incorporate general-purpose software-progr...
FPGAs are (not) Good at Deep Learning [Invited]
มุมมอง 21K2 ปีที่แล้ว
FPGAs are (not) Good at Deep Learning [Invited]
Near-Storage Acceleration in Practice: Opportunities and Challenges [Invited]
มุมมอง 2352 ปีที่แล้ว
Near-Storage Acceleration in Practice: Opportunities and Challenges [Invited]
Crossroads FPGA Seminar: High Performance CNN Inference Acceleration on FPGAs
มุมมอง 6882 ปีที่แล้ว
Crossroads FPGA Seminar: High Performance CNN Inference Acceleration on FPGAs
Re-thinking Data Center Hardware Architectures from the Ground-up [Invited]
มุมมอง 5852 ปีที่แล้ว
Re-thinking Data Center Hardware Architectures from the Ground-up [Invited]
Towards Predictable and Efficient Datacenter Storage [Invited]
มุมมอง 2622 ปีที่แล้ว
Towards Predictable and Efficient Datacenter Storage [Invited]
Unleashing the Potential of In-Network Computing [Invited]
มุมมอง 6092 ปีที่แล้ว
Unleashing the Potential of In-Network Computing [Invited]
Future of Programmable Hardware (CONIX)
มุมมอง 2.5K2 ปีที่แล้ว
Future of Programmable Hardware (CONIX)
AIgean: An Open Framework for Deploying Machine Learning on Heterogeneous Clusters [Invited]
มุมมอง 2643 ปีที่แล้ว
AIgean: An Open Framework for Deploying Machine Learning on Heterogeneous Clusters [Invited]
Crossroads FPGA Seminar: FPGA Placement - Recent Progress and Road Ahead
มุมมอง 7593 ปีที่แล้ว
Crossroads FPGA Seminar: FPGA Placement - Recent Progress and Road Ahead
RV1 Overview: Exploring Data on the Move Applications
มุมมอง 2523 ปีที่แล้ว
RV1 Overview: Exploring Data on the Move Applications
Crossroads FPGA Seminar: Verilog to Routing (VTR) A Flexible CAD Flow to Explore FPGA Architectures
มุมมอง 1.8K3 ปีที่แล้ว
Crossroads FPGA Seminar: Verilog to Routing (VTR) A Flexible CAD Flow to Explore FPGA Architectures
RV5 Overview: From "Field Programmable" to "Programmable"
มุมมอง 4423 ปีที่แล้ว
RV5 Overview: From "Field Programmable" to "Programmable"
Crossroads FPGA Seminar: Pigasus - Efficient Handling of Input-Dependent Streaming on FPGAs
มุมมอง 5863 ปีที่แล้ว
Crossroads FPGA Seminar: Pigasus - Efficient Handling of Input-Dependent Streaming on FPGAs
RV2 Overview: Soft Processor Overlays to Improve Time-to-Solution
มุมมอง 5653 ปีที่แล้ว
RV2 Overview: Soft Processor Overlays to Improve Time-to-Solution
Crossroads FPGA Seminar: High-Performance Code Generation for Graph Applications
มุมมอง 2183 ปีที่แล้ว
Crossroads FPGA Seminar: High-Performance Code Generation for Graph Applications

ความคิดเห็น

  • @chriswysocki8816
    @chriswysocki8816 3 หลายเดือนก่อน

    did I hear that right, mr. presenter? you did this project while working at Intel? And you were not using Intel/Altera FPGAs but Xilinx. Why???? As a former Altera/Intel manager in the FPGA group I feel disappointed :)

  • @enkidughom2508
    @enkidughom2508 6 หลายเดือนก่อน

    Excellent!! Is there a technical report following this? Would l9ve to dive into the details and try to reproduce some results

  • @eafindme
    @eafindme 7 หลายเดือนก่อน

    Imagine that you have 3 binary files, each represent a FPGA binary for different DNN models. Then you have an FPGA. Instead of making hardware architecture universal that could support all 3 DNN models, like GPU or ASIC, you could just optimize each DNN model for FPGA via software codesign, and reprogram the FPGA on the fly, such that 3 of the DNN models has distinctive hardware optimization. Now the FPGA has the same flex as ASIC yet cost way less space and money. This is where the fun begins.

  • @aqf0786
    @aqf0786 7 หลายเดือนก่อน

    If you knew the fundamental difference in area, speed and power of an FPGA vs ASIC, why not just focus on the key architectural improvements and make an ASIC? Surely, Intel would be able to do so?

  • @BenjaminGatti
    @BenjaminGatti 8 หลายเดือนก่อน

    Sing song la de da.

  • @BharatIndiaHindustan628
    @BharatIndiaHindustan628 9 หลายเดือนก่อน

    Hi Mohamed, I'm a beginner at AI and deep learning. And I have just started to learn these things. In order to build some deep learning hardware applications/IPs for practice and hands on purpose. I'm really fascinated with the things that AI can do in field of health monitoring and medical diagnostics. I'll be really grateful and happy if you can provide me your mail id. I would like to keep in touch with you for guidance and mentorship. Thanks

  • @User_1795
    @User_1795 10 หลายเดือนก่อน

    The 90's called they want their mic back.

  • @rossoneill5576
    @rossoneill5576 10 หลายเดือนก่อน

    🎯 Key Takeaways for quick navigation: 00:00 📅 *Introduction to FPGA and Deep Learning* - FPGA's initial attempt in deep learning in 2016. - Comparison to GPUs and the development of deep learning accelerators. - Overview of FPGA's early optimization strategies. 10:00 🧠 *Challenges and Competition from GPUs* - The rapid evolution of GPUs in deep learning with low precision arithmetic. - FPGAs falling behind ASICs and GPUs in performance. - The limitations of FPGA customization and adaptability. 20:00 🔄 *Exploring Future Possibilities for FPGAs in Deep Learning* - The concept of co-design in deep learning hardware and software. - Advocating for a flexible approach to optimizing both hardware and neural network architecture. - The potential of automated machine learning for FPGA-based deep learning. 20:46 🧠 *Automated Machine Learning and Neural Architecture Search* - Different deep neural networks (DNNs) can perform the same task. - Automated machine learning and neural architecture search are common practices in industry. - FPGAs offer the ability to customize both neural network architecture and hardware parameters simultaneously, leading to better performance. 28:09 ⚙️ *Logic Neural Networks and Logic Shrinkage* - Logic neural networks involve transforming DNNs into circuits using look-up tables (LUTs). - Logic shrinkage optimizes circuit netlists for FPGA implementation, resulting in higher efficiency. - Fine-grained pruning and training in the LUT domain can lead to significant FPGA area efficiency improvements. 36:00 🖥️ *FPGA DLA Devices and End-to-End Deep Learning Workloads* - Deep learning workloads are heterogeneous and consist of more than just DNNs. - Accelerating the entire end-to-end deep learning workload, including pre-processing and post-processing, is essential. - Optimizing system architecture for hardware acceleration beyond DNNs is necessary for overall performance improvements. 42:33 🚀 *FPGA's role in deep learning acceleration* - FPGA's suitability for accelerating deep neural networks (DNNs). - The importance of reconfigurable hardware in data centers. 45:32 🔄 *Hybrid FPGA-DLA devices for heterogeneous workloads* - The need for hybrid FPGA-DLA devices for heterogeneous workloads. - Implementing custom pre-processing and post-processing on FPGAs. - Research questions and challenges in developing these hybrid devices. 50:18 🌐 *Embedded Networks-on-Chip (NoC) for FPGAs* - Introduction to Embedded Networks-on-Chip (NoC) for FPGAs. - Solving FPGA designer challenges in system-level interconnects. - Benefits and efficiency of using an embedded NoC in FPGAs. 53:21 💡 *Using NoC-enabled FPGAs for pre and post-processing in deep learning* - Leveraging NoC-enabled FPGAs for pre and post-processing in deep learning workloads. - Connecting efficiently to deep learning accelerators. - The potential of FPGA devices in accelerating deep learning applications. Made with HARPA AI

  • @sergiosimonis
    @sergiosimonis 11 หลายเดือนก่อน

    *promo sm* 🎶

  • @rulekop
    @rulekop ปีที่แล้ว

    Very interesting and clearly presented!

  • @cmporeddy
    @cmporeddy ปีที่แล้ว

    Where can I download this presentation PPT?

  • @jasenq6986
    @jasenq6986 ปีที่แล้ว

    Software defined data movement is a great way to put it

  • @moeinmaleki7859
    @moeinmaleki7859 ปีที่แล้ว

    If I could clap virtually, I would stand up and clap in the end. what an amazing presentation and interesting topic. thank you for sharing this video!

  • @prat1024
    @prat1024 ปีที่แล้ว

    The presentation was extraordinary!! I am a student at the university of Stuttgart as well and this post randomly came across my feed.

  • @wayne1950
    @wayne1950 ปีที่แล้ว

    Most Important for me 🙏😙😙💘!!! Grow your online following - *promo sm* !!!

  • @littlecandle328
    @littlecandle328 ปีที่แล้ว

    Please can u help me in XSG matlab by fpga

  • @MrTweetyhack
    @MrTweetyhack 2 ปีที่แล้ว

    "If you can build it in ASIC, it won't be competitive on an FPGA" So what can't be built in ASIC? Actually, this has been know for a long long time

    • @gm7361
      @gm7361 ปีที่แล้ว

      it means if you have the resources and the budget.

    • @vicktorioalhakim3666
      @vicktorioalhakim3666 9 หลายเดือนก่อน

      The problem is that ML engineering is a dynamic discipline: models change all the time, and are updated. So, if one wants to map their model in an efficient way to hardware wrt power usage, resource usage, throughput, latency, etc, then the hardware must also be flexible and dynamic. If you design an ASIC-based accelerator, you kinda have to make it as general as possible to support various changes to topology and parameters of the model. Because the architecture of this accelerator is fixed, this means that often you will have underutilization (resource waste, higher power usage, etc..) or overutilization (lower throughput, higher latency, etc). Now, if you have to tape out many ASICs for different types of models, then this will become costly quite quickly, and quite frankly a waste since newer models will come up, quickly deprecating the design. This is where the power of FPGAs can come in handy: here you have the power to customize your HW arch on the fly, such that it suits the given model best. The biggest difficulty is coming up with a good HW "compiler", so that you minimize the amount of manual labor involved in mapping a model to the HW, including the pre and post-processing stages.

  • @jasoncheung1388
    @jasoncheung1388 2 ปีที่แล้ว

    Awesome work! Thanks for the talk.

  • @harishabibullah1286
    @harishabibullah1286 2 ปีที่แล้ว

    Thanks for the talk Mr. Abdelfattah. Is there any course / training to learn these stages of custom h/w kernel development for deep learning ? I am also in a similar field, and my approach is simply to import the hardware from the Synthesizing tool, like Vitis HLS. I am intrested in defining or tweeking some paramteres to make a more customized hardware.

    • @mabdelfattah88
      @mabdelfattah88 ปีที่แล้ว

      My course on ML HW & SYS (www.youtube.com/@mabdelfattah88) could help give you an overview but we don't really go deep into the hardware design part of it. I am preparing a new FPGA-focused course now which should cover the detailed design of HW accelerators - I hope to also post parts of it online. Stay tuned!

    • @vatsan2483
      @vatsan2483 ปีที่แล้ว

      @@mabdelfattah88 Looking forward to this course.. but based on the above presentation a quick question sir.. on the topic of co-design for DNN, you had suggested that FPGA-X can achieve 100imgs/s for imagenet classification rather than DLA can achieve 80imgs/s for this ResNet-50.. basically more generic model for a larger class than specialised/tuned for specific testcase.. But isnt the underlying purpose of DNN itself is rather specific than of broader notion? Like tuning of parameters by nature is a subject of its input data isnt?

    • @jacoblin0820
      @jacoblin0820 ปีที่แล้ว

      @@mabdelfattah88 Looking forward to the new course!

  • @shaikon5617
    @shaikon5617 2 ปีที่แล้ว

    Great presentation. Thanks a lot for sharing. Is the Intel project publicly available ?

  • @shashwatkhandelwal367
    @shashwatkhandelwal367 2 ปีที่แล้ว

    Loved the talk!👏 Some very cool ideas!