FPGAs are (not) Good at Deep Learning [Invited]

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 ต.ค. 2024

ความคิดเห็น • 17

  • @eafindme
    @eafindme 6 หลายเดือนก่อน +5

    Imagine that you have 3 binary files, each represent a FPGA binary for different DNN models. Then you have an FPGA. Instead of making hardware architecture universal that could support all 3 DNN models, like GPU or ASIC, you could just optimize each DNN model for FPGA via software codesign, and reprogram the FPGA on the fly, such that 3 of the DNN models has distinctive hardware optimization. Now the FPGA has the same flex as ASIC yet cost way less space and money. This is where the fun begins.

  • @prat1024
    @prat1024 ปีที่แล้ว +6

    The presentation was extraordinary!! I am a student at the university of Stuttgart as well and this post randomly came across my feed.

  • @shaikon5617
    @shaikon5617 2 ปีที่แล้ว +3

    Great presentation. Thanks a lot for sharing. Is the Intel project publicly available ?

  • @MrTweetyhack
    @MrTweetyhack ปีที่แล้ว +9

    "If you can build it in ASIC, it won't be competitive on an FPGA" So what can't be built in ASIC? Actually, this has been know for a long long time

    • @gm7361
      @gm7361 ปีที่แล้ว +4

      it means if you have the resources and the budget.

    • @vicktorioalhakim3666
      @vicktorioalhakim3666 8 หลายเดือนก่อน +2

      The problem is that ML engineering is a dynamic discipline: models change all the time, and are updated. So, if one wants to map their model in an efficient way to hardware wrt power usage, resource usage, throughput, latency, etc, then the hardware must also be flexible and dynamic. If you design an ASIC-based accelerator, you kinda have to make it as general as possible to support various changes to topology and parameters of the model. Because the architecture of this accelerator is fixed, this means that often you will have underutilization (resource waste, higher power usage, etc..) or overutilization (lower throughput, higher latency, etc). Now, if you have to tape out many ASICs for different types of models, then this will become costly quite quickly, and quite frankly a waste since newer models will come up, quickly deprecating the design. This is where the power of FPGAs can come in handy: here you have the power to customize your HW arch on the fly, such that it suits the given model best. The biggest difficulty is coming up with a good HW "compiler", so that you minimize the amount of manual labor involved in mapping a model to the HW, including the pre and post-processing stages.

  • @enkidughom2508
    @enkidughom2508 6 หลายเดือนก่อน

    Excellent!! Is there a technical report following this? Would l9ve to dive into the details and try to reproduce some results

  • @shashwatkhandelwal367
    @shashwatkhandelwal367 2 ปีที่แล้ว +3

    Loved the talk!👏
    Some very cool ideas!

  • @harishabibullah1286
    @harishabibullah1286 2 ปีที่แล้ว +3

    Thanks for the talk Mr. Abdelfattah. Is there any course / training to learn these stages of custom h/w kernel development for deep learning ?
    I am also in a similar field, and my approach is simply to import the hardware from the Synthesizing tool, like Vitis HLS. I am intrested in defining or tweeking some paramteres to make a more customized hardware.

    • @mabdelfattah88
      @mabdelfattah88 ปีที่แล้ว +6

      My course on ML HW & SYS (www.youtube.com/@mabdelfattah88) could help give you an overview but we don't really go deep into the hardware design part of it. I am preparing a new FPGA-focused course now which should cover the detailed design of HW accelerators - I hope to also post parts of it online. Stay tuned!

    • @vatsan2483
      @vatsan2483 ปีที่แล้ว

      @@mabdelfattah88 Looking forward to this course.. but based on the above presentation a quick question sir.. on the topic of co-design for DNN, you had suggested that FPGA-X can achieve 100imgs/s for imagenet classification rather than DLA can achieve 80imgs/s for this ResNet-50.. basically more generic model for a larger class than specialised/tuned for specific testcase.. But isnt the underlying purpose of DNN itself is rather specific than of broader notion? Like tuning of parameters by nature is a subject of its input data isnt?

    • @jacoblin0820
      @jacoblin0820 ปีที่แล้ว

      @@mabdelfattah88 Looking forward to the new course!

  • @aqf0786
    @aqf0786 7 หลายเดือนก่อน +1

    If you knew the fundamental difference in area, speed and power of an FPGA vs ASIC, why not just focus on the key architectural improvements and make an ASIC? Surely, Intel would be able to do so?

  • @rulekop
    @rulekop ปีที่แล้ว

    Very interesting and clearly presented!

  • @chriswysocki8816
    @chriswysocki8816 3 หลายเดือนก่อน

    did I hear that right, mr. presenter? you did this project while working at Intel? And you were not using Intel/Altera FPGAs but Xilinx. Why???? As a former Altera/Intel manager in the FPGA group I feel disappointed :)

  • @BharatIndiaHindustan628
    @BharatIndiaHindustan628 8 หลายเดือนก่อน

    Hi Mohamed, I'm a beginner at AI and deep learning. And I have just started to learn these things. In order to build some deep learning hardware applications/IPs for practice and hands on purpose. I'm really fascinated with the things that AI can do in field of health monitoring and medical diagnostics.
    I'll be really grateful and happy if you can provide me your mail id. I would like to keep in touch with you for guidance and mentorship. Thanks