Scalable Parallel Computing Lab, SPCL @ ETH Zurich

214
137 367

[SPCL_Bcast #48] Improving Cloud Security with Hardware Memory Capabilities

1:00:32

[SPCL_Bcast #49] Programming Groq LPUs without IEEE Floating Point

59:28

[SPCL_Bcast #50] Hardware-aware Algorithms for Language Modeling

1:05:15

How to find Relevant Items using Approximate Nearest Neighbor Search

22:08

Exascale Cloud Computing - A Foggy Tale of Networks, AI, Containers, and Ultra Ethernet

32:44

Swing: Short-cutting Rings for Higher Bandwidth Allreduce

25:52

How to Leverage Dataflow To Find and Squash Program Optimization Bugs

Paper Title: FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs
Conference: The International Conference for High Performance
Computing, Networking, Storage, and Analysis 2023 - #Supercomputing '23 (#SC23), Denver Colorado, USA
Speaker: Philipp Schaad
Authors: Philipp Schaad, Timo Schneider, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Alexandru Calotoiu, Torsten Hoefler
Abstract:
The current hardware landscape and application scale is driving performance engineers towards writing bespoke optimizations. Verifying such optimizations, and generating minimal failing cases, is important for robustness in the face of changing program conditions, such as inputs and sizes. However, isolation of minimal test-cases from existing applications and generating new configurations are often difficult due to side effects on the system state, mostly related to dataflow. This paper introduces FuzzyFlow: a fault localization and test case extraction framework designed to testprogram optimizations. We leverage dataflow program representations to capture a fully reproducible system state and area-of-effect for optimizations to enable fast checking for semantic equivalence. To reduce testing time, we design an algorithm for minimizing test inputs, trading off memory for recomputation. We demonstrate FuzzyFlow on example use cases in real-world applications where the approach provides up to 528 times faster optimization testing and debugging compared to traditional approaches.
Learn more:
Conference paper: dl.acm.org/doi/abs/10.1145/3581784.3613214
FuzzyFlow GitHub Repository: github.com/phschaad/fuzzyflow
Timestamps:
00:00 Introduction
02:20 Formalizing Optimizations
05:00 Automated Optimization Testing
08:40 Speeding Up Differential Fuzzing
09:59 Program Cutouts
14:05 Side Effect Analysis Using Parametric Dataflow
18:57 Reducing Input Configurations
22:46 Constraining Inputs
24:43 Coverage Guided Fuzzing
25:44 FuzzyFlow
27:11 Evaluation
30:32 Conclusion
#Fuzzing #Testing #Correctness #HPC #Optimization

มุมมอง: 86

วีดีโอ

[SPCL_Bcast #48] Improving Cloud Security with Hardware Memory Capabilities

1:00:32

[SPCL_Bcast #48] Improving Cloud Security with Hardware Memory Capabilities

มุมมอง 5714 วันที่ผ่านมา

Speaker: Peter Pietzuch Venue: SPCL_Bcast #48, recorded on 18th April, 2024 Abstract: More and more data-intensive applications, e.g., micro-service architectures and machine learning workloads, move from on-premise deployments to the cloud. Traditional cloud security mechanisms focus on strict isolation, but applications also require the efficient yet secure sharing of data between components ...

[SPCL_Bcast #49] Programming Groq LPUs without IEEE Floating Point

59:28

[SPCL_Bcast #49] Programming Groq LPUs without IEEE Floating Point

มุมมอง 10321 วันที่ผ่านมา

Speaker: Oskar Mencer Venue: SPCL_Bcast #49, recorded on 2nd May, 2024 Abstract: The IEEE standard has been a great advance in the early days of software. In these early days, the speed of software development was imperative. The Intel x86 instruction set became a standard as well as IEEE Floating point. Today, we have the first commodity computing application, the LLM, and others are rapidly f...

[SPCL_Bcast #50] Hardware-aware Algorithms for Language Modeling

1:05:15

[SPCL_Bcast #50] Hardware-aware Algorithms for Language Modeling

มุมมอง 226หลายเดือนก่อน

Speaker: Tri Dao Venue: SPCL_Bcast #50, recorded on 17th October, 2024 Abstract: Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. In the first half, we describe recent advances in FlashAttention, including optimizations for Hopper GPUs. exploiting asynchrony of the Tensor Cores and TMA to (1) over...

How to find Relevant Items using Approximate Nearest Neighbor Search

22:08

How to find Relevant Items using Approximate Nearest Neighbor Search

มุมมอง 162หลายเดือนก่อน

We motivate the problem of nearest neighbor search, and we discuss exact and approximate algorithms to solve this problem. Timestamps: 00:00 Introduction 00:14 Motivation 03:01 KD-Tree 08:06 HNSW 13:05 IVF-PQ 20:10 Comparison 21:09 Conclusion

Exascale Cloud Computing - A Foggy Tale of Networks, AI, Containers, and Ultra Ethernet

32:44

Exascale Cloud Computing - A Foggy Tale of Networks, AI, Containers, and Ultra Ethernet

มุมมอง 383หลายเดือนก่อน

Torsten Hoefler's talk presented at the Salishan 2024 meeting featuring Acceleration as a Service (XaaS), Datacenter and HPC network convergence, performance studies of networking across many datacenter providers, network noise analyses, latency sensitivity, and Ultra Ethernet news.

Swing: Short-cutting Rings for Higher Bandwidth Allreduce

25:52

Swing: Short-cutting Rings for Higher Bandwidth Allreduce

มุมมอง 1585 หลายเดือนก่อน

Paper Title: Swing: Short-cutting Rings for Higher Bandwidth Allreduce Conference: NSDI 2024 Speaker: Daniele De Sensi Authors: Daniele De Sensi, Tommaso Bonato, David Saam, Torsten Hoefler Abstract: The allreduce collective operation accounts for a significant fraction of the runtime of workloads running on distributed systems. One factor determining its performance is the distance between com...

14:33

Neural Graph Databases

มุมมอง 1355 หลายเดือนก่อน

Paper Title: Neural Graph Databases Conference: First Learning on Graphs Conference (LoG'22) Speaker: Maciej Besta Authors: Maciej Besta, Patrick Iff, Florian Scheidl, Kazuki Osawa, Nikoli Dryden, Michal Podstawski, Tiancheng Chen, Torsten Hoefler Abstract: Graph databases (GDBs) enable processing and analysis of unstructured, complex, rich, and usually vast graph datasets. Despite the large si...

HOT - Higher-Order Dynamic Graph Representation Learning with Efficient Transformers

20:16

HOT - Higher-Order Dynamic Graph Representation Learning with Efficient Transformers

มุมมอง 1285 หลายเดือนก่อน

Paper Title: HOT - Higher-Order Dynamic Graph Representation Learning with Efficient Transformers Conference: Second Learning on Graphs Conference (LoG'23) Speaker: Maciej Besta Authors: Maciej Besta, Afonso Claudino Catarino, Lukas Gianinazzi, Nils Blach, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler Abstract: Many graph representation learning (GRL) problems are dynamic, with millions of...

LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems

14:02

LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems

มุมมอง 1035 หลายเดือนก่อน

Paper Title: LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems through Polling-Free and Retry-Free Operation Conference: Design, Automation and Test in Europe Conference (DATE 2024) Speaker: Samuel Riedel Authors: Samuel Riedel, Marc Gantenbein, Alessandro Ottaviano, Torsten Hoefler, Luca Benini Abstract: Extensive polling in shared-memory manycore systems can lead t...

Compressing Multidimensional Weather and Climate Data Into Neural Networks

15:50

Compressing Multidimensional Weather and Climate Data Into Neural Networks

มุมมอง 1436 หลายเดือนก่อน

Title: Compressing multidimensional weather and climate data into neural networks Speaker: Langwen Huang Author: Langwen Huang, Torsten Hoefler Abstract: Weather and climate simulations produce petabytes of high-resolution data that are later analyzed by researchers in order to understand climate change or severe weather. We propose a new method of compressing this multidimensional weather and ...

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

25:05

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

มุมมอง 6126 หลายเดือนก่อน

Paper Title: VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores Venue: International Conference for High Performance Computing, Networking, Storage, and Analysis (#SC23) Speaker: Roberto L. Castro Authors: Roberto L. Castro, Andrei Ivanov, Diego Andrade, Tal Ben-Nun, Basilio B. Fraguela, Torsten Hoefler Abstract: The increasing success and scaling of Deep Learning mo...

Motif Prediction with Graph Neural Networks

22:41

Motif Prediction with Graph Neural Networks

มุมมอง 3417 หลายเดือนก่อน

Paper Title: Motif Prediction with Graph Neural Networks Conference: 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'22) Speaker: Maciej Besta Authors: Maciej Besta, Raphael Grob, Cesare Miglioli, Nicola Bernold, Grzegorz Kwaśniewski, Gabriel Gjini, Raghavendra Kanakagiri, Saleh Ashkboos, Lukas Gianinazzi, Nikoli Dryden, Torsten Hoefler Abstract: Link prediction is one of...

Demystifying Chains, Trees, and Graphs of Thoughts

6:18

Demystifying Chains, Trees, and Graphs of Thoughts

มุมมอง 2857 หลายเดือนก่อน

Paper Title: Demystifying Chains, Trees, and Graphs of Thoughts Speaker: Maciej Besta Authors: Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Guangyuan Piao, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Aidan O'Mahony, Onur Mutlu, Torsten Hoefler Abstract: The field of natural language process...

[SPCL_Bcast #47] The digital revolution of Earth system modelling

1:12:22

[SPCL_Bcast #47] The digital revolution of Earth system modelling

มุมมอง 1757 หลายเดือนก่อน

Speaker: Peter Dueben Venue: SPCL_Bcast #47, recorded on 4th April, 2024 Abstract: This talk outlines three revolutions that happened in Earth system modelling in the past decades. The quiet revolution has leveraged better observations and more compute power to allow for constant improvements of prediction quality of the last decades, the digital revolution has enabled us to perform km-scale si...

[SPCL_Bcast #46] Capturing Computation with Algorithmic Alignment

1:01:01

[SPCL_Bcast #46] Capturing Computation with Algorithmic Alignment

มุมมอง 1937 หลายเดือนก่อน

[SPCL_Bcast #46] Capturing Computation with Algorithmic Alignment

Co-design Hardware and Algorithm for Vector Search

24:58

Co-design Hardware and Algorithm for Vector Search

มุมมอง 2877 หลายเดือนก่อน

Co-design Hardware and Algorithm for Vector Search

7:10

Demystifying Graph Databases

มุมมอง 1398 หลายเดือนก่อน

Demystifying Graph Databases

21:28

Fortran is dead - Long live Fortran!

มุมมอง 1.8K8 หลายเดือนก่อน

Fortran is dead - Long live Fortran!

Hot Interconnects - EtherNET: the present and future of datacenter and supercomputers

25:03

Hot Interconnects - EtherNET: the present and future of datacenter and supercomputers

มุมมอง 3599 หลายเดือนก่อน

Hot Interconnects - EtherNET: the present and future of datacenter and supercomputers

[SPCL_Bcast] Can I Cook a 5 o'clock Compiler Cake and Eat It at 2?

1:02:20

[SPCL_Bcast] Can I Cook a 5 o'clock Compiler Cake and Eat It at 2?

มุมมอง 22210 หลายเดือนก่อน

[SPCL_Bcast] Can I Cook a 5 o'clock Compiler Cake and Eat It at 2?

44:49

AI-Driven Performance Metaprogramming

มุมมอง 54710 หลายเดือนก่อน

AI-Driven Performance Metaprogramming

HammingMesh: A Network Topology for Large-Scale Deep Learning

33:59

HammingMesh: A Network Topology for Large-Scale Deep Learning

มุมมอง 70210 หลายเดือนก่อน

HammingMesh: A Network Topology for Large-Scale Deep Learning

GDI: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores

24:42

GDI: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores

มุมมอง 122ปีที่แล้ว

GDI: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores

[SPCL_Bcast] Scalable Graph Machine Learning

59:51

[SPCL_Bcast] Scalable Graph Machine Learning

มุมมอง 173ปีที่แล้ว

[SPCL_Bcast] Scalable Graph Machine Learning

[SPCL_Bcast] Heterogeneous multi-core systems for efficient EdgeML

1:12:46

[SPCL_Bcast] Heterogeneous multi-core systems for efficient EdgeML

มุมมอง 324ปีที่แล้ว

[SPCL_Bcast] Heterogeneous multi-core systems for efficient EdgeML

[SPCL_Bcast] Evaluating Large-Scale Learning Systems

58:43

[SPCL_Bcast] Evaluating Large-Scale Learning Systems

มุมมอง 249ปีที่แล้ว

[SPCL_Bcast] Evaluating Large-Scale Learning Systems

ML for High-Performance Climate: Data Post Processing, Compression, and Earth Virtualization Engines

1:08:09

ML for High-Performance Climate: Data Post Processing, Compression, and Earth Virtualization Engines

มุมมอง 522ปีที่แล้ว

ML for High-Performance Climate: Data Post Processing, Compression, and Earth Virtualization Engines

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement

14:05

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement

มุมมอง 453ปีที่แล้ว

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement

How to Adjust Network-on-Chip Topologies to Design Goals and Architectures

13:13

How to Adjust Network-on-Chip Topologies to Design Goals and Architectures

มุมมอง 1Kปีที่แล้ว

How to Adjust Network-on-Chip Topologies to Design Goals and Architectures

ความคิดเห็น

@joeyng9754 4 วันที่ผ่านมา
Thank you very much.
@wenguang6597 10 วันที่ผ่านมา
at 15:34, the speaker said that Intel has fixed floating-points. but i cant find any information about that? Principally there is either fixed-point or floating-point. What is fixed floating point? Can someone help me?
@patrickkearney1577 3 หลายเดือนก่อน
First two not verbatim quotes, real programmers can write FORTRAN code in any language and computer scientists solve yesterday's problems with tomorrow's hardware. i am as old as FORTRAN and have coded in FORTRAN, C, APL ALGOL, Forth, LISP, Basic, Pascal, Mathematica, MATLAB, various scripting languages ,... even run scientific computation on laser printers overnight using POSTSCRIPT. Also I have written real time operating systems in C and assembly language. I firmly believe that the design philosophy of modern computer languages became decoupled from consideration of current and future hardware capabilities and the more general resources available prior and during execution of a program. Also operating system support for client processes is mostly very poor. For example, parallel code execution was not possible on early computers. Modern vector processors, FPGAs or multi core CPUs can handle concurrent parallel computation but efficient formalized code design is sorely lacking.
@mikgigs 3 หลายเดือนก่อน
everything is nice, super-duper, but show a software example that uses AIE for..AI, a code that starts from c/c++ level....no FIR filter, no RGB conversion, but really something related to AI...show at least something!
@jameschums 4 หลายเดือนก่อน
thank you for introducing some concepts i had not really considered before. will future compiler mix languages and optimise codes to use different hardware for operations. great talk,thank you, now i am thinking AI tools could optimise fortran, c/cuda, python... ?
@pichulinojitoojete7387 2 หลายเดือนก่อน
dicho eso, porque no hacer un codigo de inteligencia artificial en fortran, buen reto para pasar el rato
@abhinavghosh725 5 หลายเดือนก่อน
is this planned to be released in a general purpose release/integration with current kafka versions. Is this usable for production use-cases or is it still under some testing?
@maryamsamami6974 6 หลายเดือนก่อน
Dear Mr. Hoefler! Thanks for offering the useful video. May I ask if you could please share the slide of the video with me?
@ChrisPollitt 6 หลายเดือนก่อน
TIOBE Index for May 2024: Fortran in the top 10
@Machineman2500 7 หลายเดือนก่อน
Fortran is still used today, particularly in scientific, engineering, and high-performance computing applications where numerical computation and performance are critical. While newer languages like Python and Julia have gained popularity for general-purpose programming and rapid prototyping, Fortran remains widely used in fields such as computational physics, climate modeling, computational chemistry, and finite element analysis
@FindecanorNotGmail 7 หลายเดือนก่อน
Correction: 33.45. When he says, "4 ecks speed up", he means "four _times_ "
@simonpeter9617 7 หลายเดือนก่อน
good work
@ΜιχαήλΣάπκας 8 หลายเดือนก่อน
i really doubt you can run yolo on versal :P
@kamertonaudiophileplayer847 8 หลายเดือนก่อน
My friend claimed that he can program in Fortran everything. How it is true! I also converted many original weather model calculations from Algol to Fortran. They work great.
@superkaran20 9 หลายเดือนก่อน
Thank you so much, it was very helpful.
@vinayakkesharwani7769 9 หลายเดือนก่อน
great explaination, this video saved my hours of spending in understanding it from docs.
@backToFreedom 9 หลายเดือนก่อน
the sound is terrible! Fix it if you want to listened
@HarishNarayanan 9 หลายเดือนก่อน
Thank you very much for this talk, and especially for providing context for where it sits in the field.
@bhamadicharef ปีที่แล้ว
Excellent presentation ... the AI Engine (AIE) looks great !
@mar-xpro ปีที่แล้ว
Nice talk! Super interesting to see this DL-HPC double perspective. Btw the email address of the website mentioned at 57:55 for hiring seems unreacheable. The emails sent are bounced back after 2-3 days without being delivered.
@sanaulislam2354 ปีที่แล้ว
Which software did u used for designing
@spcl ปีที่แล้ว
All results are obtained using our custom NoC cost and performance prediction toolchain (see spclgitlab.ethz.ch/iffp1/sparse_hamming_graph_public ) - Does this answer your question?
@infinite-saaswath ปีที่แล้ว
Great stuff!
@Reskareth ปีที่แล้ว
But what happens when one node has multiple connected nodes which have a lower ID. Then one node would need to point to multiple other nodes. What am I missing?
@serpantleo8490 ปีที่แล้ว
Very good paper, love from 🇨🇳
@kipropcollins4220 ปีที่แล้ว
wasn't there an interesting way to deliver this? i mean, seriously?
@bobl557 ปีที่แล้ว
Most of the paths in a large system are 4 hops long, not three. In the 545 group example he uses, there is only one link between each global group. So, the first two hops get you to the correct global bus. The next two hops get you to the terminal switch. The largest system that would have a three hop maximum is 9,216 comprising 18 groups.
@danieledesensi5532 ปีที่แล้ว
Hops are counted as switch-to-switch hops. Switches within a group are fully connected, thus you need in the worst case one hop to reach another switch in the source group, one hop to reach the destination group, and one hop in the destination group to reach the destination switch.
@congchuatocmay4837 ปีที่แล้ว
Well, yeh, you are really not interested DDL.
@darrenjefferson6492 ปีที่แล้ว
Nice one pal 😀!! Get rapid results > 'promosm' .
@howwway4999 2 ปีที่แล้ว
That's really cool, hope I can get an opportunity for the possible PhD postion in your lab😃
@mprone 2 ปีที่แล้ว
Is there any open PhD position at ETH on these topics ?
@spcl 2 ปีที่แล้ว
Yes, see spcl.inf.ethz.ch/Jobs/
@elliot2456 2 ปีที่แล้ว
@@spcl is that a PhD position or a job for phd students ? why does it say "Contracts will be 12-month renewable, with an initial probatory period" ? I thought PhD were supposed to last at least 3 years.
@Qmpi 2 ปีที่แล้ว
and where am I?
@wassimmuna 2 ปีที่แล้ว
Society 5.0 ... Is that where we finally get a for-loop to iterate through the entire population to serve every inhabitant's needs and desires, instead of passing policies in a top-down approach and wondering why there are still dissatisfied people left behind somehow... or is this going to be another instance of promising technology that only entrenches preexisting distributions of security, opportunities, comforts and luxuries. And for the record, karma is illegal vigilantism. Most promising technology starts with idealistic intentions and ends up being misused to dish out varying degrees of harm. Pardon me if I don't understand why my for-loop hasn't already been implemented on a 486. Maybe that'll be Society 6.0. But obviously, great work by the researchers. Let's just hope the decision-makers live up to the same standard of effort and quality of intent.
@shikharjain3536 2 ปีที่แล้ว
What is the difference between a program dependence graph[by Ferrante & ottenstein] and contextual flow graph?
@prithvivelicheti287 2 ปีที่แล้ว
Insightful
@zeyuli3258 2 ปีที่แล้ว
Could you please upload your source code again?It seems to be 404 now:(
@spcl 2 ปีที่แล้ว
The code has been released few minutes after your message. Please, check again. Thanks!
@vedanshverma6854 2 ปีที่แล้ว
The best tutorial to get idea of how cool actual programming is for hpc using hls fpga
@kowsalyas5259 2 ปีที่แล้ว
Y f
@alle9ro 2 ปีที่แล้ว
she is amazing
@wolfgangmitterbaur3942 2 ปีที่แล้ว
Good day Mr. Hoefler, a very good and extensive overview of this huge topic. Thanks a lot.
@qwmp 2 ปีที่แล้ว
This is just truly a great gem!
@sanjeewaweerage9407 2 ปีที่แล้ว
can I have this ppt?
@paulthompson9668 2 ปีที่แล้ว
This is very informative content, but you need to slow down because you end up mispronouncing words at times.
@oscarsandoval9870 2 ปีที่แล้ว
Excellent review of the state of the art, well explained and concise, thank you Torsten!
@hitmanonstadia1784 2 ปีที่แล้ว
Nice slides! However the speaker speaks too fast like a rapper, leaves me with painful headaches after the talk. : ((((
@byliu5200 2 ปีที่แล้ว
Very helpful! Thank you!
@alexxx4434 3 ปีที่แล้ว
Very nice presentation
@SandipJadhavcctech 3 ปีที่แล้ว
Thanks a ton. Very helpful 👌
@ayushchaturvedi5203 3 ปีที่แล้ว
where can i find the slides of this presentation
@hossamfadeel 3 ปีที่แล้ว
Thanks for your efforts.
@shihlien 3 ปีที่แล้ว
GPT-2 model memory will saturate one of the WSC SRAM, right?
@spcl 3 ปีที่แล้ว
At 1:50 Prof. Hoefler says we will not use linearizability in the lecture. To clarify: We do not use linearizability in this lecture, but we will introduce linearizability in a later lecture.

Scalable Parallel Computing Lab, SPCL @ ETH Zurich

ความคิดเห็น