- 128
- 70 578
One world theoretical machine learning
เข้าร่วมเมื่อ 3 มิ.ย. 2020
This is the youtube channel of the one world seminar series on the mathematics of machine learning. Past seminars will be made available here.
Molei Tao - Optimization, Sampling, and Generative Modeling in Non-Euclidean Spaces
Machine learning in non-Euclidean spaces have been rapidly attracting attention in recent years, and this talk will give some examples of progress on its mathematical and algorithmic foundations. A sequence of developments that eventually leads to non-Euclidean generative modeling will be reported.
More precisely, I will begin with variational optimization, which, together with delicate interplays between continuous- and discrete-time dynamics, enables the construction of momentum-accelerated algorithms that optimize functions defined on manifolds. Selected applications, namely a generic improvement of Transformer, and a low-dim. approximation of high-dim. optimal transport distance, will be described. Then I will turn the optimization dynamics into an algorithm that samples from probability distributions on Lie groups. This sampler provably converges, even without log-concavity condition or its common relaxations. Finally, I will describe how this sampler can lead to a structurally-pleasant diffusion generative model that allows users to, given training data that follow any latent statistical distribution on a Lie group, generate more data exactly on the same manifold that follow the same distribution. If time permits, applications such as molecule design and generative innovation of quantum processes will be briefly discussed.
More precisely, I will begin with variational optimization, which, together with delicate interplays between continuous- and discrete-time dynamics, enables the construction of momentum-accelerated algorithms that optimize functions defined on manifolds. Selected applications, namely a generic improvement of Transformer, and a low-dim. approximation of high-dim. optimal transport distance, will be described. Then I will turn the optimization dynamics into an algorithm that samples from probability distributions on Lie groups. This sampler provably converges, even without log-concavity condition or its common relaxations. Finally, I will describe how this sampler can lead to a structurally-pleasant diffusion generative model that allows users to, given training data that follow any latent statistical distribution on a Lie group, generate more data exactly on the same manifold that follow the same distribution. If time permits, applications such as molecule design and generative innovation of quantum processes will be briefly discussed.
มุมมอง: 302
วีดีโอ
Difan Zou - Implicit Bias of Learning Separable Data
มุมมอง 165หลายเดือนก่อน
Full Title: Implicit Bias of Learning Separable Data: the Role of Optimization Algorithm and Model Architecture Abstract: This presentation seeks to study the implicit biases inherent in learning separable data by investigate the roles of optimization algorithms and model architectures. First, I will discuss the implicit bias of batch normalization when trained by gradient descent in linear mod...
Yasamin Jalalian: Data-Efficient Kernel Methods for PDE Discovery
มุมมอง 272หลายเดือนก่อน
Title: Data-Efficient Kernel Methods for PDE Discovery Abstract: For many problems in computational science and engineering, observational data exist for which the underlying physical models are not known. PDE discovery methods provide systematic ways to infer these physical models directly from data. We introduce a framework for identifying and solving PDEs using kernel methods. In particular,...
Hongru Yang - Understanding Transformers via Gradient Flow Dynamics
มุมมอง 3012 หลายเดือนก่อน
Abstract: Understanding the training dynamics of transformers is important to explain the impressive capabilities behind large language models. In this talk, I will present our recent work on the training dynamics of a shallow transformer on a task of recognizing co-occurrence of two designated words. In the literature of studying training dynamics of transformers, several simplifications are c...
Elena Celledoni - Shape analysis, structure preservation and deep learning
มุมมอง 1062 หลายเดือนก่อน
Abstract: Shape analysis is a framework for treating complex data and obtain metrics on spaces of data. Examples are spaces of unparametrized curves, time-signals, surfaces and images. In this talk we discuss structure preservation and deep learning for classifying, analysing and manipulating shapes. A computationally demanding task for estimating distances between shapes, e.g. in object recogn...
Sho Sonoda - Deep Ridgelet Transform: Harmonic Analysis for Deep Neural Network
มุมมอง 1682 หลายเดือนก่อน
Abstract: The ridgelet transform has been developed to study neural network parameters, and it can describe the distribution of parameters. Mathematically, it is defined as a pseudo-inverse operator of neural networks. Namely, given a function $f$, and network $NN[\gamma]$ with parameter $\gamma$, the ridgelet transform $R[f]$ for the network $NN$ satisfies the reconstruction formula $NN[R[f]]=...
Daniel Lengyel: An Uphill Battle: Exploring Data Optimality Conditions in Gradient Estimation
มุมมอง 483 หลายเดือนก่อน
Date: 30 October 2024 Speaker: Daniel Lengyel Title: An Uphill Battle: Exploring Data Optimality Conditions in Gradient Estimation Abstract: In this talk, Daniel will present part of his PhD work on gradient estimation methods, specifically characterizing the optimal location of function evaluations-referred to as the sample set-for noisy black-box functions. This work is motivated by the obser...
Zhong Li - On the Generalization Properties of Diffusion Models
มุมมอง 1983 หลายเดือนก่อน
Abstract: Diffusion models are a class of generative models that serve to establish a stochastic transport map between an empirically observed, yet unknown, target distribution and a known prior. Despite their remarkable success in real-world applications, a theoretical understanding of their generalization capabilities remains underdeveloped. This work embarks on a comprehensive theoretical ex...
Yaoyu Zhang - Optimistic Sample Size Estimate for Deep Neural Networks
มุมมอง 1853 หลายเดือนก่อน
Abstract: Estimating the sample size required for a deep neural network (DNN) to accurately fit a target function is a crucial issue in deep learning. In this talk, we introduce a novel sample size estimation method inspired by the phenomenon of condensation, which we term the "optimistic estimate." This method quantitatively characterizes the best possible performance achievable by neural netw...
Jan Gerken - Emergent Equivariance in Deep Ensembles
มุมมอง 1956 หลายเดือนก่อน
Abstract: We demonstrate that a generic deep ensemble is emergently equivariant under data augmentation in the large width limit. Specifically, the ensemble is equivariant at any training step, provided that data augmentation is used. Crucially, this equivariance also holds off-manifold and therefore goes beyond the intuition that data augmentation leads to approximately equivariant predictions...
Hao Ni - Path development network for sequential data analysis
มุมมอง 1216 หลายเดือนก่อน
Abstract: The path signature, a mathematically principled and universal feature of sequential data, leads to a performance boost of deep learning-based models in various sequential data tasks as a complimentary feature. However, it suffers from the curse of dimensionality when the path dimension is high. To tackle this problem, we propose a novel, trainable path development layer, which exploit...
Simon Du - How Over-Parameterization Slows Down Gradient Descent
มุมมอง 4027 หลายเดือนก่อน
Abstract: We investigate how over-parameterization impacts the convergence behaviors of gradient descent through two examples. In the context of learning a single ReLU neuron, we prove that the convergence rate shifts from $exp(−T)$ in the exact-parameterization scenario to an exponentially slower $1/T^3$ rate in the over-parameterized setting. In the canonical matrix sensing problem, specifica...
Micah Goldblum - Bridging the Gap between Deep Learning Theory and Practice
มุมมอง 6907 หลายเดือนก่อน
Abstract: Despite the widespread proliferation of neural networks, the mechanisms through which they operate so successfully are not well understood. In this talk, we will first explore empirical and theoretical investigations into neural network training and generalization and what they can tell us about why deep learning works. Then, we will examine a recent line of work on algorithm learning...
Lei Wu - Understanding the implicit bias of SGD: A dynamical stability perspective
มุมมอง 3867 หลายเดือนก่อน
Abstract: In deep learning, models are often over-parameterized, which leads to concerns about algorithms picking solutions that generalize poorly. Fortunately, stochastic gradient descent (SGD) always converges to solutions that generalize well even without needing any explicit regularization, suggesting certain “implicit regularization” at work. This talk will provide an explanation of this s...
Bobak Kiani: On the hardness of learning under symmetries
มุมมอง 2298 หลายเดือนก่อน
Speaker: Bobak Kiani Date: 29 May 2024 Title: On the hardness of learning under symmetries Abstract: We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separ...
Hemant Tyagi - Dynamic ranking and translation synchronization
มุมมอง 828 หลายเดือนก่อน
Hemant Tyagi - Dynamic ranking and translation synchronization
Nicolas Boulle - Elliptic PDE learning is provably data-efficient
มุมมอง 2299 หลายเดือนก่อน
Nicolas Boulle - Elliptic PDE learning is provably data-efficient
Yuan Cao - Understanding Deep Learning Through Phenomena Discovery and Explanation
มุมมอง 9129 หลายเดือนก่อน
Yuan Cao - Understanding Deep Learning Through Phenomena Discovery and Explanation
Mufan Li - Infinite-Depth Neural Networks as Depthwise Stochastic Processes
มุมมอง 1.1K9 หลายเดือนก่อน
Mufan Li - Infinite-Depth Neural Networks as Depthwise Stochastic Processes
Aditya Varre - On the spectral bias of two-layer linear networks
มุมมอง 26410 หลายเดือนก่อน
Aditya Varre - On the spectral bias of two-layer linear networks
Shuyang Ling - Neural collapse phenomenon for unconstrained feature model with imbalanced datasets
มุมมอง 22810 หลายเดือนก่อน
Shuyang Ling - Neural collapse phenomenon for unconstrained feature model with imbalanced datasets
Marius Zeinhofer - Error Analysis and Optimization Methods for Scientific Machine Learning
มุมมอง 32711 หลายเดือนก่อน
Marius Zeinhofer - Error Analysis and Optimization Methods for Scientific Machine Learning
Keaton Hamm - Manifold Learning in Wasserstein Space
มุมมอง 29611 หลายเดือนก่อน
Keaton Hamm - Manifold Learning in Wasserstein Space
Tan Nguyen - Transformers Meet Image Denoising: Mitigating Over-smoothing in Transformers
มุมมอง 35811 หลายเดือนก่อน
Tan Nguyen - Transformers Meet Image Denoising: Mitigating Over-smoothing in Transformers
Lisa Kreusser - Unveiling the role of the Wasserstein Distance in Generative Modelling
มุมมอง 37011 หลายเดือนก่อน
Lisa Kreusser - Unveiling the role of the Wasserstein Distance in Generative Modelling
Ting Lin - Universal Approximation and Expressive Power of Deep Neural Networks
มุมมอง 287ปีที่แล้ว
Ting Lin - Universal Approximation and Expressive Power of Deep Neural Networks
Theo Bourdais - Computational Hypergraph Discovery, a Gaussian Process framework
มุมมอง 142ปีที่แล้ว
Theo Bourdais - Computational Hypergraph Discovery, a Gaussian Process framework
Sebastian Goldt - Gaussian world is not enough: Analysing neural nets beyond Gaussian models of data
มุมมอง 270ปีที่แล้ว
Sebastian Goldt - Gaussian world is not enough: Analysing neural nets beyond Gaussian models of data
Tan Nguyen - Transformers Meet Image Denoising: Mitigating Over-smoothing in Transformers
มุมมอง 177ปีที่แล้ว
Tan Nguyen - Transformers Meet Image Denoising: Mitigating Over-smoothing in Transformers
Jakwang Kim - Understanding adversarial robustness via optimal transport perspective
มุมมอง 209ปีที่แล้ว
Jakwang Kim - Understanding adversarial robustness via optimal transport perspective
Why the Dec.4th Molei Tao talk not updated to this channel? looking for it
Great work! Quite insterested in...
Very nice!
good job keep it up
very interesting talk! Seems like a surprisingly easy shaping method you found. That should be huge! This is perhaps only indirectly related, but: I wonder if we could somehow go from a "discrete" layer-for-layer evaluation to one that's akin to various monte-carlo techniques which just sample from the continuum solution. Not sure how you'd take care of or replace the discrete parameters of an NN in such a setting, but I'm kinda picturing the difference between "radiosity" and "path tracing" when it comes to rendering. In Path tracing, if done correctly, you can kinda directly and unbiasedly approximate the continuum limit of the distribution of light in a scene, and it's all built on stochastic processes. You can even take care of "infinitely deep paths" correctly by stochastically cancelling paths at a *finite* depth through a Russian Roulette procedure, and you can combine many sampling procedures optimally through multiple importance sampling. More recently, that's even possible for a *continuum* of sampling methods in the form of *stochastic* importance sampling. I'd imagine something similar could be used for *actually* training evaluating *"infinite"* (both in width and depth) NNs by simply evaluating them to some finite but task-dependent depth. The main question to me is how to even set or store weights in such a setting in a finite amount of memory. I'm guessing you'd somehow have weights be defined through like a Gaussian mixture process or the like, but it's probably much easier said than done.
great talk thanks! very interested in more about the effect of discretization on SGD
you says two hidden layers in a neural network, right?
Thanks a lot ❤ greetings from Germany
Very interesting talk! Why is this called hypergraph? I see directed graphs in all the examples.
A hypergraph is a generalized directed graph in which edges can connect groups of nodes (creating hyper-edges and hyper-nodes). It would be a bit cumbersome to represent, so I used the convention to separate each hyper-edge into as many classical edges as needed while keeping the same labels on all edges. You should see this convention on slide 4 around the 5-minute mark. Thank you for your interest in this talk!
@@TheoBourdaisCaltech Thank you, I see!
hi Yu . Great video! How are you not interpolating the whole data set 16:52 in the train-train method ?
Operators regression: • class of statistical models that aim to estimate the parameters of an operator that maps inputs to outputs • combines ideas from machine learning, functional analysis, and operator theory.
Very good seminar
Thank you for the talk! It is really helpful!
I believe the standard argument for the preference for flat minima in SGD goes as follows: * Treat SGD as GD + eps N(0,1) * GD is not preconditioned by the inverse hessian, and therefore will (with a given step size) "skip over" or quickly exit regions of too high curvature. * In this way, the "odds" of landing on a low curvature minimum will vanish (particularly with the noise injected by stochasticity, essentially placing your destination point in a small ball around the GD destination point) All of these concepts have, of course, been stated more formally in the various frameworks used to analyze SGD training. (per your discussion ~min 56)
Thank you for the excellent seminar!
Hello guys, thank you soufiane for this presentation and congratulations. I am myself an engineer in Actuarial finance and I have started to study machine learning 3 months ago. I am from Morocco!
very interesting topic
Nice talk!
This is interesting, if you don't mind me asking a question, there's a mention in the talk that even if there's no symmetry on the data manifold one could impose one, I was wondering if that's really feasible realistically speaking for datasets really big? And could that unintentionally introduce any bias in the process?
Fantastic!
THE PRECISE, SIMPLE, TOP DOWN, AND CLEAR DERIVATION, ORIGIN, MEANING, AND CONSTRUCTION OF E=MC2: E=mc2 is taken directly from F=ma. BALANCED inertia/INERTIAL RESISTANCE is fundamental. Carefully consider what is THE SUN AND what is the speed of light (c) ON BALANCE. The stars AND PLANETS are POINTS in the NIGHT SKY. So, consider what is THE BLACK “space” of what is THE EYE. The sky is blue, AND what is THE EARTH is ALSO BLUE !! c squared is CLEARLY understood as a dimension (of what constitutes SPACE) ON BALANCE, as ELECTROMAGNETISM/energy is gravity ON/IN BALANCE !!! GREAT !!! I have CLEARLY explained or proven the mathematical unification of gravity AND ELECTROMAGNETISM/energy (ON/IN BALANCE), as I have demonstrated the fourth dimension AND the term c4 from Einstein's field equations (along with TIME); as ELECTROMAGNETISM/energy is gravity (ON/IN BALANCE). I have proven how and why this is mathematically consistent with F=ma AND E=mc2. TIME dilation ultimately proves on balance that ELECTROMAGNETISM/energy is gravity (ON/IN BALANCE). In light of what has been CLEARLY proven here (on BALANCE), think about what is THE SUN in DIRECT comparison to/with what is outer “space”. GREAT !!! It ALL CLEARLY makes perfect sense, as BALANCE and completeness go hand in hand. I'm going to more precisely explain the true origin (and the full meaning) of the equation E=mc2. This NECESSARILY represents, INVOLVES, AND DESCRIBES what is possible/potential AND actual IN BALANCE, as ELECTROMAGNETISM/energy is (CLEARLY and necessarily) proven to be gravity (ON/IN BALANCE). This also CLEARLY explains, on balance, why and how this equation represents a two dimensional surface OR SPACE as what is a BALANCED AVERAGE. Indeed, gravity AND ELECTROMAGNETISM/energy are (CLEARLY) linked AND BALANCED opposites (on balance); as ELECTROMAGNETISM/energy is gravity (on/in balance) !!! Energy has/involves GRAVITY, AND ENERGY has/involves inertia/INERTIAL RESISTANCE. TIME dilation ULTIMATELY proves ON BALANCE that ELECTROMAGNETISM/energy is gravity. This NECESSARILY represents, INVOLVES, AND DESCRIBES what is possible/potential AND actual IN BALANCE. Indeed, TIME is necessarily possible/potential AND actual IN BALANCE; as ELECTROMAGNETISM/energy is gravity (on/in balance). Gravity/acceleration involves BALANCED inertia/INERTIAL RESISTANCE on balance, as ELECTROMAGNETISM/energy is gravity ON/IN BALANCE. This explains F=ma AND E=mc2. This ALSO explains why the rotation of WHAT IS THE MOON matches it's revolution. Inertia/INERTIAL RESISTANCE is proportional to (or BALANCED with/as) GRAVITATIONAL force/ENERGY, as ELECTROMAGNETISM/energy is gravity (ON/IN BALANCE). This explains F=ma AND E=mc2. I have truly (and fundamentally) revolutionized what is our understanding of physics/physical experience. Compare the setting and orange Sun with the fully illuminated and setting/WHITE Moon DIRECTLY. They do basically appear to give off the same illumination, in fact. Notice that the curvature or shape of said Moon matches that of THE EARTH/ground (given a clear horizon, that is). What is THE SUN AND what is THE MOON are then the SAME SIZE in the sky as what is THE EYE. Moreover, these two forms manifest (or form up) at what is EYE LEVEL/body height. GREAT !!! Now, I have CLEARLY proven why AND how this is so. The tides are (CLEARLY AND NECESSARILY) ELECTROMAGNETIC/gravitational ON BALANCE. Magnificent !!! The tides are CLEARLY and necessarily subject to F=ma AND E=mc2, as ELECTROMAGNETISM/energy is gravity (ON/IN BALANCE). Again, the stars AND PLANETS are POINTS in the night sky; and the rotation of WHAT IS THE MOON matches it's revolution. The sky is blue, and THE EARTH is ALSO BLUE. Excellent. I have explained why the sizes of said Moon and said Sun are the same as what is THE EYE ON BALANCE. So, I have CLEARLY explained why the diameter of the Moon is about one quarter (at 27 percent) in size compared with what is THE EARTH. How does what is the Sun survive or exist against what is outer “space" ? The answer is, ON BALANCE, invisible AND VISIBLE SPACE in fundamental equilibrium and BALANCE. Here's why. (Think about TIME as well.) WHAT IS THE EXPLANATION OF SPACE AND TIME ON BALANCE: Invisible AND VISIBLE SPACE in fundamental equilibrium AND BALANCE. This necessarily and clearly involves interaction, on balance. Consider what is the eye (on balance). Logically consider what is a two dimensional surface OR SPACE ON BALANCE !!!! Notice the associated DOME AND the flat/black “space” of WHAT IS the eye AS WELL. Really think about it all. Consider WHAT IS THE SUN ON BALANCE. (NOW, think about time.) Outer “space” involves full inertia, AND it is fully invisible AND black. Consider one and three dimensions ON BALANCE !!! Now, consider what is the fourth dimension and the term c4 from Einstein's field equations. Think about ELECTROMAGNETISM/energy AND think about gravity (ON/IN BALANCE). Consider what is the man (AND THE EYE ON BALANCE) who IS actually standing ON what is THE EARTH/ground (ON BALANCE) !! Think about TIME !! Think about why there is something instead of nothing ON BALANCE. Consider that time is NECESSARILY possible/potential AND actual ON/IN BALANCE. Think about the man (THE EYE) that actually IS IN what is outer “space”. Think about time. The stars AND PLANETS are POINTS in the night sky ON BALANCE, and consider what is the speed of light (c) ON BALANCE. NOW, consider what is THE SUN. Think closely about everything in this writing. Balance and completeness go hand in hand. Magnificent. ❤️ c squared CLEARLY means an INTERACTION on balance, as the stars AND PLANETS are POINTS in the night sky. Balanced inertia/INERTIAL RESISTANCE is fundamental. Consider E and “m" on balance. Great. The fourth dimension is only consistent with what is (on balance) a TWO dimensional surface OR SPACE ON BALANCE. Consider what is the eye. Consider what is the balanced MIDDLE DISTANCE in/of SPACE (ON/IN BALANCE). Consider invisible AND VISIBLE SPACE in fundamental equilibrium AND BALANCE. The stars AND PLANETS are POINTS in the night sky ON BALANCE. So, consider what is the speed of light (c) ON BALANCE. This CLEARLY explains the fourth dimension AND the term c4 from Einstein's field equations. Time is NECESSARILY possible/potential AND actual ON/IN BALANCE, as BALANCED inertia/INERTIAL RESISTANCE is fundamental pursuant to F=ma AND E=mc2 in balance. A galaxy consists of invisible AND VISIBLE SPACE in fundamental equilibrium and BALANCE, thereby eliminating the need for any "dark" "matter" or "dark" "energy"; as ELECTROMAGNETISM/energy is then CLEARLY gravity ON/in BALANCE. By Frank DiMeglio
Nice talk!
Impressive !
How does it compare to Adam?
Is there a learning rate schedule? I ask because even the standard momentum SGD has a periodic pattern of sudden improvement followed by stagnation...
Great talk!
What is the definition of mean-field?
First of all, thank you for the nice video. Would you please implement the same techniques using Python, step by step with an explanation if possible.
This is to good
ReLU is a switch. f(x)=x is connect. f(x)=0 is disconnect. A ReLU net is a switched composition of dot products. If all the switch states become known the net collapses to a simple matrix, upon which you can apply various metrics if you are curious. Also never forget the variance equation for linear combinations of random variables applies to the dot product (to test noise sensitivity etc.)
You can swap what is adjustable in deep neural networks. You can have fixed dot products (enacted with fast transforms) and adjustable (parametric) activation functions. The fast Walsh Hadamard transform is ideal. See Fast Transform fixed-filter-bank neural networks.