very interesting talk! Seems like a surprisingly easy shaping method you found. That should be huge! This is perhaps only indirectly related, but: I wonder if we could somehow go from a "discrete" layer-for-layer evaluation to one that's akin to various monte-carlo techniques which just sample from the continuum solution. Not sure how you'd take care of or replace the discrete parameters of an NN in such a setting, but I'm kinda picturing the difference between "radiosity" and "path tracing" when it comes to rendering. In Path tracing, if done correctly, you can kinda directly and unbiasedly approximate the continuum limit of the distribution of light in a scene, and it's all built on stochastic processes. You can even take care of "infinitely deep paths" correctly by stochastically cancelling paths at a *finite* depth through a Russian Roulette procedure, and you can combine many sampling procedures optimally through multiple importance sampling. More recently, that's even possible for a *continuum* of sampling methods in the form of *stochastic* importance sampling. I'd imagine something similar could be used for *actually* training evaluating *"infinite"* (both in width and depth) NNs by simply evaluating them to some finite but task-dependent depth. The main question to me is how to even set or store weights in such a setting in a finite amount of memory. I'm guessing you'd somehow have weights be defined through like a Gaussian mixture process or the like, but it's probably much easier said than done.
very interesting talk!
Seems like a surprisingly easy shaping method you found. That should be huge!
This is perhaps only indirectly related, but:
I wonder if we could somehow go from a "discrete" layer-for-layer evaluation to one that's akin to various monte-carlo techniques which just sample from the continuum solution.
Not sure how you'd take care of or replace the discrete parameters of an NN in such a setting, but I'm kinda picturing the difference between "radiosity" and "path tracing" when it comes to rendering. In Path tracing, if done correctly, you can kinda directly and unbiasedly approximate the continuum limit of the distribution of light in a scene, and it's all built on stochastic processes.
You can even take care of "infinitely deep paths" correctly by stochastically cancelling paths at a *finite* depth through a Russian Roulette procedure, and you can combine many sampling procedures optimally through multiple importance sampling. More recently, that's even possible for a *continuum* of sampling methods in the form of *stochastic* importance sampling.
I'd imagine something similar could be used for *actually* training evaluating *"infinite"* (both in width and depth) NNs by simply evaluating them to some finite but task-dependent depth.
The main question to me is how to even set or store weights in such a setting in a finite amount of memory. I'm guessing you'd somehow have weights be defined through like a Gaussian mixture process or the like, but it's probably much easier said than done.