Fourier Neural Operator for Parametric Partial Differential Equations (Paper Explained)

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 ธ.ค. 2024

ความคิดเห็น • 133

  • @DavenH
    @DavenH 4 ปีที่แล้ว +78

    The intro is cracking me up, had to like.

  • @RalphDratman
    @RalphDratman ปีที่แล้ว +2

    "Linearized ways of describing how a system evolves over one timestep" is BRILLIANT!
    I never heard PDEs described in such a beautiful, comprehensible way,
    Thank you Yannic Kilcher.

  • @AE-cc1yl
    @AE-cc1yl 4 ปีที่แล้ว +50

    Navier-Stonks equations
    📈

  • @errorlooo8124
    @errorlooo8124 4 ปีที่แล้ว +20

    So basically what they did is kind of like taking a regular neural network layer added jpeg compression before it, and jpeg decompression after it, then built a network and trained it on navier stokes images to predict the next images. The reason i say jpeg is because the heart of jpeg is transforming an image into the frequency domain using a fourier-like function, the extra processing jpeg does is mostly non-destructive(duh you want your compressed version to be as close to the original), plus a neural network would probably not be impeded by the extra processing, and their method throws away some of the modes of the fourier transform too.

    • @errorlooo8124
      @errorlooo8124 4 ปีที่แล้ว +4

      @Pedro Abreu Yeah DCT is derived from the DFT which is basically the Fourier Transform but can work on actual data instead of needing a continuous function. (DCT is just the real component of DFT, with a bit of offsetting(it uses n+1/2) and less rotation(it uses pi instead of 2pi))

  • @taylanyurtsever
    @taylanyurtsever 4 ปีที่แล้ว +30

    Vorticity is the cross product of nabla operator and the vector field of velocity, which can be thought of as the rotational flow in that region (blue clockwise and red ccw).

    • @judgeomega
      @judgeomega 4 ปีที่แล้ว +6

      or more simply: twisting

    • @CharlesVanNoland
      @CharlesVanNoland 3 ปีที่แล้ว +2

      AKA "curl" en.wikipedia.org/wiki/Curl_(mathematics)

  • @dominicisthe1
    @dominicisthe1 4 ปีที่แล้ว +13

    Cool to see a paper like this pop up on my youtube. I did my MSc thesis on the first reference solving ill-posed inverse problems using iterative
    deep neural networks.

  • @이현빈학생공과대학기
    @이현빈학생공과대학기 ปีที่แล้ว +1

    This is an excellently clear description. Thanks for the help.

  • @soudaminipanda
    @soudaminipanda ปีที่แล้ว +1

    Fabulous explanation. Crystal clear

  • @lestroarmonico
    @lestroarmonico 4 ปีที่แล้ว +2

    6:26 vorticity is derivation of viscosity? No it is not. Viscosity is the fluid's property, vorticity is ∇×V (curl of the velocity). Edit: And at 8:18, that is not vorticity equation, that is the continuity equation which is about conservation of mass. Very helpful video as I currently study on this very paper myself, but there are a few mistakes you've made that needs correction :)

  • @shansiddiqui8673
    @shansiddiqui8673 3 ปีที่แล้ว +4

    Fourier Neural Operators aren't limited to periodic boundary conditions the linear transform W works as a bias term which keeps track of non-periodic BCs.

  • @channuchola1153
    @channuchola1153 4 ปีที่แล้ว +3

    Wow.. simply awesome. Fourier and PDE good to see togather

  • @clima3993
    @clima3993 2 ปีที่แล้ว

    Yannic always give me an illusion that I understand things that I actually don't. Anyway, good starting point and thank you so much!

  • @PatatjesDora
    @PatatjesDora 4 ปีที่แล้ว +3

    Going over the code is really nice!

  • @kazz811
    @kazz811 4 ปีที่แล้ว +5

    Cool video as usual. Quick comment, vorticity is simply the curl of the velocity field and doesn't have much to do with "stickiness". Speaking of which, viscosity (measures forces within the fluid molecules) is not actually related to "stickiness", a property that is measured by surface tension (how the fluid interacts with an external solid surface). You can have highly viscous fluids which don't stick at all.

  • @Mordenor
    @Mordenor 4 ปีที่แล้ว +37

    Normal broader impact: This may have negative applications on society and military applications
    This paper: I AM THE MILITARY

  • @mansisethi8127
    @mansisethi8127 5 หลายเดือนก่อน

    Thank you for the paper presentation!!

  • @simoncorbeil4081
    @simoncorbeil4081 11 หลายเดือนก่อน +1

    Great video, however I would like to correct a few facts. If Navier-Stokes equations needs the development of new and efficient methods like neural networks it`s essentially because they are strongly Nonlinear especially for high Reynold number (low viscosity, like with air, water; typical fluids we daily meet ) where Turbulence is triggered. Also, I want to rectified, the Navier-Stokes systems shown in the paper is in incompressible regime, and the second equation is the divergence of of velocity, which is the mass conservation equation, nothing related to vorticity (it`s more the opposite, vorticity would be the cross product of the nabla operator with the velocity field).

  • @pradyumnareddy5415
    @pradyumnareddy5415 4 ปีที่แล้ว +4

    I like it when Yannic throws shade.

  • @herp_derpingson
    @herp_derpingson 4 ปีที่แล้ว +6

    36:30 I like the idea of throwing away high FFT modes as regularization. I wish more papers did that.
    37:35 IDK if throwing out the little jiggles is a good idea because the Navier Stokes is a chaotic system and those little jiggles were possibly contributing chaotically. However perhaps the residual connection corrects that.
    46:10 XD
    I wish the authors ablated the point to point convolution and showed how much does that help, same for throwing away modes.
    Also I wish the authors showed an error accumulation over time graph.
    I really liked the code walkthrough. Do it for other papers too if possible.

  • @DavenH
    @DavenH 4 ปีที่แล้ว +10

    I hope this is going to lead to much more thorough climate simulations. Typically these require vast amounts of supercomputer time and are run just once a year or so. But it sounds like just a small amount of cloud compute would run them on this model.
    Managing memory would then be the challenge, however, because I don't know how you could afford to discretize into isolated cells the fluid dynamics of the atmosphere, where each part affects and flows into other parts. It's almost like you need to do it all at once.

    • @PaulanerStudios
      @PaulanerStudios 4 ปีที่แล้ว +2

      Well from what I have seen climate simulations are at the moment also discretized into grids for memory management... at least the ones where I have looked at the code... I guess its more of a challenge to enforce boundary conditions in this model such that neighbouring cells don’t diverge at their shared boundaries... I guess traditional methods for dealing with this would suffice tho... you’d still have to then blend the boundaries occasionally, so the timesteps can’t be arbitrarily large

    • @DavenH
      @DavenH 4 ปีที่แล้ว +1

      @@PaulanerStudios Hmm. Maybe take a page from CNNs and calculate 3x3 grid cells, so you get a centre cell with boundaries intact, then stride 1 cell and do another 3x3 calculation; hopefully the interaction falloff is steep enough to then stitch the centre-cells together without discontinuities. Or maybe you need to do 5x5 cells throwing away all but the centres.
      Another thing, I thought the intra-cell calculations were hand-made heuristics with these climate simulations, not actually Navier-Stokes. Could be wrong, but if no even eliminating those heuristics and putting in "real" simulations is a good improvement.

    • @PaulanerStudios
      @PaulanerStudios 4 ปีที่แล้ว +2

      @Mustache Merlin The thing with every compute job is the von Neumann Bottleneck... running massively parallel compute jobs on CPU or GPU, the limiting factor is always memory bandwith... since neural networks are in the most basic sense matrix multiplications interspersed with nonlinearities, VRAM is the limiting factor for how large a given multiplication/network and thus network input can be... there is really no sense in streaming anything from a drive no matter how fast, because the performance will tank by orders of magnitude for backprop and such, if the network (and computation graph) can‘t be held in graphics memory at once... If u‘re arguing the case for regular simulations, well, supercomputers already have terabytes or petabytes of ram... the issue is swapping the data used for computation in and out of cache and subsequently registers... optane drives will not solve the issue of the memory bottleneck there either... the only thing they can solve is maybe memory price, which really is not a limiting factor in HPC (most of the time)

  • @antman7673
    @antman7673 4 ปีที่แล้ว +1

    Vorticity is derived from vortex.
    The triangle pointing down is the nabla Operator. It was pointing to the lowest value.

  • @DamianReloaded
    @DamianReloaded 4 ปีที่แล้ว +1

    47:00 If they wanted to predict longer sequences they could use the solver for the first tensor they input and just feed in the last 11 steps of the latest prediction back in right? I wonder after how many steps it would begin to diverge if they used the maximum possible resolution of the data.

    • @YannicKilcher
      @YannicKilcher  3 ปีที่แล้ว +1

      True, but as you say, the problems would pile up

  • @JurekOK
    @JurekOK 4 ปีที่แล้ว +16

    So . . . they have taken an expensive function (which is itself, already an approximation of an even more expensive function), and trained up an approximated function.
    Then, there is no comparison of predictions with any experiment (least a rigorous one), only with that original "reference" approximated function.
    Is this a big deal? I have been doing that during the 2nd year of my undergrad in mechanical engineering, 18 years ago. Come on.
    How about the long-term stability of their predictor? How does it deal with singularities at corners? moving or deforming objects? deconveregence rate? is the damping spectrally correct? My point is that this demo is really unimpressive to a person that actually uses fluid dynamics for product design. It might be visually impressive for the entertainment industry.
    Hyped titles galore.

  • @kristiantorres1080
    @kristiantorres1080 3 ปีที่แล้ว

    Thank you! I was just reading this paper and somewhere around page 5, I started to fall asleep. Your video will help me to understand this paper better.

  • @dawidlaszuk
    @dawidlaszuk 4 ปีที่แล้ว +7

    Coming from signal processing and getting head into the Deep™ world, I'm happy to see Fourier showing up. Great paper and good start but I agree with the overhype. For example, throwing away modes is the same as masking with square function, which in the signal space is like convolving with a sinc function. That's a highly "ripply" func. Nav-Stks is general is chaotic and small perturbations will change output significantly over time. I'm guessing that they don't see/show these effects because of their data composition. But that is a good start and maybe an idea for others. For example replace Fourier kernel with Laplace and use proper filtering techniques.

    • @DavenH
      @DavenH 4 ปีที่แล้ว +1

      Hey Dawid, you produce any YT content? I'm also from DSP and doing Deep learning, curious what you're working on.

  • @markh.876
    @markh.876 4 ปีที่แล้ว +4

    This is going to be lit when it comes to Quantum Chemistry

  • @diegoandrade3912
    @diegoandrade3912 2 ปีที่แล้ว

    Fabulous thank you for the explanation and time to create this video, keep it coming.

  • @idiosinkrazijske.rutine
    @idiosinkrazijske.rutine 4 ปีที่แล้ว +5

    Looks similar to what is done is so called "spectral methods" for simulation of fluids. I'm sure this is where they draw their inspiration from.

  • @yusunliu4858
    @yusunliu4858 4 ปีที่แล้ว +4

    The process Fourier Transformation -> Multiplication -> Inverse Fourier Transformation seems like a low pass filter. If that is so, why not doing a low pass filter at the input A'. Maybe I didn't get the idea correctly.

    • @YannicKilcher
      @YannicKilcher  3 ปีที่แล้ว +1

      I think one of the steps is actually explicitly a low pass filter, so you're right

    • @weishkysiliy4420
      @weishkysiliy4420 2 ปีที่แล้ว

      @@YannicKilcher Training on a lower resolution directly evaluated on a higher resolution. I don't understand how he can do It?

    • @YannicKilcher
      @YannicKilcher  2 ปีที่แล้ว +2

      @@weishkysiliy4420 the architecture is somewhat agnostic to the resolution, unlike traditional image classifier models

    • @weishkysiliy4420
      @weishkysiliy4420 2 ปีที่แล้ว

      ​@@YannicKilcher
      After training on small size (64*64) and loading the model directly, change the input dimensions to 256*256? Can I understand it this way?

    • @weishkysiliy4420
      @weishkysiliy4420 2 ปีที่แล้ว

      @@YannicKilcher I really like your song. Nice prelude

  • @airealguy
    @airealguy 4 ปีที่แล้ว +7

    So I think this approach has some flaws and has been hyped too much. The crux of the problem is the use of FFT's which impose some severe constraints on CFD problems. First, consider complex geometries (ie those that are not rectangular). How does one take an FFT on something that is not rectangular? You can map the geometry using a spatial transform to a rectangular coordinate system, but then the learned parameters are specific to that transform and thus that geometry. Secondly, there are no good ways to do FFT's efficiently at large scales (ie scales above the memory space of one processor). Even the best algorithms such as heFFTe which can achieve 90% of the theoretical max performance are quite poor in comparison to the algorithmic performance of standard PDE solvers. heFFTe only achieves an algorithmic performand of 0.05% of peak on summit. So while this is fast on small scale problems, it will likely suffer major performance problems at large scales and will be difficult if not impossible to apply to complex non rectangular geometries. The neural operator concept is probably a good one, but the basis function makes this difficult to apply to general purpose problems. We need a basis function which is expanded in perception but not global like an FFT. Even chopping the FFT off can have issues. If you want to compute a N

    • @crypticparadigm2180
      @crypticparadigm2180 3 ปีที่แล้ว

      Great points... On the topic of memory consumption and allocation of neural networks-- what are your thoughts about Neural Ordinary Differential Equations?

  • @mohsensadr2719
    @mohsensadr2719 3 ปีที่แล้ว

    Very nice work of explaining the paper. I was wondering if you have any comments about:
    - Fourier works well if you have equidistance grid points. I think if the initial data points are random in space (or unstructured grid), one has to include more and more terms in the Fourier expansion given the irregularity of the mesh.
    - FNO has to be coupled with an exact solver since one has to give the solution of the first several time steps as input.
    - I think it is not possible to train FNO on a small solution domain and then use it for larger ones. Any comments on that?

    • @weishkysiliy4420
      @weishkysiliy4420 2 ปีที่แล้ว

      Training on a lower resolution directly evaluated on a higher resolution. I don't understand how he can do It?

  • @CoughSyrup
    @CoughSyrup ปีที่แล้ว

    This is really huge. I see no reason this couldn't be extended to solve magnetohydrodynamic behavior of plasma. And made to work for the 3D equations. This currently requires supercomputers to model. Imagine making it run on a desktop PC.
    This means modeling of plasma instabilities inside fusion reactors.
    Maybe with fast or real-time modeling, humanity can finally figure out an arrangement of magnets in 3D for plasma that is stable and robust to excursions.

  • @raunaquepatra3966
    @raunaquepatra3966 4 ปีที่แล้ว +2

    I wish the authors showed the effects for throwing away modes in some nice graphs😔.
    Also show the divergence for this method from ground truth (using simulator) when used in a RNN fashion(ie feeding the final output of this method back to itself to generate time steps possibly to infinity and show at what point it starts diverging significantly)

  • @billykotsos4642
    @billykotsos4642 4 ปีที่แล้ว +5

    Damn the opening title blew my mind

  • @sui-chan.wa.kyou.mo.chiisai
    @sui-chan.wa.kyou.mo.chiisai 4 ปีที่แล้ว +10

    8:30 Triangle for Laplace operator ?

    • @sui-chan.wa.kyou.mo.chiisai
      @sui-chan.wa.kyou.mo.chiisai 4 ปีที่แล้ว +3

      www.wikiwand.com/en/Laplace_operator

    • @machinelearningdojo
      @machinelearningdojo 4 ปีที่แล้ว +1

      😀😀😀😀 pwned

    • @finite-element
      @finite-element 4 ปีที่แล้ว +1

      Also Navier-Stokes should be nonlinear not linear (circa the same time window).

    • @JM-ty6uq
      @JM-ty6uq 4 ปีที่แล้ว +2

      that is the dorito operator

  • @esti445
    @esti445 8 หลายเดือนก่อน

    8:30 It is the laplacian operator - the second derivative with respect to space..

  • @lucidraisin
    @lucidraisin 4 ปีที่แล้ว +3

    Woohoo! New video!

  • @Andresc93
    @Andresc93 ปีที่แล้ว

    Thank you, you just save a bunch of time

  • @konghong3885
    @konghong3885 4 ปีที่แล้ว +3

    jokes aside, as a Physics student, I wonder:
    is it possible to apply periodic boundary condition on the FNO?
    how to actually estimate the error of the solver, for MCMC, the error can be estimated with probability, but not for the ML case

    • @artyinticus7149
      @artyinticus7149 4 ปีที่แล้ว +1

      Highly unlikely

    • @dominicisthe1
      @dominicisthe1 3 ปีที่แล้ว

      I think it is the non periodic boundary conditions u are worried it about.

  • @sujithkumar824
    @sujithkumar824 4 ปีที่แล้ว +11

    Download this video to save it personally because it can be taken down because of pressure by the author, for stupid reasons.

    • @herp_derpingson
      @herp_derpingson 4 ปีที่แล้ว +2

      Why?

    • @judgeomega
      @judgeomega 4 ปีที่แล้ว +2

      @@herp_derpingson i think the author can neither confirm nor deny any reasoning for a take down

    • @sujithkumar824
      @sujithkumar824 4 ปีที่แล้ว +1

      @@judgeomega yes, I'm glad Yannic didn't even respond publically to her, this is exactly the treatment every attention seeker should get.

    • @matthewtang1489
      @matthewtang1489 4 ปีที่แล้ว +1

      what?? paper author or article author? there is a fiasco about this?

    • @amarilloatacama4997
      @amarilloatacama4997 4 ปีที่แล้ว +1

      ??

  • @surbhikhetrapal1975
    @surbhikhetrapal1975 4 หลายเดือนก่อน

    Hi, found this review of the paper very helpful. I could not locate the code at the link shared in video description. Does anyone know under what name in the github neuraloperator repository is this code present?

  • @meshoverflow2150
    @meshoverflow2150 4 ปีที่แล้ว +1

    Would there be any advantage to doing convolution in frequency space with a conventional cnn for say image classification? On the surface it seems like it could be faster (given that an fft is very fast) than regular convolution, but I assume there’s a good reason why it isn’t a common practice.

    • @nx6803
      @nx6803 4 ปีที่แล้ว +1

      Octave convolutions are sorta based on the same intuition, yet don’t actually use fft.

    • @andrewcutler4599
      @andrewcutler4599 4 ปีที่แล้ว +1

      Convolution preserves spatial relationships which makes it useful for images. Neighboring pixels are often related to one another. A CNN in FFT world would operate on frequency. Not clear that there is a window where only near frequencies should be added together to form feature maps.

    • @meshoverflow2150
      @meshoverflow2150 4 ปีที่แล้ว +4

      @@andrewcutler4599 The cnn wouldn’t operate on frequencies though. Multiplication in frequency space IS convolution, so a feed forward network in frequency space should do the exact same thing as a conventional cnn. I feel like the feed forward should be smaller than the equivalent cnn, hence the question.

    • @DavenH
      @DavenH 4 ปีที่แล้ว +1

      @@meshoverflow2150 Interesting observation.

  • @tedonk03
    @tedonk03 2 ปีที่แล้ว +2

    Thank you for the awesome explanation, really clear and helpful. Can you do one for PINN (Physics Informed Neural Network)?

  • @andyfeng6
    @andyfeng6 4 ปีที่แล้ว +2

    The triangle means Laplace operator

  • @digambarkilledar003
    @digambarkilledar003 9 หลายเดือนก่อน

    what is number of input channels and output channels ?

  • @MaheshKumar-iw4mv
    @MaheshKumar-iw4mv ปีที่แล้ว

    Can FNO be used to train data from Reaction-Diffusion dynamics with no-flux boundary conditions?

  • @antman7673
    @antman7673 4 ปีที่แล้ว

    So this is kind of like an approximation of the development of the fluid with pixels instead of the infinite resolution “vector graphic” provided by the equation.

  • @boffo25
    @boffo25 4 ปีที่แล้ว +1

    Nice explanation

  • @beginning_parenting
    @beginning_parenting 3 ปีที่แล้ว

    On the line 87 of the code in FNO3D , it is mentioned that input is a 5d tensor (batch, x,y,t, in_channels).. What does in channels represent? Does that mean that each point in (x,y,t) is a vector containg 13 channels?

  • @davenovo69
    @davenovo69 4 ปีที่แล้ว +1

    Great channel!
    What App do you use to annotate PDFs?

  • @weishkysiliy4420
    @weishkysiliy4420 2 ปีที่แล้ว

    Training on a lower resolution directly evaluated on a higher resolution. I don't understand how he can do It?

  • @reinerwilhelms-tricarico344
    @reinerwilhelms-tricarico344 4 ปีที่แล้ว +1

    I found this article quite abstract (which may explain why it's interesting ;-). I could sort of get it after first reading an article by the same authors where they explain neural operators for PDEs in general (Neural Operator: Graph Kernel Network for Partial Differential Equations, 2020). There they show that the kernel they learn is similar to learning the Green's function for the PDE.

    • @kristiantorres1080
      @kristiantorres1080 3 ปีที่แล้ว

      It is abstract and there are some things that I don't understand. Is this the paper you are referring to? arxiv.org/abs/2003.03485

    • @reinerwilhelms-tricarico344
      @reinerwilhelms-tricarico344 3 ปีที่แล้ว

      @@kristiantorres1080 Yes. I read that paper and it somehow helped me understanding the paper presented here.

  • @konghong3885
    @konghong3885 4 ปีที่แล้ว +1

    behold, the new title formate for ML community

  • @JM-ty6uq
    @JM-ty6uq 4 ปีที่แล้ว

    24:40 I suppose its worth mentioning that you can make a cake with 0.5 eggs or 2 eggs

  • @southfox2012
    @southfox2012 4 ปีที่แล้ว +1

    great

  • @perlindholm4129
    @perlindholm4129 4 ปีที่แล้ว

    Idea - Scale down the ground truth video. Then train a model on a small matrix 4x4 part of the frame and learn the expansion 16x16 submatrix of the original frame. This way you can train 2 models each on the different aspects of the calculation. One scaled down time learning and one scale up learning.

  • @Neomadra
    @Neomadra 4 ปีที่แล้ว +1

    I don't quite get why you said (If I understood you correctly) that the prediction cannot be made arbitrarily far into the future. Couldn't you just use the output of the forward propagation as new input for the next round of forward propagtion. So you apply a chain of forward propagations until you reach the time you want. If memory is a problem, then you can simply clear the memory of the previous outputs.

    • @seamusoblainn
      @seamusoblainn 4 ปีที่แล้ว +1

      Perhaps as the network is making predictions as opposed to the ground truth sim which is using physics. In the latter there only is what it's rules generate, while in the former you are using 'feedforwarding' which must by necessity diverge, and on a fine degree of granularity probably is from the beginning.

    • @YannicKilcher
      @YannicKilcher  3 ปีที่แล้ว

      it's true, but you regress to the problem you have when running classic simulations

  • @sinitarium
    @sinitarium 11 หลายเดือนก่อน

    Amazing! This must be how Nvidia DLSS works!?

  • @cedricvillani8502
    @cedricvillani8502 2 ปีที่แล้ว

    Should update your video

  • @jean-pierrecoffe6666
    @jean-pierrecoffe6666 4 ปีที่แล้ว +1

    Hahahahaha, excellent intro

  • @acharyavivek51
    @acharyavivek51 2 ปีที่แล้ว

    very scary how ai is progressing.

  • @Beingtanaka
    @Beingtanaka ปีที่แล้ว

    Here for MC Hammer

  • @RalphDratman
    @RalphDratman ปีที่แล้ว

    All those little bumps could be creating the digital environment in which the upper layers of GPTx are doing their magic.

  • @kesav1985
    @kesav1985 4 ปีที่แล้ว +4

    So much fuss about curve-fitting!
    Curve-fitting is not a numerical scheme for solving PDEs. :-)

  • @artyinticus7149
    @artyinticus7149 4 ปีที่แล้ว +1

    Imagine using the intro to politicize the paper.

    • @artyinticus7149
      @artyinticus7149 4 ปีที่แล้ว +1

      @adam smith Imagine using the military for non-political purposes.