Now this is what I call teaching. I´m studying these concepts for a completely different field (statistics in psychology), yet this video was really helpful because the relation to more general concepts like einstein and occam's razor, it is inspiring. I can't believe my own teachers have never mentioned such examples.
Interestingly Ptolemy's approach of modeling orbits by epicycles (circles upon circles etc.) is a kind of Fourier analysis of the orbits. So I wonder if some kind of sparsity based approach could derive Copernicus model as a more sparse solution than the epicycle/Fourier approach in modeling orbits.
Loved the example that you gave about how Newton's F = ma is more generalized and Sparse in comparison to the previous models by Ptelomy, Kepler and Tycho Brahe. But it is also important to note that with increasing important number of large length and time scales range in a dynamical system, it becomes difficult sparsify the system.
This is exactly my area of research for my master thesis..❤️ Beautiful series..would love to see you discuss the state of the art algos and how an Explainable AI would help as a parallel....Beautiful articulation...
I'd like to ask an honest question... Do mathematicians remember everything e.g., if I ask an algebraist to prove that there exists a homeomorphism between two systems of linear dynamical systems, would they be able to with ease? I forget a lot of math and I cannot remember theorems and proofs for everything, and it's dissuading me from pursuing mathematics graduate studies after undergrad.
This is a really great question. To be honest, I remember very few details, especially for proofs and derivations. I focus more on trying to understand the fundamental principle, so that I can re-derive a result when I need it. This is something that gets much easier when you teach a subject, as a good lecture will show "how" or "why" something is derived, not just the derivation. Much easier to remember this way.
I certainly hope not 😅 I go back and review (even some of the really simple theorems and algorithms) regularly whenever I need them. Similar to what Steve said, I think just having a good understanding of "when" you're touching on a particular subject is good enough. If I know I need to address a specific problem (say, where to optimally place sensors), I should have a hint at what methods to use ("oh, I remember I can use RPCA for this!"), but I don't need to have all the details of those methods memorized (just a general idea of what they do). Note: I work in physics/Earth science, not mathematics, but I assume not all mathematicians are wiz kids who remember everything either.
To the extent that an ML model implements a program, sparsity minimizes its description length. Shorter programs account for more outputs of a given length than longer programs, proving better generalization.
If a sparse model is better, why is there a trend in deep learning to obtain tons and tons of data? Is this a practice that will change in the future and be reshaped as we understand more about deep learning?
This is a great question. Sparse isn't always better, and there are often tradeoffs. But, promoting sparsity is a big part of modern deep learning (pruning, regularizing, etc.), to help preventing overfitting. It is also a key observation that biological neural networks tend to be sparsely connected.
The best video with a clear idea❤
Now this is what I call teaching. I´m studying these concepts for a completely different field (statistics in psychology), yet this video was really helpful because the relation to more general concepts like einstein and occam's razor, it is inspiring. I can't believe my own teachers have never mentioned such examples.
Interestingly Ptolemy's approach of modeling orbits by epicycles (circles upon circles etc.) is a kind of Fourier analysis of the orbits. So I wonder if some kind of sparsity based approach could derive Copernicus model as a more sparse solution than the epicycle/Fourier approach in modeling orbits.
Loved the example that you gave about how Newton's F = ma is more generalized and Sparse in comparison to the previous models by Ptelomy, Kepler and Tycho Brahe. But it is also important to note that with increasing important number of large length and time scales range in a dynamical system, it becomes difficult sparsify the system.
This is exactly my area of research for my master thesis..❤️ Beautiful series..would love to see you discuss the state of the art algos and how an Explainable AI would help as a parallel....Beautiful articulation...
Thank you for this lovely video!
Wonderful lesson! Thanks!
Hi professor Steve, that very nice and interesting. Thanks professor 👍 💓
I'd like to ask an honest question... Do mathematicians remember everything e.g., if I ask an algebraist to prove that there exists a homeomorphism between two systems of linear dynamical systems, would they be able to with ease? I forget a lot of math and I cannot remember theorems and proofs for everything, and it's dissuading me from pursuing mathematics graduate studies after undergrad.
This is a really great question. To be honest, I remember very few details, especially for proofs and derivations. I focus more on trying to understand the fundamental principle, so that I can re-derive a result when I need it. This is something that gets much easier when you teach a subject, as a good lecture will show "how" or "why" something is derived, not just the derivation. Much easier to remember this way.
I certainly hope not 😅 I go back and review (even some of the really simple theorems and algorithms) regularly whenever I need them. Similar to what Steve said, I think just having a good understanding of "when" you're touching on a particular subject is good enough. If I know I need to address a specific problem (say, where to optimally place sensors), I should have a hint at what methods to use ("oh, I remember I can use RPCA for this!"), but I don't need to have all the details of those methods memorized (just a general idea of what they do).
Note: I work in physics/Earth science, not mathematics, but I assume not all mathematicians are wiz kids who remember everything either.
Great Video! I think you may have mistakenly said 'tower of Giza when you meant 'tower of pisa'.
Haha, nice catch! Yes, not the mythical tower of giza :)
To the extent that an ML model implements a program, sparsity minimizes its description length. Shorter programs account for more outputs of a given length than longer programs, proving better generalization.
parsimonious .. from some online dictionary ... unwilling to spend money or use resources; stingy or frugal. ... Perhaps a wise use of resources.
“With four parameters I can fit an elephant, and with five I can make him wiggle his trunk”. -Johnny von Neumann
If a sparse model is better, why is there a trend in deep learning to obtain tons and tons of data? Is this a practice that will change in the future and be reshaped as we understand more about deep learning?
Those are not sparse models. They have millions of parameters and that's why you need tons of data.
This is a great question. Sparse isn't always better, and there are often tradeoffs. But, promoting sparsity is a big part of modern deep learning (pruning, regularizing, etc.), to help preventing overfitting. It is also a key observation that biological neural networks tend to be sparsely connected.
Ptolemy had the last laugh after all. His epicycles can fit more data than planetary orbits, as they amount to a Fourier transform.
"fits more data" doesn't follow from "fourier transform"
@@tolkienfan1972 I mean that Fourier transforms can be applied to many domains, not just planetary orbits.
being greedy is good..in ML :P