I had never known about Padé approximations, and you did such a good job motivating and explaining them. Also, the way you mentioned how it seems almost unreasonably effective, like getting something for nothing, definitely felt like it was speaking directly to the thought passing through my mind at that moment.
Thanks so much! Padé approximants still feel kind of magic to me and it's definitely that they aren't well known that made me want to make a video on them. Also, does this mean I have bragging rights that "I taught 3b1b some mathematics" :-D
Qualitatively and intuitively, is it because when you truncate a Taylor series at some order O(n) you are doing precisely that but when you have a rational polynomial approximation O(m)/O(n) it really could be order greater than m or n if it is divided out? I.e. if you do 1/(1-x), the series is infinite, so you must be accruing all that accuracy by doing O(m)/O(n) because it is not really truncated at order O(m) or O(n) but is potentially an infinite series, even though you write it compactly as a rational fraction.
It is not quite for nothing. You are calculating twice as many interrelated coefficients (minus 1), and the overall function is more complicated (which makes computation for any given x harder and further dependent on precision). And that is AFTER having to calculate the Taylor expansion, so it is triple the amount of work, and is 'refining' an approximation which was already rather nice and effective in some respects. It still seems unreasonably effective to me, a bit, but it is not entirely for free. There probably are also some theoretical costs associated with going from polynomials (extremely nice, highly-constrained functions) to rational functions (which are just slightly less so).
@@curtiswfranks I don't think it's fair to say that you calculate twice as many coefficients, in general the number of coefficients would be roughly equal. In the Pade aproximation you have m coefficients in the numerator and n coefficients in the denominator, and you construct it from a m+n order taylor series, so they literally should be equal. In the example the taylor series only had fewer because some of the coefficients were 0, but you still have to calculate those coefficients anyways. As far as the "something from nothing", I think what is meant is that you get this extra precision without taking any more information from the original function. I could literally give you a taylor series without telling you what it is supposed to be approximating, and you could build a better approximation even though you don't even know what it is you are approximating.
We also use them in control theory, for the same reasons (a finite-order modeling of a time delay). It's becoming an old man trick though, now that most of what we're doing practically is in discrete time (were delays can be treated as a linear system with the good multiplicity!)
In general, expressions of the form at 0:23 are interesting since they have a few nice properties: 1.) Much like how polynomials are capable of expressing any function composed from addition, subtraction, and multiplication (for a finite number of operations, at the very least), expressions of the form at 0:23 do the same thing, but for addition, subtraction, multiplication, *and division*. Another way of putting it is that they are capable of describing any function defined on a field. This might help towards explaining why Padé approximates can be so much more effective than Taylor series. 2.) Approximates of that form are used all the time in highly realistic graphics shaders. This is because they can be used to create fast approximates of functions whose real values could not be calculated in the time it takes to render a frame. Unlike polynomials, they can behave very well over their entire domain, and they avoid large exponents that could introduce floating point precision issues, both of which are important when you need to guarantee that a shader will not create graphical artifacts in a limited environment where all you have to work with is 32 bit floating point precision. They also avoid calls to advanced functions like sin() or exp(), which again makes their execution especially fast. 3.) You don't always need the derivatives of a function to find such an approximate. For instance, if you know that a function has an asymptote, or that it assumes a certain value at 0, or that it's symmetric, or that it tends towards a number at ±∞, then that automatically tells you something about the coefficients within the approximate. It then becomes much easier for you to run an optimization algorithm on a dataset to find good values for the remaining coefficients. Christophe Schlick gives an excellent example of this approach in "An Inexpensive BRDF Model for Physically-based Rendering" (1994). 4.) Multivariate versions of the approximate are a thing, too. To see how such a thing can be done, simply start from the proof for the statement in 1.) but now instead of working with real values and a variable "x" as elements, you'll also be working with another variable "y". As an example, for 3rd order bivariate approximates you'll wind up with polynomials in your numerators and denominators that have the form p₁xxx +p₂xxy + p₃xyy + p₄yyy + p₅xx + p₆xy + p₇yy + p₈x + p₉y + p₁₀
Wow, that's so awesome! All news to me as well! Thanks for taking the time to write this post I appreciate it, as will people watching this video I'm sure.
@@DrWillWood is there any result about the error estimation with a certain Pade aproximante? Something like Lagrange's form for the remainder fo a Taylor polynomial?
I don’t see how (1) could be true. The Weierstrass function is defined on a commutative field (ℝ) but doesn’t have a Taylor approximation, and therefore (if I understand correctly) doesn’t have a Padé approximation. Maybe you meant that Padé approximations can express any functions which are derived from adding, subtracting, multiplying and dividing the identity f(x) = x?
We used these in Control Theory to approximate time delays (exponentials) in the transfer function
3 ปีที่แล้ว +315
This is an excellent explanation of something I didn't know existed. Yet it's so simple and elegant. I'm working on a Machine Learning playlist on linear regression and kernel methods and I wish I had seen this video earlier! I'll play around with Padé approximants for a while and see where this leads me. Thank you for this interesting new perspective!
Oh nice. e^-x is a very common function to Padé approximate in linear control theory because it's the Laplace transform of a uniform time delay. Notably, x in this context is a complex number, yet it still works. I've never understood how it was computed until now. I think the aha moment is realizing we are discarding all higher order terms when we perform the equation balance. This is the key reason why the Padé approximation isn't just equal to the Taylor approximation.
The case you're talking about can be computed much more easily actually. Just write e^(-x) = e^(-x/2)/e^(x/2), and use Taylor expansion for e^(-x/2) and for e^(x/2) :)
Having spent a great deal of time reading up on Pade approximants and struggling to find easy to understand introductory examples it is extremely exciting to see content such as this being put out there for people to learn. Fantastic job motivating the need and demonstrating the utility for these rational approximations. In my personal explorations, I have found multipoint Pade approximations to be very cool, being able to capture asymptotic behaviors for both large and small x, or around poles / points of interest is very cool. Keep up the awesome work!
Ratios of polynomials are used in aircraft flight controls. Normally, flight test attempts to measure aircraft handling qualities in a ratio of two second order polynomials, even though modern digital flight controls may be much higher order functions.
It seems natural to choose M = N. What are the situations where there is an advantage in choosing M > N or N > M, where I have a "budget" of M+N coefficients that I want to work with?
Having M=N means you can choose a finite non zero limit to the padé approximant according to the function you want to capture the behavior. In this case the denominator had a larger order and therefore made the fraction goes to 0 at inf. If it was the other way around (N>M) it would blow up and you would lose one of the motivation to use padé, but having polynomials with roots at the denominator allows to better describe poles located at finite x I would say, where Taylor expansion fail to even exist Edit : it seems this reply was liked a few times so I would like to add that going to inf is not an actual loss of motivation. It is great to have an expression that also captures the asymptotic behavior even when there is no finite limit, whereas Taylor blows up without following the divergence of the function
M=0 is just an ordinary Taylor series, so that's at least 1 use for N>M. From that perspective, it can also be seen that the Taylor series is a special case for pade approx.
Your function will behave like x^(N-M) asymptotically. For example, with a function like sine that tends to stay constant, you might want M=N. With a function e^x that tends to zero, you might want M>N. And so on.
The Padé Approximant might be closer to the actual function in the long run, but it actually has a larger relative error compared to the Taylor series around x=0. Since we only care about approximating sin(x) from x=0 to x=pi/4, because we can then use reflection and other properties to get the value for other angles, the benefits are overcome by the disadvantages (i.e. you have to do more arithmetic operations, including a division).
@@ere4t4t4rrrrr4 I know there are algorithms that can compute the value of a function at a given point with (arbitrarily) better precision, but i don't know about any other closed algebraic formula which locally approximates a given function better than the Taylor Series.
Yeah, if you skip the mostly unnecessarily long proofs and theorems, and use a more accesible language, most 1st-2nd year college mathematics cousld be explained this way. Concepts like eigenvalues and eigenvectors, Taylor polynomials and series, or Lagrange multipliers could easily be taught in 10-20 minutes (of course, if you already know matrices, determinants, derivatives and some multivariable calculus) but easily take up entire lectures, because of the excessive attention on proofs or "preliminary definitions" that are not necessary to understand the concepts in the first place. (only to rigorously define them). The sad reality is that most students get lost or mentally exhausted on the theoretical chip-chat and they end up NOT learning the methods or even understanding what they're doing.
A hidden gem of a channel! Never really considered other approximations because the Taylor ones are so commonly used in computation. I remember reading about polynomial approximations of trigonometric functions for low-end hardware but maybe those were less general than the Padé approximation.
Thanks a lot! Ah yeah that's a really nice application for this sort of stuff (how can we pack the most powerful approximation into the smallest memory/compute). I don't know much about it but definitely cool! I remember a lecturer saying splines were important in this area but I'll be honest I can't remember the details!
@@DrWillWood Found the book that I read! It is "The Art of Designing Embedded Systems (2nd ed)" by Jack Ganssle. Chapter 4.4 has floating point approximations for common functions like exponent, log, and trigonometric.
The best and yet the simplest explanation for Padé approximation I have seen! We use it a lot in finite element simulation software in engineering, but I was always in search for a more intuitive explanation for its merits over default Taylor series! I am happy today.
Such a great video! I'm cramming for exams in my uni right now, and this was super useful and pleasant to listen to! Way more understandable than our professor's notes lol
I have friends who live in a house in rue Paul Padé near Paris. I'm going to ring them and explain what the video has done so well. I'm sure it will make their day.
Definitively interesting, but if I get it right: if you decide you need a higher order you cannot re-use the low order coefficients that you already have. That I'd consider a disadvantage.
Doesn't seem like much of a disadvantage: you calculate the approximant only a few times compared to how many times you use that approximation function.
@@beeble2003 Well, I'd assume that this is true for many---definitively not all---applications using approximations and I agree. Then it is "not much" of a disadvantage but it is one. ;) Cheers.
Well it's not like those new coefficients are impossible to find. You'll have a recurrence relation, at least, for the coefficients of the next higher order Padé. Otherwise you wouldn't even have a Pade' to begin with. See it as problematic as you'd like, Padé gives you information in proportion to the work you give the Padé. That's just a commonality of Asymptotic Analysis.@@nikolaimikuszeit3204
Superb introduction. I was browsing thru Hall's book on continued fractions and happened upon a section on Padé approximants, which peaked my curiosity and led me to this video. I can't wait to study these further. Thank you.
I really didn't like calculus in University but I find this very interesting. I can appreciate the beauty much more now that I'm not suffering through it
A Padé (rational function) approximation can sometimes be obtained without using the Taylor approximation and without using any knowledge of derivatives, if other properties of the function are known, such as: locations where f(x) is zero, value of f(x) at x = 0, if the function f(x) has an asymptote at x-> inf (or -ing), etc. As a famous and practical example, at 4:58 there is a somewhat better (2,2)-order rational approximation for sin(x) given by the Bhāskara I's sine approximation formula, which for radians is: sin(x) ~ 16x(pi - x)/[5pi^2 - 4x(pi - x)] This (2,2)-order rational approximation is constructed to have value zero at x = 0 and x = pi, as required for sin(x).
@4:49 You can do this Padé of sin(x) in online Wolfram Alpha: Just input: [2/2] pade of sin(x) In general you need to use correct syntax *[N/M] pade of f(x)* where f(x) is your target function I got [3/3] pade of sin(x) = (x - (7 x^3)/60)/(x^2/20 + 1)
Thanks for this video! I'd never heard of Pade approximants, but they're clearly worth the attention! I usually watch videos like this solely for entertainment, but I think this content might actually see practical application in my day job. I appreciate the simplicity of constructing the approximant using the generally already widely-known Taylor series as a starting point. It makes it easier for old dogs like myself to remember a new trick.
Remarkable that in my university Maths degree, I didn't meet Pade Approximants. You explain the topic very clearly. I believe the B_0 = 1 assumption is fine unless the denominator polynomial vanishes at x = 0, which would imply a vertical asymptote in the overall function.
I found this video a year ago, before I knew calculus, so I wasn't able to follow through the part about "Taylor series(s)". Having just finished calc 1 and 2, I'm so glad I found this video again! Very insightful, thank you for sharing.
I appreciate that you switched to stressing the correct syllable midway through the video. Not only is the man French, but there's even an explicit diacritic in the last syllable of his name to make stress extra clear.
I´ve met Padé-Approximation at university in 5th semester. The name of the course was - as you can guess - "Approximation". :D There are another very interesting methods as well. Nice video from you. :)
@@algorev8679 one are Chebyshev polynomials, another are Lagrange polynomials (they are used if you want to approximate minimizing the error around a large interval of the function and not just approximate around a point like Taylor series or Padé approximants). Check out the approximation theory article on Wikipedia
After seeing this video I remembered that I actually did learn about this in an advanced mathematics class during my Master's, although then with with much higher order terms. However, they never explained very well how usefull it actually is, for simple approximations as well, so I quickly forgot about this. Thank you for the refesher, very well explained, and I will definitely keep PAdé approximation in mind as a useful tool in the future! EDIT: I remember now that the assignment was to find a singularity in a polynomial of order 130. By using Mathematica and gradually decreasing the orders of N and M in the Padé polynomial allowed the finding of a function with the same symmetry as the original giant polynomial, but with much reduced terms. This derivative of the approximated polynomial could then be used to find the singularity, which did not change location during approximation. Just a cool example of a more complicated application of the Padé approximation method for those who are interested.
Great explanation. These are the kinds of topics you want to share with everyone, as a scientist, but want to keep quiet, as a company, in order to have an edge. Thank you much.
I frickin love Padé Approximants! I came up with this back when I needed to numerically evaluate exponential to high precision: Given x∈[−.5*ln2,.5*ln2] Let y=x² Let a=(x/2)(1+5y/156(1+3y/550(1+y/1512)))/(1+3y/26(1+5y/396(1+y/450))) Then e^x≈(1+a)/(1−a) Accuracy is ~75.9666 bits. 7 increments, 1 decrement 6 constant multiplications 6 general multiplications 2 general divisions 1 div by 2 (right shift or decrement an exponent) (Ugh, this looks way uglier typed out without formatting) *Note:* range reduction is relatively easy to get x in that range.
The trick was when I used a Padé Approximant for e^x, carried it out to infinity, I realized it was of the form (1+f(x))/(1-f(x)). (Turns out f(x) is a scaled hyperbolic tangent) So then I found a Padé Approximant for hyperbolic tangent, and got that beautiful monstrosity ^~^ So essentially a Padé Approximant of a Padé Approximant !
I always regretted not taking a splines class when i was a grad student because I didn't understand NURBS (Non-uniform rational B-splines) and how to compute their coefficients. In very few minutes, you made it apparent.
This is what I call intuitive and useful math. Diagnose a problem: Taylor series approximations always diverge quickly to positive or negative infinity but many important functions you want to approximate stay near or approach 0. Find a solution: put the dominant term in the denominator. I'll definitely keep this trick in my back pocket, thank you!
Physics often use power series expansion because its so easy to just cut off terms higher than some order we want for small values of x, saying that those are "negligible". I imagine picking a polynomial order "just high enough" would be tougher if its in a ratio like the Padé approximant
@@fatcatzero By construction, a Pade Approximant is at least as accurate as a Taylor series because we force it to match the coefficients of the Taylor series. It is just a lot more work to find them via a N+M linear system of equations. If the approximant reduces to something nice like sin, they can be very useful, but that is probably a rare case. In numerical analysis, you often trade simplicity for accuracy, which is what is happening here with Taylor vs Pade.
@@brandongroth4569 ah, I misinterpreted the original comment (mistook the point about "just high enough" to mean "bounding it to smaller than O(x^n) from the actual solution", not "do the easiest thing that will get me within O(x^n)").
@@fatcatzero oh yeah, my point was close to that. what i'm saying is because taylor series results in a *sum* , you can just terminate the sum at nth power to get high-enough accuracy, using polynomial of order n. Now how can you do something like that if the approximant is a ratio? do we terminate the powers at both numerator and denominator? by no means obvious to me
@@geoffrygifari3377 if we start with our n'th degree Taylor series and set N+M=n for the Padé Approximant, it seems like it's always going to be at least as good of an approximation as O(x^n) since it's equal-to-or-better than our O(x^n) Taylor approximation. I literally just learned about this concept by watching this video so I by no means know the specifics, but if it is true that the M+N approximant is always at least as good the associated n-degree Taylor series, yes it's by definition more work and it could be hard to determine how much better it is, but the upper bound on deviation from the actual function seems to be the Taylor series where we know very well how to determine the size of our error. Do you think any of that is incorrect and/or am I still missing something about your concern with applying this method?
A great explanation for Padé approximation. It is clear and simple. Allow me only to indicate a small error in Recap (at 6:27). In the last line the equation shows {0 = C_{2n} + B_1*C_{2n+1} + ...} and I think it should be written as {0 = C_{n+m} + B_1*C_{n+m+1} + ..} because the total number of equations is n+m; unless you were implying that n = m.
I once needed to look into a dataset that seemed to have an asymptotic line. I remembered of rational equations from high school algebra, did a regression analysis with that instead of polynomials, and it worked wonders. I never expected this to be also a thing for approximating existing functions and I am so happy to learn about it here.
My story with this is, I've fit rational functions to curves before -- but largely by the "sit and stare at it for a while" method. In particular, I found a quite good fit for the transfer curve of an electronic temperature sensor (NTC thermistor resistor divider), using a low-order rational. I did not know about the method equating it to a Taylor series (and, I think, never read the Wikipedia article in great detail) -- fascinating! However, the unfortunate reality is, division is expensive on computers. As I was working with a small one (embedded microcontroller), this would be unacceptable. I ended up using a 5th degree polynomial solution -- made practical by the CPU's fast multiply instruction, allowing me to do five multiply-add operations in about a third the time of the rational method. (Some machines don't even have MUL, let alone DIV, and arithmetic has to be built entirely from add and shift operations. In that case, the rational might perform better!) And that polynomial in turn, I actually arrived at using the set of Chebyshev polynomials: these have the property of smooth, orthogonal curves within a bounded (unit) region -- perfect for approximating a (mostly) smooth, bounded set of points. The curve-fitting optimizer didn't converge at all with the bare polynomial, while this orthogonal series converged quickly to good accuracy.
Im not sure if i understand correctly, but regarding the final comment about curve-fitting, fitting a polynomial can be solved by least squares so you can always get a solution.
@@MrAngryCucaracha I was just using Excel's Solver plugin, so, a general purpose, multidimensional optimizer. I didn't want to take the trouble to write something out. Ah, I looked it up, I remembered poorly it seems! The rational was 3:2 order, in canonical form; a divided-out form was used (1:2 + 1) only needing two multiplies and one divide, and addition is for free in comparison. It took about 420 CPU cycles, versus the flat polynomial's 310.
"Some machines don't even have MUL" - I'm looking at you, Enhanced Midrange PIC Microcontroller! Not only does this micro family lack MUL, the shift instruction only shifts by one bit. Want to multiply by 20? Shift, shift, store 4x, shift, shift, add 4x to 16x.
I'm missing the point. It's cool and all, but there are two buts: 1) yes, taylor series do not extrapolate well, but it's not the point of taylor series, they are specifically used to approximate the function in some small area near to some point. 2) [N/0] padé approximant is the same as taylor series, and then you have the other N versions for padé approximants - [N-1/1], [N-2/2], etc. It seems unfair to say that padé approximants work better than taylor series, since padé approximants are a direct extension of taylor series, plus you can cheat by freely choosing how to split N into [N-M, M].
Not to mention that the Taylor expansion, if it exists, it is guaranteed to be analytic in the complex plane since it's a polynomial. N/M] Padé approximations will introduce poles if M > 0.
I agree. I would like to see a better example than for sin(x) because if your goal is to extrapolate an oscillating function, you want it to continue to oscillate as that behavior is completely lost and you are left with a much more troubling outcome: expecting some accuracy but getting none. The Pade approximation implies that at a certain point sin is basically 0, which is not true at all. Maybe it works better for some functions but unless there is some result I'm ignorant of that gives some kind of mention on how much "longer" it is accurate and what kind of accuracy you are given. I'm sure there's a use for it or this approximation would be buried in numerical analysis textbooks and never reach the light of TH-cam.
I guess the point is that it is a meaningful way to extend the Taylor expansion for a better approximation with the same amount of parameters. Often the limiting behavior tells something about the behavior near a point. Choosing among [N - M/M] also depends on the limiting behavior. In the example given here, e^-x and sin x, the limiting behavior is the order of constant. So, it’s natural to choose N=2M, and I have a feeling that it gives the best local approximation among all combinations including the Taylor approximation. Edit: the asymptotic behavior of e^-x should be approaching zero beyond all finite polynomials. So, the best one should be [0/N].
Worse-case analysis: There are functions for which Padé approximation is no better than Taylor series approximation. Most functions: Padé approximants are better, especially for functions with singularities.
Also, Taylor series is more suitable for both taking derivatives and integrating because it's just a sum of easy functions. It's much harder to to this with Pade approximations
Pretty cool video! The general idea for the approximation reminds me a lot of the process one uses to expand a fraction into a p-adic sum, to get its p-adic representation. After years of doing math, this whole idea of actually using rational functions, a thing that we commonly ignore due to them being "ugly" is shinning a new light. Keep up the good work!
Very interesting explanation, I'm currently studying electrical engineering and I've just been slightly introduced to a Padé approximant during a Control Systems practice, but never learned the algebra behind it in previous calculus classes.
This was really interesting. A professor at my college did a lot of research with approximants known as “Chromatic Derivatives”, and these share a similar motivation.
Another great thing is that rational functions are often procedurally integratable. We can always factor the denominator into linears and quadratics (because of conjugate pairs) then we apply partial fractions. Terms of the form linear/quadratic can be dealt with using u-sub and arctan. If the quadratic is a perfect square then we only need u-sub. If there is an (ax+b)^2 - c form with positive c, we can further factor using difference of squares.
I have read about Padé schemes to discretize a partial differential operator in Computational fluid dynamics a while ago but thanks for making the advantage over Taylor series more visible. This could be of great use in computational engineering simulations to avoid divergence.
Very nice and very simple, makes perfect sense. I feel that these Padé approximants can greatly improve approximate series solutions to difference and differential equations in general.
You can use the same concept to apply to legendre or chebyshev polynomials. There's two interesting ways to it too. You can actually work from the legendre or chebyshev series like with a taylor series and the average/max error will still be much smaller than using a taylor series. Secondly, you can actually calculate regression over the whole series, and these will behave even slightly better, and you can apply this same concept to a pade approximant too, and it does work, but results can vary depending on the series. Probably the best way to do it, however, is by using a matrix transform of integrals, but that's a little complicated to explain, but it's basically equivalent to rational regression. For chebyshev polynomials, it's best if you can work from f(cos(pi*x)) instead of f(x) so you don't have to deal with the weight function. That way it reduces to the fourier transform over a finite interval. I could explain either algorithms in more detail or share some matlab code if anybody cares, but it's a bit of effort if nobody is gonna read/use this. Oh, and there's one more related concept that's pretty cool. You can calculate a spline from a multipoint taylor series and it will pass through the points like a lagrange polynomial, but it it will be much better behaved than a lagrange polynomial. You can have as many continuous derivatives as you like, so it will behave very nicely. The math behind this is kind of complicated. To get the taylor series to match across multiple points involves some tedious matrix transformations, but the result is still really cool.
@@DrWillWood No problem. Orthogonal polynomials (and approximation theory in general) are some of my favorite topics. I could talk your ear off for hours about them. I could write a book about all the cool properties of Jacobi polynomials, but I feel like I'm the only person who gets excited about that stuff, even in math nerd world. The math behind it is just so elegant.
I'm gonna guess Padé approximations win when a function does not diverge to infinity, but Taylor approximations win when it does (like e^x). Cool stuff!
You have such a nice style, in the communicating (both voice and visual) but also / maybe even more so, a nice "pace" of thinking and a swell difficulty level.
I remember there was a Fortran course where we had to compute all sorts of fits using matrix algebra. Padé was one of them (as was Fourier, etc..). I had all but forgotten about it, thanks for sharing!
have you "nearly forgotten about it" or "not forgotten about it"? I'm dying to know, as I never get what "all but" is supposed to mean (I am not a native speaker). Thank you!
Thank you! but a few comments: you start with the limitation of Taylor’s series but constructing the Pade’ s approximation requires it to yield coefficients. What’s about Fourier series?: the space of L^p integrable functions is much larger and Fejér kernel also guarantees the uniform convergence.
Actually, the Padé approximation is often used in control theory as a supplement to the Laplace transform, albeit the more complete version of the Fourier transform.
When watching this I asked myself, why can’t we just construct the first M+N terms of the Taylor series of x^M*f(x), then divide by x^M? Then I realized the extra 1 in the denominator helps us avoid vertical asymptotes by moving those asymptotes into the complex plane.
Saw this in complex analysis lessons. When a question asks for a gap between numbers and the tylor series gets infinite there, we had to use that so it goes to 0 instead. Didn't know it had a name and thought it was a part of the tylor series.
The taylor series is per definition only valid to values close to zero, so when pade modelling a function via this taylor series, together with inherent lesser net order of the pade function than O(x^n), makes it logical that the pade approximation stays closer in many cases, but it is actually shear luck what value one ends up with, when x is farther from zero.
I really like this video. Very well explained. The concept could obviously (so im sure oneone did and that it has fancy name) be generalized to the idea of also thinking about how we want the approximation to behave for lim x -> inf by modelling things as a sum of functions F(X) (fractions of polynomials here) lim x -> F(x) where F defines how you want the approximation to behave for extrapolation. That way we could also think about the derivatives of the approximations at infinity.
few notes: taylor series actually can roughly follow function. to test one on "quality" we use convergence tests. They are very much reason ppl hate taylor rows, as how easy calculation of a row so much harder is finding a good one that wont shoot into orbit. There is many ways to find row's "quality" but mostly people test for convergence itself and fluctuation of each individual step. Pade(being basicly two rows glued together) have more room for counterbalancing this very problem. And also Pade though more resource consuming to calculate not just gives more accurate result but also have quite interesting begining of a row and I belive in some occasions you can without a harm for accuracy optimise the calculations by influencing how its starts.
Great little video bringing a less well-known topic to a larger audience! I too was wondering why you get something for nothing by just using Padé approximants, but I think I have it figured out: 1. Taylor series aren't meant to approximate functions well far away from the centre of expansion, and so the behaviour near infinity should not have been the basis for comparison. If you just look at how good the two approximations at 5:00 are near 0, they seem to diverge from sin(x) around the same time. (In fact, the Taylor series actually does *better*: the leading term for the error of the Taylor series is x^5/120, while that for the Padé approximant is 7x^5/360.) 2. Furthermore, the behaviour of the Padé approximant at infinity is basically determined by the order [N/M]; the approximant you gave for sin(x) stays bounded because you chose N
I was just thinking the same thing - and surely if you wanted a closer approximation of the original function (in either case) you just add more terms? Obviously it becomes infeasable at some point (already after like 6 terms it's just annoying), but that's just the nature of approximations. Maybe it's just because I did a phyisics degree and not a mathematics one, but we're dealing with approximations here anyway - the slight difference close of the relevant point in the equation it probably not significant, and if it is, you just add another term.
It's really interesting that you choose the degree of the numerator and denominator, which sort of means you're deciding the asymptotic behavior. Then again, if a coefficient turns out to be 0 then the degree might not be what you wanted. On that note, you set this up as a 0th degree approximation but ended up with a -1st degree approximation. That's crazy to me. It's like the approximation knew that the function oscillates around 0 even though the first few terms of the Taylor series don't suggest this at all.
Hmm, isn't the reason why the Pade appoximation tends to not shoot to inifinite because you chose N < M here? That is, now you got the higher polynomial in the denominator, and that will eventually make it go to zero. I suspect choosing M > N will once again make it go to infinite, and M = N go towards some fixed limit value.
Achieving a better approximation with almost the same number of operation makes very appealing this Padé approximation! At the same time, the zeros of the polynomial in the denominator are scary. It would be horrible to have a discontinuity in unwanted places (e.g. discontinuity somewhere on x>0). Nevertheless, cool stuff!
that's another interesting behaviour: the zeros in the denominator often come up at wanted places. test it with ln(x+1) ... all the zeros are to the left of -1, the rightmost one converging to -1 with rising exponents. AND in contrast to the taylor series, the pade approximants also converge for large positive x! that's not only true for this example, pade approximants do not have the convergence radius problem
That's actually a strength, not a weakness. Because Padé approximations will also match the poles of the function. Whereas Taylor series stop working as soon as you hit the distance to the nearest pole.
@@matthiash6764 Right. Got confused with extrapolation methods in the study of phase transitions, which make use of _both_ conformal maps to shift the poles away, and Padé approximants. I'll fix my post rn. Edit: not sure of the name for the method in question. I think it's just called extrapolation
"One polynomial divided by another," is not a polynomial of any kind; it is called a "rational function." "Rational polynomial" makes it sound like it's a polynomial with rational coefficients. That said, this is a method I hadn't run across. It's a wonderful idea! And you've presented it beautifully and concisely. Great job! The "unreasonableness" of its effectiveness, reminded me of approximating a real number with a continued fraction, as opposed to using, say, a decimal approximation. I echo Grant Sanderson's (3B1B) thoughts on that. Fred
Evaluating the Taylor series costs three multiplications. Evaluating the Padé-approximation costs four multiplications and one division. For a fair comparison you should give the Taylor expansion an additional term.
I saw this and thought it said Padme approximations and was supremely dissapointed it had nothing to do with star wars but very interesting math conversation.
Amazing! It's great to see a topic that feels like I could've invented it by myself but I never thought to ask the right question. Also, do the first [m,n] coefficients in the [m+1,n+1] Padé approximant match the [m,n] approximant?
I think so! My thought would be that this is true the higher orders in the Pade come from the higher orders in the Taylor. I'm not certain in all cases though!
I had never known about Padé approximations, and you did such a good job motivating and explaining them. Also, the way you mentioned how it seems almost unreasonably effective, like getting something for nothing, definitely felt like it was speaking directly to the thought passing through my mind at that moment.
Thanks so much! Padé approximants still feel kind of magic to me and it's definitely that they aren't well known that made me want to make a video on them. Also, does this mean I have bragging rights that "I taught 3b1b some mathematics" :-D
I'm excited!
Qualitatively and intuitively, is it because when you truncate a Taylor series at some order O(n) you are doing precisely that but when you have a rational polynomial approximation O(m)/O(n) it really could be order greater than m or n if it is divided out? I.e. if you do 1/(1-x), the series is infinite, so you must be accruing all that accuracy by doing O(m)/O(n) because it is not really truncated at order O(m) or O(n) but is potentially an infinite series, even though you write it compactly as a rational fraction.
It is not quite for nothing. You are calculating twice as many interrelated coefficients (minus 1), and the overall function is more complicated (which makes computation for any given x harder and further dependent on precision). And that is AFTER having to calculate the Taylor expansion, so it is triple the amount of work, and is 'refining' an approximation which was already rather nice and effective in some respects. It still seems unreasonably effective to me, a bit, but it is not entirely for free. There probably are also some theoretical costs associated with going from polynomials (extremely nice, highly-constrained functions) to rational functions (which are just slightly less so).
@@curtiswfranks I don't think it's fair to say that you calculate twice as many coefficients, in general the number of coefficients would be roughly equal. In the Pade aproximation you have m coefficients in the numerator and n coefficients in the denominator, and you construct it from a m+n order taylor series, so they literally should be equal. In the example the taylor series only had fewer because some of the coefficients were 0, but you still have to calculate those coefficients anyways.
As far as the "something from nothing", I think what is meant is that you get this extra precision without taking any more information from the original function. I could literally give you a taylor series without telling you what it is supposed to be approximating, and you could build a better approximation even though you don't even know what it is you are approximating.
Padé approximations shine in analog filter design where you have poles and zeros. They are particularly effective in analog delay lines.
Beautiful
Nerd
We also use them in control theory, for the same reasons (a finite-order modeling of a time delay). It's becoming an old man trick though, now that most of what we're doing practically is in discrete time (were delays can be treated as a linear system with the good multiplicity!)
As soon as he mentioned the fractional polynomial, my mind went to laplace and filters. It's such a natural fit.
thanks i was looking for some comment mentioning its use case areas.
The algorithm recommended this video to me, I'm so thankful because this is beatiful and very useful.
@@chonchjohnch Which part?
In general, expressions of the form at 0:23 are interesting since they have a few nice properties:
1.) Much like how polynomials are capable of expressing any function composed from addition, subtraction, and multiplication (for a finite number of operations, at the very least), expressions of the form at 0:23 do the same thing, but for addition, subtraction, multiplication, *and division*. Another way of putting it is that they are capable of describing any function defined on a field. This might help towards explaining why Padé approximates can be so much more effective than Taylor series.
2.) Approximates of that form are used all the time in highly realistic graphics shaders. This is because they can be used to create fast approximates of functions whose real values could not be calculated in the time it takes to render a frame. Unlike polynomials, they can behave very well over their entire domain, and they avoid large exponents that could introduce floating point precision issues, both of which are important when you need to guarantee that a shader will not create graphical artifacts in a limited environment where all you have to work with is 32 bit floating point precision. They also avoid calls to advanced functions like sin() or exp(), which again makes their execution especially fast.
3.) You don't always need the derivatives of a function to find such an approximate. For instance, if you know that a function has an asymptote, or that it assumes a certain value at 0, or that it's symmetric, or that it tends towards a number at ±∞, then that automatically tells you something about the coefficients within the approximate. It then becomes much easier for you to run an optimization algorithm on a dataset to find good values for the remaining coefficients. Christophe Schlick gives an excellent example of this approach in "An Inexpensive BRDF Model for Physically-based Rendering" (1994).
4.) Multivariate versions of the approximate are a thing, too. To see how such a thing can be done, simply start from the proof for the statement in 1.) but now instead of working with real values and a variable "x" as elements, you'll also be working with another variable "y". As an example, for 3rd order bivariate approximates you'll wind up with polynomials in your numerators and denominators that have the form p₁xxx +p₂xxy + p₃xyy + p₄yyy + p₅xx + p₆xy + p₇yy + p₈x + p₉y + p₁₀
Wow, that's so awesome! All news to me as well! Thanks for taking the time to write this post I appreciate it, as will people watching this video I'm sure.
Thank you @carl. You almost made me shed a tear with your post.
@@DrWillWood is there any result about the error estimation with a certain Pade aproximante? Something like Lagrange's form for the remainder fo a Taylor polynomial?
This comment should be pinned - it adds so much extra information to an already excellent video!
I don’t see how (1) could be true. The Weierstrass function is defined on a commutative field (ℝ) but doesn’t have a Taylor approximation, and therefore (if I understand correctly) doesn’t have a Padé approximation. Maybe you meant that Padé approximations can express any functions which are derived from adding, subtracting, multiplying and dividing the identity f(x) = x?
I've never heard of this before, and after judging so many bad entries, this is a breath of fresh air
Thanks a lot! Its one of my favourite things about maths TH-cam, coming across concepts you wouldn't have otherwise! Also, good luck with SoME1 :-)
@@DrWillWood thank you for the good luck. Check out mine if you have time (it's a 5 part series, but for judging, only the first part is required
We used these in Control Theory to approximate time delays (exponentials) in the transfer function
This is an excellent explanation of something I didn't know existed. Yet it's so simple and elegant. I'm working on a Machine Learning playlist on linear regression and kernel methods and I wish I had seen this video earlier! I'll play around with Padé approximants for a while and see where this leads me.
Thank you for this interesting new perspective!
Hello there! It's been a year. I just watched the video, and now I wonder what you managed to do since then
Oh nice. e^-x is a very common function to Padé approximate in linear control theory because it's the Laplace transform of a uniform time delay. Notably, x in this context is a complex number, yet it still works. I've never understood how it was computed until now.
I think the aha moment is realizing we are discarding all higher order terms when we perform the equation balance. This is the key reason why the Padé approximation isn't just equal to the Taylor approximation.
The case you're talking about can be computed much more easily actually. Just write e^(-x) = e^(-x/2)/e^(x/2), and use Taylor expansion for e^(-x/2) and for e^(x/2) :)
Having spent a great deal of time reading up on Pade approximants and struggling to find easy to understand introductory examples it is extremely exciting to see content such as this being put out there for people to learn. Fantastic job motivating the need and demonstrating the utility for these rational approximations. In my personal explorations, I have found multipoint Pade approximations to be very cool, being able to capture asymptotic behaviors for both large and small x, or around poles / points of interest is very cool. Keep up the awesome work!
Ratios of polynomials are used in aircraft flight controls. Normally, flight test attempts to measure aircraft handling qualities in a ratio of two second order polynomials, even though modern digital flight controls may be much higher order functions.
For a detailed description you got to also see the video series on youtube of lectures by Carl Bender, "Mathematical Physics"
Finally! I encountered these so often in physics papers. Finally I get it!
What physics papers? I was just wondering how this could be applied in physics. Can you give some reference please? Thanks!
In which fields did you find those pate approximations in use?
It seems natural to choose M = N. What are the situations where there is an advantage in choosing M > N or N > M, where I have a "budget" of M+N coefficients that I want to work with?
Having M=N means you can choose a finite non zero limit to the padé approximant according to the function you want to capture the behavior. In this case the denominator had a larger order and therefore made the fraction goes to 0 at inf. If it was the other way around (N>M) it would blow up and you would lose one of the motivation to use padé, but having polynomials with roots at the denominator allows to better describe poles located at finite x I would say, where Taylor expansion fail to even exist
Edit : it seems this reply was liked a few times so I would like to add that going to inf is not an actual loss of motivation. It is great to have an expression that also captures the asymptotic behavior even when there is no finite limit, whereas Taylor blows up without following the divergence of the function
M=0 is just an ordinary Taylor series, so that's at least 1 use for N>M. From that perspective, it can also be seen that the Taylor series is a special case for pade approx.
Your function will behave like x^(N-M) asymptotically.
For example, with a function like sine that tends to stay constant, you might want M=N. With a function e^x that tends to zero, you might want M>N. And so on.
The Padé Approximant might be closer to the actual function in the long run, but it actually has a larger relative error compared to the Taylor series around x=0. Since we only care about approximating sin(x) from x=0 to x=pi/4, because we can then use reflection and other properties to get the value for other angles, the benefits are overcome by the disadvantages (i.e. you have to do more arithmetic operations, including a division).
That's interesting. Are there any approximation that is locally better than the Taylor series, around some specific point?
@@ere4t4t4rrrrr4 I know there are algorithms that can compute the value of a function at a given point with (arbitrarily) better precision, but i don't know about any other closed algebraic formula which locally approximates a given function better than the Taylor Series.
Is that true in general for periodic functions? What about non-periodic functions?
@@ichigonixsun that's interesting could you please share the name of those algorithm i want to test
@@Solution4uTx CORDIC, for example. There are many others with different use cases.
Mech Eng grad student here. This is my "did you know?!" flex for the next couple weeks. Amazing video, thanks!!
Let's be realistic though...you and I both know we're still gonna regard *sin(x)≈x* _for small x_ lol
Basically you are using M+N terms in the taylor series but using only M and N terms for evaluation. This is computationally very efficient.
@@isaackay5887 MechEs that work in Rotordynamics: *small* x? what you mean!?
🤣
If you want to be a summation god then check out the book by Carl Bender. Elite 💪
I really like how fast you managed to explain it! Only few math videos get a topic like this explained in under 7 minutes
Yeah, if you skip the mostly unnecessarily long proofs and theorems, and use a more accesible language, most 1st-2nd year college mathematics cousld be explained this way.
Concepts like eigenvalues and eigenvectors, Taylor polynomials and series, or Lagrange multipliers could easily be taught in 10-20 minutes (of course, if you already know matrices, determinants, derivatives and some multivariable calculus) but easily take up entire lectures, because of the excessive attention on proofs or "preliminary definitions" that are not necessary to understand the concepts in the first place. (only to rigorously define them).
The sad reality is that most students get lost or mentally exhausted on the theoretical chip-chat and they end up NOT learning the methods or even understanding what they're doing.
A hidden gem of a channel! Never really considered other approximations because the Taylor ones are so commonly used in computation. I remember reading about polynomial approximations of trigonometric functions for low-end hardware but maybe those were less general than the Padé approximation.
Thanks a lot! Ah yeah that's a really nice application for this sort of stuff (how can we pack the most powerful approximation into the smallest memory/compute). I don't know much about it but definitely cool! I remember a lecturer saying splines were important in this area but I'll be honest I can't remember the details!
@@DrWillWood Found the book that I read! It is "The Art of Designing Embedded Systems (2nd ed)" by Jack Ganssle. Chapter 4.4 has floating point approximations for common functions like exponent, log, and trigonometric.
@@DeathStocker Awesome!! thanks for that :-)
The best and yet the simplest explanation for Padé approximation I have seen! We use it a lot in finite element simulation software in engineering, but I was always in search for a more intuitive explanation for its merits over default Taylor series! I am happy today.
Such a great video! I'm cramming for exams in my uni right now, and this was super useful and pleasant to listen to! Way more understandable than our professor's notes lol
I have friends who live in a house in rue Paul Padé near Paris. I'm going to ring them and explain what the video has done so well.
I'm sure it will make their day.
Definitively interesting, but if I get it right: if you decide you need a higher order you cannot re-use the low order coefficients that you already have. That I'd consider a disadvantage.
Doesn't seem like much of a disadvantage: you calculate the approximant only a few times compared to how many times you use that approximation function.
@@beeble2003 Well, I'd assume that this is true for many---definitively not all---applications using approximations and I agree. Then it is "not much" of a disadvantage but it is one. ;) Cheers.
Well it's not like those new coefficients are impossible to find. You'll have a recurrence relation, at least, for the coefficients of the next higher order Padé. Otherwise you wouldn't even have a Pade' to begin with. See it as problematic as you'd like, Padé gives you information in proportion to the work you give the Padé. That's just a commonality of Asymptotic Analysis.@@nikolaimikuszeit3204
Superb introduction. I was browsing thru Hall's book on continued fractions and happened upon a section on Padé approximants, which peaked my curiosity and led me to this video. I can't wait to study these further. Thank you.
I really didn't like calculus in University but I find this very interesting. I can appreciate the beauty much more now that I'm not suffering through it
Suffering indeed 😂🎉
A Padé (rational function) approximation can sometimes be obtained without using the Taylor approximation and without using any knowledge of derivatives, if other properties of the function are known, such as: locations where f(x) is zero, value of f(x) at x = 0, if the function f(x) has an asymptote at x-> inf (or -ing), etc.
As a famous and practical example, at 4:58 there is a somewhat better (2,2)-order rational approximation for sin(x) given by the Bhāskara I's sine approximation formula, which for radians is:
sin(x) ~ 16x(pi - x)/[5pi^2 - 4x(pi - x)]
This (2,2)-order rational approximation is constructed to have value zero at x = 0 and x = pi, as required for sin(x).
@4:49 You can do this Padé of sin(x) in online Wolfram Alpha:
Just input: [2/2] pade of sin(x)
In general you need to use correct syntax *[N/M] pade of f(x)* where f(x) is your target function
I got [3/3] pade of sin(x) = (x - (7 x^3)/60)/(x^2/20 + 1)
Thanks for this video! I'd never heard of Pade approximants, but they're clearly worth the attention! I usually watch videos like this solely for entertainment, but I think this content might actually see practical application in my day job. I appreciate the simplicity of constructing the approximant using the generally already widely-known Taylor series as a starting point. It makes it easier for old dogs like myself to remember a new trick.
They are widely used whenever you need to evaluate sin(x) or log upto a fairly high accuracy
Remarkable that in my university Maths degree, I didn't meet Pade Approximants. You explain the topic very clearly. I believe the B_0 = 1 assumption is fine unless the denominator polynomial vanishes at x = 0, which would imply a vertical asymptote in the overall function.
I found this video a year ago, before I knew calculus, so I wasn't able to follow through the part about "Taylor series(s)".
Having just finished calc 1 and 2, I'm so glad I found this video again! Very insightful, thank you for sharing.
I appreciate that you switched to stressing the correct syllable midway through the video.
Not only is the man French, but there's even an explicit diacritic in the last syllable of his name to make stress extra clear.
I´ve met Padé-Approximation at university in 5th semester. The name of the course was - as you can guess - "Approximation". :D There are another very interesting methods as well.
Nice video from you. :)
What other interesting methods did you see?
We know Spanish peple have no contribution to Math or science in General, is this the first one?
@@algorev8679 one are Chebyshev polynomials, another are Lagrange polynomials (they are used if you want to approximate minimizing the error around a large interval of the function and not just approximate around a point like Taylor series or Padé approximants). Check out the approximation theory article on Wikipedia
This stuff is like the authors you never heard of, but were notable for their time :-)
I'm stunned I've never heard of this before, given how important Taylor series approximations are.
After seeing this video I remembered that I actually did learn about this in an advanced mathematics class during my Master's, although then with with much higher order terms. However, they never explained very well how usefull it actually is, for simple approximations as well, so I quickly forgot about this. Thank you for the refesher, very well explained, and I will definitely keep PAdé approximation in mind as a useful tool in the future!
EDIT: I remember now that the assignment was to find a singularity in a polynomial of order 130. By using Mathematica and gradually decreasing the orders of N and M in the Padé polynomial allowed the finding of a function with the same symmetry as the original giant polynomial, but with much reduced terms. This derivative of the approximated polynomial could then be used to find the singularity, which did not change location during approximation. Just a cool example of a more complicated application of the Padé approximation method for those who are interested.
Great explanation. These are the kinds of topics you want to share with everyone, as a scientist, but want to keep quiet, as a company, in order to have an edge. Thank you much.
I frickin love Padé Approximants!
I came up with this back when I needed to numerically evaluate exponential to high precision:
Given x∈[−.5*ln2,.5*ln2]
Let y=x²
Let a=(x/2)(1+5y/156(1+3y/550(1+y/1512)))/(1+3y/26(1+5y/396(1+y/450)))
Then e^x≈(1+a)/(1−a)
Accuracy is ~75.9666 bits.
7 increments, 1 decrement
6 constant multiplications
6 general multiplications
2 general divisions
1 div by 2 (right shift or decrement an exponent)
(Ugh, this looks way uglier typed out without formatting)
*Note:* range reduction is relatively easy to get x in that range.
The trick was when I used a Padé Approximant for e^x, carried it out to infinity, I realized it was of the form (1+f(x))/(1-f(x)).
(Turns out f(x) is a scaled hyperbolic tangent)
So then I found a Padé Approximant for hyperbolic tangent, and got that beautiful monstrosity ^~^
So essentially a Padé Approximant of a Padé Approximant !
I always regretted not taking a splines class when i was a grad student because I didn't understand NURBS (Non-uniform rational B-splines) and how to compute their coefficients. In very few minutes, you made it apparent.
This is what I call intuitive and useful math.
Diagnose a problem: Taylor series approximations always diverge quickly to positive or negative infinity but many important functions you want to approximate stay near or approach 0.
Find a solution: put the dominant term in the denominator.
I'll definitely keep this trick in my back pocket, thank you!
As a calc 2 TA, I absolutely loved this
This video is way better than the book I read which skipped a lot of the information relating to the Pade approximation. Thank you! This is brilliant!
Thanks so much!
Physics often use power series expansion because its so easy to just cut off terms higher than some order we want for small values of x, saying that those are "negligible". I imagine picking a polynomial order "just high enough" would be tougher if its in a ratio like the Padé approximant
Is the Padé Approximant ever less accurate than the same-order Taylor series?
@@fatcatzero By construction, a Pade Approximant is at least as accurate as a Taylor series because we force it to match the coefficients of the Taylor series. It is just a lot more work to find them via a N+M linear system of equations. If the approximant reduces to something nice like sin, they can be very useful, but that is probably a rare case. In numerical analysis, you often trade simplicity for accuracy, which is what is happening here with Taylor vs Pade.
@@brandongroth4569 ah, I misinterpreted the original comment (mistook the point about "just high enough" to mean "bounding it to smaller than O(x^n) from the actual solution", not "do the easiest thing that will get me within O(x^n)").
@@fatcatzero oh yeah, my point was close to that. what i'm saying is because taylor series results in a *sum* , you can just terminate the sum at nth power to get high-enough accuracy, using polynomial of order n. Now how can you do something like that if the approximant is a ratio? do we terminate the powers at both numerator and denominator? by no means obvious to me
@@geoffrygifari3377 if we start with our n'th degree Taylor series and set N+M=n for the Padé Approximant, it seems like it's always going to be at least as good of an approximation as O(x^n) since it's equal-to-or-better than our O(x^n) Taylor approximation.
I literally just learned about this concept by watching this video so I by no means know the specifics, but if it is true that the M+N approximant is always at least as good the associated n-degree Taylor series, yes it's by definition more work and it could be hard to determine how much better it is, but the upper bound on deviation from the actual function seems to be the Taylor series where we know very well how to determine the size of our error.
Do you think any of that is incorrect and/or am I still missing something about your concern with applying this method?
A great explanation for Padé approximation. It is clear and simple.
Allow me only to indicate a small error in Recap (at 6:27). In the last line the equation shows {0 = C_{2n} + B_1*C_{2n+1} + ...} and I think it should be written as {0 = C_{n+m} + B_1*C_{n+m+1} + ..} because the total number of equations is n+m; unless you were implying that n = m.
I once needed to look into a dataset that seemed to have an asymptotic line. I remembered of rational equations from high school algebra, did a regression analysis with that instead of polynomials, and it worked wonders. I never expected this to be also a thing for approximating existing functions and I am so happy to learn about it here.
My story with this is, I've fit rational functions to curves before -- but largely by the "sit and stare at it for a while" method. In particular, I found a quite good fit for the transfer curve of an electronic temperature sensor (NTC thermistor resistor divider), using a low-order rational. I did not know about the method equating it to a Taylor series (and, I think, never read the Wikipedia article in great detail) -- fascinating!
However, the unfortunate reality is, division is expensive on computers. As I was working with a small one (embedded microcontroller), this would be unacceptable. I ended up using a 5th degree polynomial solution -- made practical by the CPU's fast multiply instruction, allowing me to do five multiply-add operations in about a third the time of the rational method. (Some machines don't even have MUL, let alone DIV, and arithmetic has to be built entirely from add and shift operations. In that case, the rational might perform better!)
And that polynomial in turn, I actually arrived at using the set of Chebyshev polynomials: these have the property of smooth, orthogonal curves within a bounded (unit) region -- perfect for approximating a (mostly) smooth, bounded set of points. The curve-fitting optimizer didn't converge at all with the bare polynomial, while this orthogonal series converged quickly to good accuracy.
Im not sure if i understand correctly, but regarding the final comment about curve-fitting, fitting a polynomial can be solved by least squares so you can always get a solution.
@@MrAngryCucaracha I was just using Excel's Solver plugin, so, a general purpose, multidimensional optimizer. I didn't want to take the trouble to write something out.
Ah, I looked it up, I remembered poorly it seems! The rational was 3:2 order, in canonical form; a divided-out form was used (1:2 + 1) only needing two multiplies and one divide, and addition is for free in comparison. It took about 420 CPU cycles, versus the flat polynomial's 310.
"Some machines don't even have MUL" - I'm looking at you, Enhanced Midrange PIC Microcontroller! Not only does this micro family lack MUL, the shift instruction only shifts by one bit. Want to multiply by 20? Shift, shift, store 4x, shift, shift, add 4x to 16x.
The first time I saw Padé approximants was in a paper proving the transcendence of e. Thanks for the useful discussion!
I'm missing the point. It's cool and all, but there are two buts:
1) yes, taylor series do not extrapolate well, but it's not the point of taylor series, they are specifically used to approximate the function in some small area near to some point.
2) [N/0] padé approximant is the same as taylor series, and then you have the other N versions for padé approximants - [N-1/1], [N-2/2], etc.
It seems unfair to say that padé approximants work better than taylor series, since padé approximants are a direct extension of taylor series, plus you can cheat by freely choosing how to split N into [N-M, M].
Not to mention that the Taylor expansion, if it exists, it is guaranteed to be analytic in the complex plane since it's a polynomial.
N/M] Padé approximations will introduce poles if M > 0.
I agree. I would like to see a better example than for sin(x) because if your goal is to extrapolate an oscillating function, you want it to continue to oscillate as that behavior is completely lost and you are left with a much more troubling outcome: expecting some accuracy but getting none.
The Pade approximation implies that at a certain point sin is basically 0, which is not true at all.
Maybe it works better for some functions but unless there is some result I'm ignorant of that gives some kind of mention on how much "longer" it is accurate and what kind of accuracy you are given.
I'm sure there's a use for it or this approximation would be buried in numerical analysis textbooks and never reach the light of TH-cam.
I guess the point is that it is a meaningful way to extend the Taylor expansion for a better approximation with the same amount of parameters. Often the limiting behavior tells something about the behavior near a point. Choosing among [N - M/M] also depends on the limiting behavior. In the example given here, e^-x and sin x, the limiting behavior is the order of constant. So, it’s natural to choose N=2M, and I have a feeling that it gives the best local approximation among all combinations including the Taylor approximation.
Edit: the asymptotic behavior of e^-x should be approaching zero beyond all finite polynomials. So, the best one should be [0/N].
Worse-case analysis: There are functions for which Padé approximation is no better than Taylor series approximation.
Most functions: Padé approximants are better, especially for functions with singularities.
Also, Taylor series is more suitable for both taking derivatives and integrating because it's just a sum of easy functions. It's much harder to to this with Pade approximations
Excellent. Hadn't known about that method. I like (typically) that the estimate tends to zero in the long term (no mention of NM Effects)
Forget about Padé approximants, I'm sold to this guys accent.
Pretty cool video! The general idea for the approximation reminds me a lot of the process one uses to expand a fraction into a p-adic sum, to get its p-adic representation. After years of doing math, this whole idea of actually using rational functions, a thing that we commonly ignore due to them being "ugly" is shinning a new light. Keep up the good work!
This is a great video. Well made, simple, clearly explained, genuinely interesting. Awesome.
Very interesting explanation, I'm currently studying electrical engineering and I've just been slightly introduced to a Padé approximant during a Control Systems practice, but never learned the algebra behind it in previous calculus classes.
If I understand Taylor and Padé approximations, then the essence of why Padé follows the curve for a longer time is essentially given at 2:37
This was really interesting. A professor at my college did a lot of research with approximants known as “Chromatic Derivatives”, and these share a similar motivation.
Very cool. I'm somewhat surprised I was never introduced to this in numerical analysis
this is a really neat method of approximation explained very clearly and practically! nice vid
Immediately subbed. Definitely a great content, keep up the good work
you can enable playlist saving. probably was problem on my end then
Another great thing is that rational functions are often procedurally integratable. We can always factor the denominator into linears and quadratics (because of conjugate pairs) then we apply partial fractions. Terms of the form linear/quadratic can be dealt with using u-sub and arctan. If the quadratic is a perfect square then we only need u-sub. If there is an (ax+b)^2 - c form with positive c, we can further factor using difference of squares.
I have read about Padé schemes to discretize a partial differential operator in Computational fluid dynamics a while ago but thanks for making the advantage over Taylor series more visible. This could be of great use in computational engineering simulations to avoid divergence.
Very nice and very simple, makes perfect sense. I feel that these Padé approximants can greatly improve approximate series solutions to difference and differential equations in general.
Rewatching because the thing has a lot of potential and i might apply it in my projects.
youtube algorithm brought me here. strong work, keep going!
Awesome! thanks a lot!
This is absolutely brilliant.
This was fun to watch! Definitely good points on the typical Taylor series divergence too. How cool!
I saw some clips on this context recently. They use AI to build a best-fit function and it is much more exact than ever before fitting formulas.
Thanks, never heard of this approximation before.
You can use the same concept to apply to legendre or chebyshev polynomials. There's two interesting ways to it too.
You can actually work from the legendre or chebyshev series like with a taylor series and the average/max error will still be much smaller than using a taylor series. Secondly, you can actually calculate regression over the whole series, and these will behave even slightly better, and you can apply this same concept to a pade approximant too, and it does work, but results can vary depending on the series.
Probably the best way to do it, however, is by using a matrix transform of integrals, but that's a little complicated to explain, but it's basically equivalent to rational regression. For chebyshev polynomials, it's best if you can work from f(cos(pi*x)) instead of f(x) so you don't have to deal with the weight function. That way it reduces to the fourier transform over a finite interval.
I could explain either algorithms in more detail or share some matlab code if anybody cares, but it's a bit of effort if nobody is gonna read/use this.
Oh, and there's one more related concept that's pretty cool. You can calculate a spline from a multipoint taylor series and it will pass through the points like a lagrange polynomial, but it it will be much better behaved than a lagrange polynomial. You can have as many continuous derivatives as you like, so it will behave very nicely. The math behind this is kind of complicated. To get the taylor series to match across multiple points involves some tedious matrix transformations, but the result is still really cool.
Awesome! thanks a lot :-)
@@DrWillWood No problem. Orthogonal polynomials (and approximation theory in general) are some of my favorite topics. I could talk your ear off for hours about them. I could write a book about all the cool properties of Jacobi polynomials, but I feel like I'm the only person who gets excited about that stuff, even in math nerd world. The math behind it is just so elegant.
@@cbbuntz You are most definitely not the only one! Thank you for posting.
@@dlevi67 yay! I don't have anyone to talk to about this stuff so it's good to find another nerd
Never heard of Pade Approx. Thank you.
I'm gonna guess Padé approximations win when a function does not diverge to infinity, but Taylor approximations win when it does (like e^x). Cool stuff!
You have such a nice style, in the communicating (both voice and visual) but also / maybe even more so, a nice "pace" of thinking and a swell difficulty level.
Oh thank you for the explanation, I will try to apply the padé approximants
Thank you very much! All the explanations I found on the Internet were quite difficult for me to understand. You've done a really cool work!
I remember there was a Fortran course where we had to compute all sorts of fits using matrix algebra. Padé was one of them (as was Fourier, etc..).
I had all but forgotten about it, thanks for sharing!
have you "nearly forgotten about it" or "not forgotten about it"? I'm dying to know, as I never get what "all but" is supposed to mean (I am not a native speaker). Thank you!
Very cool channel, thanks for making these videos!
Thank you! but a few comments: you start with the limitation of Taylor’s series but constructing the Pade’ s approximation requires it to yield coefficients.
What’s about Fourier series?: the space of L^p integrable functions is much larger and Fejér kernel also guarantees the uniform convergence.
Actually, the Padé approximation is often used in control theory as a supplement to the Laplace transform, albeit the more complete version of the Fourier transform.
Excellent Presentation. Concise, clear, and thoroughly enjoyable.
Thanks a lot!
When watching this I asked myself, why can’t we just construct the first M+N terms of the Taylor series of x^M*f(x), then divide by x^M?
Then I realized the extra 1 in the denominator helps us avoid vertical asymptotes by moving those asymptotes into the complex plane.
Saw this in complex analysis lessons. When a question asks for a gap between numbers and the tylor series gets infinite there, we had to use that so it goes to 0 instead. Didn't know it had a name and thought it was a part of the tylor series.
The taylor series is per definition only valid to values close to zero, so when pade modelling a function via this taylor series, together with inherent lesser net order of the pade function than O(x^n), makes it logical that the pade approximation stays closer in many cases, but it is actually shear luck what value one ends up with, when x is farther from zero.
only valid to values close to x*
@@Doomeiner I think he got maclaurin series confused?
I'm not sure tho I have only really learned these approximation recently
I really like this video. Very well explained.
The concept could obviously (so im sure oneone did and that it has fancy name) be generalized to the idea of also thinking about how we want the approximation to behave for lim x -> inf by modelling things as a sum of functions F(X) (fractions of polynomials here) lim x -> F(x) where F defines how you want the approximation to behave for extrapolation. That way we could also think about the derivatives of the approximations at infinity.
Incredibly useful for curve fitting with functions that you know should tend to zero as x-> inf
few notes: taylor series actually can roughly follow function. to test one on "quality" we use convergence tests. They are very much reason ppl hate taylor rows, as how easy calculation of a row so much harder is finding a good one that wont shoot into orbit. There is many ways to find row's "quality" but mostly people test for convergence itself and fluctuation of each individual step. Pade(being basicly two rows glued together) have more room for counterbalancing this very problem.
And also Pade though more resource consuming to calculate not just gives more accurate result but also have quite interesting begining of a row and I belive in some occasions you can without a harm for accuracy optimise the calculations by influencing how its starts.
Very interesting. I heard a few times Pade approximaton, but i did not know why we use that.
Great little video bringing a less well-known topic to a larger audience!
I too was wondering why you get something for nothing by just using Padé approximants, but I think I have it figured out:
1. Taylor series aren't meant to approximate functions well far away from the centre of expansion, and so the behaviour near infinity should not have been the basis for comparison. If you just look at how good the two approximations at 5:00 are near 0, they seem to diverge from sin(x) around the same time. (In fact, the Taylor series actually does *better*: the leading term for the error of the Taylor series is x^5/120, while that for the Padé approximant is 7x^5/360.)
2. Furthermore, the behaviour of the Padé approximant at infinity is basically determined by the order [N/M]; the approximant you gave for sin(x) stays bounded because you chose N
I was just thinking the same thing - and surely if you wanted a closer approximation of the original function (in either case) you just add more terms? Obviously it becomes infeasable at some point (already after like 6 terms it's just annoying), but that's just the nature of approximations.
Maybe it's just because I did a phyisics degree and not a mathematics one, but we're dealing with approximations here anyway - the slight difference close of the relevant point in the equation it probably not significant, and if it is, you just add another term.
It's really interesting that you choose the degree of the numerator and denominator, which sort of means you're deciding the asymptotic behavior. Then again, if a coefficient turns out to be 0 then the degree might not be what you wanted.
On that note, you set this up as a 0th degree approximation but ended up with a -1st degree approximation. That's crazy to me. It's like the approximation knew that the function oscillates around 0 even though the first few terms of the Taylor series don't suggest this at all.
"Then again, if a coefficient turns out to be 0 then the degree might not be what you wanted."
Same is true for Taylor series, of course.
I didnt know and i liked, so simple and useful.thanks
Very well explained. Thank you!
I knew already the Pade approximants (see e.g. Bender-Orszag book) but I never understood why they give an advantage. Thanks for explaining.
Ironically, this is *exactly* what I needed for an extrapolation project I'm doing
Hmm, isn't the reason why the Pade appoximation tends to not shoot to inifinite because you chose N < M here? That is, now you got the higher polynomial in the denominator, and that will eventually make it go to zero. I suspect choosing M > N will once again make it go to infinite, and M = N go towards some fixed limit value.
Achieving a better approximation with almost the same number of operation makes very appealing this Padé approximation! At the same time, the zeros of the polynomial in the denominator are scary. It would be horrible to have a discontinuity in unwanted places (e.g. discontinuity somewhere on x>0). Nevertheless, cool stuff!
that's another interesting behaviour: the zeros in the denominator often come up at wanted places. test it with ln(x+1) ... all the zeros are to the left of -1, the rightmost one converging to -1 with rising exponents.
AND in contrast to the taylor series, the pade approximants also converge for large positive x! that's not only true for this example, pade approximants do not have the convergence radius problem
That's actually a strength, not a weakness. Because Padé approximations will also match the poles of the function. Whereas Taylor series stop working as soon as you hit the distance to the nearest pole.
@@Ricocossa1 Could you elaborate on your last comment concerning the conformal map interpretation? :)
@@matthiash6764 Right. Got confused with extrapolation methods in the study of phase transitions, which make use of _both_ conformal maps to shift the poles away, and Padé approximants. I'll fix my post rn.
Edit: not sure of the name for the method in question. I think it's just called extrapolation
This works very nicely with complex poles. If you can, move to the poles to a complex pair.
The poles are very important in electronic filter design.😊
"One polynomial divided by another," is not a polynomial of any kind; it is called a "rational function."
"Rational polynomial" makes it sound like it's a polynomial with rational coefficients.
That said, this is a method I hadn't run across. It's a wonderful idea! And you've presented it beautifully and concisely.
Great job!
The "unreasonableness" of its effectiveness, reminded me of approximating a real number with a continued fraction, as opposed to using, say, a decimal approximation.
I echo Grant Sanderson's (3B1B) thoughts on that.
Fred
here from the will wood subreddit 😭😭😭 wth
This is incredible. I never knew about this, thank you!
This is fantastic, genuinely might use these at work 👍🏻
Evaluating the Taylor series costs three multiplications. Evaluating the Padé-approximation costs four multiplications and one division. For a fair comparison you should give the Taylor expansion an additional term.
Maybe, but the video is less oriented toward algorithmic complexity and more toward the functions just being similar.
Short, clear, and instructive. Congratulations for the great work 👍🏾👍🏾
Thank you very much!
Very cool. This looks super useful in implementing DSP and FPGA based algorithms (my wheelhouse).
Yeah it very much reminds me of FIR versus IIRs
This is so cool??? How is this the first time I'm hearing of this after a decade of knowing about Taylor Series?????
I saw this and thought it said Padme approximations and was supremely dissapointed it had nothing to do with star wars but very interesting math conversation.
I don't like Tailor approximations, they're coarse and too rough and don't get anywhere near!
Padè: *blushes*
Never heard of it, may enhance the results i am currently having with Taylor series alone. Tankyou.
Very interesting and very well explained ! Thank you
Amazing! It's great to see a topic that feels like I could've invented it by myself but I never thought to ask the right question. Also, do the first [m,n] coefficients in the [m+1,n+1] Padé approximant match the [m,n] approximant?
I think so! My thought would be that this is true the higher orders in the Pade come from the higher orders in the Taylor. I'm not certain in all cases though!
Look up Pade tables to see how the coefficients compare
My mind 🧠 🤯