Very important topic. With gradient descent, when the step a choose is too small, the calculations didn't converge and I didn't understand why. Thanks Steve.
As slight remark as I see it. The roundoff error error of exactly 10^(-16) would be true for "fixed" point numbers, yet most computers compute with "floating" point numbers, and here it would be more correct to say a precision of "at most 16 digits", because with floating point numbers you can easily store say 1.2345678 * 10^(-100), and here the precision for 1.2345678 is limited by 16 digits, while the number itself can be very low (in my example of order of -100). An interesting part is why it doesn't solve the problems stated in this video with truncation errors - because when performing calculations between floating point numbers that are different, the order of the smaller number will be scaled up to the larger order, but at the same time a number 1.2345678 would turn to e.g. 0.00000000000000000012345678, and after truncation to 16 digits only a 0 will be left, thus e.g. 10^-50 + 10^-66 == 10^-50. Now even if we reduce t -> 0, the values of the function f remain in the same order, and at some point dt will be just 0, and lead to undefined derivatives
Float is 4 bytes = 32b, double is 8 bytes = 64b; but most software now uses extended precision 80b. You quoted the limitation of this 80b extended precision.
The IEEE 754 paper describes all of the binary representations. A recent 2008 paper is valid for basically all existing hardware (controller, CPU, GPU ...) the newer 2019 paper covers mixed precision as well. This will be used in hardware accelerators like the IBM-AIU for PC/server or DSP's on ARM/RISC.
This is another enlightening session! Not sure if the conclusion of 1e-5 as the best trade off time step for higher order derivative calculation approach. We have been using 1e-8 for some reason, maybe that is an illusion?
Cleve Moler "Complex step function" approach seems useful, but has the disadvantage that it requires that the function being evaluated can take a complex value as its argument, and can produce a complex value result.
Mathematics are divided into two parts: The computational and the theoretical world. And then we can find an analogy with Aristote and his sublunar and supralunar world. The first concerning all that is situated under the orbit of the Moon (the Earth and its atmosphere), is a symbol of uncertainty, continually altered and unstable (as computation). The second, on the other hand, is immutable, perfect, stable and eternal (as a beautiful mathematical theory).
Very thanks... Homework: In time 5:46: you can take some numbers, such as sqrt(2) and (sqrt(2)+(10^-16)/2) (in Matlab) or sqrt(2) and (sqrt(2)+(10**-16)/2) (in python), type them in the command window, and convenience yourself that these numbers have the same decimal representation. P.S.: 1- In Matlab, you can use the "format long" command to display the long format of a decimal number.
M(Δt)=sup{d³f/dt³ (c)|c∈(t-Δt,t+Δt)} is not a constant but actually a function of Δt, so the graph of E(Δt)=Δt²M(Δt)/6 is not a parabola, even if it is something similar. It is true that M(Δt) is a monotonically non-decreasing function (for bigger Δt, the interval (t-Δt,t+Δt) is bigger and hence M(Δt) can only get bigger). Either d³f/dt³ is limited in which case M(Δt)->sup{d³f/dt³(x)|x∈R} or it is unlimited, in which case M(Δt)->+8. Hence for Δt->0, in both cases (wheter M(Δt) is limited or not) we have E(Δt)=Δt²M(Δt)/6 -> +8. It is also true that if d³f/dt³ is continuous then for Δt->0 we have M-> d³f/dt³ (t) and hence E(Δt)=Δt²M(Δt)/6 ->0. But the derivative of E with respect to Δt, dE/d(Δt), is a little bit more complicated. The fact that M is a function of implies that dE/d(Δt)= ΔtM(Δt)/3+Δt²M'(Δt)/6, and it is possible M'(Δt) doesn't always exist.
In your calculation of |Error| - should m not be a variable depending on delta t instead of being a constant, because you explained it as "the max f''' over the interval t - delta t to t + delta t" - which is an interval that narrows with shrinking delta t (and therefore, m may shrink as well unless the maximum is exactly in the middle of the interval)?
Could you analyze in a video an error behavior of first order numerical differentiation techniques for analytic functions with the "Complex Step Differentiation" showcased on MathWorks Blogs?
No. Integration is well-conditioned. Adjacent areas have the same sign so adding them keeps all of the precision. As dt gets smaller, f(t+dt) and f(t-dt) have many of their most significant digits in common. Subtracting them to approximate the derivative causes all of these digits to vanish and greatly reduces the relative precision of the result. Eventually, the 'signal' gets subtracted away and leaves only the 'noise' of the data.
If you measure f(t+dt) and f(t) for a process that does not change by a bit in dt time increment then the integral of df would be zero for that interval! In microprocessors with 8 bits, it is a real danger for slow process control. The remedy is to decrease the sampling rate of the process.
@@byronwatkins2565 In control systems s = s+ds, and ds = df*dt, if the sampling rate is high df might be zero and it makes a problem in the control system.
Very important topic. With gradient descent, when the step a choose is too small, the calculations didn't converge and I didn't understand why. Thanks Steve.
Thanks Steve. You are the best in this field.
As slight remark as I see it. The roundoff error error of exactly 10^(-16) would be true for "fixed" point numbers, yet most computers compute with "floating" point numbers, and here it would be more correct to say a precision of "at most 16 digits", because with floating point numbers you can easily store say 1.2345678 * 10^(-100), and here the precision for 1.2345678 is limited by 16 digits, while the number itself can be very low (in my example of order of -100). An interesting part is why it doesn't solve the problems stated in this video with truncation errors - because when performing calculations between floating point numbers that are different, the order of the smaller number will be scaled up to the larger order, but at the same time a number 1.2345678 would turn to e.g. 0.00000000000000000012345678, and after truncation to 16 digits only a 0 will be left, thus e.g. 10^-50 + 10^-66 == 10^-50. Now even if we reduce t -> 0, the values of the function f remain in the same order, and at some point dt will be just 0, and lead to undefined derivatives
Great presentation! I loved the error analysis that you did in the last half of the lecture. That was really fabulous!
Your videos are fantastic. Thank you so much for taking the time to make them and share them!
Float is 4 bytes = 32b, double is 8 bytes = 64b; but most software now uses extended precision 80b. You quoted the limitation of this 80b extended precision.
The IEEE 754 paper describes all of the binary representations. A recent 2008 paper is valid for basically all existing hardware (controller, CPU, GPU ...) the newer 2019 paper covers mixed precision as well. This will be used in hardware accelerators like the IBM-AIU for PC/server or DSP's on ARM/RISC.
This is another enlightening session! Not sure if the conclusion of 1e-5 as the best trade off time step for higher order derivative calculation approach. We have been using 1e-8 for some reason, maybe that is an illusion?
Thanks a lot professor for your lectures, l am from Algeria.
Cleve Moler "Complex step function" approach seems useful, but has the disadvantage that it requires that the function being evaluated can take a complex value as its argument, and can produce a complex value result.
Double precision typically uses 64 bits to represent a number. Associated error is 10^-308
Very nice and informative video!
Mathematics are divided into two parts: The computational and the theoretical world. And then we can find an analogy with Aristote and his sublunar and supralunar world. The first concerning all that is situated under the orbit of the Moon (the Earth and its atmosphere), is a symbol of uncertainty, continually altered and unstable (as computation). The second, on the other hand, is immutable, perfect, stable and eternal (as a beautiful mathematical theory).
Just to save everyone some steps:
Doubles are 8 bytes, single precision floats are 4
The standard is IEEE 754
Thank you!
Very thanks...
Homework: In time 5:46: you can take some numbers, such as sqrt(2) and (sqrt(2)+(10^-16)/2) (in Matlab) or sqrt(2) and (sqrt(2)+(10**-16)/2) (in python), type them in the command window, and convenience yourself that these numbers have the same decimal representation.
P.S.:
1- In Matlab, you can use the "format long" command to display the long format of a decimal number.
M(Δt)=sup{d³f/dt³ (c)|c∈(t-Δt,t+Δt)} is not a constant but actually a function of Δt, so the graph of E(Δt)=Δt²M(Δt)/6 is not a parabola, even if it is something similar. It is true that M(Δt) is a monotonically non-decreasing function (for bigger Δt, the interval (t-Δt,t+Δt) is bigger and hence M(Δt) can only get bigger). Either d³f/dt³ is limited in which case M(Δt)->sup{d³f/dt³(x)|x∈R} or it is unlimited, in which case M(Δt)->+8.
Hence for Δt->0, in both cases (wheter M(Δt) is limited or not) we have E(Δt)=Δt²M(Δt)/6 -> +8.
It is also true that if d³f/dt³ is continuous then for Δt->0 we have M-> d³f/dt³ (t) and hence E(Δt)=Δt²M(Δt)/6 ->0.
But the derivative of E with respect to Δt, dE/d(Δt), is a little bit more complicated.
The fact that M is a function of implies that dE/d(Δt)= ΔtM(Δt)/3+Δt²M'(Δt)/6, and it is possible M'(Δt) doesn't always exist.
In your calculation of |Error| - should m not be a variable depending on delta t instead of being a constant, because you explained it as "the max f''' over the interval t - delta t to t + delta t" - which is an interval that narrows with shrinking delta t (and therefore, m may shrink as well unless the maximum is exactly in the middle of the interval)?
Could you analyze in a video an error behavior of first order numerical differentiation techniques for analytic functions with the "Complex Step Differentiation" showcased on MathWorks Blogs?
Hello everyone. Can we change the precision of calculation in simulink (matlab)?
This error graph resembles the bias/variance trade off diagram in machine learning...
Is that not a singularity in shorthand speak at that point
Does the same logic hold in numerical integration?
No. Integration is well-conditioned. Adjacent areas have the same sign so adding them keeps all of the precision. As dt gets smaller, f(t+dt) and f(t-dt) have many of their most significant digits in common. Subtracting them to approximate the derivative causes all of these digits to vanish and greatly reduces the relative precision of the result. Eventually, the 'signal' gets subtracted away and leaves only the 'noise' of the data.
If you measure f(t+dt) and f(t) for a process that does not change by a bit in dt time increment then the integral of df would be zero for that interval! In microprocessors with 8 bits, it is a real danger for slow process control. The remedy is to decrease the sampling rate of the process.
@@hamidrezaalavi3036 The derivative estimate would be zero, but the integral estimate would be zero only if both measurements were zero.
@@byronwatkins2565 In control systems s = s+ds, and ds = df*dt, if the sampling rate is high df might be zero and it makes a problem in the control system.
Just a hint for micro control designers.
cool 😎
but we can employ autograd