Love the video, but I think it can be a bit confusing going from 5:30 where you just have "f(x_k, A_2, ..., A_M)" to 9:40 where you have "f_k(x_k, A_2, ..., A_M)." Looking at this, I initially thought that you might have a new fitting function for each data point, rather than the fitting function with the kth data point plugged in.
GREAT channel! Subscribing! Currently struggling through global multivariate nonlinear optimization of a fortunately-differentiable nonconvex, high-dimensional scalar function.
Hi Professor Kutz, do you still have a bounty on errata? I think your going rate used to be around $0.25 per instance 😊. I believe there might be a minor errata around timestamp 7:23 . At this time you state that f is the map from input to output (effectively the forward propagation through the network) but in the context of gradient descent, you actually want to use the gradient of the error function that you mentioned in the previous slide (around timestamp 6:55).
weights are all the connections between all the neurons (neurons are some sort of non linear functions). So the weights is the memory in the network, how have to be trained. The weights are initialized with random values before start training the weights. It's important that the weights is not set symmetric before start training.
"weights" can be considered as coefficients "a" and "b" in the following equation: y = a*x + b. Weights have their such specific "positions" that determine the mathematical structure of a computational model they construct. For example, in the above equation, "a" represents the linear term (slope), and "b" indicates the constant (intersection of y axis): "a" occupies the position of linear term, and "b" occupies the position of constant. The tuple (a, b) does not indicate a linear model until we set the specific positions to "a" and "b" mentioned above. "Weights" in a neural network model are something like this, but in a much higher dimensional space. Hope this helps.
Weights are a set of attributes. For example, a house has # Bedrooms, color of door, # washrooms, #floors, each of these is a single weight, (Whatever else you want to add in there, # of windows). Then you want to see how much each of these weights affect the house price, so # of bedrooms may have a higher weighting to the price, thus higher coefficient value, and color of door may have a lower weighting. These weights then make up a formula such that you can input your own attributes of the house and try to estimate the price.
What is a lot of weights? The problem I have had is that it is hard to find the gradient where the derivative of all the parameters are minimized at once. Usually, there is a parameter or two that will significantly increase the sum of squared errors or mean squared error so the step can't be big and one of the many parameters will keep the parameter set from moving towards a minimum.. In other words, the valley of the data set of parameter is trying to walk down is very narrow. I have lots of real data. GD always has problems with optimizing his data. There are many algorithms that a much better if the number of parameters is less than 25. The professor's visual example is OK for teaching but is too simple.easy,
Love the video, but I think it can be a bit confusing going from 5:30 where you just have "f(x_k, A_2, ..., A_M)" to 9:40 where you have "f_k(x_k, A_2, ..., A_M)." Looking at this, I initially thought that you might have a new fitting function for each data point, rather than the fitting function with the kth data point plugged in.
Beautiful explanation, simply beautiful !
Very well explained,
And the production quality is out of this world
Thank you sir 🙏🏼
GREAT channel! Subscribing!
Currently struggling through global multivariate nonlinear optimization of a fortunately-differentiable nonconvex, high-dimensional scalar function.
Hi Professor Kutz, do you still have a bounty on errata? I think your going rate used to be around $0.25 per instance 😊. I believe there might be a minor errata around timestamp 7:23 . At this time you state that f is the map from input to output (effectively the forward propagation through the network) but in the context of gradient descent, you actually want to use the gradient of the error function that you mentioned in the previous slide (around timestamp 6:55).
Appreciate your very clear explanation!
Stochastic gradient descent is magical thinking that actually works. Sometimes you get lucky.
ty
19:00
Can you explain what the "weights" are to a non-computer science person? (There's so much jargon, my brain)
weights are all the connections between all the neurons (neurons are some sort of non linear functions). So the weights is the memory in the network, how have to be trained. The weights are initialized with random values before start training the weights. It's important that the weights is not set symmetric before start training.
"weights" can be considered as coefficients "a" and "b" in the following equation:
y = a*x + b.
Weights have their such specific "positions" that determine the mathematical structure of a computational model they construct.
For example, in the above equation, "a" represents the linear term (slope), and "b" indicates the constant (intersection of y axis): "a" occupies the position of linear term, and "b" occupies the position of constant. The tuple (a, b) does not indicate a linear model until we set the specific positions to "a" and "b" mentioned above.
"Weights" in a neural network model are something like this, but in a much higher dimensional space. Hope this helps.
Weights are a set of attributes. For example, a house has # Bedrooms, color of door, # washrooms, #floors, each of these is a single weight, (Whatever else you want to add in there, # of windows). Then you want to see how much each of these weights affect the house price, so # of bedrooms may have a higher weighting to the price, thus higher coefficient value, and color of door may have a lower weighting. These weights then make up a formula such that you can input your own attributes of the house and try to estimate the price.
are you related to Matthew McConaughey!
What is a lot of weights? The problem I have had is that it is hard to find the gradient where the derivative of all the parameters are minimized at once. Usually, there is a parameter or two that will significantly increase the sum of squared errors or mean squared error so the step can't be big and one of the many parameters will keep the parameter set from moving towards a minimum.. In other words, the valley of the data set of parameter is trying to walk down is very narrow. I have lots of real data. GD always has problems with optimizing his data. There are many algorithms that a much better if the number of parameters is less than 25. The professor's visual example is OK for teaching but is too simple.easy,