these lectures are very useful, for understanding optimization techniques. I highly recommend those videos for those who are doing research in this field
For BPP, deep learning and for the general structure of neural networks the following comments may be useful. To begin with, note that instead of partial derivatives one can work with derivatives as the linear transformations they really are. It is also possible to look at the networks in a more structured manner. The basic ideas of BPP can then be applied in much more general cases. Several steps are involved. 1.- More general processing units. Any continuously differentiable function of inputs and weights will do; these inputs and weights can belong, beyond Euclidean spaces, to any Hilbert space. Derivatives are linear transformations and the derivative of a neural processing unit is the direct sum of its partial derivatives with respect to the inputs and with respect to the weights; this is a linear transformation expressed as the sum of its restrictions to a pair of complementary subspaces. 2.- More general layers (any number of units). Single unit layers can create a bottleneck that renders the whole network useless. Putting together several units in a unique layer is equivalent to taking their product (as functions, in the sense of set theory). The layers are functions of the of inputs and of the weights of the totality of the units. The derivative of a layer is then the product of the derivatives of the units; this is a product of linear transformations. 3.- Networks with any number of layers. A network is the composition (as functions, and in the set theoretical sense) of its layers. By the chain rule the derivative of the network is the composition of the derivatives of the layers; this is a composition of linear transformations. 4.- Quadratic error of a function. ... --- Since this comment is becoming too long I will stop here. The point is that a very general viewpoint clarifies many aspects of BPP. If you are interested in the full story and have some familiarity with Hilbert spaces please google for papers dealing with backpropagation in Hilbert spaces. A related article with matrix formulas for backpropagation on semilinear networks is also available. For a glimpse into a completely new deep learning algorithm which is orders of magnitude more efficient, controllable and faster than BPP search in this platform for a video about deep learning without backpropagation; in its description there are links to a demo software. The new algorithm is based on the following very general and powerful result (google it): Polyhedrons and perceptrons are functionally equivalent. For the elementary conceptual basis of NNs see the article Neural Network Formalism. Daniel Crespin
these lectures are very useful, for understanding optimization techniques. I highly recommend those videos for those who are doing research in this field
For BPP, deep learning and for the general structure of neural networks the following comments may be useful.
To begin with, note that instead of partial derivatives one can work with derivatives as the linear transformations they really are.
It is also possible to look at the networks in a more structured manner. The basic ideas of BPP can then be applied in much more general cases. Several steps are involved.
1.- More general processing units.
Any continuously differentiable function of inputs and weights will do; these inputs and weights can belong, beyond Euclidean spaces, to any Hilbert space. Derivatives are linear transformations and the derivative of a neural processing unit is the direct sum of its partial derivatives with respect to the inputs and with respect to the weights; this is a linear transformation expressed as the sum of its restrictions to a pair of complementary subspaces.
2.- More general layers (any number of units).
Single unit layers can create a bottleneck that renders the whole network useless. Putting together several units in a unique layer is equivalent to taking their product (as functions, in the sense of set theory). The layers are functions of the of inputs and of the weights of the totality of the units. The derivative of a layer is then the product of the derivatives of the units; this is a product of linear transformations.
3.- Networks with any number of layers.
A network is the composition (as functions, and in the set theoretical sense) of its layers. By the chain rule the derivative of the network is the composition of the derivatives of the layers; this is a composition of linear transformations.
4.- Quadratic error of a function.
...
---
Since this comment is becoming too long I will stop here. The point is that a very general viewpoint clarifies many aspects of BPP.
If you are interested in the full story and have some familiarity with Hilbert spaces please google for papers dealing with backpropagation in Hilbert spaces. A related article with matrix formulas for backpropagation on semilinear networks is also available.
For a glimpse into a completely new deep learning algorithm which is orders of magnitude more efficient, controllable and faster than BPP search in this platform for a video about deep learning without backpropagation; in its description there are links to a demo software.
The new algorithm is based on the following very general and powerful result (google it): Polyhedrons and perceptrons are functionally equivalent.
For the elementary conceptual basis of NNs see the article Neural Network Formalism.
Daniel Crespin
Thank you , this is very clear ! I love CMU been there for Tom Mitchell's Class :)
1:07:12 ppt should be y=f(g1(x),g2(x)...)
Done!!!