Thank you for the clear explanation! I wonder where does the term "cotangent" come from? A google search shows it comes from differential geometry, do I need to learn differential geometry to understand it ...?
You're welcome 🤗 Glad you liked it. The term cotangent is borrowed from differential geometry, indeed. If you are using reverse-mode autodiff to compute the derivative of a scalar-valurd loss, you can think of the cotangent associated with a node in the computational graph to be the derivative of the loss wrt that node. More abstractly it is just the auxiliary quantity associated with each node.
Hi, it's a great question, but very hard to answer. I can't pinpoint it to this one approach, this one text book etc. I have an engineering math background (I studied mechanical engineering for my bachelor degree). Generally speaking, I prefer the approach taken in engineering math classes, being more algorithmically focused than theorem-proof focused. Over the course of my undergrad, I used various TH-cam resources (which also motivated me doing this channel). The majority were in German, some English-speaking include the vector calculus videos by Khan academy (which werde done by grant Sanderson) and of course 3b1b. For my graduate education, I figured that I really liked reading documentation and seeing API interfaces of various numerical Computer programs. JAX and tensorflow have amazing docs. This is also helpful for PDE simulations. Usually, I guide myself by Google, forum posts and a general sense of curiosity. 😊
Thanks for the great content as always. One question and one comment. How would you handle it if it is a DAG instead of a chain? Any reference (book/paper) that you can share? I noted that for symbolic differentiation, you pay the price of redundant calculation (quadratic in the length of chains) but with constant memory. On the other hand, the auto-diff caches the intermediate values and has linear calculation but also linear memory.
Hi, Thanks for the kind comment :), and big apologies for the delayed reply. I just started working my way through a longer backlog of comments; it's been a bit busy in my private life the past months. Regarding your question: For DAGs the approach is to either record a separate Wengert list or overload operations. It's a bit harder to truly find a taxonomy for this because different autodiff engines all have their own style (source transformation at various stages in the compiler/interpreter chain vs. pure operator overloading in the high-level language, restrictions to certain high-level linear algebra operations, etc.). I hope I can finish the video series with some examples of it in the coming months. These are some links that directly come to my mind: The "Autodidact" repo is one of the earlier tutorials of simple (NumPy-based) autodiff engines in Python, written by Matthew Johnson (co-author of the famous HIPS autograd package): github.com/mattjj/autodidact . The HIPS autograd authors are also involved in the modern JAX package (that is featured quite often on the channel). There is a similar tutorial called "autodidaX": jax.readthedocs.io/en/latest/autodidax.html The microgrid package by Andrey Karpathy is also very insightful: github.com/karpathy/micrograd . It is based on "PyTorch-like" perspective. His video on "Becoming a backdrop Ninja" can also be helpful. In the Julia world: you might find the documentation of the "Yota.jl" package helpful: dfdx.github.io/Yota.jl/dev/design/ Hope that gave some first resources. :)
Very clear explanation! Thanks and hope you'll get more views
Thanks for the kind feedback 😊
Feel free to share it with your network :)
Thank you for the clear explanation! I wonder where does the term "cotangent" come from? A google search shows it comes from differential geometry, do I need to learn differential geometry to understand it ...?
You're welcome 🤗
Glad you liked it. The term cotangent is borrowed from differential geometry, indeed. If you are using reverse-mode autodiff to compute the derivative of a scalar-valurd loss, you can think of the cotangent associated with a node in the computational graph to be the derivative of the loss wrt that node. More abstractly it is just the auxiliary quantity associated with each node.
can you please tell your recommend resources to learn maths? or how did you learned math?
Hi,
it's a great question, but very hard to answer. I can't pinpoint it to this one approach, this one text book etc.
I have an engineering math background (I studied mechanical engineering for my bachelor degree). Generally speaking, I prefer the approach taken in engineering math classes, being more algorithmically focused than theorem-proof focused. Over the course of my undergrad, I used various TH-cam resources (which also motivated me doing this channel). The majority were in German, some English-speaking include the vector calculus videos by Khan academy (which werde done by grant Sanderson) and of course 3b1b.
For my graduate education, I figured that I really liked reading documentation and seeing API interfaces of various numerical Computer programs. JAX and tensorflow have amazing docs. This is also helpful for PDE simulations. Usually, I guide myself by Google, forum posts and a general sense of curiosity. 😊
@@MachineLearningSimulation thanks for the reply
You're welcome 😊
Good luck with your learning journey
Thanks for the great content as always. One question and one comment. How would you handle it if it is a DAG instead of a chain? Any reference (book/paper) that you can share? I noted that for symbolic differentiation, you pay the price of redundant calculation (quadratic in the length of chains) but with constant memory. On the other hand, the auto-diff caches the intermediate values and has linear calculation but also linear memory.
Hi,
Thanks for the kind comment :), and big apologies for the delayed reply. I just started working my way through a longer backlog of comments; it's been a bit busy in my private life the past months.
Regarding your question: For DAGs the approach is to either record a separate Wengert list or overload operations. It's a bit harder to truly find a taxonomy for this because different autodiff engines all have their own style (source transformation at various stages in the compiler/interpreter chain vs. pure operator overloading in the high-level language, restrictions to certain high-level linear algebra operations, etc.). I hope I can finish the video series with some examples of it in the coming months.
These are some links that directly come to my mind: The "Autodidact" repo is one of the earlier tutorials of simple (NumPy-based) autodiff engines in Python, written by Matthew Johnson (co-author of the famous HIPS autograd package): github.com/mattjj/autodidact . The HIPS autograd authors are also involved in the modern JAX package (that is featured quite often on the channel). There is a similar tutorial called "autodidaX": jax.readthedocs.io/en/latest/autodidax.html
The microgrid package by Andrey Karpathy is also very insightful: github.com/karpathy/micrograd . It is based on "PyTorch-like" perspective. His video on "Becoming a backdrop Ninja" can also be helpful.
In the Julia world: you might find the documentation of the "Yota.jl" package helpful: dfdx.github.io/Yota.jl/dev/design/
Hope that gave some first resources. :)