As an AI/ML practitioner with no proper math education, I find this video very helpful for understanding the complexity of the algorithms and ensuring proper implementation for my use case. Would love to see more content like this!
That's one way to go about it! This process would be super useful with a context aware LLM about cutting edge research like Galactica: arxiv.org/abs/2211.09085 Not sure if that model is still available though given the controversy it had.
With each minute, this video just keeps getting better! I’ve already subscribed and decided it’s the best video I’ve seen in months. If it keeps going like this, I might have to drop everything and dedicate my life to printing papers and making notes there! Thank you 🙏
Game Changer thank you, Ive been studying for years and all I've ever needed was a literate to compliment my evaluation and provide integrity for my assumptions. Your Much Appreciated my good sir.
Yeah, that would have been helpful haha. However I had a cool course during my PhD in which every week we were reviewing a computational neuroscience paper. It was cool because we were digging into the code + we had a lecture with a similar style I adopted in my channel. It really helped see behind the veil of the papers method section!
Yeah, in Italy by the completion of bachelor degree in Computer Engineering you are in condition to read, understand and in some cases write papers atleast published in ARXIV
@@jerahmeelsangil247that’s cool but do they actually teach you how to do each of those things before requiring you to understand them? I feel this is much of the problem: there’s an element of show-off that keeps things closed off. 😂 this video is amazing ❤
I dont do CS or ML, but am getting into advanced Lattice Bolzmann stuff for Fluid Simulation, where a lot more basic understanding Is required than here (obviously, as the video is intended for a wider audience) but it was great to see the steps I usually take being actually formalized and first reasoning myself through this particular problem, before watching you do it, which worked pretty good. Overall great video, especially for ‚beginners‘
Thanks for sharing! I have look through the video, and what I learned from this video is form my own tuition about the formulas (step by step from the first formula in papers), and I should summarize my intuition for the next time I read this paper.
Exactly, then you hold onto this vivid intuition every time you read the formula back. If you do this with enough of the core formulas in your field, reading research will become a breeze.
Sincerely, after taking Calculus 3, 4, and Numerical in college, it feels like a trauma that will last for the rest of my life. Every time I use a gradient, I remember having to calculate it using only my paper.
Honesty just work through a math book, you will be able to read math in the ML domain pretty easy. I read "Mathematics for Machine Learning", it was a struggle for me, but taught me a lot of very useful skills.
really enjoyed this video. Would love more content like this. Maybe you could look at some interesting papers and break them down in this way. That would really help people get better at reading these papers and practise intuitively understanding them. Subscribed ❤
1. What's your sketching software that you put screenshots into? 2. How did you know to look in the "Adam" paper for the missing formula, and how did you find it? 3. What papers should I prioritize reading if I want to become a research engineer. And should I try tinker with the papers concepts to try put out my own blogs/mini-papers to demonstrate on workshops / to potential employers?
For sure CC: @MahmoudSayed-hg8rb 1. It's TLDRAW, it's free over here: www.tldraw.com/ 2. I know to look into the Adam paper because they mentioned that phrase in the article "Note that only the last expression differs from vanilla Adam". So I followed the reference for Adam and then in the paper I followed the flow until I hit the algorithm section (as you saw it was pseudo-code, not formulas). 3. My two cents is to start out with the classical architecture or core discovery of the last decade in the specific field you are interested. Read them and reproduce the result gradually. These are great to start out because they already have been implemented in different ways in bigger software package (Pytorch, Tensorflow). So you won't feel too much alone. Once you are getting the hang of it you can start reading and tinker with more recent result. I would suggest setting up a Github Repo for these reproduction and work on them gradually. No need to reproduce 100% of all the result in a paper, but by gradually working through the most important one you will start to get a hang of how the authors were thinking while getting there result. Plus you will have a set of nice project to walkthrough with potential employers!
In IISc, any student from any dept. are not allowed to touch any course from Intelligent Systems pool/AI dept. unless you are done with Linear Algebra, Stochastic Models/Random Process and Optimization and Analysis course. No AI for you unless you are cracked in Math.
yesssss, this is what we mere-human need, to understand weird symbols and Greece characters. That's what prevent us to quickly understand scientific researches which should help us tremendously in our own domain.
Haha yes that’s the spirit. A cool trick that I learned from the founder of fast.ai is to rewrite the formula with very descriptive name. The formula looks ugly, but it’s MUCH more understandable.
You forgot to mention that some Journals have their own context, sometimes the authors rely on that and you can become confused because some operations, terms and symbols can mean different things depending on the Journal.
As an alternative to paper, I may suggest to use eink tablet with screen 13.3" and with drawing support, e.g. Boox Max Lumi, which alliws to draw directly on pdf.
Ok as coding/math enthusiast who is looking into ML, i have understanding of calc 1,2,3, some stats and L.algebra how long it takes you guys to read paper like this (30 pages) from top to bottom? and implement it. I know on KAGGLE there are torunaments and they tend to use reserach papers for solutions.
Implementing the idea behind QHM and QHAdam is very fast, less than 1h. This paper is also very straightforward since it’s well written. Reproducing all the result in this paper though can take more time (to set up the experiments). But generally, reproducing a paper main result can take anywhere from a few hours to a full month depending on the complexity and how much I know about that subfield.
I did not read the paper and I did not watch your video completely either, but it seems (as you present it) that the Momentum algorithm and QHM algorithm lead to the the same result. This is because in the QHN algorithm you are introducing another parameter (v) that does not appear in the Momentum algorithm, but you are again taking a weighted average. I.e. if you expand the update rule for QHM you get: theta_t+1 = theta_t - a [v b g_t + (1 - v b) grad L^_t(theta_t)] which is effectively the same as the Momentum algoirithm with a parameter v b.
what about harder stuff when they talk about data manifolds or proving the convergence of stochastic gradient descent?? That stuff is way too difficult unless you have taken graduate math courses after 4 years of undergrad maths
Great question! What you need to do is start in reverse, go from the intuition and figure out the path from the primitives. A manifold is named like this for a reason that make intuitive sense for people understanding the mechanism behind. What I would do is first look at the path from primitives to result from many people/educators. Then I would make sure I understand what we are starting with. Finally I would take a step by step approach just like we did in this video and go fetch the information I’m lacking externally. Knowing the math well sure help speed the process up, but you can still figure out complex topic like that no matter your level.
I'm under the impression that formulas have disappeared from DL papers since foundation models were introduced. Now most people build systems around these huge models. This also applies to big institutions such as Google, Microsoft, IBM, MIT, Stanford etc. What do you think?
True, there is usually less formula (at least in the main paper). However, they can still usually be there in the appendix. Some of the early DL papers were a bit too math heavy too, so I think it's a balance. But definitely, the LLM papers are light in general in math since the discovery is more related to experiments on these huge model than an algorithmic change.
I have completed LA, Probability and ML course. Have not done DL. I want to learn transformers and LLM's to conduct research upon it. Can you give me some directions.
@@deeplearningexplained I have a bachelors in software engineering and a bit of experience using SAM. Don’t have a specific interest but the image generation models like MJ are cool to me.
Okay neat, if you have a general interest then I would recommend the Deep Learning book. It’s not a paper per se, but it’s written with a similar flow as research paper and has pretty good references to the literature. It’s accessible though, so start with that and whenever you see a result that catch your attention dive into the base research it reference. This way you get both the benefit of context and with the depth of deep learning research. Hope it helps!
challenge for you AI experts; please develop a model that can take in a picture of a math formula, then go through and explain step by step on how to interpret or solve the equation, higlighting the symbols and variables while running a speech synthesizer or text generation to explain the logic.
Very interesting project indeed. If I had to run it though, I would split that into 3 different sub-systems: 1. OCR specialized in mathematical notation to extract the symbols and put it into a computer friendly format. 2. Use a specialized model like AlphaProof (or open source variant) to do proof or to generally break down the formula into steps. 3. Finally a LLM to summarize the structured output into something a layperson can understand. This way you avoid as much as possible the potential hallucination from a general purpose LLM, while keeping it's natural conversational power.
Your mic makes a sudden boom sounds which is making ears be so shocked time to time and is not good for ears, please fix it. Other than that super awesome stuff
@@deeplearningexplained Yes! This is it. I've actually come up with some pretty interesting algorithms. I think in algorithm, it's strange, but yes I prefer to code it, then I can make a formula for what is happening. Perhaps it's kind of like sheet music. Also I'm working on a Neurosymbolic PHP only model, it's doing pretty well so far.
@@redwingbeast1396 a while ago when I was younger my cousin took me to Kyoto, while there he decided to give me a pop quiz. He asked "In Japan, what is the tallest mountain in Japan?". I immediately knew I had access to it, but couldn't draw the memory in that instant. I in that moment, thought to my self "what if I rearrange my memories and sort by letter?", I did and upon reaching "F", "Mount Fuji" popped up in my mind.
Absolutely beutifully articulated video, it felt like a poem. Great work.
Oh wow, thanks for the kind words really appreciate it.
Saw a random screenshot about this video on twitter, so glad I came to watch, thanks for the insights!
Glad it was useful Samuel!
⁰⁰0@@deeplearningexplained
As an AI/ML practitioner with no proper math education, I find this video very helpful for understanding the complexity of the algorithms and ensuring proper implementation for my use case. Would love to see more content like this!
Glad it was helpful, do let me know if you have topic requests!
I literally paste the paper itself and parts into machine learning. And it explains it. Super helpful
That's one way to go about it!
This process would be super useful with a context aware LLM about cutting edge research like Galactica:
arxiv.org/abs/2211.09085
Not sure if that model is still available though given the controversy it had.
With each minute, this video just keeps getting better!
I’ve already subscribed and decided it’s the best video I’ve seen in months.
If it keeps going like this, I might have to drop everything and dedicate my life to printing papers and making notes there!
Thank you 🙏
Such a nice comment, thanks it's really motivating!
Game Changer thank you, Ive been studying for years and all I've ever needed was a literate to compliment my evaluation and provide integrity for my assumptions. Your Much Appreciated my good sir.
Glad it was useful man! 🌹
Started 2025 with this video. Finally got a good grasp on PPO, RLHF today following your advice. Can't thank more! 👑
Awesome work, keep it up! Great start to 2025! 👏👏👏
Man, you are doing the great job on this channel. Wish you the best luck with developing it!
Many thanks, I’m glad the content is helpful!
I really wish there were courses in CS master’s degrees, teaching how to decipher the math in AI research papers
Yeah, that would have been helpful haha.
However I had a cool course during my PhD in which every week we were reviewing a computational neuroscience paper.
It was cool because we were digging into the code + we had a lecture with a similar style I adopted in my channel.
It really helped see behind the veil of the papers method section!
@@deeplearningexplained link?
Yeah, in Italy by the completion of bachelor degree in Computer Engineering you are in condition to read, understand and in some cases write papers atleast published in ARXIV
@@jerahmeelsangil247that’s cool but do they actually teach you how to do each of those things before requiring you to understand them? I feel this is much of the problem: there’s an element of show-off that keeps things closed off. 😂 this video is amazing ❤
You shouldn’t be trying to “decipher” it…. You should actually know math
I dont do CS or ML, but am getting into advanced Lattice Bolzmann stuff for Fluid Simulation, where a lot more basic understanding Is required than here (obviously, as the video is intended for a wider audience) but it was great to see the steps I usually take being actually formalized and first reasoning myself through this particular problem, before watching you do it, which worked pretty good. Overall great video, especially for ‚beginners‘
Thanks for sharing! I have look through the video, and what I learned from this video is form my own tuition about the formulas (step by step from the first formula in papers), and I should summarize my intuition for the next time I read this paper.
Exactly, then you hold onto this vivid intuition every time you read the formula back.
If you do this with enough of the core formulas in your field, reading research will become a breeze.
Sincerely, after taking Calculus 3, 4, and Numerical in college, it feels like a trauma that will last for the rest of my life. Every time I use a gradient, I remember having to calculate it using only my paper.
Haha, at least you have a deep understanding of the material which is essential to understand topics that build upon it.
Worth the PTSD!
What is calculus 4 ?
Honesty just work through a math book, you will be able to read math in the ML domain pretty easy. I read "Mathematics for Machine Learning", it was a struggle for me, but taught me a lot of very useful skills.
Nice, will give it a read thanks for the recommendation!
really enjoyed this video. Would love more content like this. Maybe you could look at some interesting papers and break them down in this way. That would really help people get better at reading these papers and practise intuitively understanding them. Subscribed ❤
I’m happy you enjoyed the content and yes, I’ll be breaking down some more paper in the following weeks! :)
Thanks for teaching us, Rudo!
You are very welcome! 🌹
Make more content dude. You’re good at this.
Thanks man, really appreciate the comment. Will do! 🫡
1. What's your sketching software that you put screenshots into?
2. How did you know to look in the "Adam" paper for the missing formula, and how did you find it?
3. What papers should I prioritize reading if I want to become a research engineer. And should I try tinker with the papers concepts to try put out my own blogs/mini-papers to demonstrate on workshops / to potential employers?
The SW is microsoft whiteboard
Also im interested in his answer to your 3rd question so plz someone @ me if he replies
For sure
CC: @MahmoudSayed-hg8rb
1. It's TLDRAW, it's free over here: www.tldraw.com/
2. I know to look into the Adam paper because they mentioned that phrase in the article "Note that only the last expression differs from vanilla Adam". So I followed the reference for Adam and then in the paper I followed the flow until I hit the algorithm section (as you saw it was pseudo-code, not formulas).
3. My two cents is to start out with the classical architecture or core discovery of the last decade in the specific field you are interested. Read them and reproduce the result gradually. These are great to start out because they already have been implemented in different ways in bigger software package (Pytorch, Tensorflow). So you won't feel too much alone.
Once you are getting the hang of it you can start reading and tinker with more recent result.
I would suggest setting up a Github Repo for these reproduction and work on them gradually. No need to reproduce 100% of all the result in a paper, but by gradually working through the most important one you will start to get a hang of how the authors were thinking while getting there result. Plus you will have a set of nice project to walkthrough with potential employers!
@@deeplearningexplained Thanks for @ing me and thanks for your answer 🫡🫡
@@deeplearningexplained Thanks for @ing me
and thanks for the answer, really appreciated.
Highest viewed video after 5 years 😊 Congratulations 🎉
Yeah people seems to like this one, glad it is useful!
Great video man! Very well explained!
Thanks, glad it was useful!
Let me know if you have request for the next tutorial.
thank you for your time and effort and you got my subscription too,please make more like this
I'm glad you found it useful, will do! 🫡
In IISc, any student from any dept. are not allowed to touch any course from Intelligent Systems pool/AI dept. unless you are done with Linear Algebra, Stochastic Models/Random Process and Optimization and Analysis course.
No AI for you unless you are cracked in Math.
Those are great pre-requisite for undergraduate or graduate AI courses 👍
are you at IISc dub161?
woooow was a cool video man ! keep it up
Glad it was useful!
yesssss, this is what we mere-human need, to understand weird symbols and Greece characters. That's what prevent us to quickly understand scientific researches which should help us tremendously in our own domain.
Haha yes that’s the spirit.
A cool trick that I learned from the founder of fast.ai is to rewrite the formula with very descriptive name.
The formula looks ugly, but it’s MUCH more understandable.
This is gold.
Glad it was useful :)!
Very good explanation! You’re the Jon Snow of mathematics.😅
Haha thanks for the kind words!
You forgot to mention that some Journals have their own context, sometimes the authors rely on that and you can become confused because some operations, terms and symbols can mean different things depending on the Journal.
Really true, always check out that sort of background context if the methodology doesn’t seem to make sense!
Very insightful!
Glad it was! :)
you got my sub!
Thanks Igor, don't hesitate to let me know if you have feedback on the content!
As an alternative to paper, I may suggest to use eink tablet with screen 13.3" and with drawing support, e.g. Boox Max Lumi, which alliws to draw directly on pdf.
Ah good idea, I also heard great things from ReMarkable!
that was great, you are legend! thank you so much!
I’m glad it helped!
Thanks for sharing ❤
Thanks for watching!
great explanation
I'm glad it was useful!
Thanks to my exposure to advanced macroecon, the formulas don't seem crazy.
The more exposed to math you are the easier it gets for sure!
Ok as coding/math enthusiast who is looking into ML, i have understanding of calc 1,2,3, some stats and L.algebra how long it takes you guys to read paper like this (30 pages) from top to bottom? and implement it. I know on KAGGLE there are torunaments and they tend to use reserach papers for solutions.
Implementing the idea behind QHM and QHAdam is very fast, less than 1h. This paper is also very straightforward since it’s well written.
Reproducing all the result in this paper though can take more time (to set up the experiments).
But generally, reproducing a paper main result can take anywhere from a few hours to a full month depending on the complexity and how much I know about that subfield.
I did not read the paper and I did not watch your video completely either, but it seems (as you present it) that the Momentum algorithm and QHM algorithm lead to the the same result. This is because in the QHN algorithm you are introducing another parameter (v) that does not appear in the Momentum algorithm, but you are again taking a weighted average. I.e. if you expand the update rule for QHM you get:
theta_t+1 = theta_t - a [v b g_t + (1 - v b) grad L^_t(theta_t)]
which is effectively the same as the Momentum algoirithm with a parameter v b.
I thought the exact same, until I read appendix A.8 (they show it's not equivalent) on page 18.
-> arxiv.org/pdf/1810.06801
great, thanks for video
Glad you enjoyed!
Thank You❤
You’re welcome!
what about harder stuff when they talk about data manifolds or proving the convergence of stochastic gradient descent?? That stuff is way too difficult unless you have taken graduate math courses after 4 years of undergrad maths
Great question!
What you need to do is start in reverse, go from the intuition and figure out the path from the primitives.
A manifold is named like this for a reason that make intuitive sense for people understanding the mechanism behind.
What I would do is first look at the path from primitives to result from many people/educators.
Then I would make sure I understand what we are starting with.
Finally I would take a step by step approach just like we did in this video and go fetch the information I’m lacking externally.
Knowing the math well sure help speed the process up, but you can still figure out complex topic like that no matter your level.
@@deeplearningexplained thank you!
I'm under the impression that formulas have disappeared from DL papers since foundation models were introduced. Now most people build systems around these huge models. This also applies to big institutions such as Google, Microsoft, IBM, MIT, Stanford etc.
What do you think?
True, there is usually less formula (at least in the main paper). However, they can still usually be there in the appendix.
Some of the early DL papers were a bit too math heavy too, so I think it's a balance. But definitely, the LLM papers are light in general in math since the discovery is more related to experiments on these huge model than an algorithmic change.
What software do you use to take visual notes?
Hey I’m using TLDRAW, it’s free and pretty slick!
:) gentle bump
@@IgorStassiy Hey sorry man, thought I answered the question!
It's TLDRAW: www.tldraw.com/
Very solid app + it's free.
It's finaly whole explanation. Can I to go on the relax?
Yes you can 👍
You know nothing, Jon Snow... but deep learning?
Hahaha
What application do you use to list the main aspects of the paper?
It's TLDRAW: www.tldraw.com/ ! Great app and it's free!
@@deeplearningexplained Thanks. Appreciate it!
which sketch program is this?
It’s TLDRAW, it’s free and really solid!
OMG. You exist!
I do!
I have completed LA, Probability and ML course. Have not done DL. I want to learn transformers and LLM's to conduct research upon it. Can you give me some directions.
Andrej Karpathy content is absolutely awesome to get started right in transformers and LLM:
th-cam.com/video/zjkBMFhNj_g/w-d-xo.htmlfeature=shared
Do you have any paper recommendations for someone that is just getting started with DL?
What background knowledge do you have and what aspect of deep learning interest you most?
@@deeplearningexplained I have a bachelors in software engineering and a bit of experience using SAM. Don’t have a specific interest but the image generation models like MJ are cool to me.
Okay neat, if you have a general interest then I would recommend the Deep Learning book.
It’s not a paper per se, but it’s written with a similar flow as research paper and has pretty good references to the literature.
It’s accessible though, so start with that and whenever you see a result that catch your attention dive into the base research it reference.
This way you get both the benefit of context and with the depth of deep learning research.
Hope it helps!
@@deeplearningexplained thanks i’ll check it out!
Yacines always cooking 😂
😂
Thank you Jon Snow
Haha you are welcome!
Damn, Kit Harington!
challenge for you AI experts; please develop a model that can take in a picture of a math formula, then go through and explain step by step on how to interpret or solve the equation, higlighting the symbols and variables while running a speech synthesizer or text generation to explain the logic.
Very interesting project indeed.
If I had to run it though, I would split that into 3 different sub-systems:
1. OCR specialized in mathematical notation to extract the symbols and put it into a computer friendly format.
2. Use a specialized model like AlphaProof (or open source variant) to do proof or to generally break down the formula into steps.
3. Finally a LLM to summarize the structured output into something a layperson can understand.
This way you avoid as much as possible the potential hallucination from a general purpose LLM, while keeping it's natural conversational power.
is that you John Snow?
😂😂😂
You know something, John Snow
😂😂😂
Reading math notation is a huge obstacle for me.
Did the tips in the video help a bit?
The video helped alot. Thank you!
Your mic makes a sudden boom sounds which is making ears be so shocked time to time and is not good for ears, please fix it. Other than that super awesome stuff
Really sorry for that, will add a audio processing step in my recording workflow! 🙏
@@deeplearningexplained Thank you, great stuff in general :) keep it up
What app do you use to split the pdf in sections, and dragging them around. Noob haha
I take screenshot of them and then I paste them in the TLDRAW free app!
@@deeplearningexplained thanks.
Already at the end of my dual degree in computer science and applied maths.. This video came a little too late :(
Learning never stops! 👍
Now you know how to read the math in deep learning paper + you have solid theoretical foundation.
Any Udemy course.:D
On which topic?
nice yapcine
:)
can u please please setup a discord community? 🙏🏾
if u need help on this lemme know
Hey there, it's already setup over here :)
📌 discord.com/invite/QpkxRbQBpf
@@deeplearningexplained thank you :)
oh, so that's another Yacine?
Haha yes, I’m not the X.com Yacine 😂
All i saw was some woodoo magic
Haha, which part felt like this?
Learn deep learning math.
Nothing beat strong fundamentals that’s for sure!
Coo)l
:)
simply go learn maths calculus statistics differential and then jump into ai.why you in AI when math statistics is basics of AI.
Well I think most ML practitioners have just a CS degree which typically don’t require beyond calc 3 and Lin alg 1. It’s just the way it is :/
reading deep learning paper is pain in the ass
It gets better as you read more!
Wtf was that thumbnail
Did my best haha
Broken maths. Solely to tackle a trivial optimization but lacks fundamental analysis.
Hey, thanks for the feedback!
How would you have ran this tutorial differently?
You know nothing John snow
😂😂😂
No bullshit, I think in this "math" at times, but cannot even begin drawing a formula. 😂
Try to code it instead, it's much easier in my opinion than using formula to start out.
@@deeplearningexplained Yes! This is it. I've actually come up with some pretty interesting algorithms. I think in algorithm, it's strange, but yes I prefer to code it, then I can make a formula for what is happening. Perhaps it's kind of like sheet music. Also I'm working on a Neurosymbolic PHP only model, it's doing pretty well so far.
@@imaspacecreature thinking in algos. please teach me how to!!
@@redwingbeast1396 a while ago when I was younger my cousin took me to Kyoto, while there he decided to give me a pop quiz. He asked "In Japan, what is the tallest mountain in Japan?". I immediately knew I had access to it, but couldn't draw the memory in that instant. I in that moment, thought to my self "what if I rearrange my memories and sort by letter?", I did and upon reaching "F", "Mount Fuji" popped up in my mind.