To learn more about Lightning: lightning.ai/ Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/ NOTE: Since LSTM is a type of neural network, we find the best Weights and Biases using backpropagation, just like for any other neural network. For more details on how backpropagation works, see: th-cam.com/video/IN2XmBhILt4/w-d-xo.html th-cam.com/video/iyn2zdALii8/w-d-xo.html and th-cam.com/video/GKZoOHXGcLo/w-d-xo.html The only difference with LSTMs is that you have to unroll them for all of your data first and then calculate the derivatives. In the example in this video, that means unrolling the LSTM 4 times (as seen at 17:49) and calculate the derivatives for each variable, starting at the output, for each copy and then add them together. ALSO NOTE: A lot of people ask why the predictions the LSTM makes for days other than day 5 are bad. The reason is that in order to illustrate how, exactly, an LSTM works, I had to use a simple example, and this simple example only works if it is trained to predict day 5 and only day 5.
"At first I was scared of how complicated the LSTM was, but now I understand." "TRIPLE BAM!!!" Thanks Dr. Starmer for teaching in a way I could follow. I am placing an order for your book today.
As someone who has watched a ton of videos on these topics, I can say that you probably do the best job of explaining the underlying functionality in a simple to follow way. So many other educators put up the standard flowchart for a model and then talk about it. Having the visual examples of data going in and changing throughout really helps hammer the concept home.
Some rare teachers have instant cred. The moment they start talking you are convinced they really understand the subject and are qualified to teach it. As an experienced teacher of extremely challenging tech myself, I confess that I've never seen more complete and polished preparation. You are changing people's lives at just the moment when this is so critical. Best of everything to you.
2 years into Data science, many paid an unpaid courses, never understood the underlying functionality of LSTM but today, Thank you Mr. Josh Starmer for being in my life.
I have been working in ML industry of 5 years now. But I never had this clear understanding. Not only have you explained this clearly but also sparked a curiosity to understand everything with this much clarity. Thanks Josh!!
For a beginner, your videos make me feel easy to follow and understand. I love the way that you use the visual example with different colors so that it easier to follow. And the curiosity to learn more is the thing that makes your videos really impressive to me. Thank you, Josh!
I have never fully understood the working of LSTM and tried many blogs and videos on it until I watched your video. This is by far the best explanation on LSTM I have seen on internet. Thank you so much for putting so much of hard work in creating these types of videos.
Thank you very much sir. I was hired as an undergraduate research assistant earlier this year, and took this opportunity to discover and learn about Deep Learning. I am currently learning about RNNs, and this video was of great value to me. Thank you very much for this.
Mad respect for putting in the hours to prepare the material for the course. These topics are some of complicated ones and yet your illustration + explanation + awesome songs make it easy and enjoyable.
I read and watched various articles and videos about LSTM, but none of them explained as well and simply as you. I learned a lot from your video and I am indebted to you. Thank you for taking the time to make this video. I hope you are always healthy and happy. BAM :)
Thank you you are the best teacher, I have seen. 🎉🎉🎉 Hurray. I learn from you than from my actual teachers who just waste my time and break my nerves down .... Thank you 🙏🙏🙏🙏🙏🙏
Watched a lot of videos on LSTM, but BAM…this was the one. I really liked the part where you put in real numbers and showed the calculations taking place. This really helped me get in the perspective of the inner workings. Kudos for putting in the hard work in making this amazing video.
Thank you Josh! Watching this animation increased my respect for Jürgen Schmidhuber for thinking of and implementing this beautiful model, and for you who made it so easy to understand.
Damn, these are very well explained. The somewhat silly humor isn't quite for me, but with these high quality explanations I couldn't care less about that. Great job!
This is the best tutorial so far. Thank you for your clearly explanation! I watched every episode of your NN series. I am a CS student and building a voice cloning app for my honour project. Your tutorials are truly helpful!!!
Requested a video on NLP some time ago and here StatQuest with a better explanation than I expected! (Other TH-camrs and Courses taught me 'How LSTM works?' but your explanation taught 'Why LSTM works?' The clarification between sigmoid and tanh solved many of my questions)
i think this is the most clear explanation of LSTM i've ever seen. i've watched many other videos that teaches LSTM, but none of them made me feel so clear about this thing!
BEST TEACHER EVER! seriously how do you make such complicated matters so simple and easy to understand? You're amazing and even tho I didn't plan on learning Machine learning, I'm soooo gonna watch every last video on this channel! thank you for this!
Hello Mr.Josh i hope you are reading this. I noticed that year by year as time progresses, the amount of things to learn is exponentially rising, for example if we go back to 1993 there was no such thing as RNNs and machine learning was so small as a domain, in just 2 decades we found these complexe algorithms ; our brains need to keep up down the line our grand grandchildren will have to learn even more to just arrive to the "ok now i know everything that we discovered in this field". So im so worried about the future of humans they will have to learn more than us by so much. Thats why i wrote this comment to thank you from the bottom of my heart and tell you that your work is THE MOST essential to humans progression, with you mr.josh you revolutionazied imo the way of learn Deep learning and ML. We need more people like you to evolve our ways of learning. Thank you soo much and TRIPLE BAM :)❤❤❤❤❤
This what Teaching should be. I have tried watching a bunch of videos in youtube almost all of them were technical jargon. Didnt understand the why part! Thank you Dr. Starmer for making such videos
Such a wonderful explanation. Have been learning about LSTMs in my course but finally understand how it works now. Looking forward to the next step of the Quest!
Are these topics covered in "The StatQuest Illustrated Guide to Machine Learning"? The video is hands down the BEST explanation of LSTM I have seen anywhere!!!
As usual, the best intuitive explanation i have seen for LSTM till now! I have banged my head on this topic in thousands of literature and videos who try to explain the same block diagrams over and over.. I got frustrated beyond a certain point. Thankfully Josh made this.. Atleast from a concept wise I am clear now..What Josh does to the community is commendable..
I clicked like before watching the video but after 5 seconds of scrolling through the visualizations. You have some of the best visualizations on this topic on TH-cam. Glad I found your channel.
Dr. Starmer, you're a rockstar! Your videos are a life-saver. I use your videos as supplementary training as I go through other ML/DL/AI courses. The visualizations are amazing and your explanations are equally amazing. 😎👊
This is an awesome video to explain LSTM. I have a little knowledge about LSTM (I felt that is required to understand this video) - but you made it really clear and eloquent. Your voice is perfect and clear. Hats off !! Thank you so much
This tutorial was too good!! Now I clearly know how LSTMs work, and how they are used to solve the Vanishing/Exploding Gradient Descent problem. Thank you StatQuest!!
I'm taking an NLP class, we learned about LSTMs a couple weeks ago. I have already forgotten much. This was a very clear and well illustrated example of how they work. Hopefully the percentage of what I now know about LSTMs that is added to my long term memory is now approaching one. Thank you! I'm waiting, with great attention, for the transformer video!!!
Im a native spanish speaker and when this video played speaking spanish my face was genuine horror, mainly because im used to josh’s voice. I’m glad i could switch it back
Ha! That's funny. Well, to be honest, one day my dream is to record my own Spanish overdubs. I'm still very far away from that dream coming true, but maybe one day it will happen.
Thanks so much! It is really the best tutorial! However, I do have a tiny problem. I really appreciate it if you could give me some help. In 18:00, when using LSTM to predict company A, you said that the final short-term memory represents the predicted price on day 5. So, when you input the price on day 3 (which is 0.25), the short-term memory (which is -0.2) should represent the predicted price on day 4. However, the real price on day 4 is 1. There seems to be some problems.
In order to keep this example as simple as possible (so I could illustrate it), this model was only trained to predict the value on day 5. It wasn't trained to predict the value on day 3 or any other day.
Thank you so much for continuing to upload videos on this machine learning topic!! Your videos has saved my grade a year ago and now it has helped my team members understand the concept very easily!
Obrigado so much. It clicked. At LONG last :] I have understood it like so: Stage 1: we decide "how relevant" the LTM is (based on the STM & input) i.e, how much to remember/forget ================================================================================== Stage 2: we update the LTM (which will be the "input" for the next stage, hence input gate) i) the update value is created via tanh function and based on the STM & the input ii) decide how relevant the update value is (or, what percentage to keep), using the sigmoid function. now we add this to the LTM from the previous stage. ================================================================================== Stage 3: determines the output based on the LTM as its input i) tanh(LTM) will be the initial output ii) based on the STM & the input, we decide what percentage of the generated output to keep (i.e., how relevant it is) It seems weird to me, how it really works. The notion of long-term memory, short-term memory, we update it like so. It really is weird.
I have watched almost all of your videos from the beginning ... I found that your teaching skills and visualization skills become better and better in every single video. This is the best quality of a data scientist, which I do not find in many data scientists.
I am making a comment on a video after a really really long time. And for me this is the best criteria to show myself how useful this video is. Thanks :)
Josh, this video really clearly explains the processes of LSTM, thank you so much! I would really appreciate it if you could also explain: 1. The intuition behind calling them long and short-term memories -> it's not just that short-term memories are the final output to consider right? 2. Mathematically, what makes the output avoid the gradient exploding/disappearing problem -> Your example demonstrates that, but it would be awesome to understand how it's a rule, not just an exception like how your example was set up.
1) Typically, when people use LSTMs, they only use the short-term memories (also called the "hidden states"). 2) I show the math that creates the exploding and vanishing gradients in my video on RNNs: th-cam.com/video/AsNTP8Kwu80/w-d-xo.html and you'll notice that that problem can't happen in LSTMs because there are no trainable weights on the long-term memory path.
This explanation do be giving me so much more insights than mundane class lectures and other yt tutorials. The level of math and logical work behind any Machine Learning or Deep Learning concepts are CRAZY and quite hard to comprehend at times, but after being a regular visitor and enjoyer of StatQuest, I've just gotten myself into deeper understandings about the beauty behind it all, in a simple and easy understandable manner! Thanks Josh for it!! ((: Ps* Love the tiny bams and random yet cute songs in between!
Great video! It is super easy to watch and understand! Also, it would be really helpful if you made a video where you clearly explain the backpropagation in LSTM. Because there are almost no reliabile and understandable videos on this topic on youtube... Thank you! Edit: just saw your pinned comment with all the stuff about backpropagation in LSTM, so thanks again :)
@@statquest By the way, I wanted to ask you, regarding backpropagation in LSTM. At first, I wanted to do some math to calculate each weight's/bias derivative (like you said in pinned comment), but then, soon after, found out that it would take a long time, since formulas get more complicated the dipper you go. So, I decided to calculate the derivative of the error of the output of the last LSTM (its STM) with respect to the output of the previous one, so that I get the "gradient" of previous LSTM's STM, and then calculate all the local derivatives for all weights and biases (and repeat this algorithm for each time step). Thus, using those gradients I can just localy calculate all the derivatives without any problem for each previous LSTM block and then finally add them together and adjust the weights and biases. Is it correct to do so?
@@KloiUA To be honest, I'm having a hard time imagining exactly how your process works. A much easier approach would to just do a proof of concept gradient using a very simple, vanilla RNN like this: th-cam.com/video/AsNTP8Kwu80/w-d-xo.html In that case, it's much easier to calculate everything and you can validate that your method is correct.
@@statquest Well, yes, you are right, it would be great to try to prove it with some smaller models. But it seems like this method for LSTM is not really applicable to a vanilla RNN, since while LSTM blocks are fully used in calculation for each time step (each weight and bias is taken into account), in vanilla RNN we skip some of them (like w3 and b2 in that RNN from your example in the video you mentioned) and use only in the last iteration. However, I think there is a method, and the idea is worth a try. I guess I will start from the RNN then, and after that come back to LSTM (hopefully with new ideas and better understanding). :) In general, thank you for your responses. In my opinion the fact that you respond to each comment and help people with their questions is super cool. And you really deserve a huge respect for that. Keep doing the best! :) Edit: I decided to write the LSTM from scratch (on C++) with this method of backpropagation to see how it works, and after some hyperparameters tuning and training it perfermed quite great (92% precision) on simple task (similar to the task in your video). Although I did not manage to find any detailed and accessible sources that explain backpropagation in LSTM, I guess that this method is likely correct, given the fact that the network actually showed the result. :))
I was learning about LSTM for the last two months; still struggled to understand what exactly happening inside until I watched this video. Huge thanks for the content creator. Still, I am struggling about the weights and bias values. Please make your next video on that, if possible. Once again, Thank you so much.
The weights and biases are optimized with backpropagation, which I explain in these videos: th-cam.com/video/IN2XmBhILt4/w-d-xo.html th-cam.com/video/iyn2zdALii8/w-d-xo.html and th-cam.com/video/GKZoOHXGcLo/w-d-xo.html
@@statquest Thank you for your reply, Josh. One thing I am a little confused about is the difference between short term memory and prediction. At around 18:00 when you explain the day 5 prediction, you said that the final short term memory is day 5 prediction. Does that mean the input value is the actual price at a certain date and the short term memory is the price prediction at a certain date. If that's the case, then the short term memories should be very close to the input values but they are not (0, -0.1, -0.1 ,-0.2 vs 0, 0.25, 0.5, 1).
@@leejo5160 The model was only trained to predict the output on day 5. And, as such, only makes good predictions for day 5. However, we could train it to predict every day if we wanted to. We'd probably need more data or a more complicated model (more layers or a fully connected network at the end).
I am working on my bachelor's thesis, for which I am trying to implement an LSTM model for anomaly detection for an IoT vehicle data project. I wanted to understand more of the math behind it, and the disclaimers about each video you would recommend watching before helped me keep everything in mind, even though most of the code won't be self-developed but merely adjusted for the use case. It's primarily based on previous literature... anyways, this overview certainly helped gain an overview :D
To learn more about Lightning: lightning.ai/
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
NOTE: Since LSTM is a type of neural network, we find the best Weights and Biases using backpropagation, just like for any other neural network. For more details on how backpropagation works, see: th-cam.com/video/IN2XmBhILt4/w-d-xo.html th-cam.com/video/iyn2zdALii8/w-d-xo.html and th-cam.com/video/GKZoOHXGcLo/w-d-xo.html The only difference with LSTMs is that you have to unroll them for all of your data first and then calculate the derivatives. In the example in this video, that means unrolling the LSTM 4 times (as seen at 17:49) and calculate the derivatives for each variable, starting at the output, for each copy and then add them together.
ALSO NOTE: A lot of people ask why the predictions the LSTM makes for days other than day 5 are bad. The reason is that in order to illustrate how, exactly, an LSTM works, I had to use a simple example, and this simple example only works if it is trained to predict day 5 and only day 5.
you deserve a a shamless promotion for this lecture dude.
@@enggm.alimirzashortclipswh6010 Thank you! :)
Yow, a new paper dropped
"Were RNNs All We Needed?"
I feel like this needs an update
Please make a Video on Backpropagation through time. With for RNN/LSTM.
@@arunkennedy9267 It's in my new book that will come out in early January.
"At first I was scared of how complicated the LSTM was, but now I understand."
"TRIPLE BAM!!!"
Thanks Dr. Starmer for teaching in a way I could follow. I am placing an order for your book today.
Hooray!!! Thank you very much! :)
As someone who has watched a ton of videos on these topics, I can say that you probably do the best job of explaining the underlying functionality in a simple to follow way. So many other educators put up the standard flowchart for a model and then talk about it. Having the visual examples of data going in and changing throughout really helps hammer the concept home.
Thank you very much!
great great great great great great great great great great great great great great video!
@@suzhenkang Thank you very much! :)
yes definitely
@@statquest also cool sound effects
Some rare teachers have instant cred. The moment they start talking you are convinced they really understand the subject and are qualified to teach it. As an experienced teacher of extremely challenging tech myself, I confess that I've never seen more complete and polished preparation. You are changing people's lives at just the moment when this is so critical. Best of everything to you.
Thank you very much! I really appreciate it.
2 years into Data science, many paid an unpaid courses, never understood the underlying functionality of LSTM but today, Thank you Mr. Josh Starmer for being in my life.
BAM! :)
I have been working in ML industry of 5 years now. But I never had this clear understanding. Not only have you explained this clearly but also sparked a curiosity to understand everything with this much clarity. Thanks Josh!!
Thank you! :)
The ease with which you explain these topics has inspired me to pursue a masters in data science. Thank you for helping me unveil my passion.
One can learn any thing for passion. should invest money and time learning in only employable courses.
BAM! :)
Thanks StatQuest for everything!
TRIPLE BAM!!! Thank you so much for supporting StatQuest!!!
For a beginner, your videos make me feel easy to follow and understand. I love the way that you use the visual example with different colors so that it easier to follow. And the curiosity to learn more is the thing that makes your videos really impressive to me. Thank you, Josh!
Awesome, thank you!
the first stage of LSTM unit determines what percentage of long term memory is remembered ... you are absolutely amazing!
Bam! :)
I have been waiting for your LSTM video for so long! No other videos can explain ML concepts as good as you do, you sir deserve a thousand BAMs!!
Thank you very much! BAM! :)
I have never fully understood the working of LSTM and tried many blogs and videos on it until I watched your video. This is by far the best explanation on LSTM I have seen on internet. Thank you so much for putting so much of hard work in creating these types of videos.
Glad it helped!
Thank you very much sir. I was hired as an undergraduate research assistant earlier this year, and took this opportunity to discover and learn about Deep Learning. I am currently learning about RNNs, and this video was of great value to me.
Thank you very much for this.
Glad it was helpful! :)
Mad respect for putting in the hours to prepare the material for the course. These topics are some of complicated ones and yet your illustration + explanation + awesome songs make it easy and enjoyable.
Thank you very much! :)
I read and watched various articles and videos about LSTM, but none of them explained as well and simply as you. I learned a lot from your video and I am indebted to you. Thank you for taking the time to make this video. I hope you are always healthy and happy. BAM :)
Thank you!
Thank you you are the best teacher, I have seen. 🎉🎉🎉 Hurray. I learn from you than from my actual teachers who just waste my time and break my nerves down ....
Thank you 🙏🙏🙏🙏🙏🙏
I'm glad my videos are helpful! :)
I'm currently studying for my NLP exam. Sending all my gratitude from Italy for such a clear and in-depth explanation.
Good luck! BAM! :)
Watched a lot of videos on LSTM, but BAM…this was the one. I really liked the part where you put in real numbers and showed the calculations taking place. This really helped me get in the perspective of the inner workings. Kudos for putting in the hard work in making this amazing video.
Thank you! :)
Thank you Josh! Watching this animation increased my respect for Jürgen Schmidhuber for thinking of and implementing this beautiful model, and for you who made it so easy to understand.
Thanks!
Damn, these are very well explained. The somewhat silly humor isn't quite for me, but with these high quality explanations I couldn't care less about that. Great job!
Thanks! :)
This is the best tutorial so far. Thank you for your clearly explanation! I watched every episode of your NN series. I am a CS student and building a voice cloning app for my honour project. Your tutorials are truly helpful!!!
Thanks! I'm really glad my videos are helpful! :)
Requested a video on NLP some time ago and here StatQuest with a better explanation than I expected! (Other TH-camrs and Courses taught me 'How LSTM works?' but your explanation taught 'Why LSTM works?' The clarification between sigmoid and tanh solved many of my questions)
Hooray! :)
i think this is the most clear explanation of LSTM i've ever seen. i've watched many other videos that teaches LSTM, but none of them made me feel so clear about this thing!
Thank you!
BEST TEACHER EVER! seriously how do you make such complicated matters so simple and easy to understand? You're amazing and even tho I didn't plan on learning Machine learning, I'm soooo gonna watch every last video on this channel! thank you for this!
Thank you very much! :)
A huge thanks for all the effort that you put on this series of videos. You are changing the course of people’s life.
Thanks!
Josh, your videos and book have been an incredible discovery for me. The visual explanation is much easier to understand. Thank you!
Thank you very much! :)
Hello Mr.Josh i hope you are reading this. I noticed that year by year as time progresses, the amount of things to learn is exponentially rising, for example if we go back to 1993 there was no such thing as RNNs and machine learning was so small as a domain, in just 2 decades we found these complexe algorithms ; our brains need to keep up down the line our grand grandchildren will have to learn even more to just arrive to the "ok now i know everything that we discovered in this field". So im so worried about the future of humans they will have to learn more than us by so much. Thats why i wrote this comment to thank you from the bottom of my heart and tell you that your work is THE MOST essential to humans progression, with you mr.josh you revolutionazied imo the way of learn Deep learning and ML. We need more people like you to evolve our ways of learning. Thank you soo much and TRIPLE BAM :)❤❤❤❤❤
Wow! Thank you very much! I'm really glad you like my videos. :)
Man the timing! I just saw your RNN video yesterday and was waiting for your LSTM video. Your timing is just impeccable
Perfect!
This what Teaching should be. I have tried watching a bunch of videos in youtube almost all of them were technical jargon. Didnt understand the why part!
Thank you Dr. Starmer for making such videos
Thank you!
Such a wonderful explanation. Have been learning about LSTMs in my course but finally understand how it works now. Looking forward to the next step of the Quest!
Thanks!
"Simplicity is the ultimate sophistication". Wonder how many hours go in to make the explanation so simple and smooth.
Great work!! Thank you!!
Thank you very much! I do spend a lot of time on these, but I enjoy it.
How does this channel not have 100 million subscribers already? What a beautiful content. Love the way things are presented.
Thank you! :)
Are these topics covered in "The StatQuest Illustrated Guide to Machine Learning"? The video is hands down the BEST explanation of LSTM I have seen anywhere!!!
The chapter on neural networks does not cover LSTMs. Just the basics + backpropagation.
@@statquest Would you consider a book explaining deep learning concepts? It would be a major life saver for all of us
@@amitpraseed I'm, slowly, working on one.
As usual, the best intuitive explanation i have seen for LSTM till now! I have banged my head on this topic in thousands of literature and videos who try to explain the same block diagrams over and over.. I got frustrated beyond a certain point. Thankfully Josh made this.. Atleast from a concept wise I am clear now..What Josh does to the community is commendable..
Thank you very much! :)
This is the best video to clearly explain the concept of LSTM I have ever seen!
BAM!
I have no clue why on earth such content is FREE!
bam! :)
This weekend I'll try going to church to thank god for your existence Josh, seriously
bam! :)
I clicked like before watching the video but after 5 seconds of scrolling through the visualizations.
You have some of the best visualizations on this topic on TH-cam.
Glad I found your channel.
I can't go to the next video without saying a big thanks to you here !! loved this explanation !! 👏
Thank you! 😃!
Can't find a better explanation than this of LSTMs across TH-cam, Thanks Josh !
Wow, thanks!
Use simple words to help me understand some complex concepts! Really appreciated that! Looking forward to learning Transformers from you soon!
Thanks! :)
Dr. Starmer, you're a rockstar! Your videos are a life-saver. I use your videos as supplementary training as I go through other ML/DL/AI courses. The visualizations are amazing and your explanations are equally amazing. 😎👊
Glad to help!
Just WOW! Can't wait to see the third part of this series (Transformers). Thank you, Josh.
Thanks! :)
You are the bestest teacher I've ever seen. I am a teacher my self, currently also a student taking AI course.
Thank you! :)
This is an awesome video to explain LSTM. I have a little knowledge about LSTM (I felt that is required to understand this video) - but you made it really clear and eloquent. Your voice is perfect and clear. Hats off !! Thank you so much
Glad it was helpful!
Can't thank you enough! the dedication you put in this video is amazing. You are my guru (or shall I call it Yoda).
Wow, thank you!
This tutorial was too good!!
Now I clearly know how LSTMs work, and how they are used to solve the Vanishing/Exploding Gradient Descent problem.
Thank you StatQuest!!
BAM! :)
I'm taking an NLP class, we learned about LSTMs a couple weeks ago. I have already forgotten much. This was a very clear and well illustrated example of how they work. Hopefully the percentage of what I now know about LSTMs that is added to my long term memory is now approaching one. Thank you! I'm waiting, with great attention, for the transformer video!!!
Awesome! I'm glad to hear the video was helpful! :)
Attention is all you need 😃😃
@@otsogileonalepelo9610 :)
The best explanation how LSTM cell works.
Thank you!
An exceptional explanation! I finally understand LSTMs after 6 months of trying to get my head around them! Thank you so much.
Glad it was helpful!
best explanation ever, i can't express how glad I am to found this channel. 100% better than the paid course I am doing right now. Thank you :).
Glad you enjoy it! :)
Your videos should begin with "universities hate this guy, learn how you increase your knowledge with Josh" 😂
BAM! :)
The university I'm in is hiring new professors, I wish Josh Starmer is my statistics professor, lol
Universities actually love they can use these videos as material.
Litterally got recomended the channel by my professor, so i would say that necessarilly
@@Ragnarok540 Yes, they can afford giving poor classes because students will study on their own. Essentially making degrees useless.
Your teaching technics are just magical! Keep up with this amazing job! BAM!!!
TRIPLE BAM! Thank you so much for supporting StatQuest!
BAAAM, First. Gotta Thank Professor Josh before I even watch the video
bam! :)
Truly a magical way of explaining such complex topics!
Thank you!
Im a native spanish speaker and when this video played speaking spanish my face was genuine horror, mainly because im used to josh’s voice. I’m glad i could switch it back
Ha! That's funny. Well, to be honest, one day my dream is to record my own Spanish overdubs. I'm still very far away from that dream coming true, but maybe one day it will happen.
Definitely the best visual explanation of LSTM I have ever seen!! Can't wait for your video for Transformers.
Glad you liked it!
@@statquest Do you have any timeline in mind for a video on Transformers?
Thank you professor, you are the best!
Thanks!
Amazingly explained! Can't wait to watch transformer!
Thank you! :)
You explain the concepts extremely well and in a simple manner! Thank you very much!
Thanks! :)
Thanks so much! It is really the best tutorial! However, I do have a tiny problem. I really appreciate it if you could give me some help.
In 18:00, when using LSTM to predict company A, you said that the final short-term memory represents the predicted price on day 5. So, when you input the price on day 3 (which is 0.25), the short-term memory (which is -0.2) should represent the predicted price on day 4. However, the real price on day 4 is 1. There seems to be some problems.
In order to keep this example as simple as possible (so I could illustrate it), this model was only trained to predict the value on day 5. It wasn't trained to predict the value on day 3 or any other day.
Got it. Thanks for the swift and helpful reply! I'm truly grateful for your help and your tutorial!@@statquest
I seldom comment on videos, but credit lies where it's due. Hands down the best video on LSTM I have watched
Thank you!
great series of videos, please make some for "transformers" too!! Thanks in advance
I'm working on them.
Thank you so much for continuing to upload videos on this machine learning topic!! Your videos has saved my grade a year ago and now it has helped my team members understand the concept very easily!
Thanks!
Obrigado so much. It clicked. At LONG last :]
I have understood it like so:
Stage 1: we decide "how relevant" the LTM is (based on the STM & input)
i.e, how much to remember/forget
==================================================================================
Stage 2: we update the LTM (which will be the "input" for the next stage, hence input gate)
i) the update value is created via tanh function and based on the STM & the input
ii) decide how relevant the update value is (or, what percentage to keep), using the sigmoid function.
now we add this to the LTM from the previous stage.
==================================================================================
Stage 3: determines the output based on the LTM as its input
i) tanh(LTM) will be the initial output
ii) based on the STM & the input, we decide what percentage of the generated output to keep (i.e., how relevant it is)
It seems weird to me, how it really works. The notion of long-term memory, short-term memory, we update it like so. It really is weird.
It's weird, but at least you understand it. Bam!
Finally, I understood LSTM. Clean and simple explanation. Probably the best one about LSTM.
Thank you! :)
Thank You professor. Now I can proudly say that I understand LSTM and can calculate the output of LSTM using Microsoft Excel
Glad it was helpful!
This was a GOD level explanation of LSTM!!! Hats off!!!
Thanks!
I have watched almost all of your videos from the beginning ... I found that your teaching skills and visualization skills become better and better in every single video. This is the best quality of a data scientist, which I do not find in many data scientists.
Wow, thanks!
I must say i have never understood a concept so well, thankyou so much Dr.Starmer . TRIPLE BAM!! Will come back here to learn about transformers next.
Thank you!
You are absolutely slept on. I love how you talk to me like I'm an idiot but sometimes I think you're just being nice.
These videos are just how I teach myself a topic. So, any tone is directed towards myself.
@@statquest totally joking it's an awesome vid
Gosh, Josh. You make learning such a breeze. Thank you very much for every single BAM!
BAM! :)
Hands down the best explanation for LSTM!
Wow, thanks!
Agreed.
I am making a comment on a video after a really really long time. And for me this is the best criteria to show myself how useful this video is. Thanks :)
Thank you very much! :)
It was the most easy explaination of LSTM ever, Thank You So Much...
Thanks!
amazing. The best LSTM explanation possible. Period.
Thank you!
HOLY SMOKES the concept is now crystal clear 🔥🔥🔥🔥🔥
Hooray! :)
THIS IS THE BEST VIDEO TO UNDERSTAND LSTM!! KEEP UP THE GREAT WORK!!!
Thanks, will do!
You are insane!!!!!!!!!!!!!!!!!!!!!!!!!
never thought I would understand lstm by looking at diagram but you did.
Hooray! :)
Your videos always puts a smile on my face, while learning.
Hooray! :)
Excellent introduction to LSTMs. Thank you Josh
Thank you!
Sir, this is my extremely awaited topic.A lot of Thanks. i know after watching LSTM my doubt will be clear.
BAM! :)
Thank you so much! I became a payed member. I wish you great success in your passion for teaching.
Awesome, thank you!
This was the best video about LSTMs that I've ever seen! Thanks!
Thank you! :)
Thank you so much for your videos! I never knew my brain could handle such THE COMPLICATED CONCEPTS!
TRIPLE BAM!!!!!!!!!
🥰😍🤩
Happy to help!
So far the best explanantion for LSTM. Thanks a lot for this video. EAgerly waiting for the next stage 'Transformer'.
Thanks!
Josh, this video really clearly explains the processes of LSTM, thank you so much! I would really appreciate it if you could also explain:
1. The intuition behind calling them long and short-term memories -> it's not just that short-term memories are the final output to consider right?
2. Mathematically, what makes the output avoid the gradient exploding/disappearing problem -> Your example demonstrates that, but it would be awesome to understand how it's a rule, not just an exception like how your example was set up.
1) Typically, when people use LSTMs, they only use the short-term memories (also called the "hidden states").
2) I show the math that creates the exploding and vanishing gradients in my video on RNNs: th-cam.com/video/AsNTP8Kwu80/w-d-xo.html and you'll notice that that problem can't happen in LSTMs because there are no trainable weights on the long-term memory path.
@@statquest Thank you :)
Thanks for BAM explanation.
I explained the same way in my exam today as you explained in the video and felt satisfied with my answer 😅.
BAM!!!
This explanation do be giving me so much more insights than mundane class lectures and other yt tutorials. The level of math and logical work behind any Machine Learning or Deep Learning concepts are CRAZY and quite hard to comprehend at times, but after being a regular visitor and enjoyer of StatQuest, I've just gotten myself into deeper understandings about the beauty behind it all, in a simple and easy understandable manner! Thanks Josh for it!! ((:
Ps* Love the tiny bams and random yet cute songs in between!
BAM!!! Thank you very much! :)
Great video! It is super easy to watch and understand!
Also, it would be really helpful if you made a video where you clearly explain the backpropagation in LSTM. Because there are almost no reliabile and understandable videos on this topic on youtube...
Thank you!
Edit: just saw your pinned comment with all the stuff about backpropagation in LSTM, so thanks again :)
bam! :)
@@statquest By the way, I wanted to ask you, regarding backpropagation in LSTM. At first, I wanted to do some math to calculate each weight's/bias derivative (like you said in pinned comment), but then, soon after, found out that it would take a long time, since formulas get more complicated the dipper you go. So, I decided to calculate the derivative of the error of the output of the last LSTM (its STM) with respect to the output of the previous one, so that I get the "gradient" of previous LSTM's STM, and then calculate all the local derivatives for all weights and biases (and repeat this algorithm for each time step). Thus, using those gradients I can just localy calculate all the derivatives without any problem for each previous LSTM block and then finally add them together and adjust the weights and biases. Is it correct to do so?
@@KloiUA To be honest, I'm having a hard time imagining exactly how your process works. A much easier approach would to just do a proof of concept gradient using a very simple, vanilla RNN like this: th-cam.com/video/AsNTP8Kwu80/w-d-xo.html In that case, it's much easier to calculate everything and you can validate that your method is correct.
@@statquest Well, yes, you are right, it would be great to try to prove it with some smaller models. But it seems like this method for LSTM is not really applicable to a vanilla RNN, since while LSTM blocks are fully used in calculation for each time step (each weight and bias is taken into account), in vanilla RNN we skip some of them (like w3 and b2 in that RNN from your example in the video you mentioned) and use only in the last iteration. However, I think there is a method, and the idea is worth a try. I guess I will start from the RNN then, and after that come back to LSTM (hopefully with new ideas and better understanding). :)
In general, thank you for your responses. In my opinion the fact that you respond to each comment and help people with their questions is super cool. And you really deserve a huge respect for that. Keep doing the best! :)
Edit: I decided to write the LSTM from scratch (on C++) with this method of backpropagation to see how it works, and after some hyperparameters tuning and training it perfermed quite great (92% precision) on simple task (similar to the task in your video). Although I did not manage to find any detailed and accessible sources that explain backpropagation in LSTM, I guess that this method is likely correct, given the fact that the network actually showed the result. :))
Your videos are very accessible, I love them. I'll definitely recommend them when I'm asked for introductions to Machine Learning content.
Awesome! Thank you! :)
Didática impressionante. Vi várias vídeos e esse foi o que mais deixou claro.
Muito obrigado!
I was learning about LSTM for the last two months; still struggled to understand what exactly happening inside until I watched this video. Huge thanks for the content creator. Still, I am struggling about the weights and bias values. Please make your next video on that, if possible.
Once again, Thank you so much.
The weights and biases are optimized with backpropagation, which I explain in these videos: th-cam.com/video/IN2XmBhILt4/w-d-xo.html th-cam.com/video/iyn2zdALii8/w-d-xo.html and th-cam.com/video/GKZoOHXGcLo/w-d-xo.html
Excellent! StatQuest has explained XGB and DL mostly clearly I have ever seen. I can't wait for your new videos on attention and transformer.
Thank you!
The best explanation on LSTM ever! Thank you so much!
Glad it was helpful!
@@statquest Thank you for your reply, Josh. One thing I am a little confused about is the difference between short term memory and prediction. At around 18:00 when you explain the day 5 prediction, you said that the final short term memory is day 5 prediction. Does that mean the input value is the actual price at a certain date and the short term memory is the price prediction at a certain date. If that's the case, then the short term memories should be very close to the input values but they are not (0, -0.1, -0.1 ,-0.2 vs 0, 0.25, 0.5, 1).
@@leejo5160 The model was only trained to predict the output on day 5. And, as such, only makes good predictions for day 5. However, we could train it to predict every day if we wanted to. We'd probably need more data or a more complicated model (more layers or a fully connected network at the end).
Wow! These videos are absolutely incredible! What a presentation!
Thank you!
dodgerblue and orange has always been my favorite color scheme for 2 colors. well done
Great explanation!!! Now I understand LTSM. Thanks a lot !!! 🙏
Thanks!
That is the best explanation I've seen anywhere!
Thank you!
I am working on my bachelor's thesis, for which I am trying to implement an LSTM model for anomaly detection for an IoT vehicle data project. I wanted to understand more of the math behind it, and the disclaimers about each video you would recommend watching before helped me keep everything in mind, even though most of the code won't be self-developed but merely adjusted for the use case. It's primarily based on previous literature... anyways, this overview certainly helped gain an overview :D
bam!