I really like the way you explain the paper. A lot of concepts I confused have been touched, but I wish the block parts been explained more detailed, like why the modules are used like that in those blocks. Anw thank you so much for the video, +1 subscriber, and hope to see more from you in the future.
sure. So are you more interested in papers and theory? Or would you like more on hands-on LLMs, RAG, etc. Just trying to understand the audience better. :)
@@AIBites I'm more interested in the research papers and theories and any insightful implications that you can contribute along the way. What you did here is a nice Baseline. thx!
Could only grasp the sLSTM on the first read So the exponential activation pushes up everything So we use log to get every activation in a smaller range ? damn, pretty interesting
thank you. Yes, whenever I don't understand equations, I plug in numbers to push values to the extremes. This way, it paints a better picture to understand! :)
It's so much fun looking under the hood. Thanks for explaining it so well! 😎🤖
my pleasure :)
I really like the way you explain the paper. A lot of concepts I confused have been touched, but I wish the block parts been explained more detailed, like why the modules are used like that in those blocks. Anw thank you so much for the video, +1 subscriber, and hope to see more from you in the future.
sure. So are you more interested in papers and theory? Or would you like more on hands-on LLMs, RAG, etc. Just trying to understand the audience better. :)
@@AIBites I'm more interested in the research papers and theories and any insightful implications that you can contribute along the way. What you did here is a nice Baseline. thx!
Thank you so much for providing this video!!!!!
my pleasure Yuan! 🙂
Well the graphs at 2:18 are incorrect, sigmoid and tanh have different ranges, so the output gate should have range - 1 to 1 (tanh)
thats a great spot. Copy pasting oversight I guess 🙂 will pay more attention while making the videos on attention. Thank you 😀
Could only grasp the sLSTM on the first read
So the exponential activation pushes up everything
So we use log to get every activation in a smaller range ?
damn, pretty interesting
thank you. Yes, whenever I don't understand equations, I plug in numbers to push values to the extremes. This way, it paints a better picture to understand! :)
🎉🎉🎉🎉🎉🎉🎉
🙂🙂🙂
Miller Laura Anderson Jason Williams Gary
Davis Christopher Moore Helen Garcia Ronald
Thomas Richard Clark Eric Wilson Patricia
Martin John Johnson Anthony Taylor Edward
Robinson Richard Thompson John White William
Davis Michelle Hernandez Jessica Thomas Cynthia
Thompson Robert Garcia George Wilson Brenda
Hernandez Angela Young Kevin White Sandra
Johnson Melissa Robinson Maria Williams Brenda
Gonzalez Jennifer Anderson Shirley Thomas Mary
Brown Maria Jones Kenneth Brown Shirley
Johnson Ronald Clark Brian Robinson Jeffrey
Walker Shirley Lee Laura Jackson Kevin
Moore James Jones Betty Lewis Amy
Davis Kimberly Thomas Linda Young Linda
Wilson Jeffrey Jackson Anthony Davis George
White Margaret Robinson Jennifer Clark Steven
Jackson Margaret Taylor Barbara Harris Elizabeth
Lopez Anthony Thompson Nancy Thomas Paul
Brown Frank White Kenneth Martin Sandra
Jackson Jose Garcia Dorothy Wilson Steven
do you mean it?
Taylor Charles Perez Maria Williams Carol