An introduction to Reinforcement Learning

Arxiv Insights

มุมมอง 667 614

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 2 ม.ค. 2025

ความคิดเห็น • 410

@DanielHernandez-rn6rp 6 ปีที่แล้ว ⁺²⁹¹
Love this guy. As an RL PhD student, your videos are golden.
@nikhillondhe5815 6 ปีที่แล้ว ⁺¹²
RL PhD sounds so interesting!
@andres18m 6 ปีที่แล้ว ⁺²
Institute name?
@Ayanwesha 6 ปีที่แล้ว ⁺¹
hello..sir
i am a grad stud
can anyone tell me plzz if back propagation is necessary in supervised and unsupervised learning?or it is only used in reinforcement learning
thanks
@hcgaron 5 ปีที่แล้ว
Ayanwesha 12345 yes, back propagation is used as a basis for gradient based methods of optimization
@ernie2111 5 ปีที่แล้ว ⁺³
"RL PhD" didn't know such things exist lol
@rednassie1101 4 ปีที่แล้ว ⁺²³⁰
People: ANN ARE TAKING OVER THE WORLD AND STUFF WILL NEVER BE THE SAME
my horribly trained network on a cat: "dog"
@I_Lemaire 4 ปีที่แล้ว ⁺¹
Could they help with the necessary government takeovers associated with COVID-19? Temporary command economies could be more efficient.
@revimfadli4666 4 ปีที่แล้ว ⁺⁶
TH-cam's bots: "Robot fighting is animal cruelty"
@denebvegaaltair1146 3 ปีที่แล้ว ⁺¹⁵
Your videos have just the right amount of technical terms such that student engineers can learn something, and also the right amount of summary and rewording such that beginners can get a vague idea of concepts. Thank you so much
@davidfield5295 6 ปีที่แล้ว
The misuse of 'literally' notwithstanding, this was an excellent video. Very clear and concise explanation.
@SukhwinderSingh-fb9qw 6 ปีที่แล้ว ⁺⁶⁶
This was one of the best videos on RL that I have seen. Extremely informative. The way you explain things is awesome. Keep up the great work! Cheers man!
@yuanyuansun3521 3 ปีที่แล้ว ⁺³²
“If u only give it a positive reward when it successfully stacked a block, it’ll never get to see any of those reward” Only if my tutors realise this.
@Hyuts 5 ปีที่แล้ว ⁺¹⁷
Explains in an elegant manner more than I have learned in half a semester of my AI college course.
@atcer51 ปีที่แล้ว ⁺¹
fiiiinnnaaaallly after tons of googling, I finally fund a USEFUL video that accually EXPLAINS how to reward the agent, and not just saying:
'oh u just reward it'
@snippletrap 6 ปีที่แล้ว ⁺¹⁷
The perils of reward shaping are well understood in a public policy context, where incentives can lead to "unintended consequences".
@cemgocer8185 4 ปีที่แล้ว ⁺³
Quality of the video is off the charts. Topics u have chosen to explain the field, the way u explain them and especially pointing the common misconceptions that make it harder for us to get into what AI really is... I'm sad that there is no superlike button. Rare to see videos of this quality and honesty
@floriandebrauwer9140 5 ปีที่แล้ว ⁺²
Thanks for your work ! I like the way you present such a complex field in a clear manner for poeple without any background. Thanks to you I know where to start in my learning journey !
@LearnRoboticsAndAI 4 ปีที่แล้ว
Summary :
- State-of-the-art Robotics is a Software challenge and not a hardware challenge (Robots are physically capable of challenging tasks)
- Supervised Learning
* Known - Inputs, Outputs
* Compute gradients using Backpropagation to train the network to predict outputs for new inputs
* E.g. for a game of ping pong, the data can be screenshots at specific time instants and the key (Up/down) pressed at each instant by the user (recorded from a user playing the game) and it can be used to train a neural network to predict output for a new input image
* Disadvantage is the creation of a dataset, which isn't always easy to do
* Another disadvantage is that since the data is recorded from human playing the game, it can never be better than a human playing the game
- Reinforcement Learning
* Difference to supervised learning is that we do not know the target label as we have no dataset
* The network that transforms the input states to output actions is called policy network
* One simple way to train a policy network (can be fully connected or convoluted) is a method called policy gradients
1. In Policy Gradient method, we start with a completely random network
2. Feed the network a frame from the game engine, it produces a random output
3. Send that action back to the game engine
4. The game engine produces the next frame
5. Outputs are represented probabilistically sampled from a distribution such that the same exact actions are not repeated again and again
6. Rewards are given if the agent scores a goal (+1) and penalty is given is the opponent scores a goal (-1)
7. Entire goal is to optimize its policy to receive maximum reward
8. We get a bunch of experience by feeding frames to the network, getting random actions
9. Sometimes (rarely) the result would be a WIN
10. We use normal gradients to increase the probability of those actions (that result in WIN) in the future
11. For a negative reward we use the same gradient but multiply it with -1
* Credit Assignment Problem - Most of the steps were good but it lost at the end so our network will think that the particular sequence of actions is bad
* Sparse reward setting, very sample inefficient
* Reward shaping- Additional intermediate rewards. But must be designed individually for specific problems. Not scalable.
@allamasadi7970 6 ปีที่แล้ว ⁺¹⁵¹
Your channel deserves more views 👍
@akramsystems 6 ปีที่แล้ว ⁺¹
agree %100
@lohithArcot 4 ปีที่แล้ว
Not many reach these topics.
@TheBeansChopper 3 ปีที่แล้ว
I think the comment section speaks for itself. This is a fantastic grasp of the basic concepts and issues with this technologies in such short time, without diving unnecessarily into formalism. Thanks :)
@shirishbajpai9486 ปีที่แล้ว
3:12 - Why reinforcement learning
4:00 - RL framework
4:30 - Policy Gradients
5:37 - Training Policy network
7:50 - Problem with policy gradient(credit assignment problem)
9:25 - Where sparse reward setting fails
11:00 - Reward shaping
@TY-un4no 4 ปีที่แล้ว ⁺¹
Complex stuff made simple and easy, this is a very good intro video to RL. Starting to learn RL for work and your video gave me a great starting point, thank you!
@punitpalial 3 ปีที่แล้ว
Here the notes from the video, LEARNZY (please ignore the timestamps, they are not accurate)
01:57 : Peter Abeel gave a demonstration of robots doing all the mundane tasks of the house like cleaning, cooking, and bringing a bottle of beer. It showed our remarkable achievements in the field of robotics
We are sufficiently advanced(mechanically and in hardware essence) to build a robot capable of doing complex actions but the reason we aren't able to make terminator-like robots is that we still haven't embedded intelligence into these robots. So creating intelligent robots is a software problem, not a hardware problem
02:03 : Reinforcement learning is basically about letting computers learn on their own by learning from themselves. Like it's said, you can only be as good as your master. Therefore if a computer learns from the world's best chess player then the best it can become is to become equal to the best chess player but to surpass her, AI needs to learn much more than just from the best chess player, and that is made possible by learning from itself, allowing it to take random decisions and then regarding the decisions which lead to a positive outcoming and punishing for decisions which led to a negative outcome, and rewarding the AI for not just winning the war but when it wins the battle too. This learning from itself is called reinforcement learning.
04:07 : The difference between Reinforcement and SUpervised learning is that unlike in supervised learning where we need a training set like the moves of the best chess player to train our AI, and then the computer recognizes patterns picks the best pattern.
In reinforcement learning, there is no training data and the computer pretty much learns by taking random decisions and figuring out which random move worked best
04:41 : Policry gradients- AI does a random action >>checks if it is good>> if good asks it to repeat it and reward it>>if not, then punish it
05:02 : 📌the entire goal of the policy network is to maximize the reward. It just receives the scoreboard as a checking mark
05:37 : 📌read Andrej Karpath's blog on dep reinforcement learning: pong from pixels
07:50 : the problem with policy gradient is that it rewards the end goal and not the process. so even if the AI took all the right steps in the game but only lost out on the last move, then the policy gradient would put all the moves made in the game as negative and will punish the ai for it. This problem is called the "Credit assignment Problem' To correct this, the AI can be rewarded for all the right moves in the game rather than winning the game.
The solution given is called reward shaping.
but the problem with reward shaping is that it has to be configured for all the cases where it is used. therefore makes it difficult to be used universally
12:03 : Reward shaping can also have "the alignment problem" where the ai is getting all the rewards but isn't doing what it is supposed to do
14:08 : Boston Dynamics has some pretty cool robots but those robots cant take autonomous intelligent decisions. They pre-programmed for doing what they do. They don't actively decide for themselves what they want to do. Hence they are not really intelligent and just a marketing gimmick at this point
@Lilowillow42 3 ปีที่แล้ว ⁺¹
Just wanted you to know that in my university course for introduction to AI our professor recommended your videos for machine learning. Your explanation is highly enjoyable and informative. Thank you!
@MuditBachhawatIn 4 ปีที่แล้ว
I have been meaning to read about RL for a long time. This video couldn't be more simple and clear introduction to it. Thanks man!
@rutexgreat3619 2 หลายเดือนก่อน
Very clear material, very clear representation, thank you for your time and video.
@DotCSV 6 ปีที่แล้ว ⁺⁴⁶
Hi Xander, just found your TH-cam channel and I'm very amazed about your content! I also run a TH-cam channel with the same topic but for the Spanish speaking audience, and I'm happy to see that more new channels are growing to educate in the field of machine learning. I hope in the future we can crossover our contents :)
@ArxivInsights 6 ปีที่แล้ว ⁺¹¹
Checked out your channel, great stuff man!! It's indeed nice to see that many people are starting to contribute to the online ML community in such a huge variety of ways :p
@OcramRatte 4 ปีที่แล้ว
eeee yo creo que te acabo de ver en tiktok
@PasseScience 4 ปีที่แล้ว ⁺²
Hello, an RL idea I had, I am curious to know if you came accross similar things.
Let's put the context in a very general way: a predictive/policy part doing it's usual job: having a latent/feature representation of the time line. (this time line including sense-data and action outputs, both past and predicted).
and an RL part that can use the prediction of the policy part to make decisions. (determine action outputs).
If we remain general both parts are working in some kind of a loop (policy predicts a future, decision parts tries to use it to determine futur actions, policy predicts again a futur based on what is planned etc...)
We have here a very basic feature we usually seek: action that requires initially a huge number of back and forth in prediction-decision, can be eventually learned by the prediction parts (that will prefill the output
actions based on it's prediction). Here nothing new, I am just talking about an abstraction that can suit a large number of RL systems.
But now is the idea: usually the decision part job is to fill action output, but if we allow it to fill the "sense data prediction part" we end up with something interesting: we can see the prediction part
in a more general way, not only as something that can predict but as something that can fill the gap (prediction is filling a specific gap), and so if the decision parts prefil the predicted sense-data with "i fill an apple in my hand" the predictive part (now more a "time-line filling gap part") can try to determine the actions that leads to this sense-data. Here we invent a new way of decision to communicate: by "will". It describes what it wants and the planification to get it is delegated to the first part of the engine.
Was I clear?
have you seen this kind of thing?
@HARtalks 4 ปีที่แล้ว
It was really interesting and helped me to get a clear picture of what reinforcement learning is... Thank you!!
@rishidixit7939 ปีที่แล้ว ⁺²
The sudden surprise of hearing Bruno Mars makes you pause video for other open tabs
@PriyanshuGupta-hf2hm 3 ปีที่แล้ว
You explained so well that I understood each and everything in your video. I am overjoyed!
@shirishbajpai9486 ปีที่แล้ว
watched in 2023 after all the LLMs stuff going on... still such relevant and pure gold!
@aanex2005 5 ปีที่แล้ว
I have no idea about RL but your video has given me a good jump start. Thanks man
@gusbakker 5 ปีที่แล้ว ⁺²
Great balance between a very well explained content and the interesting facts about current progress in AI at the end. Good work
@williamkyburz 6 ปีที่แล้ว ⁺¹
Xander, extremely well done, lucid and cogent. You should be teaching at M.I.T. or Universiteit Gent). The ability to teach complex subjects in an intuitive and simple way is a gift. Wish you the best in everything. Peace
@ArxivInsights 6 ปีที่แล้ว ⁺¹
Thanks William! I am actually doing my PhD in Gent at the moment :)
@Krimson5pride 5 ปีที่แล้ว
It was both professional and entertaining at the same time. Great and precise explanation.
@biiigates7381 4 ปีที่แล้ว ⁺¹
I've been learning AI for almost a year now and on all the channels I've spent with this is the best one. Very underrated! (btw its the first time i discovered this channel and I instantly subscribed)
@Gint-j9j 4 ปีที่แล้ว
Same here, loved this video and I instantly subscribed... and also oh yeah yeah
@funpy772 3 ปีที่แล้ว ⁺¹
Just wanted to tell you people.. this video is still awesome.
@shashankshivakumar4732 5 ปีที่แล้ว ⁺⁴
I love this video. I love his criticial and grounded thinking. Great work !
@poojanpatel2437 6 ปีที่แล้ว ⁺⁴
Best Channel on yt for ml/dl/rl/ai... Keep up the good work... Would love to see your new video weekly...
@ArxivInsights 6 ปีที่แล้ว ⁺³
I'd love to make more videos too! But since I'm currently doing this 100% in my spare time and 1 vid takes about 30hrs of work, there's really no way I can do one per week for now :(
@poojanpatel2437 6 ปีที่แล้ว
Arxiv Insights Still amazing work till now... Love to see your more videos in future.. ❤
@ArnauViaMartinezSeara 6 ปีที่แล้ว
Really useful. I am preparing a Reinforcement Learning class aplied to finance and it is really helpful. Can't wait to see next episode. Thanks
@doctorartin 5 ปีที่แล้ว
Doing part of my PhD on potantial AI-strategies fordecision-making in healthcare, and this was very useful, thank you.
@varshinis6930 4 ปีที่แล้ว
Which university??
@doctorartin 4 ปีที่แล้ว
@@varshinis6930 Lund University
@laeeqahmed1980 5 ปีที่แล้ว ⁺¹
Great talk. Humans are not good at multiple sound recognition and you added music to your video.
@majeedhussain3276 6 ปีที่แล้ว
You deserve million subscribers hopefully one day you will. So much clarity in every video. Keep going...
@thanasispappas62 ปีที่แล้ว
By far the best video of RL ive ever seen.
@espangie 4 ปีที่แล้ว ⁺¹
This was really helpful. Thank you to people like you for creating this content. Appreciate you, Xander!
@nateshrager512 6 ปีที่แล้ว
Great job introducing the topic. Very nice job dispelling misconceptions surrounding the topic as well. I put on that notification for your next videos, looking forward to em : )
@josefpolasek6666 4 ปีที่แล้ว ⁺¹
Your videos are absolutely amazing! Thank you very much for explaining concept of RL in 16 minutes.
@govindnarasimman1536 5 ปีที่แล้ว
Very clear naration and true to.ground comments. All the euphoria about AI needs to be grounded.
@ms_1918 5 ปีที่แล้ว
well came here for a 1 min intro to reinforcement learning for first class of course,
stopped after 16 minutes what a superb experience.
@7810 6 ปีที่แล้ว
Good stuff to learn the RL in terms of basic knowledge as well as the challenge it will face. Thanks for your time and sharing!
@colorlace 5 ปีที่แล้ว ⁺¹⁶
The Lebowski Theorem: No superintelligent AI is going to bother with a task that is harder than hacking its reward function.
@wizardOfRobots 4 ปีที่แล้ว ⁺⁶
Unless it's reward function punishes it for it.
Now we have the Meta-Lebowski theorem: It's not going to bother with a task harder than hacking it's hack-detection algorithm.
@halifakx 3 ปีที่แล้ว
perhaps, a machine become smart, and then smarter as it decides becoiming smarter is shorterst path to reward... finally so smart to realize their reward is just color mirrors? and create a new program inside the program that cancels or outweigh the previous reward and create new rewards? programming this new reawards in their own languaje, not apparent to us....like facebook robot talking their own languaje
@halifakx 3 ปีที่แล้ว
estramboticusssssss dangerosicusss hahaha
@gudusangtani 4 ปีที่แล้ว
So well explained ....I also liked the comments on Boston robotics considering the hype and buzz about AI and ML.. You are doing a very good job !
@orfeasliossatos 6 ปีที่แล้ว ⁺²
I've been literally looking all over for a video like this, thank you so much
@nemx4u 6 ปีที่แล้ว ⁺²
You explain hard topics beautifully! great job. Would love to see more RL videos!
@robertfairburn9979 6 ปีที่แล้ว
When I was a psychology student when trained chickens using reinforcement training with reward shaping. However it was a form supervised training in reality
@mujahid1324 4 ปีที่แล้ว
I would say "Wow'. You nailed it in10 mnts what's "reinforcement learning" is. Please keep sending more and more Ai . keep it up, Xander :)
@papaman1037 6 ปีที่แล้ว
Even in games w/o an RL actor loops without achieving a goal occur. The long time solution was to periodically perturb the system sufficiently that such learned patterns get interrupted.
@papaman1037 6 ปีที่แล้ว ⁺¹
Your content is far better than that guy that copies someone's code from GitHub makes an obscure reference to the original author and states that he added a wrapper to make the code easier to use (a lie Everytime I've checked). He uploads the code as an original comit (no fork from the rightful author's repo). He intentionally misleads people and profits from it -- a legal necessity for calling it fraudulent.
Your content is excellent, clearly founded in recent research papers and you very professionally point out that material and more. You add value with your discussion of the topic. Thank you for an excellent channel. I would use patreon but I am Ill and not working. I'm doing my best to spread the word.
@soundninja99 6 หลายเดือนก่อน
I wanna try pretraining the RL model with supervised learning to see if it can circumvent some of the problems with reward shaping
@geraldkenneth119 2 ปีที่แล้ว ⁺¹
It seems to me one way, albeit a rather difficult one, to help AI deal with sparse rewards is to
1. Give them a reward function that doesn’t work based on if they accomplished the task or not, but on how close they got to achieving it
2. Give them the ability to generate plans for achieving a goal, and to recognize why they failed
@steadymedia234 5 ปีที่แล้ว
This is a great presentation on RL, short and clear content.
@atchutram9894 4 ปีที่แล้ว
1:51 I doubt if this is only software challenge. In that demo, they apparently used humans vision and human computational capabilities to make decisions. The statement that it is only a SW challenge is probably incorrect.
@mantische 4 ปีที่แล้ว
One of the best explanations I've seen
@RoxanaNoe 6 ปีที่แล้ว
Your channel is a great resource for getting into Deep Learning and AI.
@codyheiner3636 6 ปีที่แล้ว ⁺¹
Love the philosophical discussion at the end!
@lamborghinicentenario2497 10 หลายเดือนก่อน
12:28 what did you use to connect the machine learning to a 3d model?
@alirezaparsay8518 ปีที่แล้ว
The explanation was so clear. Thank you.
@Z4NT0 3 ปีที่แล้ว
I learned so much in just 16 minutes. Awesome Video!
@stevenvaningelgem8828 ปีที่แล้ว
What is the movie shot that is shown around 1:50? It looks interesting!
@jackwhite9332 6 ปีที่แล้ว ⁺⁷
Impressive explanation, found this very useful. Thank you!
@dean8147 3 ปีที่แล้ว
You’re a legend mate. Honestly, thanks for all of your hard work
@soumyakantadash5986 5 ปีที่แล้ว
These videos are gem!!!..... incredible, precise and knowledgeable!!!!
@32isaias 5 ปีที่แล้ว
The one that will take Siraj's crown, well deserved.
@ahilanpalarajah3159 5 ปีที่แล้ว
Only way to describe this guy is "22 Two's - Jay-Z". Excellent video.
@amitredkar140 6 ปีที่แล้ว ⁺¹
Great video!!!! Explained exceptionally, liked other videos as well from your channel. Would love to see more stuff related to AI/DL or RL. Thanks in advance. Keep up the good work....
@bsudharsh 5 ปีที่แล้ว
succinct; its a brilliant rendition on reinforcement learning
@HarutakaShimizu 6 หลายเดือนก่อน
Wow, this was a very clearly explained video, thanks!
@sharadrawatindia 6 ปีที่แล้ว
Hey Xander! Great videos. Looking forwards for your next video.
@sidharthaparhi7930 6 ปีที่แล้ว
Also your intro is very high quality, like an intro to a good TV show
@Alex-gc2vo 6 ปีที่แล้ว
your videos are some of the best explanations I've found for a lot of these very advanced subjects. I suspect your viewer count is going to jump very quickly. keep it up.
@Jshizzle2 5 ปีที่แล้ว ⁺¹
Perfect video, so much more intuitive than my lectures. Thanks a bunch!
@khajasaen 6 ปีที่แล้ว
Best channel in the crowd ... keep it up Xander
@OliverZeigermann 6 ปีที่แล้ว
Very lively and understandable. Great work!
@JuantheJohn1 2 ปีที่แล้ว
damn 10:50 went from "damn I'm in class" to "YOO IM AT THE PARTY"
@ArturoMoraSoto 4 ปีที่แล้ว
Nice explanation, thanks for taking the time to create this great video.
@mohammadhatoum 6 ปีที่แล้ว
Great job.. Explained the subject in a simple way. Keep it up and looking forward for new videos
@ingeniouswild 6 ปีที่แล้ว ⁺¹
Very nice episode! One thing that struck me about your suggestion that without Reward Shaping, the auto-learning of the 2600 games would be intractable: even for a human, this would be extremely difficult - we succeed with new, undocumented games because they often have similar sub-components and sub-goals that we already know from other games (or life). But I'm sure you could easily construct a game which would be impossible for a human to learn without any hints, while still having the same overall complexity.
@thaermashkoor6225 3 ปีที่แล้ว
Thanks for this clear introduction.
@azmathmoosa4324 6 ปีที่แล้ว
I like how u don't hype up anything. Great mate! I subscribe!
@DavidSaintloth 6 ปีที่แล้ว
Reinforcement learning is along the path to the complex multidimensional salience models that will drive dynamic cognition.
"Reward shaping" I assert in the salience theory of dynamic cognition that I proposed publicly in 2013 is performed by a combination of autonomic and emotional signal modifications to experience. The key is to tie the reward to the experience and then use that to vary the prediction...this way you don't reward shape as a separate process ...reward shaping is actually performed BY comparison.
For those interested in the salience theory of dynamic cognition and consciousness a collection of the articles I've written are available at this public Facebook note:
facebook.com/notes/david-saintloth/discovering-the-dynamic-cognition-cycle/10152513149708057
@sachinsaxena8307 5 ปีที่แล้ว ⁺³
Hi Arxiv I need to create an environment for application testing ?
@saukraya3254 3 ปีที่แล้ว
Does reinforced learning implied end result is known? If it is so, it can only train where the end result is known, not enough to train AI to resolve novel problem.
@AbdullahJirjees 4 ปีที่แล้ว
Thanks for the video, what is the best source of RL to study it right and fast.
@saaniausaf9621 6 ปีที่แล้ว
I loved the way you explained everything. Thanks!
@alanator25 ปีที่แล้ว
Thank you! This was a great introduction!
@tnmygrwl 6 ปีที่แล้ว
You do an awesome of structuring the content. Loved the video.
@mehdisauvage1234 6 ปีที่แล้ว
Your videos are so useful and interesting ! This is pure gold to me :)
@papaguagua 2 ปีที่แล้ว
I love the use of Xi when you talk about monopolies. Lol
@alenasazanova8331 4 ปีที่แล้ว
That's very interesting and understantable video. Thank you very much!
@shivanavya4488 4 ปีที่แล้ว
Can anyone here please explain what he means by ‘ for the training data, we sample grik the distribution’. What exactly does it mean?
@gorillapimpin2978 6 ปีที่แล้ว
my new favorite channel
@josephedappully1482 6 ปีที่แล้ว
This is a great video; thanks for making it! Looking forward to your next one.
@digvijaybhandari9747 ปีที่แล้ว
Really enjoyed the content here!
@zrmsraggot 5 ปีที่แล้ว
Just a tought ... Maybe more on the philosofical plan, but when comparing supervised learning to reinforcement learning.
The supervised learning model will perform as well as a human can play right ? And in reinforcement learning the model will perform as well as a human think he could play ( by giving rewards and punishements ) but do we know for sure the we as humans are able to evaluate (when we see it) the best solution of a problem ?
@sridhasridharan3600 4 ปีที่แล้ว
Great Videos! I am recommending these to my students.

ต่อไป

เล่นอัตโนมัติ

Reinforcement Learning with sparse rewards