I'm immersed in this. I read a book with a similar theme, and I was completely immersed. "The Art of Saying No: Mastering Boundaries for a Fulfilling Life" by Samuel Dawn
Thanks Steve very beautifully explained. From my point you are the best teacher ever I have seen. Please teach us or upload some lecture on designing own custom environment.
Thank you Prof. Brunton for the valuable content. I will be a bit greedy and ask if you can upload a video including an example coded in Matlab or Python Thank you again for all your efforts
There were only around 55000 transistors in the quite useful Z80 CPU that was already available in the late 1970s. And certainly that would have been enough for a specialized fast Hadamard transform chip and possibly even a fast neural network chip based on that transform. Lost oppertunities to do things early. Certain realizations about randomization, distribution and dot products could also have made associative memory a thing in general use today. Allowing efficient experimentation with external memory blocks for neural nets. Also, even if it is not very popular, I want to mention training by evolution rather than back-propagation. The continuous gray code optimization algorithm works rather well. You can split the training over many cores very easily and it appears you get much better generalization. Which is perhaps due to the full training set being used in its entirety rather than batches. Obviously a negative point is it will work far better with fast networks. However the easy distribution over (say cheap ARM) cores offsets some of the pain. Also some problems are better framed using evolution.
I think the issue stems from the lack of understanding from the bot to recognize the dynamics and nonlinearities in the state and not being able to apply these predictive dynamic patterns to other environments. Using SINDy and Q-learning in conjunction may provide a solution to an adaptive model. The issue of course is understanding how to apply the learned dynamics of the system to the Q-values in other models. In the case of tic-tac-toe to say connect four would be alligning inputs and recognizing opponents position maneuvering around them and needing to learn the new vertically structure. The dynamics of needing to allign pieces would already be learned but the new upright vertically stacking enviroment would need to be learned. Some rules of certain game elements would obviously be more communacative for the bot
There needs to be some way for the bot to map what it knows can already be controlled with what is observed and recognize the area that needs to be explored, maybe with classical control systems or something BMR
From what I gather, MPC is a deep learning algortihm that extends RL to better predict the state of a sequence of actions. Quora has some answers, which do challenge each other but will give you an idea. www.quora.com/What-is-the-difference-between-Machine-Learning-and-Model-Predictive-Control
I wonder if Steve, after seeing the current major advances in AI such as GPT-4 or Stable Diffusion, still thinks that general AI is a problem that will be solved hundreds of years from now?
Hello Steve i appreciate your videos. is it possible to create video about architecture that combines RL and Supervised learning. like the method google used in Alpha zero Alpha mu ect gaming bot. please ? thank you
thank you for your amazing lecture. I am confused about the policy. if it's a neural network that gets states as input and commits an action to the environment, why do you represent it as pi which is a function of a state and an ACTION? is action an input or an output for the agent?
Its confusing but the action generated by the Neural Network will lead to a new state that then will be used by the Neural Network again. So in a way you could say that it depends on an action too, except the very first state which wouldnt have any action that led to it, unless you implicitly define a "do nothing" action.
is "Real general artificial intelligence" hard to get, mainly because computer resources or reinforcement learning techniques? Which world should get it faster in years?: 1) we have super computers, but human can't model "real intelligence" yet. 2) we have powerful scalable reforcement learning tecniques but the actual computers can't run it.
Very interesting video! I had a couple of questions related to physical systems. Is there a way to quantify the robustness of such a controller to disturbances? Also, is there a way to “tune” the performance of such a controller without having to re-run the entire training step? Thanks!
GPT-3 Tranformers can generalize and think abstractly. I taught an agent how to apply a dialectical analysis to any given political or philosophical question and extract the Marxist position on it -- and it did it well. It can apply the method of dialectical deconstruction to ANY topic, ask the right questions, uncover the contradictions and arrive at a dynamic understanding of the topic. Transformers are the future
As always: almost uncanny production value. Will certainly recommend to others. I'd love if you'd follow up on this and get more technical, take us by the hand and explore the content in the niches and pockets within the field - maybe driven by curiosity? :p
I wonder if humans are really that good at transfer learning or is there something else at play. Transfer learning isn't the only explanation of the fact that humans can learn new things from just a few examples. I mean it could be that we are strongly pre-wired for certain tasks like language or movement, so in reality what we see as "learning" could just be the finishing touch on something that is mostly already within us. I am no behavioral scientist but I would be curious what an expert in human learning would say.
My gut says that neural nets are extremely good at "training" but i doubt it's all to "learning". Movements for example are really difficult. You can show or tell someone a movement and many can pull it off in a view attempts. But to master it takes years of training. I think we use biological neural nets to do all the basic stuff. But there is something else that let us design, evaluate and test a cost function, while we are doing something for the first time. There is something that lets us use informationen from others that let us discard most possible things and train in a much smaller solution space. Also i think we have a pretty good gut feeling if we are "under or over fitted" even without a testset.
im not so sure about your claim that transfer learning/general intelligence is 100 years off. i think all it will take is a single key insight (which might come in months... or even has already been made and just hasnt gotten into the right hands yet) into model generation to really make a breakthrough into AGI.
I really wish math notation was better. "Just grab whatever symbol from the greek alphabet and use it for whatever variable you want". rely on the reader to know what the asinine mapping of variables to greek letters stand for. nice video though.
Steve senpai!!! Uploaded. Is it possible to set up real time data sharing between two basian reinforcement learning Algorithms to explore the same data space.
I'll be honest, this was less informative than the last RL video. Definitely took the right direction working towards implementing Neural Networks into RL, but didn't really explain a whole lot. Title could have been "What has Been Accomplished with Neural Networks for Deep RL".
Shouting out Two minute papers in a Brunton video - best crossover episode ever!
what a time to be alive
I love the Two minute series on YT but it is so hard to pronounce that dude's name! lol
I held on to my papers hearing that
Shouldn't his channel be called 5 minute papers?
Really love your videos!
Would love to see more about deep reinforcement learning used in the field of Robotics.
Really great video.
The most complicated problem that I’ve seen RL solve is openAI dota 2. Mind blowing.
Very valuable resources that get uploaded here, thank you!
I'm immersed in this. I read a book with a similar theme, and I was completely immersed. "The Art of Saying No: Mastering Boundaries for a Fulfilling Life" by Samuel Dawn
Your work is so inspiring, Steve.
Thanks Steve very beautifully explained. From my point you are the best teacher ever I have seen.
Please teach us or upload some lecture on designing own custom environment.
Easy to digest information. Enjoying learning from this account.
thank you for your amazing content! I learn so much about the world with your videos
Someday I would love to attend your lectures live.:))))
Thank you! I've learned so much from your videos. Please machine learning and reinforcement learning and MPC with fuzzy logic.
Thank you. Very instructive.
Thank you Prof. Brunton for the valuable content. I will be a bit greedy and ask if you can upload a video including an example coded in Matlab or Python
Thank you again for all your efforts
lol you're the eigensteve. All other steve's are just a linear combination of your properties XD. love it.
I love that your sequence for bringing up children is: Tic-Tac-Toe --> Checkers --> Chess --> ready for the Real World.
Great video! Would like to see a video about "Hindsight experience replay " in Reinforcement Learning.
Thanks Steve! I believe it will be also interesting to compare model-based and model-free RL.
Great Insight. Thank you
Thumbs up and subscribed. Keep up the valuable work 👍
Very good material. Thank you
Thank you too!
There were only around 55000 transistors in the quite useful Z80 CPU that was already available in the late 1970s. And certainly that would have been enough for a specialized fast Hadamard transform chip and possibly even a fast neural network chip based on that transform. Lost oppertunities to do things early. Certain realizations about randomization, distribution and dot products could also have made associative memory a thing in general use today. Allowing efficient experimentation with external memory blocks for neural nets.
Also, even if it is not very popular, I want to mention training by evolution rather than back-propagation. The continuous gray code optimization algorithm works rather well. You can split the training over many cores very easily and it appears you get much better generalization. Which is perhaps due to the full training set being used in its entirety rather than batches. Obviously a negative point is it will work far better with fast networks. However the easy distribution over (say cheap ARM) cores offsets some of the pain. Also some problems are better framed using evolution.
Thank you Professor! Are we starting a new video series on machine learning control?
WE LOVE YOU STEVE!!!
Thumbs up Steve! :)
You rock!
This man is just awesome!
what is the expectation in the value function represent? over what random variable?
Thanks steve.
@6:59 "only a small percentage of humans learn"... I thought that was the whole point of the game...
@12:44 - "because they are extremely expressive" ... should that be "expensive" instead of "expressive"?
Refering to your point about a trained agent only being good at the game they were trained on. Are they actually learning, or simply memorizing?
They are “learning” by estimating how good a state is. Thus it can estimate unobserved states based on previously seen data
I think the issue stems from the lack of understanding from the bot to recognize the dynamics and nonlinearities in the state and not being able to apply these predictive dynamic patterns to other environments. Using SINDy and Q-learning in conjunction may provide a solution to an adaptive model. The issue of course is understanding how to apply the learned dynamics of the system to the Q-values in other models. In the case of tic-tac-toe to say connect four would be alligning inputs and recognizing opponents position maneuvering around them and needing to learn the new vertically structure. The dynamics of needing to allign pieces would already be learned but the new upright vertically stacking enviroment would need to be learned. Some rules of certain game elements would obviously be more communacative for the bot
There needs to be some way for the bot to map what it knows can already be controlled with what is observed and recognize the area that needs to be explored, maybe with classical control systems or something BMR
Hi prof. Can you please tell me what is the difference between the MPC and RL?
From what I gather, MPC is a deep learning algortihm that extends RL to better predict the state of a sequence of actions. Quora has some answers, which do challenge each other but will give you an idea. www.quora.com/What-is-the-difference-between-Machine-Learning-and-Model-Predictive-Control
nice video! well done!
Amazing video
Your video's are amazing
Great videos! Thanks. The only thing I think could be improved, is the sound.
Hey Steve! Nice video! Could you please link the things you mentioned in the description?
Thanks for video
I wonder if Steve, after seeing the current major advances in AI such as GPT-4 or Stable Diffusion, still thinks that general AI is a problem that will be solved hundreds of years from now?
Hello Steve
i appreciate your videos.
is it possible to create video about architecture that combines RL and Supervised learning. like the method google used in Alpha zero Alpha mu ect gaming bot. please ?
thank you
thank you for your amazing lecture. I am confused about the policy. if it's a neural network that gets states as input and commits an action to the environment, why do you represent it as pi which is a function of a state and an ACTION? is action an input or an output for the agent?
Its confusing but the action generated by the Neural Network will lead to a new state that then will be used by the Neural Network again. So in a way you could say that it depends on an action too, except the very first state which wouldnt have any action that led to it, unless you implicitly define a "do nothing" action.
I was expecting more technical details! nice video though
I know this is not related to this topic, but can you please explain the sliding mode control
How did you run record the videos that show you after the transparent slides of your presentations?
Please add the suggested sources in the comments
Very helpful
Can you complete this RL series with your great explaination?
thank you for your vido
Really nice exposé video!
as always, your explanations are fantastic!
do you think you can give some practical examples with code too ?
Amazing
Do you still think that we are lifetimes away from AGI?
Steve I would love to see an example of how to implement reiinforcement learning! Discrete control and z-transform videos would also be great!
It takes a lot of effort to workout what we can use this for and how.
is "Real general artificial intelligence" hard to get, mainly because computer resources or reinforcement learning techniques?
Which world should get it faster in years?:
1) we have super computers, but human can't model "real intelligence" yet.
2) we have powerful scalable reforcement learning tecniques but the actual computers can't run it.
Does Transfer Learning not translate into Reinforcement learning?
If expert knowledge is inherently limiting how can we ever reach AGI?
"Long story short", starts from 19:00
Very interesting video! I had a couple of questions related to physical systems. Is there a way to quantify the robustness of such a controller to disturbances? Also, is there a way to “tune” the performance of such a controller without having to re-run the entire training step? Thanks!
GPT-3 Tranformers can generalize and think abstractly. I taught an agent how to apply a dialectical analysis to any given political or philosophical question and extract the Marxist position on it -- and it did it well. It can apply the method of dialectical deconstruction to ANY topic, ask the right questions, uncover the contradictions and arrive at a dynamic understanding of the topic. Transformers are the future
As always: almost uncanny production value. Will certainly recommend to others. I'd love if you'd follow up on this and get more technical, take us by the hand and explore the content in the niches and pockets within the field - maybe driven by curiosity? :p
I wonder if humans are really that good at transfer learning or is there something else at play.
Transfer learning isn't the only explanation of the fact that humans can learn new things from just a few examples.
I mean it could be that we are strongly pre-wired for certain tasks like language or movement, so in reality what we see as "learning" could just be the finishing touch on something that is mostly already within us.
I am no behavioral scientist but I would be curious what an expert in human learning would say.
My gut says that neural nets are extremely good at "training" but i doubt it's all to "learning".
Movements for example are really difficult. You can show or tell someone a movement and many can pull it off in a view attempts. But to master it takes years of training.
I think we use biological neural nets to do all the basic stuff.
But there is something else that let us design, evaluate and test a cost function, while we are doing something for the first time. There is something that lets us use informationen from others that let us discard most possible things and train in a much smaller solution space. Also i think we have a pretty good gut feeling if we are "under or over fitted" even without a testset.
I took 6 semesters of Spanish, was married to a native Mexican ( aka illegal alien) spen a great deal of time in Mexico and barely know a few phrases.
Do humans really take what they learn from playing Go to make them better at some other non-Go related task?
im not so sure about your claim that transfer learning/general intelligence is 100 years off.
i think all it will take is a single key insight (which might come in months... or even has already been made and just hasnt gotten into the right hands yet) into model generation to really make a breakthrough into AGI.
I really wish math notation was better. "Just grab whatever symbol from the greek alphabet and use it for whatever variable you want". rely on the reader to know what the asinine mapping of variables to greek letters stand for.
nice video though.
So your videos of google’s robots walking is actually based on evolutionary algorithms not RL…
Steve senpai!!!
Uploaded. Is it possible to set up real time data sharing between two basian reinforcement learning Algorithms to explore the same data space.
Is there a stack for. This? Can I get a grant for this😏😔
Why do want these to transfer to other games, so they can beat us at everything.
Nixe
ahh.. so that's where the Neural Network is implemented
would be awesome if the talk is spent on the ML coding stuff.... talking on the news is boring where it is reported almost everywhere ...
Hi Steve, come to clubhouse please!
I'll be honest, this was less informative than the last RL video. Definitely took the right direction working towards implementing Neural Networks into RL, but didn't really explain a whole lot. Title could have been "What has Been Accomplished with Neural Networks for Deep RL".
No one knows when AGI will happen, 10, 20, 30 years. Giving a falsified timeline like this is harmful to the public.
not impressed with this. very little information. mostly says “this is interesting” or “this paper is interesting”