Minute 11: Our interaction with the environment is not continual. There are special training periods: sleep -- a crucial step in all mammals, might even extend back to all vertebrates.
There are total insomniacs who cannot sleep for years, but they do not exhibit significant learning-related disabilities. Hence sleep should not be considered the only factor for unlearning the falsehoods.
@@erkinalp Sources? I thought sleep deprivation and disorders were pretty universally harmful to cognitive abilities. You cannot simply not sleep and be healthy and functional.
it seems a contradiction to say you want a model with no domain knowledge yet still having a reward function. doesnt knowledge of a reward imply knowledge of the domain of that reward? the amount of knowledge in the universe is nigh infinite, and we need that reward to anchor our focus on just that which has utility with respect to our goals(rewards).
I was wondering the very same thing. What's your reward function? With ChatGPT, the score comes from "did I predict the next word accurately?" I have no idea what this system is going to use. One possibility is -- is it going to be an auto-decoder? Don't know.
here I believe he means the "value function" defines the reward, specificially is it getting better or worse. It's not inputting an external reward. reward is part of perception and is learned by the value function (if you understand TD learning)
When you found the gizmo, it was a good metaphor on how you are freeing up the agent in the world with technology.
Minute 11: Our interaction with the environment is not continual. There are special training periods: sleep -- a crucial step in all mammals, might even extend back to all vertebrates.
There are total insomniacs who cannot sleep for years, but they do not exhibit significant learning-related disabilities. Hence sleep should not be considered the only factor for unlearning the falsehoods.
@@erkinalp you might want to wiki that, total insomnia (also called fatal insomnia because you die from it) causes hallucinations.
@@erkinalp Sources? I thought sleep deprivation and disorders were pretty universally harmful to cognitive abilities. You cannot simply not sleep and be healthy and functional.
This entire comment section looks AI generated 😂😂
it seems a contradiction to say you want a model with no domain knowledge yet still having a reward function. doesnt knowledge of a reward imply knowledge of the domain of that reward?
the amount of knowledge in the universe is nigh infinite, and we need that reward to anchor our focus on just that which has utility with respect to our goals(rewards).
I guess that's just semantics and that the point is that the reward function should encode all that is relevant about the domain?
I was wondering the very same thing. What's your reward function? With ChatGPT, the score comes from "did I predict the next word accurately?" I have no idea what this system is going to use. One possibility is -- is it going to be an auto-decoder? Don't know.
here I believe he means the "value function" defines the reward, specificially is it getting better or worse. It's not inputting an external reward. reward is part of perception and is learned by the value function (if you understand TD learning)
"paltry worries like THE ECONOMY IS IN TROUBLE".
The slides can be found on my web site richsutton.com.