ขนาดวิดีโอ: 1280 X 720853 X 480640 X 360
แสดงแผงควบคุมโปรแกรมเล่น
เล่นอัตโนมัติ
เล่นใหม่
At 1:20, I guess that Policy = pi(a | s).
You deserve a nobel Prize
Can you explain the rationale for BELLMAN_STEPS (instead of taking every step) ? Also, how do you tune this parameter ?
03:11 what use of s prime ?
thank you so much I really need it !!
The code seems to perform the same, if not better, without the entropy loss (ENTROPY_BETA = 0). Also, I don't really understand the reason of entropy loss.
Seems like you are just reading the slides in all the videos
I feel like PTAN lib introduces unecessary complexity. Not ideal for a tutorial...
At 1:20, I guess that Policy = pi(a | s).
You deserve a nobel Prize
Can you explain the rationale for BELLMAN_STEPS (instead of taking every step) ? Also, how do you tune this parameter ?
03:11 what use of s prime ?
thank you so much I really need it !!
The code seems to perform the same, if not better, without the entropy loss (ENTROPY_BETA = 0). Also, I don't really understand the reason of entropy loss.
Seems like you are just reading the slides in all the videos
I feel like PTAN lib introduces unecessary complexity. Not ideal for a tutorial...