TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)

TensorFlow

มุมมอง 82 244

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 17 ธ.ค. 2024

ความคิดเห็น • 63

@davidm.johnston8994 4 ปีที่แล้ว ⁺³
I don't know if the two people teaching us in this presentation will ever get this message, but if you guys are reading this, I wanted to thank you so much! Not only for the clear explanations, but also for giving me a bit of perspective as to why this is an interesting topic to learn. I feel like we're a community, and maybe this is what I needed most in these pandemic times. Thanks again!
@rodamira 5 ปีที่แล้ว ⁺⁶
My parents' scarf store in San Francisco was just about to go bankrupt because nobody buys scarfs in SF. Then Martin Gorner moved there, and the business is thriving again!
@mayankpj 6 ปีที่แล้ว ⁺⁷
Here we go....... "without a Phd" I love your sessions!!
Thank you for doing this ....
@miraclemaxicl 5 ปีที่แล้ว ⁺³
Thank you for the insight. I was able to successfully apply your approach to problems in OpenAI gym
@bingeltube 6 ปีที่แล้ว ⁺²
Very recommendable! I liked the robot arm that learned how to flip pancakes
@maoryatskan6346 5 ปีที่แล้ว ⁺¹⁹
I must say something about the math, There are two ways teaching ML:
1. Pre-requiring from your audience the underlying math needed.
2. Explain the needed math during the lecture.
Ignoring the math or showing it without explaining it carefully in a detailed fashion, is not helpful.
I'm 3rd year college student and this is not a clear lecture for me.
All I got is that Tensorflow can do Reinforcement with NN, that we use softmax in the last layer.
What I'm missing is the full understanding of the pipeline/graph and the derivation part.
@biddls 4 ปีที่แล้ว
i think it's more of this is a basic RL look to Tf-Agents for more complex implementations of RL
@MartinGorner 6 ปีที่แล้ว ⁺⁶
You can find the slides and the code for this talk, as well as all the other talks in the "Tensorflow without a PhD" series at this URL: github.com/GoogleCloudPlatform/tensorflow-without-a-phd
@baotran7626 2 ปีที่แล้ว
Wonderful man Yeah
@crykrafter 6 ปีที่แล้ว ⁺⁶
Great explanation. Finally understood all the math behind drl
@0GRANATE0 2 ปีที่แล้ว ⁺¹
you joking? didn't watch it yet. so I should?
@0GRANATE0 2 ปีที่แล้ว ⁺¹
Ok you joked.
@axelsolhall5830 5 ปีที่แล้ว ⁺²
I can't quite get the loss function to work with TF2.0,
loss = tf.reduce_sum(R * cross_entropies)
model.compile(optimizer="Adam", loss=loss, metrics=['accuracy'])
TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed. Use `if t is not None:` instead of `if t:` to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.
Anyone got some advice? Thanks!
@Corpsecreate 4 ปีที่แล้ว
neither can I
@ajaykumarbharaj3283 6 ปีที่แล้ว ⁺³
great explanation by martin and Han . Great work
@carlaiau242 6 ปีที่แล้ว ⁺²
Great stuff!
Typo:
tf.losses.softmax_cross_entropy(one_hot_labels,
should be:
tf.losses.softmax_cross_entropy(onehot_labels,
@AkshayAradhya 6 ปีที่แล้ว ⁺⁶
Amazing talk. Just what I was looking for.
@EndAtDay 6 ปีที่แล้ว ⁺³
The code is here.
github.com/GoogleCloudPlatform/tensorflow-without-a-phd/tree/master/tensorflow-rl-pong
@jtfidje 6 ปีที่แล้ว ⁺¹
But when they take the reward and multiply by the cross_entropy, won't a negative reward ( loss ) turn the cross entropy negative? And by minimizing this, they actually encourage the algorithm to lose? I notice in the slides that they do: loss = - R( ... ), but I can't see this reflected in the code?
@ludovicsenez8038 ปีที่แล้ว
Hello, I don't know if you still care about it but if I got this correctly. You compute the gradient of the loss function. Even if the loss is negative, the gradient of a logarithme is positive. Actually, if the loss is negative the gradient of the loss is very large meaning the neural network didn't choose a good option.
@ronnypolle4350 6 ปีที่แล้ว ⁺²
This is incredible!! Thank you for the great talk!!
@redberries8039 6 ปีที่แล้ว ⁺⁴
I like this double-act thing they have going on
@czehoul 6 ปีที่แล้ว
Form the code, refer to the following line
loss = tf.reduce_sum(processed_rewards * cross_entropies + move_cost)
Could I know the reason processed_reward is passed in as it is instead of negating it? Cause to my understanding, even it is normalised, negative or small reward indicate losing point or result of bad action and it should be discouraged. And from the code it minimize loss in optimization function, so it seems to encourage bad action?
@MartinGorner 6 ปีที่แล้ว ⁺¹
the minus sign has already been applied in "cross-entropies"
@nik15oc 6 ปีที่แล้ว ⁺¹
How do we decide between SOFTMAX and SIGMOID as a function for the final layer?
@osaxma 6 ปีที่แล้ว ⁺⁷
Based on the number of outputs at the final layers. Generally if you have more than 2 outputs, you use softmax, and if you have two outputs or less at the output layer, you choose sigmoid.
By the way, in the hidden layer, Relu is more preferred than Sigmoid in deep learning. I encourage you to take the Deep Learning Specialization course at Coursera if you would like to have a "deeper" understanding...
@nik15oc 6 ปีที่แล้ว ⁺¹
Thanks so much
@joshuat6124 6 ปีที่แล้ว ⁺²
Completely echo what @Osama Alraddadi said.
@MartinGorner 6 ปีที่แล้ว ⁺¹
For classification (putting things into categories): SOFTMAX on the last layer. For regression (producing numbers with continuous values): most often you use no activation, just the weighted sum. You could use the SIGMOID for this if you are KO with output values between 0 and 1 (or TANH for values between -1 and 1). There is a good example of this in the YOLO algorithm for object detection explained in this session: th-cam.com/video/vaL1I2BD_xY/w-d-xo.html
@tobe259 6 ปีที่แล้ว
Actually softmax ist just a generalization of sigmoid to multiple classes. In case of 2 classes softmax is equivalent to sigmoid.
@tikke8511 6 ปีที่แล้ว
Can anybody explain why there is no dtype e.g. in
observations = tf.placeholder(shape=[None, 80x80])
@tikke8511 6 ปีที่แล้ว
The code is in: github.com/GoogleCloudPlatform/tensorflow-without-a-phd/tree/master/tensorflow-rl-pong
@tikke8511 6 ปีที่แล้ว ⁺¹
Does somebody know, is this actually the "REINFORE" algorithm? (Williams 1992)
@sciencemanguy 6 ปีที่แล้ว
yep
@himansuodedra2201 6 ปีที่แล้ว
where is the code for this? where is the game environment anyone know where i can find it ? thank you
@MartinGorner 6 ปีที่แล้ว
code with installation instructions: github.com/GoogleCloudPlatform/tensorflow-without-a-phd
@rubencg195 6 ปีที่แล้ว ⁺¹
Where is the code for this session?
@MartinGorner 6 ปีที่แล้ว
right here: github.com/GoogleCloudPlatform/tensorflow-without-a-phd/tree/master/tensorflow-rl-pong
@shishirnarayan3900 6 ปีที่แล้ว ⁺¹
This is adapted from Karpathy's Blog. The original post with the code and everything is here : karpathy.github.io/2016/05/31/rl/
@MartinGorner 6 ปีที่แล้ว ⁺¹
Yes it is. Credit where credit is due.
@gmnboss 6 ปีที่แล้ว ⁺⁵
Still need a PhD
@fahemhamou6170 2 ปีที่แล้ว ⁺¹
تحياتي الخالصة شكرا جزيلا
@proweiqi 6 ปีที่แล้ว
How to choose the number of neurons of 200 or 20
@sciencemanguy 6 ปีที่แล้ว
I've been doing a lot of reading on this, and it's very subjective, but deals with underfitting and overfitting. The more neurons there are the more overfitting the data will become, and vice versa with underfitting. To strike a balance, neural architecture comes into play.
@blipdeliveries1145 5 ปีที่แล้ว
Trial and error lol
@ezioalditore5346 6 ปีที่แล้ว
Pls introduce tensor flow for 32 bit pc
@constantinen.mbufung1618 6 ปีที่แล้ว ⁺¹
great talk.
@joppegeluykens1904 6 ปีที่แล้ว ⁺²
Is the example code published?
@MartinGorner 6 ปีที่แล้ว ⁺¹
Right here: github.com/GoogleCloudPlatform/tensorflow-without-a-phd
@MichaelFortner1989 6 ปีที่แล้ว ⁺⁴
Joppe Geluykens follow the link on the screen at 39:30
@joppegeluykens1904 6 ปีที่แล้ว ⁺¹
Michael Fortner Sweet, thanks!
@chethan93 6 ปีที่แล้ว ⁺²
Inspiring!!
@cyrilfurtado 6 ปีที่แล้ว
Nice will check the code
@vaizerdgrey 2 ปีที่แล้ว
up bhau teaching TF and RL
@Crystalzht 5 ปีที่แล้ว
Policy Gradient
@sevi95100 6 ปีที่แล้ว
One day to converge? Thats a lot! Imagine a more complex problem
@OleksandrFialko 6 ปีที่แล้ว ⁺⁴
of course you do not need PhD to understand simple linear algebra.
@MartinGorner 6 ปีที่แล้ว ⁺²
Thank you for that :-)
@Gamexx1000 6 ปีที่แล้ว ⁺¹
BRASILLL

ต่อไป

เล่นอัตโนมัติ

Machine Learning Zero to Hero (Google I/O'19)