The whole talk is about single node training and the title of distributed training is misleading. It barely mentions distributed at the end. Also only mentions async parameter server approach.
At 15:43 you are saying that adding the distribution strategy to the runconfig is sufficient to run gpus as well. But when I run this with tensorflow-gpu version 1.10, I am getting the error that optimizer.minimize uses apply_gradient() which doesnt work and that I should use _distributed_apply(). Is this because I am not using the proper versions or something?
That's cool. Definitely gonna try that. Would you please point too some tutorial which will help me to try out using my CPU/GPU for building ML models?
At 22:28 she says the training is more effective with more GPUs, but the graph is in step mode. Why are more GPUs more effective per step? I would have only expected it to be more effective per unit time.
my thoughts: more GPUs -> bigger batch size -> learning from more data per step -> more effective training. ( Although in practice bigger batch doesn't always yield better performance, but in this case it seems like so)
Priys Mam is an IIT graduate with gold medal and was also second rank in IIT JEE exam, one of the toughest exams in the world. She is such a legend ❤️❤️❤️❤️💓💓
I can't believe it. It's now 2020 and this presentation is still relevant. This is now a classic.
The whole talk is about single node training and the title of distributed training is misleading. It barely mentions distributed at the end. Also only mentions async parameter server approach.
the one which they are calling as Sync All reduce architecture can also be called as swarm ?
At 15:43 you are saying that adding the distribution strategy to the runconfig is sufficient to run gpus as well. But when I run this with tensorflow-gpu version 1.10, I am getting the error that optimizer.minimize uses apply_gradient() which doesnt work and that I should use _distributed_apply(). Is this because I am not using the proper versions or something?
Can I find the ppt somewhere, it will be handy for a quick revision
What is the difference between Distributed Tensorflow with GPU and TPU ? Can Tensorflow with GPU able to produce same performance as TPU ?
That's cool. Definitely gonna try that. Would you please point too some tutorial which will help me to try out using my CPU/GPU for building ML models?
I'm trying to build this GoogleCloud Compute Engine but no way to get 8 NVIDIA V100 nor 4 NVIDIA 100P :(
you can try a cluster of single board computers with cuda cores like the Jetson Nano.
At 22:28 she says the training is more effective with more GPUs, but the graph is in step mode. Why are more GPUs more effective per step? I would have only expected it to be more effective per unit time.
my thoughts: more GPUs -> bigger batch size -> learning from more data per step -> more effective training. ( Although in practice bigger batch doesn't always yield better performance, but in this case it seems like so)
Perfect! Big Thanks for this presentation
This one was amazing
I am the reason why you have to be successful
lol why all those empty seats
Because the main things that make the developers come to IO is about app and web development. Not everyone really into AI stuff...
finished watching
Tesla P100D is GPU really
great talk really amazing
Thanks !
That’s cool 😎
nice talk
Priys Mam is an IIT graduate with gold medal and was also second rank in IIT JEE exam, one of the toughest exams in the world. She is such a legend ❤️❤️❤️❤️💓💓