Thanks for including links to articles in your talk. I was especially happy to learn about your future plans with distributed TensorFlow and Horovod. Thank you for mentioning that.
This is awesome! Good to hear that it is not tied to estimators. It is very hacky to build reinforcement learning systems with Estimator API since "input" is often generated and depend on previous model output, label could be extracted from internal state (prioritized memory for DQN), and number of steps depend on environment feedback. Looking forward for this to become available for Graph Session API !
Does someone have information/links on how the inter-node communication works? MPI? NCCL? Both? Were these results obtained using TCP over LAN or Infiniband/RoCE?
You might imaging a few ways. But don't think there is anything super high-tech. You need to "resolve" the gradient values. This could just mean taking the average, the mins, the maxes. Or, you might try to see which give the best cost and then use those gradients for all nodes.
Thanks for including links to articles in your talk. I was especially happy to learn about your future plans with distributed TensorFlow and Horovod. Thank you for mentioning that.
the best software engineer in the world
This is awesome!
Good to hear that it is not tied to estimators. It is very hacky to build reinforcement learning systems with Estimator API since "input" is often generated and depend on previous model output, label could be extracted from internal state (prioritized memory for DQN), and number of steps depend on environment feedback.
Looking forward for this to become available for Graph Session API !
Great presentation! Interesting to see a mention of Uber Horovod as a way to embrace the Open Source commitment from Google/TF team.
I wabted to see on example for mirrored strategy for say imagenet. R there any links
Steady so !
Does someone have information/links on how the inter-node communication works? MPI? NCCL? Both? Were these results obtained using TCP over LAN or Infiniband/RoCE?
honestly i have little idea how the distributed gradients can be combined together.
You might imaging a few ways. But don't think there is anything super high-tech. You need to "resolve" the gradient values. This could just mean taking the average, the mins, the maxes. Or, you might try to see which give the best cost and then use those gradients for all nodes.
Great talk.
Great Talk
COOL, THANKS