Tensorflow Multi-gpu - Nccl
I have been wanting to increase my batch size to improve the generalization of my model (it's very batch size sensitive). The solution for this is to go multi-GPU in order to utili
Solution 1:
One solution from issue 21470 is to build nccl for Winx64. MyCaffe provides instructions for that here: https://github.com/MyCaffe/NCCL/blob/master/INSTALL.md
You'll need VS 2015, 2017, CUDA development package, and to put the produced .dlls in the correct location once compiled.
Solution 2:
In my experience some cross_device_ops
would not work and produce errors.
This option was meant for NVIDIA DGX-1 architecture and might underperform on other architectures :
strategy = tf.distribute.MirroredStrategy(
cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())
Should work :
strategy = tf.distribute.MirroredStrategy(
cross_device_ops=tf.distribute.ReductionToOneDevice())
Would not work with my configuration :
strategy = tf.distribute.MirroredStrategy(
cross_device_ops=tf.distribute.NcclAllReduce())
So that it can be advised to try the different options.
Post a Comment for "Tensorflow Multi-gpu - Nccl"