Skip to content Skip to sidebar Skip to footer

Tensorflow Multi-gpu - Nccl

I have been wanting to increase my batch size to improve the generalization of my model (it's very batch size sensitive). The solution for this is to go multi-GPU in order to utili

Solution 1:

One solution from issue 21470 is to build nccl for Winx64. MyCaffe provides instructions for that here: https://github.com/MyCaffe/NCCL/blob/master/INSTALL.md

You'll need VS 2015, 2017, CUDA development package, and to put the produced .dlls in the correct location once compiled.

Solution 2:

In my experience some cross_device_ops would not work and produce errors.

This option was meant for NVIDIA DGX-1 architecture and might underperform on other architectures :

strategy = tf.distribute.MirroredStrategy(
    cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())

Should work :

strategy = tf.distribute.MirroredStrategy(
     cross_device_ops=tf.distribute.ReductionToOneDevice())

Would not work with my configuration :

strategy = tf.distribute.MirroredStrategy(
     cross_device_ops=tf.distribute.NcclAllReduce())

So that it can be advised to try the different options.

Post a Comment for "Tensorflow Multi-gpu - Nccl"