Tensorflow MirroredStrategy() looks like it may only be working on one GPU?

Question

I finally got a computer with 2 gpus, and tested out https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html and https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10_estimator and confirmed that both gpus are being utilized in each(The wattage increases to 160-180 on both, Memory is almost maxed out on both, and GPU-Util increased to about 45% on both at the same time).

So I decided I would try out tensorflow's MirroredStrategy() on an exitsting neural net I had trained with one GPU in the past.

What I don't understand is that the wattage increases in both, and the memory is pretty much maxed in both but only one GPU looks like it is being utilized at 98% and the other one just chills at 3%.  Am I messing something up in my code?  Or is this working as designed?

strategy = tensorflow.distribute.MirroredStrategy()
with strategy.scope():
    model = tensorflow.keras.models.Sequential([
        tensorflow.keras.layers.Dense(units=427, kernel_initializer='uniform', activation='relu', input_dim=853),
        tensorflow.keras.layers.Dense(units=427, kernel_initializer='uniform',activation='relu'),
        tensorflow.keras.layers.Dense(units=1, kernel_initializer='uniform', activation='sigmoid')])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    model.fit(X_train, y_train, batch_size=1000, epochs=100)

nvidia-smi:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2506      C   python3                                    11721MiB |
|    1      1312      G   /usr/lib/xorg/Xorg                            18MiB |
|    1      1353      G   /usr/bin/gnome-shell                          51MiB |
|    1      1620      G   /usr/lib/xorg/Xorg                           108MiB |
|    1      1751      G   /usr/bin/gnome-shell                          72MiB |
|    1      2506      C   python3                                    11473MiB |
+-----------------------------------------------------------------------------+

gt1485a · Answer

I'm seeing the same thing, here I enabled just the first 2 GPU's. 
If I force it to use just 1 GPU the memory utilization drops but the amount of time per epoch is unchanged.

Tensorflow MirroredStrategy() looks like it may only be working on one GPU?

One Answer

Add your own answers!

Ask a Question