TransWikia.com

Tensorflow MirroredStrategy() looks like it may only be working on one GPU?

Data Science Asked by sectechguy on August 4, 2020

I finally got a computer with 2 gpus, and tested out https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html and https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10_estimator and confirmed that both gpus are being utilized in each(The wattage increases to 160-180 on both, Memory is almost maxed out on both, and GPU-Util increased to about 45% on both at the same time).

So I decided I would try out tensorflow’s MirroredStrategy() on an exitsting neural net I had trained with one GPU in the past.

What I don’t understand is that the wattage increases in both, and the memory is pretty much maxed in both but only one GPU looks like it is being utilized at 98% and the other one just chills at 3%. Am I messing something up in my code? Or is this working as designed?

strategy = tensorflow.distribute.MirroredStrategy()
with strategy.scope():
    model = tensorflow.keras.models.Sequential([
        tensorflow.keras.layers.Dense(units=427, kernel_initializer='uniform', activation='relu', input_dim=853),
        tensorflow.keras.layers.Dense(units=427, kernel_initializer='uniform',activation='relu'),
        tensorflow.keras.layers.Dense(units=1, kernel_initializer='uniform', activation='sigmoid')])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    model.fit(X_train, y_train, batch_size=1000, epochs=100)

nvidia-smi:

Fri Nov 22 09:26:21 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp COLLEC...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 24%   47C    P2    81W / 250W |  11733MiB / 12196MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp COLLEC...  Off  | 00000000:41:00.0  On |                  N/A |
| 28%   51C    P2    64W / 250W |  11736MiB / 12187MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2506      C   python3                                    11721MiB |
|    1      1312      G   /usr/lib/xorg/Xorg                            18MiB |
|    1      1353      G   /usr/bin/gnome-shell                          51MiB |
|    1      1620      G   /usr/lib/xorg/Xorg                           108MiB |
|    1      1751      G   /usr/bin/gnome-shell                          72MiB |
|    1      2506      C   python3                                    11473MiB |
+-----------------------------------------------------------------------------+

One Answer

I'm seeing the same thing, here I enabled just the first 2 GPU's. nvidia-smi

If I force it to use just 1 GPU the memory utilization drops but the amount of time per epoch is unchanged.

enter image description here

Answered by gt1485a on August 4, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP