TransWikia.com

How to adjust NVIDIA GPU fan speed on a headless node?

Unix & Linux Asked by Aleksandr Dubinsky on December 16, 2020

How is it possible to control the fan speed of multiple consumer NVIDIA GPUs such as Titan and 1080 Ti on a headless node running Linux?

2 Answers

The following is a simple method that does not require scripting, connecting fake monitors, or fiddling and can be executed over SSH to control multiple NVIDIA GPUs' fans. It has been tested on Arch Linux.

Create xorg.conf

sudo nvidia-xconfig --allow-empty-initial-configuration --enable-all-gpus --cool-bits=7

This will create an /etc/X11/xorg.conf with an entry for each GPU, similar to the manual method.

Note: Some distributions (Fedora, CentOS, Manjaro) have additional config files (eg in /etc/X11/xorg.conf.d/ or /usr/share/X11/xorg.conf.d/), which override xorg.conf and set AllowNVIDIAGPUScreens. This option is not compatible with this guide. The extra config files should be modified or deleted. The X11 log file shows which config files have been loaded.

Alternative: Create xorg.conf manually

Identify your cards' PCI IDs:

nvidia-xconfig --query-gpu-info

Find the PCI BusID fields. Note that these are not the same as the bus IDs reported in the kernel.

Alternatively, do sudo startx, open /var/log/Xorg.0.log (or whatever location startX lists in its output under the line "Log file:"), and look for the line NVIDIA(0): Valid display device(s) on GPU-<GPU number> at PCI:<PCI ID>.

Edit /etc/X11/xorg.conf

Here is an example of xorg.conf for a three-GPU machine:

Section "ServerLayout"
        Identifier "dual"
        Screen 0 "Screen0"
        Screen 1 "Screen1" RightOf "Screen0"
        Screen 1 "Screen2" RightOf "Screen1"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:5:0:0"
    Option         "Coolbits"       "7"
    Option         "AllowEmptyInitialConfiguration"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:6:0:0"
    Option         "Coolbits"       "7"
    Option         "AllowEmptyInitialConfiguration"
EndSection

Section "Device"
    Identifier     "Device2"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:9:0:0"
    Option         "Coolbits"       "7"
    Option         "AllowEmptyInitialConfiguration"
EndSection

Section "Screen"
        Identifier     "Screen0"
        Device         "Device0"
EndSection

Section "Screen"
        Identifier     "Screen1"
        Device         "Device1"
EndSection

Section "Screen"
        Identifier     "Screen2"
        Device         "Device2"
EndSection

The BusID must match the bus IDs we identified in the previous step. The option AllowEmptyInitialConfiguration allows X to start even if no monitor is connected. The option Coolbits allows fans to be controlled. It can also allow overclocking.

Note: Some distributions (Fedora, CentOS, Manjaro) have additional config files (eg in /etc/X11/xorg.conf.d/ or /usr/share/X11/xorg.conf.d/), which override xorg.conf and set AllowNVIDIAGPUScreens. This option is not compatible with this guide. The extra config files should be modified or deleted. The X11 log file shows which config files have been loaded.

Edit /root/.xinitrc

nvidia-settings -q fans
nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=75
nvidia-settings -a [gpu:1]/GPUFanControlState=1 -a [fan:1]/GPUTargetFanSpeed=75
nvidia-settings -a [gpu:2]/GPUFanControlState=1 -a [fan:2]/GPUTargetFanSpeed=75

I use .xinitrc to execute nvidia-settings for convenience, although there's probably other ways. The first line will print out every GPU fan in the system. Here, I set the fans to 75%.

Launch X

sudo startx -- :0

You can execute this command from SSH. The output will be:

Current version of pixman: 0.34.0
    Before reporting problems, check http://wiki.x.org
    to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
    (++) from command line, (!!) notice, (II) informational,
    (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sat May 27 02:22:08 2017
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"

  Attribute 'GPUFanControlState' (pushistik:0[gpu:0]) assigned value 1.

  Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:0]) assigned value 75.


  Attribute 'GPUFanControlState' (pushistik:0[gpu:1]) assigned value 1.

  Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:1]) assigned value 75.


  Attribute 'GPUFanControlState' (pushistik:0[gpu:2]) assigned value 1.

  Attribute 'GPUTargetFanSpeed' (pushistik:0[fan:2]) assigned value 75.

Monitor temperatures and clock speeds

nvidia-smi and nvtop can be used to observe temperatures and power draw. Lower temperatures will allow the card to clock higher and increase its power draw. You can use sudo nvidia-smi -pl 150 to limit power draw and keep the cards cool, or use sudo nvidia-smi -pl 300 to let them overclock. My 1080 Ti runs at 1480 MHz if given 150W, and over 1800 MHz if given 300W, but this depends on the workload. You can monitor their clock speed with nvidia-smi -q or more specifically, watch 'nvidia-smi -q | grep -E "Utilization| Graphics|Power Draw"'

Returning to automatic fan management.

Reboot. I haven't found another way to make the fans automatic.

Correct answer by Aleksandr Dubinsky on December 16, 2020

I've written a pip-installable Python script to do something similar to @AlexsandrDubinsky's suggestion.

When you run fans.py, it sets up a temporary X server for each GPU with a fake display attached. Then, it loops over the GPUs every few seconds and sets the fan speed according to their temperature. When the script dies, it returns control of the fans to the drivers and cleans up the X servers.

Answered by Andy Jones on December 16, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP