TransWikia.com

df differs from du a lot, nfsd service seems matter

Server Fault Asked by Guangyu wu on February 4, 2021

I know there have been a lot of discussion on the df/du discrepancy topic. But here I’d post a special issue and ask for some hints.
here is the detail( with a hardware raid5 with 6 sas disks)

  1. system info of nfs server/client:
    [root@ndio06 ~]# cat /etc/release|grep CentOS
    CentOS Linux release 7.6.1810 (Core)
    NAME="CentOS Linux"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    CENTOS_MANTISBT_PROJECT="CentOS-7"
    CentOS Linux release 7.6.1810 (Core)
    [root@ndio06 ~]# uname -a
    Linux ndio06 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
    [root@ndio06 ~]# rpm -qa|grep nfs-util
    nfs-utils-1.3.0-0.61.el7.x86_64

The nfs server normally is under heavy load serving 48 busy nodes with lots of processes, io operation.

  1. df/du not aligned, and the gap is huge:
    [root@ndio06 ~]# df -hl /CAE;du -sh /CAE
    Filesystem Size Used Avail Use% Mounted on
    /dev/sdb1 5.0T 3.6T 1.4T 73% /CAE
    736G /CAE

  2. no “deleted but in use” files on nfs server or 40+ clients:
    [root@pbs ~]# for node in pestat|grep ndpam02|awk {'print $1'}|grep -v io; do echo –$node–;ssh $node lsof +D /CAE|grep -i "deleted" ;done
    –nd065–
    –nd066–
    –nd067–
    ……….

  3. mount options on the clients:

ndio06-ib:/CAE on /CAE type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.72,local_lock=none,addr=192.168.1.161)

  1. what has been observed:
    — an xfs_repair after unmount would fix the issue, however it would come up again after some weeks. Unfortunately I did not capture the information during the repair to see it there are any physical issues with the raid/disks.
  • a restart of nfsd service would fix it and df would give right percentage after a few tries (Each tries would get a lower percentage used%) or wait for a few minutes. Again the is just a temp fix and the issue would come up few days or weeks later.
  • The nfs service is still responsive unless df reports 100% usage. The nfsd thread number is set to 8.

Anybody can kindly give a light on this issue? Can provide more info if needed.

Thanks.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP