TransWikia.com

Pool gone with upgrade to Ubuntu 20.04

Server Fault Asked by FlorentR on November 7, 2021

I upgraded my server (SuperMicro X11-SSM-F, LSI SAS 9211-8i) from Ubuntu 18.04 to 20.04. The server had 2 zpools, one composed of a single WD Red 10 TB (downloadpool), and the other composed of 8 WD Red 10TB and 2 Seagate IronWolf 8TB arranged in 5x 2 mirrors (masterpool). The pools were created using /dev/disk/by-id references so as to be stable across restarts. The pools are scrubbed regularly, and the last scrub was a couple of weeks old and didn’t show any errors.

When I rebooted after updating to Ubuntu 20.04, the second pool (masterpool) was gone. After running zfs import, it reimported it, but using sdX references for most of the disks (the WD Reds, but not the Seagates). Also, the pool with the single lone WD Red was fine and was referencing its disk by-id.
The output of zpool status for masterpool looked something like this (this is from memory):

    NAME                                  STATE     READ WRITE CKSUM
    masterpool                            ONLINE       0     0     0
      mirror-0                            ONLINE       0     0     0
        sdb                               ONLINE       0     0     0
        sdk                               ONLINE       0     0     0
      mirror-1                            ONLINE       0     0     0
        sdi                               ONLINE       0     0     0
        sdf                               ONLINE       0     0     0
      mirror-2                            ONLINE       0     0     0
        sdd                               ONLINE       0     0     0
        sde                               ONLINE       0     0     0
      mirror-3                            ONLINE       0     0     0
        sdh                               ONLINE       0     0     0
        sdc                               ONLINE       0     0     0
      mirror-4                            ONLINE       0     0     0
        ata-ST8000VN0022-2EL112_ZA17FZXF  ONLINE       0     0     0
        ata-ST8000VN0022-2EL112_ZA17H5D3  ONLINE       0     0     0

This is not ideal, because these identifiers are not stable, so after looking a bit online, I re-exported the pool, and ran zpool import -d /dev/disk/by-id masterpool.

But now, zpool is telling me that there are checksum errors:

    NAME                                  STATE     READ WRITE CKSUM
    masterpool                            ONLINE       0     0     0
      mirror-0                            ONLINE       0     0     0
        wwn-0x5000cca26af27d8b            ONLINE       0     0     2
        wwn-0x5000cca273ee8907            ONLINE       0     0     0
      mirror-1                            ONLINE       0     0     0
        wwn-0x5000cca26aeb9280            ONLINE       0     0     8
        wwn-0x5000cca273eeaed7            ONLINE       0     0     0
      mirror-2                            ONLINE       0     0     0
        wwn-0x5000cca273c21a05            ONLINE       0     0     0
        wwn-0x5000cca267eaa17a            ONLINE       0     0     0
      mirror-3                            ONLINE       0     0     0
        wwn-0x5000cca26af7e655            ONLINE       0     0     0
        wwn-0x5000cca273c099dd            ONLINE       0     0     0
      mirror-4                            ONLINE       0     0     0
        ata-ST8000VN0022-2EL112_ZA17FZXF  ONLINE       0     0     0
        ata-ST8000VN0022-2EL112_ZA17H5D3  ONLINE       0     0     0

So I’m running a scrub, and zfs has found a few more checksum errors:

  pool: masterpool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub in progress since Fri May 22 21:47:34 2020
        27.1T scanned at 600M/s, 27.0T issued at 597M/s, 31.1T total
        112K repaired, 86.73% done, 0 days 02:00:45 to go
config:

        NAME                                  STATE     READ WRITE CKSUM
        masterpool                            DEGRADED     0     0     0
          mirror-0                            DEGRADED     0     0     0
            wwn-0x5000cca26af27d8b            DEGRADED     0     0    15  too many errors  (repairing)
            wwn-0x5000cca273ee8907            ONLINE       0     0     0
          mirror-1                            DEGRADED     0     0     0
            wwn-0x5000cca26aeb9280            DEGRADED     0     0    18  too many errors  (repairing)
            wwn-0x5000cca273eeaed7            ONLINE       0     0     0
          mirror-2                            ONLINE       0     0     0
            wwn-0x5000cca273c21a05            ONLINE       0     0     0
            wwn-0x5000cca267eaa17a            ONLINE       0     0     0
          mirror-3                            ONLINE       0     0     0
            wwn-0x5000cca26af7e655            ONLINE       0     0     0
            wwn-0x5000cca273c099dd            ONLINE       0     0     0
          mirror-4                            ONLINE       0     0     0
            ata-ST8000VN0022-2EL112_ZA17FZXF  ONLINE       0     0     0
            ata-ST8000VN0022-2EL112_ZA17H5D3  ONLINE       0     0     0

Weirdly, smartctl does not show anything amiss in the smart monitoring data (similar output for both disks, just showing one):

$ sudo smartctl /dev/disk/by-id/wwn-0x5000cca26aeb9280 -a
...
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   129   129   054    Old_age   Offline      -       112
  3 Spin_Up_Time            0x0007   153   153   024    Pre-fail  Always       -       431 (Average 430)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       31
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       15474
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       31
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       664
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       664
194 Temperature_Celsius     0x0002   158   158   000    Old_age   Always       -       41 (Min/Max 16/41)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%        19         -
# 2  Short offline       Completed without error       00%         0         -

...

Also, I notice that many of the aliases in /dev/disk/by-id are gone (all the ata-* for the WD Reds except the lone one in cloudpool):

# ls /dev/disk/by-id/ -l
total 0
lrwxrwxrwx 1 root root  9 May 22 23:19 ata-Samsung_SSD_850_EVO_500GB_S2RANX0H608885H -> ../../sda
lrwxrwxrwx 1 root root 10 May 22 23:19 ata-Samsung_SSD_850_EVO_500GB_S2RANX0H608885H-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 May 23 01:28 ata-ST8000VN0022-2EL112_ZA17FZXF -> ../../sdc
lrwxrwxrwx 1 root root 10 May 23 01:28 ata-ST8000VN0022-2EL112_ZA17FZXF-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 23 01:28 ata-ST8000VN0022-2EL112_ZA17FZXF-part9 -> ../../sdc9
lrwxrwxrwx 1 root root  9 May 23 01:16 ata-ST8000VN0022-2EL112_ZA17H5D3 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 23 01:16 ata-ST8000VN0022-2EL112_ZA17H5D3-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 23 01:16 ata-ST8000VN0022-2EL112_ZA17H5D3-part9 -> ../../sdb9
lrwxrwxrwx 1 root root  9 May 22 23:21 ata-WDC_WD100EFAX-68LHPN0_2YG1R7PD -> ../../sdd
lrwxrwxrwx 1 root root 10 May 22 23:21 ata-WDC_WD100EFAX-68LHPN0_2YG1R7PD-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 22 23:21 ata-WDC_WD100EFAX-68LHPN0_2YG1R7PD-part9 -> ../../sdd9
lrwxrwxrwx 1 root root  9 May 22 23:19 scsi-0ATA_Samsung_SSD_850_S2RANX0H608885H -> ../../sda
lrwxrwxrwx 1 root root 10 May 22 23:19 scsi-0ATA_Samsung_SSD_850_S2RANX0H608885H-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 May 23 01:28 scsi-0ATA_ST8000VN0022-2EL_ZA17FZXF -> ../../sdc
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-0ATA_ST8000VN0022-2EL_ZA17FZXF-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-0ATA_ST8000VN0022-2EL_ZA17FZXF-part9 -> ../../sdc9
lrwxrwxrwx 1 root root  9 May 23 01:16 scsi-0ATA_ST8000VN0022-2EL_ZA17H5D3 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-0ATA_ST8000VN0022-2EL_ZA17H5D3-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-0ATA_ST8000VN0022-2EL_ZA17H5D3-part9 -> ../../sdb9
lrwxrwxrwx 1 root root  9 May 22 23:21 scsi-0ATA_WDC_WD100EFAX-68_2YG1R7PD -> ../../sdd
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-0ATA_WDC_WD100EFAX-68_2YG1R7PD-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-0ATA_WDC_WD100EFAX-68_2YG1R7PD-part9 -> ../../sdd9
lrwxrwxrwx 1 root root  9 May 22 23:19 scsi-1ATA_Samsung_SSD_850_EVO_500GB_S2RANX0H608885H -> ../../sda
lrwxrwxrwx 1 root root 10 May 22 23:19 scsi-1ATA_Samsung_SSD_850_EVO_500GB_S2RANX0H608885H-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 May 23 01:28 scsi-1ATA_ST8000VN0022-2EL112_ZA17FZXF -> ../../sdc
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-1ATA_ST8000VN0022-2EL112_ZA17FZXF-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-1ATA_ST8000VN0022-2EL112_ZA17FZXF-part9 -> ../../sdc9
lrwxrwxrwx 1 root root  9 May 23 01:16 scsi-1ATA_ST8000VN0022-2EL112_ZA17H5D3 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-1ATA_ST8000VN0022-2EL112_ZA17H5D3-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-1ATA_ST8000VN0022-2EL112_ZA17H5D3-part9 -> ../../sdb9
lrwxrwxrwx 1 root root  9 May 22 23:21 scsi-1ATA_WDC_WD100EFAX-68LHPN0_2YG1R7PD -> ../../sdd
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-1ATA_WDC_WD100EFAX-68LHPN0_2YG1R7PD-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-1ATA_WDC_WD100EFAX-68LHPN0_2YG1R7PD-part9 -> ../../sdd9
lrwxrwxrwx 1 root root  9 May 23 01:28 scsi-35000c500a2e631c6 -> ../../sdc
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-35000c500a2e631c6-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-35000c500a2e631c6-part9 -> ../../sdc9
lrwxrwxrwx 1 root root  9 May 23 01:16 scsi-35000c500a2edebe0 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-35000c500a2edebe0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-35000c500a2edebe0-part9 -> ../../sdb9
lrwxrwxrwx 1 root root  9 May 23 00:38 scsi-35000cca267eaa17a -> ../../sdg
lrwxrwxrwx 1 root root 10 May 23 00:38 scsi-35000cca267eaa17a-part1 -> ../../sdg1
lrwxrwxrwx 1 root root 10 May 23 00:38 scsi-35000cca267eaa17a-part9 -> ../../sdg9
lrwxrwxrwx 1 root root  9 May 23 01:20 scsi-35000cca26aeb9280 -> ../../sdl
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-35000cca26aeb9280-part1 -> ../../sdl1
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-35000cca26aeb9280-part9 -> ../../sdl9
lrwxrwxrwx 1 root root  9 May 23 01:20 scsi-35000cca26af27d8b -> ../../sdk
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-35000cca26af27d8b-part1 -> ../../sdk1
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-35000cca26af27d8b-part9 -> ../../sdk9
lrwxrwxrwx 1 root root  9 May 23 02:35 scsi-35000cca26af7e655 -> ../../sdi
lrwxrwxrwx 1 root root 10 May 23 02:35 scsi-35000cca26af7e655-part1 -> ../../sdi1
lrwxrwxrwx 1 root root 10 May 23 02:35 scsi-35000cca26af7e655-part9 -> ../../sdi9
lrwxrwxrwx 1 root root  9 May 23 00:35 scsi-35000cca273c099dd -> ../../sdf
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-35000cca273c099dd-part1 -> ../../sdf1
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-35000cca273c099dd-part9 -> ../../sdf9
lrwxrwxrwx 1 root root  9 May 22 23:21 scsi-35000cca273c0c7e3 -> ../../sdd
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-35000cca273c0c7e3-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-35000cca273c0c7e3-part9 -> ../../sdd9
lrwxrwxrwx 1 root root  9 May 23 03:01 scsi-35000cca273c21a05 -> ../../sdj
lrwxrwxrwx 1 root root 10 May 23 03:01 scsi-35000cca273c21a05-part1 -> ../../sdj1
lrwxrwxrwx 1 root root 10 May 23 03:01 scsi-35000cca273c21a05-part9 -> ../../sdj9
lrwxrwxrwx 1 root root  9 May 23 00:35 scsi-35000cca273ee8907 -> ../../sde
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-35000cca273ee8907-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-35000cca273ee8907-part9 -> ../../sde9
lrwxrwxrwx 1 root root  9 May 23 00:04 scsi-35000cca273eeaed7 -> ../../sdh
lrwxrwxrwx 1 root root 10 May 23 00:04 scsi-35000cca273eeaed7-part1 -> ../../sdh1
lrwxrwxrwx 1 root root 10 May 23 00:04 scsi-35000cca273eeaed7-part9 -> ../../sdh9
lrwxrwxrwx 1 root root  9 May 22 23:19 scsi-35002538d40f8ba4c -> ../../sda
lrwxrwxrwx 1 root root 10 May 22 23:19 scsi-35002538d40f8ba4c-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 May 22 23:19 scsi-SATA_Samsung_SSD_850_S2RANX0H608885H -> ../../sda
lrwxrwxrwx 1 root root 10 May 22 23:19 scsi-SATA_Samsung_SSD_850_S2RANX0H608885H-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 May 23 01:28 scsi-SATA_ST8000VN0022-2EL_ZA17FZXF -> ../../sdc
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-SATA_ST8000VN0022-2EL_ZA17FZXF-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 23 01:28 scsi-SATA_ST8000VN0022-2EL_ZA17FZXF-part9 -> ../../sdc9
lrwxrwxrwx 1 root root  9 May 23 01:16 scsi-SATA_ST8000VN0022-2EL_ZA17H5D3 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-SATA_ST8000VN0022-2EL_ZA17H5D3-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 23 01:16 scsi-SATA_ST8000VN0022-2EL_ZA17H5D3-part9 -> ../../sdb9
lrwxrwxrwx 1 root root  9 May 23 01:20 scsi-SATA_WDC_WD100EFAX-68_2TK2VELD -> ../../sdl
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-SATA_WDC_WD100EFAX-68_2TK2VELD-part1 -> ../../sdl1
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-SATA_WDC_WD100EFAX-68_2TK2VELD-part9 -> ../../sdl9
lrwxrwxrwx 1 root root  9 May 23 01:20 scsi-SATA_WDC_WD100EFAX-68_2TKL26ZD -> ../../sdk
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-SATA_WDC_WD100EFAX-68_2TKL26ZD-part1 -> ../../sdk1
lrwxrwxrwx 1 root root 10 May 23 01:20 scsi-SATA_WDC_WD100EFAX-68_2TKL26ZD-part9 -> ../../sdk9
lrwxrwxrwx 1 root root  9 May 23 02:35 scsi-SATA_WDC_WD100EFAX-68_2TKYZ3ND -> ../../sdi
lrwxrwxrwx 1 root root 10 May 23 02:35 scsi-SATA_WDC_WD100EFAX-68_2TKYZ3ND-part1 -> ../../sdi1
lrwxrwxrwx 1 root root 10 May 23 02:35 scsi-SATA_WDC_WD100EFAX-68_2TKYZ3ND-part9 -> ../../sdi9
lrwxrwxrwx 1 root root  9 May 23 00:35 scsi-SATA_WDC_WD100EFAX-68_2YG19ZMD -> ../../sdf
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-SATA_WDC_WD100EFAX-68_2YG19ZMD-part1 -> ../../sdf1
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-SATA_WDC_WD100EFAX-68_2YG19ZMD-part9 -> ../../sdf9
lrwxrwxrwx 1 root root  9 May 22 23:21 scsi-SATA_WDC_WD100EFAX-68_2YG1R7PD -> ../../sdd
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-SATA_WDC_WD100EFAX-68_2YG1R7PD-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 22 23:21 scsi-SATA_WDC_WD100EFAX-68_2YG1R7PD-part9 -> ../../sdd9
lrwxrwxrwx 1 root root  9 May 23 03:01 scsi-SATA_WDC_WD100EFAX-68_2YG4MA0D -> ../../sdj
lrwxrwxrwx 1 root root 10 May 23 03:01 scsi-SATA_WDC_WD100EFAX-68_2YG4MA0D-part1 -> ../../sdj1
lrwxrwxrwx 1 root root 10 May 23 03:01 scsi-SATA_WDC_WD100EFAX-68_2YG4MA0D-part9 -> ../../sdj9
lrwxrwxrwx 1 root root  9 May 23 00:35 scsi-SATA_WDC_WD100EFAX-68_2YK9BHKD -> ../../sde
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-SATA_WDC_WD100EFAX-68_2YK9BHKD-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 May 23 00:35 scsi-SATA_WDC_WD100EFAX-68_2YK9BHKD-part9 -> ../../sde9
lrwxrwxrwx 1 root root  9 May 23 00:04 scsi-SATA_WDC_WD100EFAX-68_2YK9PKUD -> ../../sdh
lrwxrwxrwx 1 root root 10 May 23 00:04 scsi-SATA_WDC_WD100EFAX-68_2YK9PKUD-part1 -> ../../sdh1
lrwxrwxrwx 1 root root 10 May 23 00:04 scsi-SATA_WDC_WD100EFAX-68_2YK9PKUD-part9 -> ../../sdh9
lrwxrwxrwx 1 root root  9 May 23 00:38 scsi-SATA_WDC_WD100EFAX-68_JEK0T76Z -> ../../sdg
lrwxrwxrwx 1 root root 10 May 23 00:38 scsi-SATA_WDC_WD100EFAX-68_JEK0T76Z-part1 -> ../../sdg1
lrwxrwxrwx 1 root root 10 May 23 00:38 scsi-SATA_WDC_WD100EFAX-68_JEK0T76Z-part9 -> ../../sdg9
lrwxrwxrwx 1 root root  9 May 23 01:28 wwn-0x5000c500a2e631c6 -> ../../sdc
lrwxrwxrwx 1 root root 10 May 23 01:28 wwn-0x5000c500a2e631c6-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 May 23 01:28 wwn-0x5000c500a2e631c6-part9 -> ../../sdc9
lrwxrwxrwx 1 root root  9 May 23 01:16 wwn-0x5000c500a2edebe0 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 23 01:16 wwn-0x5000c500a2edebe0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 23 01:16 wwn-0x5000c500a2edebe0-part9 -> ../../sdb9
lrwxrwxrwx 1 root root  9 May 23 00:38 wwn-0x5000cca267eaa17a -> ../../sdg
lrwxrwxrwx 1 root root 10 May 23 00:38 wwn-0x5000cca267eaa17a-part1 -> ../../sdg1
lrwxrwxrwx 1 root root 10 May 23 00:38 wwn-0x5000cca267eaa17a-part9 -> ../../sdg9
lrwxrwxrwx 1 root root  9 May 23 01:20 wwn-0x5000cca26aeb9280 -> ../../sdl
lrwxrwxrwx 1 root root 10 May 23 01:20 wwn-0x5000cca26aeb9280-part1 -> ../../sdl1
lrwxrwxrwx 1 root root 10 May 23 01:20 wwn-0x5000cca26aeb9280-part9 -> ../../sdl9
lrwxrwxrwx 1 root root  9 May 23 01:20 wwn-0x5000cca26af27d8b -> ../../sdk
lrwxrwxrwx 1 root root 10 May 23 01:20 wwn-0x5000cca26af27d8b-part1 -> ../../sdk1
lrwxrwxrwx 1 root root 10 May 23 01:20 wwn-0x5000cca26af27d8b-part9 -> ../../sdk9
lrwxrwxrwx 1 root root  9 May 23 02:35 wwn-0x5000cca26af7e655 -> ../../sdi
lrwxrwxrwx 1 root root 10 May 23 02:35 wwn-0x5000cca26af7e655-part1 -> ../../sdi1
lrwxrwxrwx 1 root root 10 May 23 02:35 wwn-0x5000cca26af7e655-part9 -> ../../sdi9
lrwxrwxrwx 1 root root  9 May 23 00:35 wwn-0x5000cca273c099dd -> ../../sdf
lrwxrwxrwx 1 root root 10 May 23 00:35 wwn-0x5000cca273c099dd-part1 -> ../../sdf1
lrwxrwxrwx 1 root root 10 May 23 00:35 wwn-0x5000cca273c099dd-part9 -> ../../sdf9
lrwxrwxrwx 1 root root  9 May 22 23:21 wwn-0x5000cca273c0c7e3 -> ../../sdd
lrwxrwxrwx 1 root root 10 May 22 23:21 wwn-0x5000cca273c0c7e3-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 May 22 23:21 wwn-0x5000cca273c0c7e3-part9 -> ../../sdd9
lrwxrwxrwx 1 root root  9 May 23 03:01 wwn-0x5000cca273c21a05 -> ../../sdj
lrwxrwxrwx 1 root root 10 May 23 03:01 wwn-0x5000cca273c21a05-part1 -> ../../sdj1
lrwxrwxrwx 1 root root 10 May 23 03:01 wwn-0x5000cca273c21a05-part9 -> ../../sdj9
lrwxrwxrwx 1 root root  9 May 23 00:35 wwn-0x5000cca273ee8907 -> ../../sde
lrwxrwxrwx 1 root root 10 May 23 00:35 wwn-0x5000cca273ee8907-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 May 23 00:35 wwn-0x5000cca273ee8907-part9 -> ../../sde9
lrwxrwxrwx 1 root root  9 May 23 00:04 wwn-0x5000cca273eeaed7 -> ../../sdh
lrwxrwxrwx 1 root root 10 May 23 00:04 wwn-0x5000cca273eeaed7-part1 -> ../../sdh1
lrwxrwxrwx 1 root root 10 May 23 00:04 wwn-0x5000cca273eeaed7-part9 -> ../../sdh9
lrwxrwxrwx 1 root root  9 May 22 23:19 wwn-0x5002538d40f8ba4c -> ../../sda
lrwxrwxrwx 1 root root 10 May 22 23:19 wwn-0x5002538d40f8ba4c-part1 -> ../../sda1

So this raises many questions:

1) Why did my pool disappear? Is that because the symlinks in /dev/disk/by-id/ disappeared, and zfs couldn’t locate most of the disks?

2) Are the checksum errors worrysome? The disks seem perfectly healthy. I just looked at a couple of dirs and files while the pool was imported with the sdX references, could that have caused checksum to be incorrectly rewritten if zfs imported the disks in the wrong order?

3) How do I get the missing /dev/disk/by-id/ata-* symlinks back? Has something changed with Ubuntu 20.04 there that would have caused them to disappear?

4) I thought it was a good idea to refer to my disks through /dev/disk/by-id/, because these would be stable. Is that not the best way to go about it?

5) I don’t like the wwn-* names because they are non-descriptive to me. I’d much rather have names that reflect the serial number of the disk, so I can easily identify them if I need to do a replacement. I’ve gone ahead and setup aliases in /dev/disk/by-vdev/ (aliasing to wwn-*), following the advice in http://kbdone.com/zfs-basics/#Consistent_device_IDs_via_vdev_idconf_file:

$ cat /etc/zfs/vdev_id.conf
alias ST8000VN0022-2EL_ZA17H5D3 /dev/disk/by-id/wwn-0x5000c500a2edebe0
alias ST8000VN0022-2EL_ZA17FZXF /dev/disk/by-id/wwn-0x5000c500a2e631c6
alias WD100EFAX-68_2YG1R7PD /dev/disk/by-id/wwn-0x5000cca273c0c7e3
alias WD100EFAX-68_2YK9BHKD /dev/disk/by-id/wwn-0x5000cca273ee8907
...

Thoughts?

Thanks!

Edit: output from zpool status after scrub completion:

root@cloud:~# zpool status
  pool: downloadpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0 days 11:33:18 with 0 errors on Sun May 10 11:57:19 2020
config:

        NAME                                  STATE     READ WRITE CKSUM
        downloadpool                          ONLINE       0     0     0
          ata-WDC_WD100EFAX-68LHPN0_2YG1R7PD  ONLINE       0     0     0

errors: No known data errors

  pool: masterpool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 112K in 0 days 15:06:09 with 0 errors on Sat May 23 12:53:43 2020
config:

        NAME                                  STATE     READ WRITE CKSUM
        masterpool                            DEGRADED     0     0     0
          mirror-0                            DEGRADED     0     0     0
            wwn-0x5000cca26af27d8b            DEGRADED     0     0    15  too many errors
            wwn-0x5000cca273ee8907            ONLINE       0     0     0
          mirror-1                            DEGRADED     0     0     0
            wwn-0x5000cca26aeb9280            DEGRADED     0     0    18  too many errors
            wwn-0x5000cca273eeaed7            ONLINE       0     0     0
          mirror-2                            ONLINE       0     0     0
            wwn-0x5000cca273c21a05            ONLINE       0     0     0
            wwn-0x5000cca267eaa17a            ONLINE       0     0     0
          mirror-3                            ONLINE       0     0     0
            wwn-0x5000cca26af7e655            ONLINE       0     0     0
            wwn-0x5000cca273c099dd            ONLINE       0     0     0
          mirror-4                            ONLINE       0     0     0
            ata-ST8000VN0022-2EL112_ZA17FZXF  ONLINE       0     0     0
            ata-ST8000VN0022-2EL112_ZA17H5D3  ONLINE       0     0     0

errors: No known data errors

One Answer

I had the exact same problem. Your post helped me in the right direction. So here are my thoughts.

I have 6 drives, 2 drives in zfs pool 'A' attached to the SATA controller of the motherboard, and 4 drives in zfs pool 'B' attached to my LSI SAS 9211 controller. The pools where setup to look for devices in /dev/disk/by-id.

After upgrading from Ubuntu 18.04 to Ubuntu 20.04, the device id's of all disks attached to the SAS controller where changed, from device id ata-* to scsi-SATA*. After rebooting the server, zfs pool B was missing, because zfs couldn't find the device id's anymore during import. The device id's of the drives connected to the SATA controller on the motherboard stayed the same. The zfs pool using those drives could be imported and wasn't missing after the release upgrade.

This is how I fixed the missing 'B' pool:

First I listed all pools that where available for import:

sudo zpool import

This listed my missing pool 'B', and all the correct drives in that pool, but named as devices listed in /dev. So I imported the pool using device id's listed in /dev/disk/by-id. I got a warning that the pool appears to be potentially active, so I had to force import using -f, like this:

sudo zpool import -f -d /dev/disk/by-id B

And everything was fine again. Pool B was available again. I didn't export the pool. I didn't import the pool without telling to use device id's first. The device id's used are different now: wwn-*

I ran a scrub on the pool, resulting in no errors.

To answer your questions:

  1. I think the release upgrade from Ubuntu 18.04 to 20.04 caused the links in /dev/disk/by-id changing.

  2. I didn't import the pool with the /dev references, and I imported using the option -f. That would be the difference with what you did and what I did. But I can't imagine this would be a problem, unless the wrong drives where used.

  3. I didn't get the old disk by id links back. But by importing the pool using the directive to use the disk id's, it's using new disk id's that's good enough for me. I don't need the old ones back.

  4. I still think it's a good idea to refer to disks through /dev/disk/by-id/. These are stable during reboots and when disks are moving physically around in the server (I tested this). I am a bit disappointed that a release upgrade would brake the disk id naming. But I am glad it could be solved in my case by importing the pool again.

  5. The same reason for me. Thanks for the tip to use aliases! Perhaps I will use this.

Answered by Crestop on November 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP