TransWikia.com

Confirming parameters for XFS filesystem and LVM volume striping over 2 ADAPT (RAID6-like) volumes

Server Fault Asked by Nicolas De Jay on November 4, 2021

We are setting up an ADAPT0 (RAID-60-like) configuration for a file server.

We have six disk pools. Each consists of 14 disks and is set up using ADAPT. According to Dell’s official white paper, ADAPT is similar to RAID 6 but distributes spare capacity. On page 13, it is indicated that the chunk size is 512 KiB and that the stripe width is 4 MiB (over 8 disks) for each disk pool.

My understanding is that for each 14 disk pool, 2 disks worth of capacity is reserved for spare, 20% of the remaining 12 disks (2.4 disks worth of capacity) is used for parity and 80% (9.6 disks) is used for storage. However, the chunk size is 512 KiB and the stripe width remains 4MiB since we are only writing to 8 disks in one contiguous block.

To achieve an ADAPT0 (RAID-60-like) configuration, we then created a logical volume that stripes over two disk pools using LVM. Our intent is to eventually have 3 striped volumes, each striping over two disk pools. We used a stripe size that matches that of the hardware RAID (512 KiB):

$ vgcreate vg-gw /dev/sda /dev/sdb
$ lvcreate -y --type striped -L 10T -i 2 -I 512k -n vol vg-gw

Next, set up an XFS file system over the striped logical volume. Following guidelines from XFS.org and a few other sources, we matched the stripe unit su to the LVM and RAID stripe size (512k) and set the stripe width sw to 16 since we have 16 "data disks".

$ mkfs.xfs -f -d su=512k,sw=16 -l su=256k /dev/mapper/vg--gw-vol
$ mkdir -p /vol/vol
$ mount -o rw -t xfs /dev/mapper/vg--gw-vol /vol/vol

We benchmarked sequential I/O performance of 4KiB block sizes on /dev/sda and /dev/sdb and /dev/mapped/vg--gw-vol using

fio --name=test --ioengine=posixaio --rw=rw --bs=4k --numjobs=1 --size=256g --iodepth=1 --runtime=300 --time_based --end_fsync=1

We were surprised to obtain similar performances:

       Volumes         Throughput   Latency
---------------------  ----------  ----------
/dev/sda                198MiB/s    9.50 usec
/dev/sdb                188MiB/s   10.11 usec
/dev/mapped/vg--gw-vol  209MiB/s    9.06 usec

If we use the I/O monitoring tool bwm-ng, we can see I/O to both /dev/sda and /dev/sdb when writing to /dev/mapped/vg--gw-vol.

Did we configure properly? More specifically:

(1) Was it correct to align the LVM stripe size to that of the hardware RAID (512 KiB)?

(2) Was it correct to align the XFS stripe unit and widths as we have (512 KiB stripe size and 16 data disks), or are we supposed to "abstract" the underlying volumes (4 MiB stripe size and 2 data disks)?

(3) Adding to the confusion is the self-reported output of the block devices here:

$ grep "" /sys/block/sda/queue/*_size
/sys/block/sda/queue/hw_sector_size:512
/sys/block/sda/queue/logical_block_size:512
/sys/block/sda/queue/max_segment_size:65536
/sys/block/sda/queue/minimum_io_size:4096
/sys/block/sda/queue/optimal_io_size:1048576
/sys/block/sda/queue/physical_block_size:4096

Thank you!

One Answer

I would avoid inserting a RAID0 layer on top of ADAPT. Rather, I would create a simple linear LVM pool comprising the two arrays or, alternatively, create a single 28 disks array (not utilizing the second controller at all).

If a linear LVM concatenation of the two arrays, XFS will give you added performance by the virtue of its own allocation group strategy (due to the filesystem concurrently issuing multiple IOs to various LBA ranges).

However, a single 28 disks pool should provide slightly better space efficiency due to less total spare capacity vs user data.

Regarding XFS options, you should use su=512k,sw=8 based on ADAPT layout. Anyway, with high end controller equipped with large powerloss-protected write cache, this should have a minor effect.

Answered by shodanshok on November 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP