Making bit identical ext2 filesystems

Question

I'm preparing an image file for a linux system. I need to be able to run my script that creates the image and have the output be bit-for-bit identical each time.

I do the normal procedure, by making a large binary file, partition it, create  a loop device with the partition and then make the I filesystem. I then mount the file system, copy the syslinux and initrd stuff over, unmount  the partition, delete the loop devices and I have my image file. I can dd it to a disk and the linux system boots correctly. So I'm making the filesystem correctly.

I run my script that performs the above steps but each time the output differs. Some of it is timestamps in the ext2 data structures. I wrote a program that reads in the ext2 structures and can clear out the timestamps, and tune2fs can clear out a few more things but some of the bitmap data even differs and it seems the file data isn't even in the same place each time.

So how would I go about creating identical filesystems?

Here's the commands I use to create a filesystem, put a file on it and unmount it. Save the output and run it again, then compare the outputs, the file a.txt gets put in different locations.

dd if=/dev/zero bs=1024 count=46112 of=cf.bin
parted cf.bin <<EOF
unit
s
mklabel
msdos
mkpart
p
ext2
63s
45119s
set
1
boot
on
q
EOF

losetup -o $(expr 63 * 512) /dev/loop0 cf.bin

mke2fs -b 1024 -t ext2 /dev/loop0 22528

#clear some parameters
tune2fs -i 0 /dev/loop0 # interval between check
tune2fs -L LABEL /dev/loop0
tune2fs -U 00000000-0000-0000-0000-000000000000 /dev/loop0 #uuid
tune2fs -c 0 /dev/loop0 #mount count

mount /dev/loop0 mnt
# make a dummy file
echo HELLO > mnt/a.txt
umount mnt

losetup -d /dev/loop0

Update
If I put the above commands in a script, copy and paste them to run a second time (but save the output between), and even change the date before running the commands a 2nd time (using the date command), the a.txt gets put in the same disk location. But if you run the script, save the output, and run it again from the command line, compare the outputs and a.txt is in different locations. Very curious behavior. What data is being used to generate the file locations? Clearly it's not the time. The only thing I can think of is the difference between calling the commands twice via calling the script twice vs running the commands twice in the same script would be something like the process ID of the calling process. Ideas anyone?

Update #2
I gave up on trying to use ext2. So I can't answer my original question about ext2, but I'll describe what I did to get a completely reproducible build of a basic linux system.

Instead of ext2, use a FAT variant or ISO9660. If you need a partition less than 32MB, use FAT16 for the linux system partition, otherwise use FAT32. Either FAT16 or FAT32 will repeatedly put files in the same locations. But it does have some time stamps in its directory entries.
Add linux system files needed to boot.
Write a program to walk the FAT16/32 filesystem directory structures and set all time stamps to 0.
Clear the disk signature in the mbr. Either do this in your program that clears timestamps, or use dd.
Since it's a FAT filesystem, I'm using syslinux for a boot loader. cpio will produce identical initrd's from run to run, so there's no issues there. This is all that is needed for a basic bit-for-bit identical linux system.

Issues with FAT file systems

For just booting a linux system, FAT shouldn't cause any problems. But for larger data partitions, there are a couple issues with FAT32 that may crop up.

It is possible to bump into the maximum number of files in a directory. This isn't likely to be a problem. (but of course, in my case it was)
FAT32 will store an 8.3 filename for each file. Long file names are shortened to a stem with a tilde and a number appended. But if you have more than 9 files that map to the same short stem, FAT32 uses an undocumented procedure to generate a sort of hash to append to the file name instead. I dug into the linux kernel code for FAT32, and it uses the time as a hash seed (the functionvfat_create_shortname() in file namei_vfat.c). So this field is not reproducible. I don't know how Microsoft's implementation does it. You may get away with just clearing this field, as I don't think the 8.3 names are used for anything other than DOS. Or you could generate your own unique numbers that you can reproduce, it doesn't matter what the numbers are, just that they're unique.

Using ISO9660 for an additional partition

Use genisoimage to create the iso. It will generate identical output from run to run with the exception of time stamps. Using the -l option lets you have file names of up to 31 character. If you need filenames longer than that, use the rock ridge extension. The command is

genisoimage -o gfx.iso -R -l -f assets/files/

Write a program that walks the iso9660 filesystem, clears all time stamps, including the TF field of the rock ridge entries.
Use fdisk or parted to make a partition in your disk image. 96h is the MBR id number for ISO9660.
If necessary, patch up the partition table. Parted doesn't support making a partition of type iso9660. Unfortunately, I'm stuck with an older version of both parted and fdisk, and parted is easier to use. So I used parted to make my second partition as fat32. Then used fdisk to change the type to 96.
Use dd to embed the iso in the disk image, using the same numbers you used for making the partition. I used

dd bs=512 seek=$part2_start_lba conv=notrunc if=gfx.iso of=cf.bin

where cf.bin is my disk image file.
6. Mount the iso partition after linux has booted. If the iso is the second partition, it will be /dev/sda2. You may have to use mknod to make the proper device file in /dev first.

josch · Answer

The answer to your original question is a tool called genext2fs. If you supply the -f switch, then it will create bit-by-bit identical output, given the same input. This is either proven by its own test-suite which compares the created images with a pre-computed md5sum of the correct output or by this test (executed inside the source dir):

$ ./genext2fs -f -B 1024 -b 40 -d m4 rootfs.img
$ md5sum rootfs.img
322053a8962acc599eaabb2dfde28783  rootfs.img
$ rm rootfs.img
$ ./genext2fs -f -B 1024 -b 40 -d m4 rootfs.img
$ md5sum rootfs.img
322053a8962acc599eaabb2dfde28783  rootfs.img

You can mount the resulting image and check that its content is indeed the same as the packed directory:

$ sudo mount rootfs.img /mnt
$ ls -lha /mnt
total 27K
drwxr-xr-x  3 root  root  1.0K Jan  1  1970 .
drwxr-xr-x 25 root  root  4.0K Mar 22 14:56 ..
-rw-r--r--  1 josch josch 1.5K Mar 22 23:05 ac_func_scanf_can_malloc.m4
-rw-r--r--  1 josch josch 2.4K Mar 22 14:24 ac_func_snprintf.m4
drwx------  2 root  root   16K Jan  1  1970 lost+found
$ rmdir /mnt/lost+found
$ diff -rq m4 /mnt
$ echo $?
0

Have fun!

Str1ker · Answer

You could also try e2image:

The e2image program will save critical ext2, ext3, or ext4 filesystem
metadata located on device to a file.

Despite that by default e2image saves metadata only, it's able to save data too, by the use of -a option. (See section "Including data" by the link)

The -a option can be specified to include all data. This will give an image that is suitable to use to clone the entire FS or for backup purposes. Note that this option only works with the raw or QCOW2 formats.

QCOW2 format is more preferable for backups than raw format. QCOW2 image is a normal file with size that is close to used space of partition to back up. Raw image is a sparse file with all the implications - tools that don't handle sparse files specifically will process not only used space, but free space too.
So, usage examples are:

Backup sda1 partition, including files & metadata, to the file boot-part.qcow2:

sudo mount -o remount,ro /dev/sda1
sudo e2image -a -Q /dev/sda1 boot-part.qcow2

Note that remounting as read-only is required to guarantee that no one will write to the partition, while backup is in progress. After all, you are able to remount as read-write by sudo mount -o remount,rw /dev/sda1 if you need to.

Restore partition /dev/sda1 from QCOW2 image file boot-part.qcow2:

sudo umount /dev/sda1
sudo e2image -r boot-part.qcow2 /dev/sda1

Note that umount is a must since you can't rewrite superblocks of a mounted partition. If /dev/sda1 is not mounted that time, you can skip this step. :)
It's also worth noting that in restoring scenario size of /dev/sda1 should be equal or greater than partition inside image file, otherwise you will get error: e2image: Invalid argument while trying to convert qcow2 image (boot-part.qcow2) into raw image (/dev/sda1).

matsib.dev · Answer

Note: This will not be a full answer; just a partial one, or, at least, a hint I need to be able to run my script that creates the image and have the output be bit-for-bit identical each time. The first problem you have to achieve that, are the disk signatures in your msdos partition tables (offset of 440 in the MBR, 4 bytes long). If your MBRs are different, your are failing in your goal just at the first sector. Each time you execute mklabel inside parted, you are generating a new disk signature. You can overcome that, overwriting those four bytes with the same random signature, like this: printf RAMDOM_SIGNATURE | xxd -p -r | dd bs=1 count=4 seek=440 of=YOUR_DOT_BIN conv=notrunc 2> /dev/null RANDOM_SIGNATURE could be something like '73396992' I've made a little mod to your script, with this fix: dd if=/dev/zero bs=1024 count=46112 of="$1" parted "$1" < /dev/null losetup -o $(expr 63 * 512) /dev/loop0 "$1" mke2fs -b 1024 -t ext2 /dev/loop0 22528 #clear some parameters tune2fs -i 0 /dev/loop0 # interval between check tune2fs -L LABEL /dev/loop0 tune2fs -U 00000000-0000-0000-0000-000000000000 /dev/loop0 #uuid tune2fs -c 0 /dev/loop0 #mount count #mount /dev/loop0 mnt ## make a dummy file #echo HELLO > mnt/a.txt #umount mnt losetup -d /dev/loop0 Now, you can call the script like this ./script_name BIN_FILE_NAME RANDOM_SIGNATURE Now, if you do this: ./test.sh cf00.bin '73396992' ./test.sh cf01.bin '73396992' ./test.sh cf02.bin '73396992' ./test.sh cf03.bin '73396992' and then this: dd if=cf00.bin count=63 2>/dev/null | sha1sum dd if=cf01.bin count=63 2>/dev/null | sha1sum dd if=cf02.bin count=63 2>/dev/null | sha1sum dd if=cf03.bin count=63 2>/dev/null | sha1sum You'll see that all of those files are identical till just before the filesystem in the first partition (try the same with your original script, and the sums will differ from one another). You probably notice that in my version of the script, I commented out the lines that wrote the a.txt file. I did this, cause there is no point in trying to fix that, when you can't make the filesystems identical, even with no files on them. And this is the case: the filesystems differ, even with no files, so, first, we need to fix that. If you run dumpe2fs against the filesystem partition on each image, dump that to a file, and then use diff against any pair of dumps, you'll see something like this: 25c25 < Filesystem created: Sat Jun 15 07:37:32 2019 --- > Filesystem created: Sat Jun 15 07:37:40 2019 27c27 < Last write time: Sat Jun 15 07:37:33 2019 --- > Last write time: Sat Jun 15 07:37:40 2019 30c30 < Last checked: Sat Jun 15 07:37:32 2019 --- > Last checked: Sat Jun 15 07:37:40 2019 37c37 < Directory Hash Seed: 603130ae-82de-4530-9772-f68ae3d6df5f --- > Directory Hash Seed: 1d9c5af8-a48e-4221-9e70-8fa2ccc6936f So, at very least, at a very high level (after this, you need to go deeper, at the lowest level, i.e.: the actual byte by byte comparison ) the filesystems differ in the details showed just above. Get around that first. Even if you change the date in the machine, you'll no be successful in tampering the timestamps and making them equal, cause there are gaps of time you can control in the program execution. In that case, you'll need to freeze your clock, at least, from the program that creates the filesystem perspective. You can dig on that, but I think that this is not the way to go, cause you said that they need to execute your script in their machines: you don't want to mess with their clock. So, IMHO, the way to go, is probably tampering the correct bytes on the filesystem, like I did with the disk signature. Search around that. Also, don't forget the Superblock backups... track them down. If they contain different data on each filesystem, they will propagate differences in the byte range they reside. Lastly, bare in mind that when you copy a file, you don't have direct control over the 'distribution' of the file bytes inside the filesystem... If you can't clone, you need to find a way to control that too.

somebody · Answer

IMHO this all seems to be made overly complicated. When tar alone seems like the obvious solution. tar can create just about any file system, including cdfs (--options cd9660:*). It will also allow you to time stamp the output file to any of that of the most recent -m || --modification-time, --gid id || --gname name,  --acls || --no-acls, --same-owner || --no-same-owner, ...

Or you could create your filesystem. Perform a chown -Rh someone:somegroup . within your file tree, and chmod it to your liking and use either tar, or rsync to place the file tree into your prepared filesystem. Then everything would be consistent -- same date, same owner/group && perms.

Well that's the way I'd approach something like this. :)

HTH

Making bit identical ext2 filesystems

Issues with FAT file systems

4 Answers

Add your own answers!

Ask a Question