linux-kernel - Filesystem corruption when adding a new device (delayed-resync, write-mostly)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <9952f532-2554-44bf-b906-4880b2e88e3a@o2.pl>
Date: Sat, 20 Jul 2024 16:47:32 +0200
From: Mateusz Jończyk <mat.jonczyk@...pl>
To: Yu Kuai <yukuai3@...wei.com>, linux-raid@...r.kernel.org,
 linux-kernel@...r.kernel.org
Cc: Song Liu <song@...nel.org>, Paul Luse <paul.e.luse@...ux.intel.com>
Subject: Filesystem corruption when adding a new device (delayed-resync,
 write-mostly)

Hello,

In my laptop, I used to have two RAID1 arrays on top of NVMe and SATA SSD
drives: /dev/md0 for /boot (not partitioned), /dev/md1 for remaining data (LUKS
+ LVM + ext4). For performance, I have marked the RAID component device for
/dev/md1 on the SATA SSD drive write-mostly, which "means that the 'md' driver
will avoid reading from these devices if at all possible" (man mdadm).

Recently, the NVMe drive started having problems (PCI AER errors and the
controller disappearing), so I removed it from the arrays and wiped it.
However, I have reseated the drive in the M.2 socket and this apparently fixed
it (verified with tests).

    $ cat /proc/mdstat
    Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
    md1 : active raid1 sdb5[1](W)
          471727104 blocks super 1.2 [2/1] [_U]
          bitmap: 4/4 pages [16KB], 65536KB chunk

    md2 : active (auto-read-only) raid1 sdb6[3](W) sda1[2]
          3142656 blocks super 1.2 [2/2] [UU]
          bitmap: 0/1 pages [0KB], 65536KB chunk

    md0 : active raid1 sdb4[3]
          2094080 blocks super 1.2 [2/1] [_U]
         
    unused devices: <none>

(md2 was used just for testing, ignore it).

Today, I have tried to add the drive back to the arrays by using a script that
executed in quick succession:

    mdadm /dev/md0 --add --readwrite /dev/nvme0n1p2
    mdadm /dev/md1 --add --readwrite /dev/nvme0n1p3

This was on Linux 6.10.0, patched with my previous patch:

    https://lore.kernel.org/linux-raid/20240711202316.10775-1-mat.jonczyk@o2.pl/

(which fixed a regression in the kernel and allows it to start /dev/md1 with a
single drive in write-mostly mode).
In the background, I was running "rdiff-backup --compare" that was comparing
data between my array contents and a backup attached via USB.

This, however resulted in mayhem - I was unable to start any program with an
input-output error, etc. I used SysRQ + C to save a kernel log:

    [ 4940.891490] md: could not open device unknown-block(259,1).
    [ 4940.891498] md: md_import_device returned -16
    [ 4940.920440] md: could not open device unknown-block(259,1).
    [ 4940.920449] md: md_import_device returned -16
    [ 4940.934462] md: recovery of RAID array md0
    [ 4941.003041] EXT4-fs warning (device dm-4): ext4_dirblock_csum_verify:405: inode #16676515: comm rdiff-backup: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4941.003053] EXT4-fs error (device dm-4): htree_dirblock_to_tree:1082: inode #16676515: comm rdiff-backup: Directory block failed checksum
    [ 4941.003061] Aborting journal on device dm-4-8.
    [ 4941.066131] md: delaying recovery of md1 until md0 has finished (they share one or more physical units)
    [ 4941.092374] EXT4-fs (dm-4): Remounting filesystem read-only
    [ 4942.301499] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm gnome-terminal-: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4942.301508] EXT4-fs error (device dm-2): __ext4_find_entry:1693: inode #156120: comm gnome-terminal-: checksumming directory block 0
    [ 4942.301515] Aborting journal on device dm-2-8.
    [ 4942.332393] EXT4-fs (dm-2): Remounting filesystem read-only
    [ 4942.333839] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm gnome-terminal-: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4942.765422] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm gnome-terminal-: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4942.766884] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm gnome-terminal-: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4947.364466] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm gnome-terminal-: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4947.366435] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm gnome-terminal-: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4947.849698] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm gnome-terminal-: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4947.851860] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm gnome-terminal-: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4949.226094] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm gnome-terminal-: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4949.227982] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm gnome-terminal-: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4949.696095] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm gnome-terminal-: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4949.697468] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm gnome-terminal-: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4950.105158] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm pool-gnome-shel: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4950.105645] EXT4-fs warning (device dm-2): ext4_dirblock_csum_verify:405: inode #156120: comm pool-gnome-shel: No space for directory leaf checksum. Please run e2fsck -D.
    [ 4951.559184] md: md0: recovery done.
    [ 4951.562154] md: recovery of RAID array md1
    [ 4952.636764] EXT4-fs warning: 6 callbacks suppressed

The interesting fact is that both dm-4 and dm-2 are on /dev/md1, which was
modified second, and should not have been touched (or read from) before the
resync of /dev/md0 was complete.

I have restarted the laptop on the same kernel, and this apparently
made things worse (perhaps fsck at boot corrupted my root filesystem).
All in all, other filesystems were relatively unaffected, but the root
filesystem (for / ) I had to recover from backups.

Greetings,

Mateusz