linux-kernel - Re: [REGRESSION] Cannot start degraded RAID1 array with device with write-mostly flag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3f934c1b-756f-4560-9798-98a74c32d857@o2.pl>
Date: Sun, 7 Jul 2024 21:50:16 +0200
From: Mateusz Jończyk <mat.jonczyk@...pl>
To: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: regressions@...ts.linux.dev, Song Liu <song@...nel.org>,
 Yu Kuai <yukuai3@...wei.com>, Paul Luse <paul.e.luse@...ux.intel.com>,
 Xiao Ni <xni@...hat.com>
Subject: Re: [REGRESSION] Cannot start degraded RAID1 array with device with
 write-mostly flag

W dniu 6.07.2024 o 16:30, Mateusz Jończyk pisze:
> Hello,
>
> Linux 6.9+ cannot start a degraded RAID1 array when the only remaining
> device has the write-mostly flag set. Linux 6.8.0 works fine, as does
> 6.1.96.
[snip]
> After some investigation, I have determined that the bug is most likely in
> choose_slow_rdev() in drivers/md/raid1.c, which doesn't set max_sectors
> before returning early. A test patch (below) seems to fix this issue (Linux
> boots and appears to be working correctly with it, but I didn't do any more
> advanced experiments yet).
>
> This points to
> commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
> as the most likely culprit. However, I was running into other bugs in mdadm when
> trying to test this commit directly.
>
> Distribution: Ubuntu 20.04, hardware: a HP 17-by0001nw laptop.

I have been testing this patch carefully:

1. I have been reliably getting deadlocks when adding / removing devices
on an array that contains a component with the write-mostly flag set
- while the array was loaded with fsstress. When the array was idle,
no such deadlocks happened. This occurred also on Linux 6.8.0
though, but not on 6.1.97-rc1, so this is likely an independent regression.

2. When adding a device to the array (/dev/sda1), I once got the following warnings in dmesg on patched 6.10-rc6:

        [ 8253.337816] md: could not open device unknown-block(8,1).
        [ 8253.337832] md: md_import_device returned -16
        [ 8253.338152] md: could not open device unknown-block(8,1).
        [ 8253.338169] md: md_import_device returned -16
        [ 8253.674751] md: recovery of RAID array md2

(/dev/sda1 has device major/minor numbers = 8,1). This may be caused by some interaction with udev, though.
I have also seen this on Linux 6.8.

Additionally, on an unpatched 6.1.97-rc1 (which was handy for testing), I got a deadlock
when removing a bitmap from such an array while it was loaded with fsstress.

I'll file independent reports, but wanted to give a head's up.

Greetings,

Mateusz