[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f934c1b-756f-4560-9798-98a74c32d857@o2.pl>
Date: Sun, 7 Jul 2024 21:50:16 +0200
From: Mateusz Jończyk <mat.jonczyk@...pl>
To: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: regressions@...ts.linux.dev, Song Liu <song@...nel.org>,
Yu Kuai <yukuai3@...wei.com>, Paul Luse <paul.e.luse@...ux.intel.com>,
Xiao Ni <xni@...hat.com>
Subject: Re: [REGRESSION] Cannot start degraded RAID1 array with device with
write-mostly flag
W dniu 6.07.2024 o 16:30, Mateusz Jończyk pisze:
> Hello,
>
> Linux 6.9+ cannot start a degraded RAID1 array when the only remaining
> device has the write-mostly flag set. Linux 6.8.0 works fine, as does
> 6.1.96.
[snip]
> After some investigation, I have determined that the bug is most likely in
> choose_slow_rdev() in drivers/md/raid1.c, which doesn't set max_sectors
> before returning early. A test patch (below) seems to fix this issue (Linux
> boots and appears to be working correctly with it, but I didn't do any more
> advanced experiments yet).
>
> This points to
> commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
> as the most likely culprit. However, I was running into other bugs in mdadm when
> trying to test this commit directly.
>
> Distribution: Ubuntu 20.04, hardware: a HP 17-by0001nw laptop.
I have been testing this patch carefully:
1. I have been reliably getting deadlocks when adding / removing devices
on an array that contains a component with the write-mostly flag set
- while the array was loaded with fsstress. When the array was idle,
no such deadlocks happened. This occurred also on Linux 6.8.0
though, but not on 6.1.97-rc1, so this is likely an independent regression.
2. When adding a device to the array (/dev/sda1), I once got the following warnings in dmesg on patched 6.10-rc6:
[ 8253.337816] md: could not open device unknown-block(8,1).
[ 8253.337832] md: md_import_device returned -16
[ 8253.338152] md: could not open device unknown-block(8,1).
[ 8253.338169] md: md_import_device returned -16
[ 8253.674751] md: recovery of RAID array md2
(/dev/sda1 has device major/minor numbers = 8,1). This may be caused by some interaction with udev, though.
I have also seen this on Linux 6.8.
Additionally, on an unpatched 6.1.97-rc1 (which was handy for testing), I got a deadlock
when removing a bitmap from such an array while it was loaded with fsstress.
I'll file independent reports, but wanted to give a head's up.
Greetings,
Mateusz
Powered by blists - more mailing lists