linux-kernel - Re: [PATCH 2/2] md/raid5: fix IO hang when array is broken with IO inflight

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <58e61f51-6a28-4d24-a385-4546b7c61a93@fnnas.com>
Date: Wed, 19 Nov 2025 17:00:34 +0800
From: "Yu Kuai" <yukuai@...as.com>
To: "Li Nan" <linan666@...weicloud.com>, <linux-raid@...r.kernel.org>, 
	"Yu Kuai" <yukuai@...as.com>
Cc: <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] md/raid5: fix IO hang when array is broken with IO inflight

Hi,

在 2025/11/19 16:29, Li Nan 写道:
>
>
> 在 2025/11/17 16:55, Yu Kuai 写道:
>> Following test can cause IO hang:
>>
>> mdadm -CvR /dev/md0 -l10 -n4 /dev/sd[abcd] --assume-clean --chunk=64K 
>> --bitmap=none
>> sleep 5
>> echo 1 > /sys/block/sda/device/delete
>> echo 1 > /sys/block/sdb/device/delete
>> echo 1 > /sys/block/sdc/device/delete
>> echo 1 > /sys/block/sdd/device/delete
>>
>> dd if=/dev/md0 of=/dev/null bs=8k count=1 iflag=direct
>>
>> Root cause:
>>
>> 1) all disks removed, however all rdevs in the array is still in sync,
>> IO will be issued normally.
>>
>> 2) IO failure from sda, and set badblocks failed, sda will be faulty
>> and MD_SB_CHANGING_PENDING will be set.
>>
>> 3) error recovery try to recover this IO from other disks, IO will be
>> issued to sdb, sdc, and sdd.
>>
>> 4) IO failure from sdb, and set badblocks failed again, now array is
>> broken and will become read-only.
>>
>> 5) IO failure from sdc and sdd, however, stripe can't be handled anymore
>> because MD_SB_CHANGING_PENDING is set:
>>
>> handle_stripe
>>   handle_stripe
>>   if (test_bit MD_SB_CHANGING_PENDING)
>>    set_bit STRIPE_HANDLE
>>    goto finish
>>    // skip handling failed stripe
>>
>> release_stripe
>>   if (test_bit STRIPE_HANDLE)
>>    list_add_tail conf->hand_list
>>
>> 6) later raid5d can't handle failed stripe as well:
>>
>> raid5d
>>   md_check_recovery
>>    md_update_sb
>>     if (!md_is_rdwr())
>>      // can't clear pending bit
>>      return
>>   if (test_bit MD_SB_CHANGING_PENDING)
>>    break;
>>    // can't handle failed stripe
>>
>> Since MD_SB_CHANGING_PENDING can never be cleared for read-only array,
>> fix this problem by skip this checking for read-only array.
>>
>> Fixes: d87f064f5874 ("md: never update metadata when array is 
>> read-only.")
>> Signed-off-by: Yu Kuai <yukuai@...as.com>
>> ---
>>   drivers/md/raid5.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index cdbc7eba5c54..e57ce3295292 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -4956,7 +4956,8 @@ static void handle_stripe(struct stripe_head *sh)
>>           goto finish;
>>         if (s.handle_bad_blocks ||
>> -        test_bit(MD_SB_CHANGE_PENDING, &conf->mddev->sb_flags)) {
>> +        (md_is_rdwr(conf->mddev) &&
>> +         test_bit(MD_SB_CHANGE_PENDING, &conf->mddev->sb_flags))) {
>
>
> I am not sure where mddev->ro is set to MD_RDONLY — is it via a user's 
> ioctl?

It's from user space daemon, once the array is broken, it'll set array to
read-auto by sysfs api array_state.

>
>
>>           set_bit(STRIPE_HANDLE, &sh->state);
>>           goto finish;
>>       }
>> @@ -6768,7 +6769,8 @@ static void raid5d(struct md_thread *thread)
>>           int batch_size, released;
>>           unsigned int offset;
>>   -        if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
>> +        if (md_is_rdwr(mddev) &&
>> +            test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
>>               break;
>>             released = release_stripe_list(conf, 
>> conf->temp_inactive_list);
>

-- 
Thanks,
Kuai