[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <58e61f51-6a28-4d24-a385-4546b7c61a93@fnnas.com>
Date: Wed, 19 Nov 2025 17:00:34 +0800
From: "Yu Kuai" <yukuai@...as.com>
To: "Li Nan" <linan666@...weicloud.com>, <linux-raid@...r.kernel.org>,
"Yu Kuai" <yukuai@...as.com>
Cc: <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] md/raid5: fix IO hang when array is broken with IO inflight
Hi,
在 2025/11/19 16:29, Li Nan 写道:
>
>
> 在 2025/11/17 16:55, Yu Kuai 写道:
>> Following test can cause IO hang:
>>
>> mdadm -CvR /dev/md0 -l10 -n4 /dev/sd[abcd] --assume-clean --chunk=64K
>> --bitmap=none
>> sleep 5
>> echo 1 > /sys/block/sda/device/delete
>> echo 1 > /sys/block/sdb/device/delete
>> echo 1 > /sys/block/sdc/device/delete
>> echo 1 > /sys/block/sdd/device/delete
>>
>> dd if=/dev/md0 of=/dev/null bs=8k count=1 iflag=direct
>>
>> Root cause:
>>
>> 1) all disks removed, however all rdevs in the array is still in sync,
>> IO will be issued normally.
>>
>> 2) IO failure from sda, and set badblocks failed, sda will be faulty
>> and MD_SB_CHANGING_PENDING will be set.
>>
>> 3) error recovery try to recover this IO from other disks, IO will be
>> issued to sdb, sdc, and sdd.
>>
>> 4) IO failure from sdb, and set badblocks failed again, now array is
>> broken and will become read-only.
>>
>> 5) IO failure from sdc and sdd, however, stripe can't be handled anymore
>> because MD_SB_CHANGING_PENDING is set:
>>
>> handle_stripe
>> handle_stripe
>> if (test_bit MD_SB_CHANGING_PENDING)
>> set_bit STRIPE_HANDLE
>> goto finish
>> // skip handling failed stripe
>>
>> release_stripe
>> if (test_bit STRIPE_HANDLE)
>> list_add_tail conf->hand_list
>>
>> 6) later raid5d can't handle failed stripe as well:
>>
>> raid5d
>> md_check_recovery
>> md_update_sb
>> if (!md_is_rdwr())
>> // can't clear pending bit
>> return
>> if (test_bit MD_SB_CHANGING_PENDING)
>> break;
>> // can't handle failed stripe
>>
>> Since MD_SB_CHANGING_PENDING can never be cleared for read-only array,
>> fix this problem by skip this checking for read-only array.
>>
>> Fixes: d87f064f5874 ("md: never update metadata when array is
>> read-only.")
>> Signed-off-by: Yu Kuai <yukuai@...as.com>
>> ---
>> drivers/md/raid5.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index cdbc7eba5c54..e57ce3295292 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -4956,7 +4956,8 @@ static void handle_stripe(struct stripe_head *sh)
>> goto finish;
>> if (s.handle_bad_blocks ||
>> - test_bit(MD_SB_CHANGE_PENDING, &conf->mddev->sb_flags)) {
>> + (md_is_rdwr(conf->mddev) &&
>> + test_bit(MD_SB_CHANGE_PENDING, &conf->mddev->sb_flags))) {
>
>
> I am not sure where mddev->ro is set to MD_RDONLY — is it via a user's
> ioctl?
It's from user space daemon, once the array is broken, it'll set array to
read-auto by sysfs api array_state.
>
>
>> set_bit(STRIPE_HANDLE, &sh->state);
>> goto finish;
>> }
>> @@ -6768,7 +6769,8 @@ static void raid5d(struct md_thread *thread)
>> int batch_size, released;
>> unsigned int offset;
>> - if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
>> + if (md_is_rdwr(mddev) &&
>> + test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
>> break;
>> released = release_stripe_list(conf,
>> conf->temp_inactive_list);
>
--
Thanks,
Kuai
Powered by blists - more mailing lists