[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db4f5f1b-5eba-2cdb-fad0-7aa725cea508@huaweicloud.com>
Date: Fri, 15 Mar 2024 09:17:56 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Dan Moulding <dan@...m.net>, yukuai1@...weicloud.com
Cc: gregkh@...uxfoundation.org, junxiao.bi@...cle.com,
linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
regressions@...ts.linux.dev, song@...nel.org, stable@...r.kernel.org,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system;
successfully bisected
Hi,
在 2024/03/15 0:12, Dan Moulding 写道:
>> How about the following patch?
>>
>> Thanks,
>> Kuai
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 3ad5f3c7f91e..0b2e6060f2c9 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -6720,7 +6720,6 @@ static void raid5d(struct md_thread *thread)
>>
>> md_check_recovery(mddev);
>>
>> - blk_start_plug(&plug);
>> handled = 0;
>> spin_lock_irq(&conf->device_lock);
>> while (1) {
>> @@ -6728,6 +6727,14 @@ static void raid5d(struct md_thread *thread)
>> int batch_size, released;
>> unsigned int offset;
>>
>> + /*
>> + * md_check_recovery() can't clear sb_flags, usually
>> because of
>> + * 'reconfig_mutex' can't be grabbed, wait for
>> mddev_unlock() to
>> + * wake up raid5d().
>> + */
>> + if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
>> + goto skip;
>> +
>> released = release_stripe_list(conf,
>> conf->temp_inactive_list);
>> if (released)
>> clear_bit(R5_DID_ALLOC, &conf->cache_state);
>> @@ -6766,8 +6773,8 @@ static void raid5d(struct md_thread *thread)
>> spin_lock_irq(&conf->device_lock);
>> }
>> }
>> +skip:
>> pr_debug("%d stripes handled\n", handled);
>> -
>> spin_unlock_irq(&conf->device_lock);
>> if (test_and_clear_bit(R5_ALLOC_MORE, &conf->cache_state) &&
>> mutex_trylock(&conf->cache_size_mutex)) {
>> @@ -6779,6 +6786,7 @@ static void raid5d(struct md_thread *thread)
>> mutex_unlock(&conf->cache_size_mutex);
>> }
>>
>> + blk_start_plug(&plug);
>> flush_deferred_bios(conf);
>>
>> r5l_flush_stripe_to_raid(conf->log);
>
> I can confirm that this patch also works. I'm unable to reproduce the
> hang after applying this instead of the first patch provided by
> Junxiao. So looks like both ways are succesful in avoiding the hang.
>
Thanks a lot for the testing! Can you also give following patch a try?
It removes the change to blk_plug, because Dan and Song are worried
about performance degradation, so we need to verify the performance
before consider that patch.
Anyway, I think following patch can fix this problem as well.
Thanks,
Kuai
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 3ad5f3c7f91e..ae8665be9940 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6728,6 +6728,9 @@ static void raid5d(struct md_thread *thread)
int batch_size, released;
unsigned int offset;
+ if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
+ goto skip;
+
released = release_stripe_list(conf,
conf->temp_inactive_list);
if (released)
clear_bit(R5_DID_ALLOC, &conf->cache_state);
@@ -6766,6 +6769,7 @@ static void raid5d(struct md_thread *thread)
spin_lock_irq(&conf->device_lock);
}
}
+skip:
pr_debug("%d stripes handled\n", handled);
spin_unlock_irq(&conf->device_lock);
> -- Dan
> .
>
Powered by blists - more mailing lists