[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64d27757-9387-09dc-48e8-a9eedd67f075@huaweicloud.com>
Date: Sun, 18 Feb 2024 09:14:59 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Xiao Ni <xni@...hat.com>, Yu Kuai <yukuai1@...weicloud.com>
Cc: mpatocka@...hat.com, heinzm@...hat.com, blazej.kucman@...ux.intel.com,
agk@...hat.com, snitzer@...nel.org, dm-devel@...ts.linux.dev,
song@...nel.org, jbrassow@....redhat.com, neilb@...e.de, shli@...com,
akpm@...l.org, linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
yi.zhang@...wei.com, yangerkun@...wei.com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v5 01/14] md: don't ignore suspended array in
md_check_recovery()
Hi,
在 2024/02/16 14:58, Xiao Ni 写道:
> On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@...weicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3@...wei.com>
>>
>> mddev_suspend() never stop sync_thread, hence it doesn't make sense to
>> ignore suspended array in md_check_recovery(), which might cause
>> sync_thread can't be unregistered.
>>
>> After commit f52f5c71f3d4 ("md: fix stopping sync thread"), following
>> hang can be triggered by test shell/integrity-caching.sh:
>
> Hi Kuai
>
> After applying this patch, it's still stuck at mddev_suspend. Maybe
> the deadlock can be fixed by other patches from the patch set. But
> this patch can't fix this issue. If so, the comment is not right.
This patch alone can't fix the problem that mddev_suspend() can stuck
thoroughly, patches 1-4 will all be needed.
Thanks,
Kuai
>
>>
>> 1) suspend the array:
>> raid_postsuspend
>> mddev_suspend
>>
>> 2) stop the array:
>> raid_dtr
>> md_stop
>> __md_stop_writes
>> stop_sync_thread
>> set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> md_wakeup_thread_directly(mddev->sync_thread);
>> wait_event(..., !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>>
>> 3) sync thread done:
>> md_do_sync
>> set_bit(MD_RECOVERY_DONE, &mddev->recovery);
>> md_wakeup_thread(mddev->thread);
>>
>> 4) daemon thread can't unregister sync thread:
>> md_check_recovery
>> if (mddev->suspended)
>> return; -> return directly
>> md_read_sync_thread
>> clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
>> -> MD_RECOVERY_RUNNING can't be cleared, hence step 2 hang;
>
> I add some debug logs when stopping dmraid with lvremove command. The
> step you mentioned are sequential but not async. The process is :
> dev_remove->dm_destroy->__dm_destroy->dm_table_postsuspend_targets(raid_postsuspend)
> -> dm_table_destroy(raid_dtr). It looks like mddev_suspend is waiting
> for active_io to be zero.
>
> Best Regards
> Xiao
>
>> This problem is not just related to dm-raid, fix it by ignoring
>> suspended array in md_check_recovery(). And follow up patches will
>> improve dm-raid better to frozen sync thread during suspend.
>>
>> Reported-by: Mikulas Patocka <mpatocka@...hat.com>
>> Closes: https://lore.kernel.org/all/8fb335e-6d2c-dbb5-d7-ded8db5145a@redhat.com/
>> Fixes: 68866e425be2 ("MD: no sync IO while suspended")
>> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>> ---
>> drivers/md/md.c | 3 ---
>> 1 file changed, 3 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 2266358d8074..07b80278eaa5 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -9469,9 +9469,6 @@ static void md_start_sync(struct work_struct *ws)
>> */
>> void md_check_recovery(struct mddev *mddev)
>> {
>> - if (READ_ONCE(mddev->suspended))
>> - return;
>> -
>> if (mddev->bitmap)
>> md_bitmap_daemon_work(mddev);
>>
>> --
>> 2.39.2
>>
>
>
> .
>
Powered by blists - more mailing lists