[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <60d75867-8fb7-4c67-96f7-3e5ba65bdbd9@redhat.com>
Date: Thu, 29 Feb 2024 10:10:54 +0800
From: Xiao Ni <xni@...hat.com>
To: Yu Kuai <yukuai1@...weicloud.com>, mpatocka@...hat.com,
heinzm@...hat.com, blazej.kucman@...ux.intel.com, agk@...hat.com,
snitzer@...nel.org, dm-devel@...ts.linux.dev, song@...nel.org,
yukuai3@...wei.com, jbrassow@....redhat.com, neilb@...e.de, shli@...com,
akpm@...l.org
Cc: linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
yi.zhang@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v5 05/14] md: don't suspend the array for interrupted
reshape
在 2024/2/1 下午5:25, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@...wei.com>
>
> md_start_sync() will suspend the array if there are spares that can be
> added or removed from conf, however, if reshape is still in progress,
Hi Kuai
Why md_start_sync can run when reshape is still in progress?
md_check_recovery should return without queue sync_work, right?
> this won't happen at all or data will be corrupted(remove_and_add_spares
> won't be called from md_choose_sync_action for reshape), hence there is
> no need to suspend the array if reshape is not done yet.
>
> Meanwhile, there is a potential deadlock for raid456:
>
> 1) reshape is interrupted;
>
> 2) set one of the disk WantReplacement, and add a new disk to the array,
> however, recovery won't start until the reshape is finished;
>
> 3) then issue an IO across reshpae position, this IO will wait for
> reshape to make progress;
>
> 4) continue to reshape, then md_start_sync() found there is a spare disk
> that can be added to conf, mddev_suspend() is called;
I c. The answer for my above question is reshape is interrupted and then
it continues the reshape, right?
Best Regards
Xiao
>
> Step 4 and step 3 is waiting for each other, deadlock triggered. Noted
> this problem is found by code review, and it's not reporduced yet.
>
> Fix this porblem by don't suspend the array for interrupted reshape,
> this is safe because conf won't be changed until reshape is done.
>
> Fixes: bc08041b32ab ("md: suspend array in md_start_sync() if array need reconfiguration")
> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
> ---
> drivers/md/md.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 6c5d0a372927..85fde05c37dd 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9374,12 +9374,17 @@ static void md_start_sync(struct work_struct *ws)
> bool suspend = false;
> char *name;
>
> - if (md_spares_need_change(mddev))
> + /*
> + * If reshape is still in progress, spares won't be added or removed
> + * from conf until reshape is done.
> + */
> + if (mddev->reshape_position == MaxSector &&
> + md_spares_need_change(mddev)) {
> suspend = true;
> + mddev_suspend(mddev, false);
> + }
>
> - suspend ? mddev_suspend_and_lock_nointr(mddev) :
> - mddev_lock_nointr(mddev);
> -
> + mddev_lock_nointr(mddev);
> if (!md_is_rdwr(mddev)) {
> /*
> * On a read-only array we can:
Powered by blists - more mailing lists