lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 29 Feb 2024 10:10:54 +0800
From: Xiao Ni <xni@...hat.com>
To: Yu Kuai <yukuai1@...weicloud.com>, mpatocka@...hat.com,
 heinzm@...hat.com, blazej.kucman@...ux.intel.com, agk@...hat.com,
 snitzer@...nel.org, dm-devel@...ts.linux.dev, song@...nel.org,
 yukuai3@...wei.com, jbrassow@....redhat.com, neilb@...e.de, shli@...com,
 akpm@...l.org
Cc: linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
 yi.zhang@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v5 05/14] md: don't suspend the array for interrupted
 reshape


在 2024/2/1 下午5:25, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@...wei.com>
>
> md_start_sync() will suspend the array if there are spares that can be
> added or removed from conf, however, if reshape is still in progress,


Hi Kuai

Why md_start_sync can run when reshape is still in progress? 
md_check_recovery should return without queue sync_work, right?

> this won't happen at all or data will be corrupted(remove_and_add_spares
> won't be called from md_choose_sync_action for reshape), hence there is
> no need to suspend the array if reshape is not done yet.
>
> Meanwhile, there is a potential deadlock for raid456:
>
> 1) reshape is interrupted;
>
> 2) set one of the disk WantReplacement, and add a new disk to the array,
>     however, recovery won't start until the reshape is finished;
>
> 3) then issue an IO across reshpae position, this IO will wait for
>     reshape to make progress;
>
> 4) continue to reshape, then md_start_sync() found there is a spare disk
>     that can be added to conf, mddev_suspend() is called;


I c. The answer for my above question is reshape is interrupted and then 
it continues the reshape, right?


Best Regards

Xiao

>
> Step 4 and step 3 is waiting for each other, deadlock triggered. Noted
> this problem is found by code review, and it's not reporduced yet.
>
> Fix this porblem by don't suspend the array for interrupted reshape,
> this is safe because conf won't be changed until reshape is done.
>
> Fixes: bc08041b32ab ("md: suspend array in md_start_sync() if array need reconfiguration")
> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
> ---
>   drivers/md/md.c | 13 +++++++++----
>   1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 6c5d0a372927..85fde05c37dd 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9374,12 +9374,17 @@ static void md_start_sync(struct work_struct *ws)
>   	bool suspend = false;
>   	char *name;
>   
> -	if (md_spares_need_change(mddev))
> +	/*
> +	 * If reshape is still in progress, spares won't be added or removed
> +	 * from conf until reshape is done.
> +	 */
> +	if (mddev->reshape_position == MaxSector &&
> +	    md_spares_need_change(mddev)) {
>   		suspend = true;
> +		mddev_suspend(mddev, false);
> +	}
>   
> -	suspend ? mddev_suspend_and_lock_nointr(mddev) :
> -		  mddev_lock_nointr(mddev);
> -
> +	mddev_lock_nointr(mddev);
>   	if (!md_is_rdwr(mddev)) {
>   		/*
>   		 * On a read-only array we can:


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ