linux-kernel - Re: [PATCH] md: ensure consistent action state in md_do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8d355b81-9a32-4bb6-9951-0905c05434a4@molgen.mpg.de>
Date: Sat, 30 Aug 2025 11:51:13 +0200
From: Paul Menzel <pmenzel@...gen.mpg.de>
To: Li Nan <linan666@...weicloud.com>
Cc: song@...nel.org, yukuai3@...wei.com, linux-raid@...r.kernel.org,
 linux-kernel@...r.kernel.org, yangerkun@...wei.com, yi.zhang@...wei.com
Subject: Re: [PATCH] md: ensure consistent action state in md_do_sync

Dear Nan,


Thank you for your patch.

Am 30.08.25 um 11:05 schrieb linan666@...weicloud.com:
> From: Li Nan <linan122@...wei.com>
> 
> The 'mddev->recovery' flags can change during md_do_sync(), leading to
> inconsistencies. For example, starting with MD_RECOVERY_RECOVER and
> ending with MD_RECOVERY_SYNC can cause incorrect offset updates.

Can you give a concrete example?

> To avoid this, use the 'action' determined at the beginning of the
> function instead of repeatedly checking 'mddev->recovery'.

Do you have a reproducer?

Add a Fixes: tag?

> Signed-off-by: Li Nan <linan122@...wei.com>
> ---
>   drivers/md/md.c | 21 +++++++++------------
>   1 file changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 6828a569e819..67cda9b64c87 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9516,7 +9516,7 @@ void md_do_sync(struct md_thread *thread)
>   
>   		skipped = 0;
>   
> -		if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
> +		if (action != ACTION_RESHAPE &&
>   		    ((mddev->curr_resync > mddev->curr_resync_completed &&
>   		      (mddev->curr_resync - mddev->curr_resync_completed)
>   		      > (max_sectors >> 4)) ||
> @@ -9529,8 +9529,7 @@ void md_do_sync(struct md_thread *thread)
>   			wait_event(mddev->recovery_wait,
>   				   atomic_read(&mddev->recovery_active) == 0);
>   			mddev->curr_resync_completed = j;
> -			if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) &&
> -			    j > mddev->resync_offset)
> +			if (action == ACTION_RESYNC && j > mddev->resync_offset)
>   				mddev->resync_offset = j;
>   			update_time = jiffies;
>   			set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags);
> @@ -9646,7 +9645,7 @@ void md_do_sync(struct md_thread *thread)
>   	blk_finish_plug(&plug);
>   	wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
>   
> -	if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
> +	if (action != ACTION_RESHAPE &&
>   	    !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>   	    mddev->curr_resync >= MD_RESYNC_ACTIVE) {
>   		mddev->curr_resync_completed = mddev->curr_resync;
> @@ -9654,9 +9653,8 @@ void md_do_sync(struct md_thread *thread)
>   	}
>   	mddev->pers->sync_request(mddev, max_sectors, max_sectors, &skipped);
>   
> -	if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery) &&
> -	    mddev->curr_resync > MD_RESYNC_ACTIVE) {
> -		if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
> +	if (action != ACTION_CHECK && mddev->curr_resync > MD_RESYNC_ACTIVE) {
> +		if (action == ACTION_RESYNC) {
>   			if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
>   				if (mddev->curr_resync >= mddev->resync_offset) {
>   					pr_debug("md: checkpointing %s of %s.\n",
> @@ -9674,8 +9672,7 @@ void md_do_sync(struct md_thread *thread)
>   		} else {
>   			if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery))
>   				mddev->curr_resync = MaxSector;
> -			if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
> -			    test_bit(MD_RECOVERY_RECOVER, &mddev->recovery)) {
> +			if (action == ACTION_RECOVER) {

What about `MD_RECOVERY_RESHAPE`?

>   				rcu_read_lock();
>   				rdev_for_each_rcu(rdev, mddev)
>   					if (mddev->delta_disks >= 0 &&
> @@ -9692,7 +9689,7 @@ void md_do_sync(struct md_thread *thread)
>   	set_mask_bits(&mddev->sb_flags, 0,
>   		      BIT(MD_SB_CHANGE_PENDING) | BIT(MD_SB_CHANGE_DEVS));
>   
> -	if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
> +	if (action == ACTION_RESHAPE &&
>   			!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>   			mddev->delta_disks > 0 &&
>   			mddev->pers->finish_reshape &&
> @@ -9709,10 +9706,10 @@ void md_do_sync(struct md_thread *thread)
>   	spin_lock(&mddev->lock);
>   	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
>   		/* We completed so min/max setting can be forgotten if used. */
> -		if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
> +		if (action == ACTION_REPAIR)
>   			mddev->resync_min = 0;
>   		mddev->resync_max = MaxSector;
> -	} else if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
> +	} else if (action == ACTION_REPAIR)
>   		mddev->resync_min = mddev->curr_resync_completed;
>   	set_bit(MD_RECOVERY_DONE, &mddev->recovery);
>   	mddev->curr_resync = MD_RESYNC_NONE;

I have not fully grogged yet, what the consequence of a mismatch between 
`action`, set at the beginning, and changed flags in `&mddev->recovery` are.


Kind regards,

Paul