linux-kernel - Re: [PATCH v5 1/2] md/raid10: fix incorrect done of recovery

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2e36d874-4dd3-080c-3499-44f2f09b9169@molgen.mpg.de>
Date:   Thu, 1 Jun 2023 09:06:04 +0200
From:   Paul Menzel <pmenzel@...gen.mpg.de>
To:     linan666@...weicloud.com
Cc:     song@...nel.org, neilb@...e.de, linux-raid@...r.kernel.org,
        linux-kernel@...r.kernel.org, linan122@...wei.com,
        yukuai3@...wei.com, yi.zhang@...wei.com, houtao1@...wei.com,
        yangerkun@...wei.com
Subject: Re: [PATCH v5 1/2] md/raid10: fix incorrect done of recovery

Dear Li,


Thank you for your patch.

Am 01.06.23 um 08:24 schrieb linan666@...weicloud.com:
> From: Li Nan <linan122@...wei.com>

Unfortunately, I do not understand your commit message summary “fix 
incorrect done of recovery”. Maybe:

Do not add sparse disk when recovery aborts

> In raid10_sync_request(), if data cannot be read from any disk for
> recovery, it will go to 'giveup' and let 'chunks_skipped' + 1. After
> multiple 'giveup', when 'chunks_skipped >= geo.raid_disks', it will
> return 'max_sector', indicating that the recovery has been completed.
> However, the recovery is just aborted and the data remains inconsistent.
> 
> Fix it by setting mirror->recovery_disabled, which will prevent the spare
> disk from being added to this mirror. The same issue also exists during
> resync, it will be fixed afterwards.
> 
> Signed-off-by: Li Nan <linan122@...wei.com>
> ---
>   drivers/md/raid10.c | 18 +++++++++++++++++-
>   1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index d93d8cb2b620..3ba1516ea160 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   	int chunks_skipped = 0;
>   	sector_t chunk_mask = conf->geo.chunk_mask;
>   	int page_idx = 0;
> +	int error_disk = -1;
>   
>   	/*
>   	 * Allow skipping a full rebuild for incremental assembly
> @@ -3386,7 +3387,20 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   		return reshape_request(mddev, sector_nr, skipped);
>   
>   	if (chunks_skipped >= conf->geo.raid_disks) {
> -		/* if there has been nothing to do on any drive,
> +		pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
> +			test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" : "recovery");
> +		if (error_disk >= 0 &&
> +		    !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
> +			/*
> +			 * recovery fail, set mirrors.recovory_disabled,

recov*e*ry

> +			 * device shouldn't be added to there.
> +			 */
> +			conf->mirrors[error_disk].recovery_disabled =
> +						mddev->recovery_disabled;
> +			return 0;
> +		}
> +		/*
> +		 * if there has been nothing to do on any drive,
>   		 * then there is nothing to do at all..

Just one dot/period at the end?

>   		 */
>   		*skipped = 1;
> @@ -3638,6 +3652,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
>   						       mdname(mddev));
>   					mirror->recovery_disabled
>   						= mddev->recovery_disabled;
> +				} else {
> +					error_disk = i;
>   				}
>   				put_buf(r10_bio);
>   				if (rb2)


Kind regards,

Paul