[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <801a3a11-9a2c-dca2-cec4-4a9c71d3afb6@huaweicloud.com>
Date: Mon, 22 May 2023 21:54:33 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: linan666@...weicloud.com, song@...nel.org, shli@...com,
allenpeng@...ology.com, alexwu@...ology.com,
bingjingc@...ology.com, neilb@...e.de
Cc: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
linan122@...wei.com, yi.zhang@...wei.com, houtao1@...wei.com,
yangerkun@...wei.com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery
Hi,
在 2023/05/22 19:54, linan666@...weicloud.com 写道:
> From: Li Nan <linan122@...wei.com>
>
> Recovery will go to giveup and let chunks_skipped++ in
> raid10_sync_request() if there are some bad_blocks, and it will return
> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
> data is inconsistent but user think recovery is done, it is wrong.
>
> Fix it by set mirror's recovery_disabled and spare device shouln't be
> added to here.
>
> Signed-off-by: Li Nan <linan122@...wei.com>
> ---
> drivers/md/raid10.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index e21502c03b45..70cc87c7ee57 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> int chunks_skipped = 0;
> sector_t chunk_mask = conf->geo.chunk_mask;
> int page_idx = 0;
> + int error_disk = -1;
>
> /*
> * Allow skipping a full rebuild for incremental assembly
> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> return reshape_request(mddev, sector_nr, skipped);
>
> if (chunks_skipped >= conf->geo.raid_disks) {
> - /* if there has been nothing to do on any drive,
> + pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
> + test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync" : "recovery");
Line exceed 80 columns, and following.
> + if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
Resync has the same problem, right?
Thanks,
Kuai
> + /*
> + * recovery fail, set mirrors.recovory_disabled,
> + * device shouldn't be added to there.
> + */
> + conf->mirrors[error_disk].recovery_disabled = mddev->recovery_disabled;
> + return 0;
> + }
> + /*
> + * if there has been nothing to do on any drive,
> * then there is nothing to do at all..
> */
> *skipped = 1;
> @@ -3640,6 +3652,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
> mdname(mddev));
> mirror->recovery_disabled
> = mddev->recovery_disabled;
> + } else {
> + error_disk = i;
> }
> put_buf(r10_bio);
> if (rb2)
>
Powered by blists - more mailing lists