lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <917d7c23-eefc-efc5-1b12-949a684900bc@huaweicloud.com>
Date:   Fri, 2 Jun 2023 14:57:46 +0800
From:   Li Nan <linan666@...weicloud.com>
To:     Paul Menzel <pmenzel@...gen.mpg.de>, linan666@...weicloud.com
Cc:     song@...nel.org, neilb@...e.de, linux-raid@...r.kernel.org,
        linux-kernel@...r.kernel.org, yukuai3@...wei.com,
        yi.zhang@...wei.com, houtao1@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v5 1/2] md/raid10: fix incorrect done of recovery



在 2023/6/1 15:06, Paul Menzel 写道:
> Dear Li,
> 
> 
> Thank you for your patch.
> 
> Am 01.06.23 um 08:24 schrieb linan666@...weicloud.com:
>> From: Li Nan <linan122@...wei.com>
> 
> Unfortunately, I do not understand your commit message summary “fix 
> incorrect done of recovery”. Maybe:
> 
> Do not add sparse disk when recovery aborts
> 

"recovery fail" is better?

>> In raid10_sync_request(), if data cannot be read from any disk for
>> recovery, it will go to 'giveup' and let 'chunks_skipped' + 1. After
>> multiple 'giveup', when 'chunks_skipped >= geo.raid_disks', it will
>> return 'max_sector', indicating that the recovery has been completed.
>> However, the recovery is just aborted and the data remains inconsistent.
>>
>> Fix it by setting mirror->recovery_disabled, which will prevent the spare
>> disk from being added to this mirror. The same issue also exists during
>> resync, it will be fixed afterwards.
>>
>> Signed-off-by: Li Nan <linan122@...wei.com>
>> ---
>>   drivers/md/raid10.c | 18 +++++++++++++++++-
>>   1 file changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index d93d8cb2b620..3ba1516ea160 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev 
>> *mddev, sector_t sector_nr,
>>       int chunks_skipped = 0;
>>       sector_t chunk_mask = conf->geo.chunk_mask;
>>       int page_idx = 0;
>> +    int error_disk = -1;
>>       /*
>>        * Allow skipping a full rebuild for incremental assembly
>> @@ -3386,7 +3387,20 @@ static sector_t raid10_sync_request(struct 
>> mddev *mddev, sector_t sector_nr,
>>           return reshape_request(mddev, sector_nr, skipped);
>>       if (chunks_skipped >= conf->geo.raid_disks) {
>> -        /* if there has been nothing to do on any drive,
>> +        pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
>> +            test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" 
>> : "recovery");
>> +        if (error_disk >= 0 &&
>> +            !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
>> +            /*
>> +             * recovery fail, set mirrors.recovory_disabled,
> 
> recov*e*ry
> 
>> +             * device shouldn't be added to there.
>> +             */
>> +            conf->mirrors[error_disk].recovery_disabled =
>> +                        mddev->recovery_disabled;
>> +            return 0;
>> +        }
>> +        /*
>> +         * if there has been nothing to do on any drive,
>>            * then there is nothing to do at all..
> 
> Just one dot/period at the end?
>
Thanks for your suggestion. I will change it in next version.

-- 
Thanks,
Nan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ