linux-kernel - Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <10e164cc-149f-baf6-de52-0b7d3c9468f6@huaweicloud.com>
Date:   Thu, 25 May 2023 22:00:23 +0800
From:   Li Nan <linan666@...weicloud.com>
To:     Yu Kuai <yukuai1@...weicloud.com>, linan666@...weicloud.com,
        song@...nel.org, shli@...com, allenpeng@...ology.com,
        alexwu@...ology.com, bingjingc@...ology.com, neilb@...e.de
Cc:     linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
        yi.zhang@...wei.com, houtao1@...wei.com, yangerkun@...wei.com,
        "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery



在 2023/5/22 21:54, Yu Kuai 写道:
> Hi,
> 
> 在 2023/05/22 19:54, linan666@...weicloud.com 写道:
>> From: Li Nan <linan122@...wei.com>
>>
>> Recovery will go to giveup and let chunks_skipped++ in
>> raid10_sync_request() if there are some bad_blocks, and it will return
>> max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
>> data is inconsistent but user think recovery is done, it is wrong.
>>
>> Fix it by set mirror's recovery_disabled and spare device shouln't be
>> added to here.
>>
>> Signed-off-by: Li Nan <linan122@...wei.com>
>> ---
>>   drivers/md/raid10.c | 16 +++++++++++++++-
>>   1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index e21502c03b45..70cc87c7ee57 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev 
>> *mddev, sector_t sector_nr,
>>       int chunks_skipped = 0;
>>       sector_t chunk_mask = conf->geo.chunk_mask;
>>       int page_idx = 0;
>> +    int error_disk = -1;
>>       /*
>>        * Allow skipping a full rebuild for incremental assembly
>> @@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct 
>> mddev *mddev, sector_t sector_nr,
>>           return reshape_request(mddev, sector_nr, skipped);
>>       if (chunks_skipped >= conf->geo.raid_disks) {
>> -        /* if there has been nothing to do on any drive,
>> +        pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
>> +            test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?  "resync" 
>> : "recovery");
> 
> Line exceed 80 columns, and following.
>> +        if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, 
>> &mddev->recovery)) {
> 
> Resync has the same problem, right?
> 

Yes. But I have no idea to fix it. md_error disk nor set 
recovery_disabled is a good solution. So, just print error message now.
Do you have any ideas?
-- 
Thanks,
Nan