linux-kernel - Re: [PATCH v2 06/11] md: remove MD_RECOVERY_ERROR handling and simplify resync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f2822f91-d48f-a99e-02dc-36c0b4c4b633@huaweicloud.com>
Date: Mon, 10 Nov 2025 20:17:57 +0800
From: Li Nan <linan666@...weicloud.com>
To: yukuai@...as.com, linan666@...weicloud.com, song@...nel.org,
 neil@...wn.name, namhyung@...il.com
Cc: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org, xni@...hat.com,
 k@...l.me, yangerkun@...wei.com, yi.zhang@...wei.com
Subject: Re: [PATCH v2 06/11] md: remove MD_RECOVERY_ERROR handling and
 simplify resync_offset update



在 2025/11/8 18:22, Yu Kuai 写道:
> Hi,
> 
> 在 2025/11/6 19:59, linan666@...weicloud.com 写道:
>> From: Li Nan <linan122@...wei.com>
>>
>> When sync IO failed and setting badblock also failed, unsynced disk
>> might be kicked via setting 'recovery_disable' without Faulty flag.
>> MD_RECOVERY_ERROR was set in md_sync_error() to prevent updating
>> 'resync_offset', avoiding reading the failed sync sectors.
>>
>> Previous patch ensures disk is marked Faulty when badblock setting fails.
>> Remove MD_RECOVERY_ERROR handling as it's no longer needed - failed sync
>> sectors are unreadable either via badblock or Faulty disk.
>>
>> Simplify resync_offset update logic.
>>
>> Signed-off-by: Li Nan <linan122@...wei.com>
>> ---
>>    drivers/md/md.h |  2 --
>>    drivers/md/md.c | 23 +++++------------------
>>    2 files changed, 5 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index 18621dba09a9..c5b5377e9049 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -644,8 +644,6 @@ enum recovery_flags {
>>    	MD_RECOVERY_FROZEN,
>>    	/* waiting for pers->start() to finish */
>>    	MD_RECOVERY_WAIT,
>> -	/* interrupted because io-error */
>> -	MD_RECOVERY_ERROR,
>>    
>>    	/* flags determines sync action, see details in enum sync_action */
>>    
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 2bdbb5b0e9e1..71988d8f5154 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -8949,7 +8949,6 @@ void md_sync_error(struct mddev *mddev)
>>    {
>>    	// stop recovery, signal do_sync ....
>>    	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -	set_bit(MD_RECOVERY_ERROR, &mddev->recovery);
>>    	md_wakeup_thread(mddev->thread);
>>    }
>>    EXPORT_SYMBOL(md_sync_error);
>> @@ -9603,8 +9602,8 @@ void md_do_sync(struct md_thread *thread)
>>    	wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
>>    
>>    	if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
>> -	    !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
> 
> Why the above checking is removed?
> 
> Thanks,
> Kuai
> 

Before patch 05, a error sync IO might end and decrement recovery_active,
but its error handling is not completed. It sets recovery_disabled and
MD_RECOVERY_INTR, then remove the error disk later. If
'curr_resync_completed' is updated before the disk is removed, it may cause
reading from the sync-failed regions.

After patch 05, the error IO will definitely be handled. After waiting for
'recovery_active' to become 0 in the previous line, all sync IO has
completed regardless of whether MD_RECOVERY_INTR is set. Thus, this check
can be removed.

So I added the following comment:

>>    	    mddev->curr_resync >= MD_RESYNC_ACTIVE) {
>> +		/* All sync IO completes after recovery_active becomes 0 */
>>    		mddev->curr_resync_completed = mddev->curr_resync;

Since the logic behind this change is complex, should I separate it into a
new commit?

-- 
Thanks,
Nan