linux-kernel - Re: [PATCH] md: ensure consistent action state in md_do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <91ff3d55-e9cf-364e-8039-1c739a132b4a@huaweicloud.com>
Date: Mon, 1 Sep 2025 15:18:41 +0800
From: Li Nan <linan666@...weicloud.com>
To: Li Nan <linan666@...weicloud.com>, Paul Menzel <pmenzel@...gen.mpg.de>
Cc: song@...nel.org, yukuai3@...wei.com, linux-raid@...r.kernel.org,
 linux-kernel@...r.kernel.org, yangerkun@...wei.com, yi.zhang@...wei.com
Subject: Re: [PATCH] md: ensure consistent action state in md_do_sync



在 2025/9/1 10:16, Li Nan 写道:
> 
> 
> 在 2025/8/30 17:51, Paul Menzel 写道:
>> Dear Nan,
>>
>>
>> Thank you for your patch.
>>
>> Am 30.08.25 um 11:05 schrieb linan666@...weicloud.com:
>>> From: Li Nan <linan122@...wei.com>
>>>
>>> The 'mddev->recovery' flags can change during md_do_sync(), leading to
>>> inconsistencies. For example, starting with MD_RECOVERY_RECOVER and
>>> ending with MD_RECOVERY_SYNC can cause incorrect offset updates.
>>
>> Can you give a concrete example?
>>
> 
> T1                    T2
> md_do_sync
>   action = ACTION_RECOVER
>                      (write sysfs)
>                      action_store
>                       set MD_RECOVERY_SYNC
>   [ do recovery ]
>   update resync_offset
> 
> The corresponding code is:
> ```
>          if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery) &&
>              mddev->curr_resync > MD_RESYNC_ACTIVE) {
>                  if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {    
> ->SYNC is set, But what we do is recovery
>                          if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
>                                  if (mddev->curr_resync >= 
> mddev->resync_offset) {
>                                          pr_debug("md: checkpointing %s of 
> %s.\n",
>                                                   desc, mdname(mddev));
>                                          if (test_bit(MD_RECOVERY_ERROR,
>                                                  &mddev->recovery))
>                                                  mddev->resync_offset =
> 
> mddev->curr_resync_completed;
>                                          else
>                                                  mddev->resync_offset =
>                                                          mddev->curr_resync;
>                                  }
> ```
> 
>>> To avoid this, use the 'action' determined at the beginning of the
>>> function instead of repeatedly checking 'mddev->recovery'.
>>
>> Do you have a reproducer?
>>
> 
> I don't have a reproducer because reproducing it requires modifying the
> kernel. The approximate steps are:
> 
> - Modify the kernel to add a delay before the above check.
> - Trigger recovery by removing and adding disks.
> - After recovery completes, write to the sysfs interface at the delay point
> to set the sync flag.
> 

Please ignore my previous reply — it was wrong. When MD_RECOVERY_RUNNING
is set, the recovery state should not be changed, so this is just a
cleanup. I will further improve the code about sync finish in v2.

-- 
Thanks,
Nan