linux-kernel - Re: [PATCH v4 4/9] md/raid1,raid10: Don't set MD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d0de7500-eac0-4d02-9b48-887cdefab4c1@fwd.mgml.me>
Date: Wed, 24 Sep 2025 00:54:44 +0900
From: Kenta Akagi <k@....mgml.me>
To: Yu Kuai <hailan@...uai.org.cn>, yukuai1@...weicloud.com, song@...nel.org,
        mtkaczyk@...nel.org, shli@...com, jgq516@...il.com
Cc: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
        yukuai3@...wei.com, Kenta Akagi <k@....mgml.me>
Subject: Re: [PATCH v4 4/9] md/raid1,raid10: Don't set MD_BROKEN on failfast
 bio failure

Hi,

On 2025/09/20 18:51, Yu Kuai wrote:
> Hi,
> 
> 在 2025/9/20 14:30, Kenta Akagi 写道:
>> Hi,
>>
>> I have changed my email address because our primary MX server
>> suddenly started rejecting non-DKIM mail.
>>
>> On 2025/09/19 10:36, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2025/9/18 23:22, Kenta Akagi 写道:
>>>>>> @@ -470,7 +470,7 @@ static void raid1_end_write_request(struct bio *bio)
>>>>>>                 (bio->bi_opf & MD_FAILFAST) &&
>>>>>>                 /* We never try FailFast to WriteMostly devices */
>>>>>>                 !test_bit(WriteMostly, &rdev->flags)) {
>>>>>> -            md_error(r1_bio->mddev, rdev);
>>>>>> +            md_bio_failure_error(r1_bio->mddev, rdev, bio);
>>>>>>             }
>>>>> Can following check of faulty replaced with return value?
>>>> In the case where raid1_end_write_request is called for a non-failfast IO,
>>>> and the rdev has already been marked Faulty by another bio, it must not retry too.
>>>> I think it would be simpler not to use a return value here.
>>> You can just add Faulty check inside md_bio_failure_error() as well, and both
>>> failfast and writemostly check.
>> Sorry, I'm not sure I understand this part.
>> In raid1_end_write_request, this code path is also used for a regular bio,
>> not only for FailFast.
>>
>> You mean to change md_bio_failure_error as follows:
>> * If the rdev is Faulty, immediately return true.
>> * If the given bio is Failfast and the rdev is not the lastdev, call md_error.
>> * If the given bio is not Failfast, do nothing and return false.
> 
> Yes, doesn't that apply to all the callers?

It's difficult because the flow differs depending on the function. 
For example, in raid1_end_write_request, if rdev and bio are Failfast but not Writemostly,
it calls md_error, and then performs a something if it is Faulty regardless
of whether it is Failfast or not. This flow is specific to raid1_end_write_request.

Other functions that need to be changed to md_bio_failure_error are handle_read_error
and fix_sync_read_error, but the path for determining whether these are Faulty,
regardless of whether they are Failfast, is not exists there functions.

It may be possible with some refactoring,
but I think raid1_end_write_request current style, that is
if(Failfast) md_bio_failure_error();
if(Faulty) something;
would be better because We can see at a glance what is happening.

BTW, fix_sync_read_error can use the return value of md_bio_failure_error as
suggested. so I'll revise it as follows:

@@ -2167,8 +2174,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
        if (test_bit(FailFast, &rdev->flags)) {
                /* Don't try recovering from here - just fail it
                 * ... unless it is the last working device of course */
-               md_bio_failure_error(mddev, rdev, bio);
-               if (test_bit(Faulty, &rdev->flags))
+               if (md_bio_failure_error(mddev, rdev, bio))
                        /* Don't try to read from here, but make sure
                         * put_buf does it's thing
                         */

> 
>>
>> And then apply this?
>> This is complicated. Wouldn't it be better to keep the Faulty check as it is?
>>
>> @@ -466,18 +466,12 @@ static void raid1_end_write_request(struct bio *bio)
>>                          set_bit(MD_RECOVERY_NEEDED, &
>>                                  conf->mddev->recovery);
>>
>> -               if (test_bit(FailFast, &rdev->flags) &&
>> -                   (bio->bi_opf & MD_FAILFAST) &&
>> -                   /* We never try FailFast to WriteMostly devices */
>> -                   !test_bit(WriteMostly, &rdev->flags)) {
>> -                       md_error(r1_bio->mddev, rdev);
>> -               }
>> -
>>                  /*
>>                   * When the device is faulty, it is not necessary to
>>                   * handle write error.
>>                   */
>> -               if (!test_bit(Faulty, &rdev->flags))
>> +               if (!test_bit(Faulty, &rdev->flags) ||
>> +                   !md_bio_failure_error(r1_bio->mddev, rdev, bio))
>>                          set_bit(R1BIO_WriteError, &r1_bio->state);
>>                  else {
>>                          /* Finished with this branch */
> 
> Faulty is set with lock held, so check Faulty with lock held as well can
> prevent rdev to be Faulty concurrently, and this check can be added to all
> callers, I think.
> 
>>
>> Or do you mean a fix like this?
>>
>> @@ -466,23 +466,24 @@ static void raid1_end_write_request(struct bio *bio)
>>                          set_bit(MD_RECOVERY_NEEDED, &
>>                                  conf->mddev->recovery);
>>
>> -               if (test_bit(FailFast, &rdev->flags) &&
>> -                   (bio->bi_opf & MD_FAILFAST) &&
>> -                   /* We never try FailFast to WriteMostly devices */
>> -                   !test_bit(WriteMostly, &rdev->flags)) {
>> -                       md_error(r1_bio->mddev, rdev);
>> -               }
>> -
>>                  /*
>>                   * When the device is faulty, it is not necessary to
>>                   * handle write error.
>>                   */
>> -               if (!test_bit(Faulty, &rdev->flags))
>> -                       set_bit(R1BIO_WriteError, &r1_bio->state);
>> -               else {
>> +               if (test_bit(Faulty, &rdev->flags) ||
>> +                   (
>> +                   test_bit(FailFast, &rdev->flags) &&
>> +                   (bio->bi_opf & MD_FAILFAST) &&
>> +                   /* We never try FailFast to WriteMostly devices */
>> +                   !test_bit(WriteMostly, &rdev->flags) &&
>> +                   md_bio_failure_error(r1_bio->mddev, rdev, bio)
>> +                   )
>> +               ) {
>>                          /* Finished with this branch */
>>                          r1_bio->bios[mirror] = NULL;
>>                          to_put = bio;
>> +               } else {
>> +                       set_bit(R1BIO_WriteError, &r1_bio->state);
>>                  }
>>          } else {
>>                  /*
> 
> No, this just make code even more unreadable.

Understood.

Thanks,
Akagi

> 
> Thanks,
> Kuai
> 
>> Thanks,
>> Akagi
>>
>>> Thanks,
>>> Kuai
>>>
>>>
>>>