linux-kernel - Re: [PATCH v4 3/9] md: introduce md_bio_failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <010601995d53e601-62141e26-a228-405c-a145-728744b06616-000000@ap-northeast-1.amazonses.com>
Date: Thu, 18 Sep 2025 14:56:43 +0000
From: Kenta Akagi <k@...l.me>
To: yukuai1@...weicloud.com, song@...nel.org, mtkaczyk@...nel.org, 
	shli@...com, jgq516@...il.com
Cc: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org, 
	yukuai3@...wei.com, k@...l.me
Subject: Re: [PATCH v4 3/9] md: introduce md_bio_failure_error()



On 2025/09/18 10:09, Yu Kuai wrote:
> Hi,
> 
> 在 2025/09/15 11:42, Kenta Akagi 写道:
>> Add a new helper function md_bio_failure_error().
>> It is serialized with md_error() under the same lock and works
>> almost the same, but with two differences:
>>
>> * Takes the failed bio as an argument
>> * If MD_FAILFAST is set in bi_opf and the target rdev is LastDev,
>>    it does not mark the rdev faulty
>>
>> Failfast bios must not break the array, but in the current implementation
>> this can happen. This is because MD_BROKEN was introduced in RAID1/RAID10
>> and is set when md_error() is called on an rdev required for mddev
>> operation. At the time failfast was introduced, this was not the case.
>>
>> Before this commit, md_error() has already been serialized, and
>> RAID1/RAID10 mark rdevs that must not be set Faulty by Failfast
>> with the LastDev flag.
>>
>> The actual change in bio error handling will follow in a later commit.
>>
>> Signed-off-by: Kenta Akagi <k@...l.me>
>> ---
>>   drivers/md/md.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>>   drivers/md/md.h |  4 +++-
>>   2 files changed, 45 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 5607578a6db9..65fdd9bae8f4 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -8297,6 +8297,48 @@ void md_error(struct mddev *mddev, struct md_rdev *rdev)
>>   }
>>   EXPORT_SYMBOL(md_error);
>>   +/** md_bio_failure_error() - md error handler for MD_FAILFAST bios
>> + * @mddev: affected md device.
>> + * @rdev: member device to fail.
>> + * @bio: bio whose triggered device failure.
>> + *
>> + * This is almost the same as md_error(). That is, it is serialized at
>> + * the same level as md_error, marks the rdev as Faulty, and changes
>> + * the mddev status.
>> + * However, if all of the following conditions are met, it does nothing.
>> + * This is because MD_FAILFAST bios must not stopping the array.
>> + *  * RAID1 or RAID10
>> + *  * LastDev - if rdev becomes Faulty, mddev will stop
>> + *  * The failed bio has MD_FAILFAST set
>> + *
>> + * Returns: true if _md_error() was called, false if not.
>> + */
>> +bool md_bio_failure_error(struct mddev *mddev, struct md_rdev *rdev, struct bio *bio)
>> +{
>> +    bool do_md_error = true;
>> +
>> +    spin_lock(&mddev->error_handle_lock);
>> +    if (mddev->pers) {
> 
> With the respect this is only called from IO path, mddev->pers must be
> checked already while submitting the bio. You can add a warn here like:
> 
> if (WARN_ON_ONCE(!mddev->pers))
>     /* return true because we don't want caller to retry */
>     return true;

Thank you, Understood. I will fix it.

>> +        if (mddev->pers->head.id == ID_RAID1 ||
>> +            mddev->pers->head.id == ID_RAID10) {
>> +            if (test_bit(LastDev, &rdev->flags) &&
>> +                test_bit(FailFast, &rdev->flags) &&
>> +                bio != NULL && (bio->bi_opf & MD_FAILFAST))
>> +                do_md_error = false;
>> +        }
> 
> As I suggested in patch 1, this can be:
>     if (!mddev->pers->lastdev ||
>         !mddev->pers->lastdev(mddev, rdev, bio)) {
>         __md_error(mddev, rdev);
>         return true;
>     }

Understood this too.

Thanks,
Akagi

> 
> Thanks,
> Kuai
> 
>> +    }
>> +
>> +    if (do_md_error)
>> +        _md_error(mddev, rdev);
>> +    else
>> +        pr_warn_ratelimited("md: %s: %s didn't do anything for %pg\n",
>> +            mdname(mddev), __func__, rdev->bdev);
>> +
>> +    spin_unlock(&mddev->error_handle_lock);
>> +    return do_md_error;
>> +}
>> +EXPORT_SYMBOL(md_bio_failure_error);
>> +
>>   /* seq_file implementation /proc/mdstat */
>>     static void status_unused(struct seq_file *seq)
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index 5177cb609e4b..11389ea58431 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -283,7 +283,8 @@ enum flag_bits {
>>                    */
>>       LastDev,        /* This is the last working rdev.
>>                    * so don't use FailFast any more for
>> -                 * metadata.
>> +                 * metadata and don't Fail rdev
>> +                 * when FailFast bio failure.
>>                    */
>>       CollisionCheck,        /*
>>                    * check if there is collision between raid1
>> @@ -906,6 +907,7 @@ extern void md_write_end(struct mddev *mddev);
>>   extern void md_done_sync(struct mddev *mddev, int blocks, int ok);
>>   void _md_error(struct mddev *mddev, struct md_rdev *rdev);
>>   extern void md_error(struct mddev *mddev, struct md_rdev *rdev);
>> +extern bool md_bio_failure_error(struct mddev *mddev, struct md_rdev *rdev, struct bio *bio);
>>   extern void md_finish_reshape(struct mddev *mddev);
>>   void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
>>               struct bio *bio, sector_t start, sector_t size);
>>
> 
>