linux-kernel - Re: [PATCH v4 3/9] md: introduce md_bio_failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bfe630fa-7668-4a11-6033-20dca0c112f9@huaweicloud.com>
Date: Thu, 18 Sep 2025 09:09:45 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Kenta Akagi <k@...l.me>, Song Liu <song@...nel.org>,
 Mariusz Tkaczyk <mtkaczyk@...nel.org>, Shaohua Li <shli@...com>,
 Guoqing Jiang <jgq516@...il.com>
Cc: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
 "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v4 3/9] md: introduce md_bio_failure_error()

Hi,

在 2025/09/15 11:42, Kenta Akagi 写道:
> Add a new helper function md_bio_failure_error().
> It is serialized with md_error() under the same lock and works
> almost the same, but with two differences:
> 
> * Takes the failed bio as an argument
> * If MD_FAILFAST is set in bi_opf and the target rdev is LastDev,
>    it does not mark the rdev faulty
> 
> Failfast bios must not break the array, but in the current implementation
> this can happen. This is because MD_BROKEN was introduced in RAID1/RAID10
> and is set when md_error() is called on an rdev required for mddev
> operation. At the time failfast was introduced, this was not the case.
> 
> Before this commit, md_error() has already been serialized, and
> RAID1/RAID10 mark rdevs that must not be set Faulty by Failfast
> with the LastDev flag.
> 
> The actual change in bio error handling will follow in a later commit.
> 
> Signed-off-by: Kenta Akagi <k@...l.me>
> ---
>   drivers/md/md.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>   drivers/md/md.h |  4 +++-
>   2 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 5607578a6db9..65fdd9bae8f4 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8297,6 +8297,48 @@ void md_error(struct mddev *mddev, struct md_rdev *rdev)
>   }
>   EXPORT_SYMBOL(md_error);
>   
> +/** md_bio_failure_error() - md error handler for MD_FAILFAST bios
> + * @mddev: affected md device.
> + * @rdev: member device to fail.
> + * @bio: bio whose triggered device failure.
> + *
> + * This is almost the same as md_error(). That is, it is serialized at
> + * the same level as md_error, marks the rdev as Faulty, and changes
> + * the mddev status.
> + * However, if all of the following conditions are met, it does nothing.
> + * This is because MD_FAILFAST bios must not stopping the array.
> + *  * RAID1 or RAID10
> + *  * LastDev - if rdev becomes Faulty, mddev will stop
> + *  * The failed bio has MD_FAILFAST set
> + *
> + * Returns: true if _md_error() was called, false if not.
> + */
> +bool md_bio_failure_error(struct mddev *mddev, struct md_rdev *rdev, struct bio *bio)
> +{
> +	bool do_md_error = true;
> +
> +	spin_lock(&mddev->error_handle_lock);
> +	if (mddev->pers) {

With the respect this is only called from IO path, mddev->pers must be
checked already while submitting the bio. You can add a warn here like:

if (WARN_ON_ONCE(!mddev->pers))
	/* return true because we don't want caller to retry */
	return true;
> +		if (mddev->pers->head.id == ID_RAID1 ||
> +		    mddev->pers->head.id == ID_RAID10) {
> +			if (test_bit(LastDev, &rdev->flags) &&
> +			    test_bit(FailFast, &rdev->flags) &&
> +			    bio != NULL && (bio->bi_opf & MD_FAILFAST))
> +				do_md_error = false;
> +		}

As I suggested in patch 1, this can be:
	if (!mddev->pers->lastdev ||
	    !mddev->pers->lastdev(mddev, rdev, bio)) {
		__md_error(mddev, rdev);
		return true;
	}

Thanks,
Kuai

> +	}
> +
> +	if (do_md_error)
> +		_md_error(mddev, rdev);
> +	else
> +		pr_warn_ratelimited("md: %s: %s didn't do anything for %pg\n",
> +			mdname(mddev), __func__, rdev->bdev);
> +
> +	spin_unlock(&mddev->error_handle_lock);
> +	return do_md_error;
> +}
> +EXPORT_SYMBOL(md_bio_failure_error);
> +
>   /* seq_file implementation /proc/mdstat */
>   
>   static void status_unused(struct seq_file *seq)
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 5177cb609e4b..11389ea58431 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -283,7 +283,8 @@ enum flag_bits {
>   				 */
>   	LastDev,		/* This is the last working rdev.
>   				 * so don't use FailFast any more for
> -				 * metadata.
> +				 * metadata and don't Fail rdev
> +				 * when FailFast bio failure.
>   				 */
>   	CollisionCheck,		/*
>   				 * check if there is collision between raid1
> @@ -906,6 +907,7 @@ extern void md_write_end(struct mddev *mddev);
>   extern void md_done_sync(struct mddev *mddev, int blocks, int ok);
>   void _md_error(struct mddev *mddev, struct md_rdev *rdev);
>   extern void md_error(struct mddev *mddev, struct md_rdev *rdev);
> +extern bool md_bio_failure_error(struct mddev *mddev, struct md_rdev *rdev, struct bio *bio);
>   extern void md_finish_reshape(struct mddev *mddev);
>   void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
>   			struct bio *bio, sector_t start, sector_t size);
>