linux-kernel - Re: [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when using FailFast

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALTww282Kyg9ERXeSiY5Thd-O40vEXjAXsD8A2PxEsd-h-Cg3Q@mail.gmail.com>
Date: Wed, 7 Jan 2026 11:35:47 +0800
From: Xiao Ni <xni@...hat.com>
To: Kenta Akagi <k@...l.me>
Cc: linan666@...weicloud.com, linux-raid@...r.kernel.org, 
	linux-kernel@...r.kernel.org, song@...nel.org, yukuai@...as.com, shli@...com, 
	mtkaczyk@...nel.org
Subject: Re: [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when
 using FailFast

On Tue, Jan 6, 2026 at 8:30 PM Kenta Akagi <k@...l.me> wrote:
>
> Hi,
> Thank you for reviewing.
>
> On 2026/01/06 11:57, Li Nan wrote:
> >
> >
> > 在 2026/1/5 22:40, Kenta Akagi 写道:
> >> After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"),
> >> if the error handler is called on the last rdev in RAID1 or RAID10,
> >> the MD_BROKEN flag will be set on that mddev.
> >> When MD_BROKEN is set, write bios to the md will result in an I/O error.
> >>
> >> This causes a problem when using FailFast.
> >> The current implementation of FailFast expects the array to continue
> >> functioning without issues even after calling md_error for the last
> >> rdev.  Furthermore, due to the nature of its functionality, FailFast may
> >> call md_error on all rdevs of the md. Even if retrying I/O on an rdev
> >> would succeed, it first calls md_error before retrying.
> >>
> >> To fix this issue, this commit ensures that for RAID1 and RAID10, if the
> >> last In_sync rdev has the FailFast flag set and the mddev's fail_last_dev
> >> is off, the MD_BROKEN flag will not be set on that mddev.
> >>
> >> This change impacts userspace. After this commit, If the rdev has the
> >> FailFast flag, the mddev never broken even if the failing bio is not
> >> FailFast. However, it's unlikely that any setup using FailFast expects
> >> the array to halt when md_error is called on the last rdev.
> >>
> >
> > In the current RAID design, when an IO error occurs, RAID ensures faulty
> > data is not read via the following actions:
> > 1. Mark the badblocks (no FailFast flag); if this fails,
> > 2. Mark the disk as Faulty.
> >
> > If neither action is taken, and BROKEN is not set to prevent continued RAID
> > use, errors on the last remaining disk will be ignored. Subsequent reads
> > may return incorrect data. This seems like a more serious issue in my opinion.
>
> I agree that data inconsistency can certainly occur in this scenario.
>
> However, a RAID1 with only one remaining rdev can considered the same as a plain
> disk. From that perspective, I do not believe it is the mandatory responsibility
> of md raid to block subsequent writes nor prevent data inconsistency in this situation.
>
> The commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") that introduced
> BROKEN for RAID1/10 also does not seem to have done so for that responsibility.
>
> >
> > In scenarios with a large number of transient IO errors, is FailFast not a
> > suitable configuration? As you mentioned: "retrying I/O on an rdev would
>
> It seems be right about that. Using FailFast with unstable underlayer is not good.
> However, as md raid, which is issuer of FailFast bios,
> I believe it is incorrect to shutdown the array due to the failure of a FailFast bio.

Hi all

I understand @Li Nan 's point now. The badblock can't be recorded in
this situation and the last working device is not set to faulty. To be
frank, I think consistency of data is more important. Users don't
think it's a single disk, they must think raid1 should guarantee the
consistency. But the write request should return an error when calling
raid1_error for the last working device, right? So there is no
consistency problem?

hi, Kenta. I have a question too. What will you do in your environment
after the network connection works again? Add those disks one by one
to do recovery?

Best Regards
Xiao

>
> Thanks,
> Akagi
>
> > succeed".
> >
> > --
> > Thanks,
> > Nan
> >
> >
>