linux-kernel - Re: [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when using FailFast

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALTww2-6F8+TMqLTzQYEhSiWL+cFLn4JQqd1GXai5ekTxo8=3w@mail.gmail.com>
Date: Tue, 6 Jan 2026 17:25:41 +0800
From: Xiao Ni <xni@...hat.com>
To: Li Nan <linan666@...weicloud.com>
Cc: Kenta Akagi <k@...l.me>, Song Liu <song@...nel.org>, Yu Kuai <yukuai@...as.com>, 
	Shaohua Li <shli@...com>, Mariusz Tkaczyk <mtkaczyk@...nel.org>, linux-raid@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when
 using FailFast

On Tue, Jan 6, 2026 at 5:11 PM Li Nan <linan666@...weicloud.com> wrote:
>
>
>
> 在 2026/1/6 15:59, Xiao Ni 写道:
> > On Tue, Jan 6, 2026 at 10:57 AM Li Nan <linan666@...weicloud.com> wrote:
> >>
> >>
> >>
> >> 在 2026/1/5 22:40, Kenta Akagi 写道:
> >>> After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"),
> >>> if the error handler is called on the last rdev in RAID1 or RAID10,
> >>> the MD_BROKEN flag will be set on that mddev.
> >>> When MD_BROKEN is set, write bios to the md will result in an I/O error.
> >>>
> >>> This causes a problem when using FailFast.
> >>> The current implementation of FailFast expects the array to continue
> >>> functioning without issues even after calling md_error for the last
> >>> rdev.  Furthermore, due to the nature of its functionality, FailFast may
> >>> call md_error on all rdevs of the md. Even if retrying I/O on an rdev
> >>> would succeed, it first calls md_error before retrying.
> >>>
> >>> To fix this issue, this commit ensures that for RAID1 and RAID10, if the
> >>> last In_sync rdev has the FailFast flag set and the mddev's fail_last_dev
> >>> is off, the MD_BROKEN flag will not be set on that mddev.
> >>>
> >>> This change impacts userspace. After this commit, If the rdev has the
> >>> FailFast flag, the mddev never broken even if the failing bio is not
> >>> FailFast. However, it's unlikely that any setup using FailFast expects
> >>> the array to halt when md_error is called on the last rdev.
> >>>
> >>
> >> In the current RAID design, when an IO error occurs, RAID ensures faulty
> >> data is not read via the following actions:
> >> 1. Mark the badblocks (no FailFast flag); if this fails,
> >> 2. Mark the disk as Faulty.
> >>
> >> If neither action is taken, and BROKEN is not set to prevent continued RAID
> >> use, errors on the last remaining disk will be ignored. Subsequent reads
> >> may return incorrect data. This seems like a more serious issue in my opinion.
> >>
> >> In scenarios with a large number of transient IO errors, is FailFast not a
> >> suitable configuration? As you mentioned: "retrying I/O on an rdev would
> >> succeed".
> >
> > Hi Nan
> >
> > According to my understanding, the policy here is to try to keep raid
> > work if io error happens on the last device. It doesn't set faulty on
> > the last in_sync device. It only sets MD_BROKEN to forbid write
> > requests. But it still can read data from the last device.
> >
> > static void raid1_error(struct mddev *mddev, struct md_rdev *rdev)
> > {
> >
> >      if (test_bit(In_sync, &rdev->flags) &&
> >          (conf->raid_disks - mddev->degraded) == 1) {
> >          set_bit(MD_BROKEN, &mddev->flags);
> >
> >          if (!mddev->fail_last_dev) {
> >              return;  // return directly here
> >          }
> >
> >
> >
> > static void md_submit_bio(struct bio *bio)
> > {
> >      if (unlikely(test_bit(MD_BROKEN, &mddev->flags)) && (rw == WRITE)) {
> >          bio_io_error(bio);
> >          return;
> >      }
> >
> > Read requests can submit to the last working device. Right?
> >
> > Best Regards
> > Xiao
> >
>
> Yeah, after MD_BROKEN is set, read are forbidden but writes remain allowed.

Hmm, reverse way? Write requests are forbidden and read requests are
allowed now. If MD_BROKEN is set, write requests return directly after
bio_io_error.

Regards
Xiao

> IMO we preserve the RAID array in this state to enable users to retrieve
> stored data, not to continue using it. However, continued writes to the
> array will cause subsequent errors to fail to be logged, either due to
> failfast or the badblocks being full. Read errors have no impact as they do
> not damage the original data.
>
> --
> Thanks,
> Nan
>