linux-kernel - Re: [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when using FailFast

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e32e34aa-bc07-25bd-f361-424ec01c14d1@huaweicloud.com>
Date: Tue, 6 Jan 2026 17:11:23 +0800
From: Li Nan <linan666@...weicloud.com>
To: Xiao Ni <xni@...hat.com>, Li Nan <linan666@...weicloud.com>
Cc: Kenta Akagi <k@...l.me>, Song Liu <song@...nel.org>,
 Yu Kuai <yukuai@...as.com>, Shaohua Li <shli@...com>,
 Mariusz Tkaczyk <mtkaczyk@...nel.org>, linux-raid@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when
 using FailFast



在 2026/1/6 15:59, Xiao Ni 写道:
> On Tue, Jan 6, 2026 at 10:57 AM Li Nan <linan666@...weicloud.com> wrote:
>>
>>
>>
>> 在 2026/1/5 22:40, Kenta Akagi 写道:
>>> After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"),
>>> if the error handler is called on the last rdev in RAID1 or RAID10,
>>> the MD_BROKEN flag will be set on that mddev.
>>> When MD_BROKEN is set, write bios to the md will result in an I/O error.
>>>
>>> This causes a problem when using FailFast.
>>> The current implementation of FailFast expects the array to continue
>>> functioning without issues even after calling md_error for the last
>>> rdev.  Furthermore, due to the nature of its functionality, FailFast may
>>> call md_error on all rdevs of the md. Even if retrying I/O on an rdev
>>> would succeed, it first calls md_error before retrying.
>>>
>>> To fix this issue, this commit ensures that for RAID1 and RAID10, if the
>>> last In_sync rdev has the FailFast flag set and the mddev's fail_last_dev
>>> is off, the MD_BROKEN flag will not be set on that mddev.
>>>
>>> This change impacts userspace. After this commit, If the rdev has the
>>> FailFast flag, the mddev never broken even if the failing bio is not
>>> FailFast. However, it's unlikely that any setup using FailFast expects
>>> the array to halt when md_error is called on the last rdev.
>>>
>>
>> In the current RAID design, when an IO error occurs, RAID ensures faulty
>> data is not read via the following actions:
>> 1. Mark the badblocks (no FailFast flag); if this fails,
>> 2. Mark the disk as Faulty.
>>
>> If neither action is taken, and BROKEN is not set to prevent continued RAID
>> use, errors on the last remaining disk will be ignored. Subsequent reads
>> may return incorrect data. This seems like a more serious issue in my opinion.
>>
>> In scenarios with a large number of transient IO errors, is FailFast not a
>> suitable configuration? As you mentioned: "retrying I/O on an rdev would
>> succeed".
> 
> Hi Nan
> 
> According to my understanding, the policy here is to try to keep raid
> work if io error happens on the last device. It doesn't set faulty on
> the last in_sync device. It only sets MD_BROKEN to forbid write
> requests. But it still can read data from the last device.
> 
> static void raid1_error(struct mddev *mddev, struct md_rdev *rdev)
> {
> 
>      if (test_bit(In_sync, &rdev->flags) &&
>          (conf->raid_disks - mddev->degraded) == 1) {
>          set_bit(MD_BROKEN, &mddev->flags);
> 
>          if (!mddev->fail_last_dev) {
>              return;  // return directly here
>          }
> 
> 
> 
> static void md_submit_bio(struct bio *bio)
> {
>      if (unlikely(test_bit(MD_BROKEN, &mddev->flags)) && (rw == WRITE)) {
>          bio_io_error(bio);
>          return;
>      }
> 
> Read requests can submit to the last working device. Right?
> 
> Best Regards
> Xiao
> 

Yeah, after MD_BROKEN is set, read are forbidden but writes remain allowed.
IMO we preserve the RAID array in this state to enable users to retrieve
stored data, not to continue using it. However, continued writes to the
array will cause subsequent errors to fail to be logged, either due to
failfast or the badblocks being full. Read errors have no impact as they do
not damage the original data.

-- 
Thanks,
Nan