linux-kernel - Re: [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when using FailFast

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3a3f341e-fe6f-1c5c-a435-beafc9307ea1@huaweicloud.com>
Date: Tue, 6 Jan 2026 19:14:35 +0800
From: Li Nan <linan666@...weicloud.com>
To: Xiao Ni <xni@...hat.com>, Li Nan <linan666@...weicloud.com>
Cc: Kenta Akagi <k@...l.me>, Song Liu <song@...nel.org>,
 Yu Kuai <yukuai@...as.com>, Shaohua Li <shli@...com>,
 Mariusz Tkaczyk <mtkaczyk@...nel.org>, linux-raid@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when
 using FailFast



在 2026/1/6 17:25, Xiao Ni 写道:
> On Tue, Jan 6, 2026 at 5:11 PM Li Nan <linan666@...weicloud.com> wrote:
>>
>>
>>
>> 在 2026/1/6 15:59, Xiao Ni 写道:
>>> On Tue, Jan 6, 2026 at 10:57 AM Li Nan <linan666@...weicloud.com> wrote:
>>>>
>>>>
>>>>
>>>> 在 2026/1/5 22:40, Kenta Akagi 写道:
>>>>> After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"),
>>>>> if the error handler is called on the last rdev in RAID1 or RAID10,
>>>>> the MD_BROKEN flag will be set on that mddev.
>>>>> When MD_BROKEN is set, write bios to the md will result in an I/O error.
>>>>>
>>>>> This causes a problem when using FailFast.
>>>>> The current implementation of FailFast expects the array to continue
>>>>> functioning without issues even after calling md_error for the last
>>>>> rdev.  Furthermore, due to the nature of its functionality, FailFast may
>>>>> call md_error on all rdevs of the md. Even if retrying I/O on an rdev
>>>>> would succeed, it first calls md_error before retrying.
>>>>>
>>>>> To fix this issue, this commit ensures that for RAID1 and RAID10, if the
>>>>> last In_sync rdev has the FailFast flag set and the mddev's fail_last_dev
>>>>> is off, the MD_BROKEN flag will not be set on that mddev.
>>>>>
>>>>> This change impacts userspace. After this commit, If the rdev has the
>>>>> FailFast flag, the mddev never broken even if the failing bio is not
>>>>> FailFast. However, it's unlikely that any setup using FailFast expects
>>>>> the array to halt when md_error is called on the last rdev.
>>>>>
>>>>
>>>> In the current RAID design, when an IO error occurs, RAID ensures faulty
>>>> data is not read via the following actions:
>>>> 1. Mark the badblocks (no FailFast flag); if this fails,
>>>> 2. Mark the disk as Faulty.
>>>>
>>>> If neither action is taken, and BROKEN is not set to prevent continued RAID
>>>> use, errors on the last remaining disk will be ignored. Subsequent reads
>>>> may return incorrect data. This seems like a more serious issue in my opinion.
>>>>
>>>> In scenarios with a large number of transient IO errors, is FailFast not a
>>>> suitable configuration? As you mentioned: "retrying I/O on an rdev would
>>>> succeed".
>>>
>>> Hi Nan
>>>
>>> According to my understanding, the policy here is to try to keep raid
>>> work if io error happens on the last device. It doesn't set faulty on
>>> the last in_sync device. It only sets MD_BROKEN to forbid write
>>> requests. But it still can read data from the last device.
>>>
>>> static void raid1_error(struct mddev *mddev, struct md_rdev *rdev)
>>> {
>>>
>>>       if (test_bit(In_sync, &rdev->flags) &&
>>>           (conf->raid_disks - mddev->degraded) == 1) {
>>>           set_bit(MD_BROKEN, &mddev->flags);
>>>
>>>           if (!mddev->fail_last_dev) {
>>>               return;  // return directly here
>>>           }
>>>
>>>
>>>
>>> static void md_submit_bio(struct bio *bio)
>>> {
>>>       if (unlikely(test_bit(MD_BROKEN, &mddev->flags)) && (rw == WRITE)) {
>>>           bio_io_error(bio);
>>>           return;
>>>       }
>>>
>>> Read requests can submit to the last working device. Right?
>>>
>>> Best Regards
>>> Xiao
>>>
>>
>> Yeah, after MD_BROKEN is set, read are forbidden but writes remain allowed.
> 
> Hmm, reverse way? Write requests are forbidden and read requests are
> allowed now. If MD_BROKEN is set, write requests return directly after
> bio_io_error.
> 
> Regards
> Xiao
> 

Apologies for the typo... The rest of the content was written with this
exact meaning in mind.

>> IMO we preserve the RAID array in this state to enable users to retrieve
>> stored data, not to continue using it. However, continued writes to the
>> array will cause subsequent errors to fail to be logged, either due to
>> failfast or the badblocks being full. Read errors have no impact as they do
>> not damage the original data.
>>

-- 
Thanks,
Nan