[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0106019b97324465-29511120-f4d0-4ac7-88e0-d0b998071b79-000000@ap-northeast-1.amazonses.com>
Date: Wed, 7 Jan 2026 06:43:32 +0000
From: Kenta Akagi <k@...l.me>
To: Xiao Ni <xni@...hat.com>
Cc: k@...l.me, linan666@...weicloud.com, linux-raid@...r.kernel.org,
linux-kernel@...r.kernel.org, song@...nel.org, yukuai@...as.com,
shli@...com, mtkaczyk@...nel.org
Subject: Re: [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10
when using FailFast
Hi,
On 2026/01/07 12:35, Xiao Ni wrote:
> On Tue, Jan 6, 2026 at 8:30 PM Kenta Akagi <k@...l.me> wrote:
>>
>> Hi,
>> Thank you for reviewing.
>>
>> On 2026/01/06 11:57, Li Nan wrote:
>>>
>>>
>>> 在 2026/1/5 22:40, Kenta Akagi 写道:
>>>> After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"),
>>>> if the error handler is called on the last rdev in RAID1 or RAID10,
>>>> the MD_BROKEN flag will be set on that mddev.
>>>> When MD_BROKEN is set, write bios to the md will result in an I/O error.
>>>>
>>>> This causes a problem when using FailFast.
>>>> The current implementation of FailFast expects the array to continue
>>>> functioning without issues even after calling md_error for the last
>>>> rdev. Furthermore, due to the nature of its functionality, FailFast may
>>>> call md_error on all rdevs of the md. Even if retrying I/O on an rdev
>>>> would succeed, it first calls md_error before retrying.
>>>>
>>>> To fix this issue, this commit ensures that for RAID1 and RAID10, if the
>>>> last In_sync rdev has the FailFast flag set and the mddev's fail_last_dev
>>>> is off, the MD_BROKEN flag will not be set on that mddev.
>>>>
>>>> This change impacts userspace. After this commit, If the rdev has the
>>>> FailFast flag, the mddev never broken even if the failing bio is not
>>>> FailFast. However, it's unlikely that any setup using FailFast expects
>>>> the array to halt when md_error is called on the last rdev.
>>>>
>>>
>>> In the current RAID design, when an IO error occurs, RAID ensures faulty
>>> data is not read via the following actions:
>>> 1. Mark the badblocks (no FailFast flag); if this fails,
>>> 2. Mark the disk as Faulty.
>>>
>>> If neither action is taken, and BROKEN is not set to prevent continued RAID
>>> use, errors on the last remaining disk will be ignored. Subsequent reads
>>> may return incorrect data. This seems like a more serious issue in my opinion.
>>
>> I agree that data inconsistency can certainly occur in this scenario.
>>
>> However, a RAID1 with only one remaining rdev can considered the same as a plain
>> disk. From that perspective, I do not believe it is the mandatory responsibility
>> of md raid to block subsequent writes nor prevent data inconsistency in this situation.
>>
>> The commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") that introduced
>> BROKEN for RAID1/10 also does not seem to have done so for that responsibility.
>>
>>>
>>> In scenarios with a large number of transient IO errors, is FailFast not a
>>> suitable configuration? As you mentioned: "retrying I/O on an rdev would
>>
>> It seems be right about that. Using FailFast with unstable underlayer is not good.
>> However, as md raid, which is issuer of FailFast bios,
>> I believe it is incorrect to shutdown the array due to the failure of a FailFast bio.
>
> Hi all
>
> I understand @Li Nan 's point now. The badblock can't be recorded in
> this situation and the last working device is not set to faulty. To be
> frank, I think consistency of data is more important. Users don't
> think it's a single disk, they must think raid1 should guarantee the
Hmm, I see...
> consistency. But the write request should return an error when calling
> raid1_error for the last working device, right? So there is no
> consistency problem?
>
> hi, Kenta. I have a question too. What will you do in your environment
> after the network connection works again? Add those disks one by one
> to do recovery?
Yes. We will have to add a new disk or remove and add the rdev marked as faulty.
Currently, the array is being recreated because it is mark as broken.
Thanks,
Akagi
>
> Best Regards
> Xiao
>
>>
>> Thanks,
>> Akagi
>>
>>> succeed".
>>>
>>> --
>>> Thanks,
>>> Nan
>>>
>>>
>>
>
>
Powered by blists - more mailing lists