[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALTww2_MHcXCOjeOPha0+LHNiu8O_9P4jVYP=K5-ea951omfMw@mail.gmail.com>
Date: Thu, 6 Nov 2025 21:15:20 +0800
From: Xiao Ni <xni@...hat.com>
To: yukuai@...as.com
Cc: Li Nan <linan666@...weicloud.com>, corbet@....net, song@...nel.org, hare@...e.de,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-raid@...r.kernel.org, yangerkun@...wei.com, yi.zhang@...wei.com
Subject: Re: [PATCH v9 4/5] md: add check_new_feature module parameter
On Thu, Nov 6, 2025 at 8:49 PM Yu Kuai <yukuai@...as.com> wrote:
>
> Hi,
>
> 在 2025/11/6 20:35, Xiao Ni 写道:
> > On Thu, Nov 6, 2025 at 11:45 AM Yu Kuai <yukuai@...as.com> wrote:
> >> Hi,
> >>
> >> 在 2025/11/4 15:17, Xiao Ni 写道:
> >>> On Tue, Nov 4, 2025 at 10:52 AM Li Nan <linan666@...weicloud.com> wrote:
> >>>>
> >>>> 在 2025/11/4 9:47, Xiao Ni 写道:
> >>>>> On Mon, Nov 3, 2025 at 9:06 PM <linan666@...weicloud.com> wrote:
> >>>>>> From: Li Nan <linan122@...wei.com>
> >>>>>>
> >>>>>> Raid checks if pad3 is zero when loading superblock from disk. Arrays
> >>>>>> created with new features may fail to assemble on old kernels as pad3
> >>>>>> is used.
> >>>>>>
> >>>>>> Add module parameter check_new_feature to bypass this check.
> >>>>>>
> >>>>>> Signed-off-by: Li Nan <linan122@...wei.com>
> >>>>>> ---
> >>>>>> drivers/md/md.c | 12 +++++++++---
> >>>>>> 1 file changed, 9 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >>>>>> index dffc6a482181..5921fb245bfa 100644
> >>>>>> --- a/drivers/md/md.c
> >>>>>> +++ b/drivers/md/md.c
> >>>>>> @@ -339,6 +339,7 @@ static int start_readonly;
> >>>>>> */
> >>>>>> static bool create_on_open = true;
> >>>>>> static bool legacy_async_del_gendisk = true;
> >>>>>> +static bool check_new_feature = true;
> >>>>>>
> >>>>>> /*
> >>>>>> * We have a system wide 'event count' that is incremented
> >>>>>> @@ -1850,9 +1851,13 @@ static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_
> >>>>>> }
> >>>>>> if (sb->pad0 ||
> >>>>>> sb->pad3[0] ||
> >>>>>> - memcmp(sb->pad3, sb->pad3+1, sizeof(sb->pad3) - sizeof(sb->pad3[1])))
> >>>>>> - /* Some padding is non-zero, might be a new feature */
> >>>>>> - return -EINVAL;
> >>>>>> + memcmp(sb->pad3, sb->pad3+1, sizeof(sb->pad3) - sizeof(sb->pad3[1]))) {
> >>>>>> + pr_warn("Some padding is non-zero on %pg, might be a new feature\n",
> >>>>>> + rdev->bdev);
> >>>>>> + if (check_new_feature)
> >>>>>> + return -EINVAL;
> >>>>>> + pr_warn("check_new_feature is disabled, data corruption possible\n");
> >>>>>> + }
> >>>>>>
> >>>>>> rdev->preferred_minor = 0xffff;
> >>>>>> rdev->data_offset = le64_to_cpu(sb->data_offset);
> >>>>>> @@ -10704,6 +10709,7 @@ module_param(start_dirty_degraded, int, S_IRUGO|S_IWUSR);
> >>>>>> module_param_call(new_array, add_named_array, NULL, NULL, S_IWUSR);
> >>>>>> module_param(create_on_open, bool, S_IRUSR|S_IWUSR);
> >>>>>> module_param(legacy_async_del_gendisk, bool, 0600);
> >>>>>> +module_param(check_new_feature, bool, 0600);
> >>>>>>
> >>>>>> MODULE_LICENSE("GPL");
> >>>>>> MODULE_DESCRIPTION("MD RAID framework");
> >>>>>> --
> >>>>>> 2.39.2
> >>>>>>
> >>>>> Hi
> >>>>>
> >>>>> Thanks for finding this problem in time. The default of this kernel
> >>>>> module is true. I don't think people can check new kernel modules
> >>>>> after updating to a new kernel. They will find the array can't
> >>>>> assemble and report bugs. You already use pad3, is it good to remove
> >>>>> the check about pad3 directly here?
> >>>>>
> >>>>> By the way, have you run the regression tests?
> >>>>>
> >>>>> Regards
> >>>>> Xiao
> >>>>>
> >>>>>
> >>>>> .
> >>>> Hi Xiao.
> >>>>
> >>>> Thanks for your review.
> >>>>
> >>>> Deleting this check directly is risky. For example, in configurable LBS:
> >>>> if user sets LBS to 4K, the LBS of a RAID array assembled on old kernel
> >>>> becomes 512. Forcing use of this array then risks data loss -- the
> >>>> original issue this feature want to solve.
> >>> You're right, we can't delete the check.
> >>> For the old kernel, the array which has specified logical size can't
> >>> be assembled. This patch still can't fix this problem, because it is
> >>> an old kernel and this patch is for a new kernel, right?
> >>> For existing arrays, they don't have such problems. They can be
> >>> assembled after updating to a new kernel.
> >>> So, do we need this patch?
> >> There is a use case for us that user may create the array with old kernel, and
> >> then if something bad happened in the system(may not be related to the array),
> >> user may update to mainline releases and later switch back to our release. We
> >> want a solution that user can still use the array in this case.
> > Hi all
> >
> > Let me check if I understand right:
> > 1. a machine with an old kernel has problems
> > 2. update to new kernel which has new feature
> > 3. create an array with new kernel
> > 4. switch back to the old kernel, so assemble fails because sb->pad3
> > is used and not zero.
> >
> > The old kernel is right to do so. This should be expected, right?
>
> Not quite what I mean, for example
> 1. old kernel create an array md0;
> 2. something bad happened(not related to md0), for example, file system from other device crashed, or another array can't assembled;
> 3. user might update to new kernel and try to copy data, however, md0 will be assembled and sb->pad3 will be set;
> 4. user switch back to old kernel, the md0 assemble failed and can't not be used in old kernel anymore.
In patch05, the commit says this:
Future mdadm should support setting LBS via metadata field during RAID
creation and the new sysfs. Though the kernel allows runtime LBS changes,
users should avoid modifying it after creating partitions or filesystems
to prevent compatibility issues.
So it only can specify logical block size when creating an array. In
the case you mentioned above, in step3, the array will be assembled in
new kernel and the sb->pad3 will not be set, right?
Regards
Xiao
>
> >
> >>>> Future features may also have similar risks, so instead of deleting this
> >>>> check directly, I chose to add a module parameter to give users a choice.
> >>>> What do you think?
> >>> Maybe we can add a feature bit to avoid the kernel parameter. This
> >>> feature bit can be set when specifying logical block size.
> >> The situation still stand, for unknown feature bit, we'd better to forbid
> >> assembling the array to prevent data loss by default.
> > If I understand correctly, the old kernel already refuses to assemble it.
>
> The problem is that if array is created from old kernel, and user still
> want to use it in the old kernel, then the user can't assemble this array
> in new kernel. However, this is real use case for us :(
>
> > Regards
> > Xiao
> >
> >> Thanks,
> >> Kuai
> >>
> >>> Regards
> >>> Xiao
> >>>> --
> >>>> Thanks,
> >>>> Nan
> >>>>
>
Powered by blists - more mailing lists