[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALOAHbCJRce9-VYNgUO5szU4kMSktXyvkY9+ZFX_kyVXeoQ1ig@mail.gmail.com>
Date: Thu, 6 Feb 2025 10:35:11 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: Petr Mladek <pmladek@...e.com>
Cc: jpoimboe@...nel.org, jikos@...nel.org, mbenes@...e.cz,
joe.lawrence@...hat.com, live-patching@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
On Thu, Feb 6, 2025 at 12:03 AM Petr Mladek <pmladek@...e.com> wrote:
>
> On Wed 2025-02-05 10:54:47, Yafang Shao wrote:
> > On Tue, Feb 4, 2025 at 9:21 PM Petr Mladek <pmladek@...e.com> wrote:
> > >
> > > On Mon 2025-01-27 23:34:50, Yafang Shao wrote:
> > > > On Mon, Jan 27, 2025 at 10:31 PM Petr Mladek <pmladek@...e.com> wrote:
> > > > >
> > > > > On Mon 2025-01-27 14:35:26, Yafang Shao wrote:
> > > > > > The atomic replace livepatch mechanism was introduced to handle scenarios
> > > > > > where we want to unload a specific livepatch without unloading others.
> > > > > > However, its current implementation has significant shortcomings, making
> > > > > > it less than ideal in practice. Below are the key downsides:
> > > > >
> > > > > [...]
> > > > >
> > > > > > In the hybrid mode:
> > > > > >
> > > > > > - Specific livepatches can be marked as "non-replaceable" to ensure they
> > > > > > remain active and unaffected during replacements.
> > > > > >
> > > > > > - Other livepatches can be marked as "replaceable," allowing targeted
> > > > > > replacements of only those patches.
> > > > > >
> > > > > > This selective approach would reduce unnecessary transitions, lower the
> > > > > > risk of temporary patch loss, and mitigate performance issues during
> > > > > > livepatch replacement.
> > > > > >
> > > > > > --- a/kernel/livepatch/core.c
> > > > > > +++ b/kernel/livepatch/core.c
> > > > > > @@ -658,6 +658,8 @@ static int klp_add_nops(struct klp_patch *patch)
> > > > > > klp_for_each_object(old_patch, old_obj) {
> > > > > > int err;
> > > > > >
> > > > > > + if (!old_patch->replaceable)
> > > > > > + continue;
> > > > >
> > > > > This is one example where things might get very complicated.
> > > >
> > > > Why does this example even exist in the first place?
> > > > If hybrid mode is enabled, this scenario could have been avoided entirely.
> > >
> >
> > You have many questions, but it seems you haven’t taken the time to read even
> > a single line of this patchset. I’ll try to address them to save you some time.
>
> What?
Apologies if my words offended you.
>
> > > How exactly this scenario could be avoided, please?
> > >
> > > In the real life, livepatches are used to fix many bugs.
> > > And every bug is fixed by livepatching several functions.
> > >
> > > Fix1 livepatches: funcA
> > > Fix2 livepatches: funcA, funcB
> > > Fix3 livepatches: funcB
> > >
> > > How would you handle this with the hybrid model?
> >
> > In your scenario, each Fix will replace the applied livepatches, so
> > they must be set to replaceable.
> > To clarify the hybrid model, I'll illustrate it with the following Fixes:
> >
> > Fix1 livepatches: funcA
> > Fix2 livepatches: funcC
> > Fix3 livepatches: funcA, funcB
>
> How does look the livepatched FuncA here?
> Does it contain changes only for Fix3?
> Or is it cummulative livepatches_funA includes both Fix1 + Fix3?
It is cumulative livepatches_funA includes both Fix1 + Fix3.
>
> > Fix4 livepatches: funcD
> > Fix5 livepatches: funcB
> >
> > Each FixN represents a single /sys/kernel/livepatch/FixN.
> > By default, all Fixes are set to non-replaceable.
> >
> > Step-by-step process:
> > 1. Loading Fix1
> > Loaded Fixes: Fix1
> > 2. Loading Fix2
> > Loaded Fixes: Fix1 Fix2
> > 3. Loading Fix3
> > Before loading, set Fix1 to replaceable and replace it with Fix3,
> > which is a cumulative patch of Fix1 and Fix3.
>
> Who will modify the "replaceable" flag of "Fix1"? Userspace or kernel?
Userspace scripts. Modify it before loading a new one.
>
> How the livepatch modules would get installed/removed to/from the
> filesystem?
It doesn't make any difference.
>
> We (SUSE) distribute the livepatches using RPM packages. Every new
> version of the livepatch module is packaged in a RPM package with
> the same name and higher version. A post install script loads
> the module into the kernel and removes disabled livepatch modules.
Similar to us.
>
> The package update causes that the new version of the livepatch module
> is enabled and the old version is removed. And also the old version
> of the module and is removed from the filesystem together with the old
> version of the RPM package.
>
> This matches the behavior of the atomic replace. There is always only
> one version of the livepatch RPM package installed and only one
> livepatch module loaded/enabled. And when it works correcly then
> the version of the installed package matches the version of the loaded
> livepatch module.
>
> This might be hard to achieve with the hybrid model. Every livepatch
> module would need to be packaged in a separate (different name)
> RPM package.
Before switching to atomic replace, we packaged all the livepatch
modules into a single RPM. The RPM installation handled them quite
well.
> And some userspace service would need to clean up both
> kernel modules and livepatch RPM packages (remove the unused ones).
>
> This might add a lot of extra complexity.
It's not that complex—just like what we did before atomic replacement
was enabled.
>
> > Loaded Fixes: Fix2 Fix3
> > 4. Loading Fix4
> > Loaded Fixes: Fix2 Fix3 Fix4
> > 5. Loading Fix5
> > Similar to Step 3, set Fix3 to replaceable and replace it with Fix5.
> > Loaded Fixes: Fix2 Fix4 Fix5
>
> Let's imagine another situation:
>
> Fix1 livepatches: funcA, funcB
> Fix2 livepatches: funcB, funcC
> Fix3 livepatches: funcA, funcC
>
> Variant A:
>
> 1. Loading Fix1
> Loaded Fixes: Fix1
> In use:: funcA_fix1, funcB_fix1
>
> 2. Loading Fix2
> Loaded Fixes: Fix1 Fix2
> In use: funcA_fix1, funcB_fix2, funcC_fix2
>
> 3. Loading Fix3
> Loaded Fixes: Fix2 Fix3
> In use: funcA_fix3, funcB_fix2, funcC_fix3
>
> This will be correct only when:
>
> + funcA_fix3 contains both changes from Fix1 and Fix3
> + funcC_fix3 contains both changes from Fix2 and Fix3
>
>
> Variant B:
>
> 1. Loading Fix1
> Loaded Fixes: Fix1
> In use:: funcA_fix1, funcB_fix1
>
> 2. Loading Fix2 (fails from some reason or is skipped)
> Loaded Fixes: Fix1
> In use:: funcA_fix1, funcB_fix1
>
> 3. Loading Fix3
> Loaded Fixes: Fix1 Fix2
> In use: funcA_fix3, funcB_fix1, funcC_fix3
>
> This will be correct only when:
>
> + funcA_fix3 contains both changes from Fix1 and Fix3
> and stays funcB_fix1 is compatible with funcA_fix3
> + funcC_fix3 contains changes only from Fix3,
> it must be compatible with the original funcB because
>
> I want to say that this would work only when all livepatches
> are loaded in the right order. Otherwise, it might break
> the system.
>
> How do you want to ensure this?
As we discussed earlier, if there’s an overlap between two
livepatches, we merge them into a cumulative patch and replace the old
one.
>
> Is it really that easy?
>
> > 3. Loading Fix3
> > Before loading, set Fix1 to replaceable and replace it with Fix3,
>
> > This hybrid model ensures that irrelevant Fixes remain unaffected
> > during replacements.
>
> It makes some some now. But IMHO, it is not as easy as it looks.
> The complexity is in details.
>
> > >
> > > Which fix will be handled by livepatches installed in parallel?
> > > Which fix will be handled by atomic replace?
> > > How would you keep it consistent?
> > >
> > > How would it work when there are hundreds of fixes and thousands
> > > of livepatched functions?
> >
> > The key point is that if a new Fix modifies a function already changed
> > by previous Fixes, all the affected old Fixes should be set to
> > replaceable, merged into the new Fix, and then the old Fixes should be
> > replaced with the new one.
>
> As I tried to explain above. This might be hard to use in practice.
>
> We would either need to enforce loading all livepatches in the right
> order. It might be hard to make it user friendly.
>
> Or it would need a lot of extra code which would ensure that only
> compatible livepatches can be loaded.
That’s related to the configuration. If you decide to use it, you
should familiarize yourself with the best practices. Once you
understand those, it becomes much simpler and less complex.
BTW, based on my experience, the likelihood of overlapping between two
livepatches is very low in real production environments. I haven’t
encountered that case so far in our production servers.
--
Regards
Yafang
Powered by blists - more mailing lists