linux-kernel - Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALOAHbBWKL5MJz3DB+y02oqOrxy5xa3WZwTg0JPpqeQsMSVXmA@mail.gmail.com>
Date: Wed, 5 Feb 2025 10:54:47 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: Petr Mladek <pmladek@...e.com>
Cc: jpoimboe@...nel.org, jikos@...nel.org, mbenes@...e.cz, 
	joe.lawrence@...hat.com, live-patching@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode

On Tue, Feb 4, 2025 at 9:21 PM Petr Mladek <pmladek@...e.com> wrote:
>
> On Mon 2025-01-27 23:34:50, Yafang Shao wrote:
> > On Mon, Jan 27, 2025 at 10:31 PM Petr Mladek <pmladek@...e.com> wrote:
> > >
> > > On Mon 2025-01-27 14:35:26, Yafang Shao wrote:
> > > > The atomic replace livepatch mechanism was introduced to handle scenarios
> > > > where we want to unload a specific livepatch without unloading others.
> > > > However, its current implementation has significant shortcomings, making
> > > > it less than ideal in practice. Below are the key downsides:
> > >
> > > [...]
> > >
> > > > In the hybrid mode:
> > > >
> > > > - Specific livepatches can be marked as "non-replaceable" to ensure they
> > > >   remain active and unaffected during replacements.
> > > >
> > > > - Other livepatches can be marked as "replaceable," allowing targeted
> > > >   replacements of only those patches.
> > > >
> > > > This selective approach would reduce unnecessary transitions, lower the
> > > > risk of temporary patch loss, and mitigate performance issues during
> > > > livepatch replacement.
> > > >
> > > > --- a/kernel/livepatch/core.c
> > > > +++ b/kernel/livepatch/core.c
> > > > @@ -658,6 +658,8 @@ static int klp_add_nops(struct klp_patch *patch)
> > > >               klp_for_each_object(old_patch, old_obj) {
> > > >                       int err;
> > > >
> > > > +                     if (!old_patch->replaceable)
> > > > +                             continue;
> > >
> > > This is one example where things might get very complicated.
> >
> > Why does this example even exist in the first place?
> > If hybrid mode is enabled, this scenario could have been avoided entirely.
>

You have many questions, but it seems you haven’t taken the time to read even
a single line of this patchset. I’ll try to address them to save you some time.

> How exactly this scenario could be avoided, please?
>
> In the real life, livepatches are used to fix many bugs.
> And every bug is fixed by livepatching several functions.
>
> Fix1 livepatches: funcA
> Fix2 livepatches: funcA, funcB
> Fix3 livepatches: funcB
>
> How would you handle this with the hybrid model?

In your scenario, each Fix will replace the applied livepatches, so
they must be set to replaceable.
To clarify the hybrid model, I'll illustrate it with the following Fixes:

Fix1 livepatches: funcA
Fix2  livepatches: funcC
Fix3 livepatches: funcA, funcB
Fix4  livepatches: funcD
Fix5 livepatches: funcB

Each FixN represents a single /sys/kernel/livepatch/FixN.
By default, all Fixes are set to non-replaceable.

Step-by-step process:
1. Loading Fix1
   Loaded Fixes: Fix1
2. Loading Fix2
   Loaded Fixes: Fix1 Fix2
3. Loading Fix3
    Before loading, set Fix1 to replaceable and replace it with Fix3,
which is a cumulative patch of Fix1 and Fix3.
    Loaded Fixes:  Fix2 Fix3
4. Loading Fix4
    Loaded Fixes:  Fix2 Fix3 Fix4
5. Loading Fix5
    Similar to Step 3, set Fix3 to replaceable and replace it with Fix5.
    Loaded Fixes:  Fix2 Fix4 Fix5

This hybrid model ensures that irrelevant Fixes remain unaffected
during replacements.

>
> Which fix will be handled by livepatches installed in parallel?
> Which fix will be handled by atomic replace?
> How would you keep it consistent?
>
> How would it work when there are hundreds of fixes and thousands
> of livepatched functions?

The key point is that if a new Fix modifies a function already changed
by previous Fixes, all the affected old Fixes should be set to
replaceable, merged into the new Fix, and then the old Fixes should be
replaced with the new one.

>
> Where exactly is the advantage of the hybrid model?

That can be seen as a form of "deferred replacement"—in other words,
only replacing the old Fixes when absolutely necessary. This approach
can significantly reduce unnecessary work.

>
> > >
> > > The same function might be livepatched by more livepatches, see
> > > ops->func_stack. For example, let's have funcA and three livepatches:
> > > a
> > >   + lp1:
> > >         .replace = false,
> > >         .non-replace = true,
> > >         .func = {
> > >                         .old_name = "funcA",
> > >                         .new_func = lp1_funcA,
> > >                 }, { }
> > >
> > >   + lp2:
> > >         .replace = false,
> > >         .non-replace = false,
> > >         .func = {
> > >                         .old_name = "funcA",
> > >                         .new_func = lp2_funcA,
> > >                 },{
> > >                         .old_name = "funcB",
> > >                         .new_func = lp2_funcB,
> > >                 }, { }
> > >
> > >
> > >   + lp3:
> > >         .replace = true,
> > >         .non-replace = false,
> > >         .func = {
> > >                         .old_name = "funcB",
> > >                         .new_func = lp3_funcB,
> > >                 }, { }
> > >
> > >
> > > Now, apply lp1:
> > >
> > >       + funcA() gets redirected to lp1_funcA()
> > >
> > > Then, apply lp2
> > >
> > >       + funcA() gets redirected to lp2_funcA()
> > >
> > > Finally, apply lp3:
> > >
> > >       + The proposed code would add "nop()" for
> > >         funcA() because it exists in lp2 and does not exist in lp3.
> > >
> > >       + funcA() will get redirected to the original code
> > >         because of the nop() during transition
> > >
> > >       + nop() will get removed in klp_complete_transition() and
> > >         funcA() will get suddenly redirected to lp1_funcA()
> > >         because it will still be on ops->func_stack even
> > >         after the "nop" and lp2_funcA() gets removed.
> > >
> > >            => The entire system will start using another funcA
> > >               implementation at some random time
> > >
> > >            => this would violate the consistency model
> > >
> > >
> > > The proper solution might be tricky:
> > >
> > > 1. We would need to detect this situation and do _not_ add
> > >    the "nop" for lp3 and funcA().
> > >
> > > 2. We would need a more complicate code for handling the task states.
> > >
> > >    klp_update_patch_state() sets task->patch_state using
> > >    the global "klp_target_state". But in the above example,
> > >    when enabling lp3:
> > >
> > >     + funcA would need to get transitioned _backward_:
> > >          KLP_TRANSITION_PATCHED -> KLP_TRANSITION_UNPATCHED
> > >       , so that it goes on ops->func_stack:
> > >          lp2_funcA -> lp1->funcA
> > >
> > >    while:
> > >
> > >     + funcA would need to get transitioned forward:
> > >          KLP_TRANSITION_UNPATCHED -> KLP_TRANSITION_PATCHED
> > >       , so that it goes on ops->func_stack:
> > >          lp2_funcB -> lp3->funcB
> > >
> > >
> > > => the hybrid mode would complicate the life for both livepatch
> > >    creators/maintainers and kernel code developers/maintainers.
> > >
> >
> > I don’t believe they should be held responsible for poor
> > configurations. These settings could have been avoided from the start.
> > There are countless tunable knobs in the system—should we try to
> > accommodate every possible combination? No, that’s not how systems are
> > designed to operate. Instead, we should identify and follow best
> > practices to ensure optimal functionality.
>
> I am sorry but I do not understand the above paragraph at all.
>
> Livepatches modify functions.
> How is it related to system configuration or tunable knobs?

/sys/kernel/livepatch/FixN/replaceable is a tunable knob.

+static struct kobj_attribute replaceable_kobj_attr = __ATTR_RW(replaceable);

> What best practices are you talking, please?

As mentioned earlier.

--
Regards
Yafang