linux-kernel - Re: [PATCH v1 18/19] Documentation/livepatch: Update documentation for state, callbacks, and shadow variables

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z8rmCritDCtNmw64@pathway.suse.cz>
Date: Fri, 7 Mar 2025 13:26:50 +0100
From: Petr Mladek <pmladek@...e.com>
To: Joe Lawrence <joe.lawrence@...hat.com>
Cc: Nicolai Stange <nstange@...e.de>, live-patching@...r.kernel.org,
	linux-kernel@...r.kernel.org, Josh Poimboeuf <jpoimboe@...nel.org>,
	Miroslav Benes <mbenes@...e.cz>
Subject: Re: [PATCH v1 18/19] Documentation/livepatch: Update documentation
 for state, callbacks, and shadow variables

On Thu 2025-03-06 17:54:41, Joe Lawrence wrote:
> On 1/15/25 03:24, Petr Mladek wrote:
> > This commit updates the livepatch documentation to reflect recent changes
> > in the behavior of states, callbacks, and shadow variables.
> > 
> > Key changes include:
> > 
> > - Per-state callbacks replace per-object callbacks, invoked only when a
> >   livepatch introduces or removes a state.
> > - Shadow variable lifetime is now tied to the corresponding livepatch
> >   state lifetime.
> > - The "version" field in `struct klp_state` has been replaced with the
> >   "block_disable" flag for improved compatibility handling.
> > - The "data" field has been removed from `struct klp_state`; shadow
> >   variables are now the recommended way to store state-related data.
> > 
> > This update ensures the documentation accurately describes the current
> > livepatch functionality.
> > 
> > Signed-off-by: Petr Mladek <pmladek@...e.com>
> 
> Hi Petr, I'm finally getting around to looking through these
> patches.

Great, thanks a lot for jumping on it.

> I've made it as far as the selftest cleanups, then skipped ahead to the
> Documentation here.  Before diving into code review, I just wanted to
> clarify some things for my own understanding.  Please correct anything
> below that is incorrect.  IMHO it is easy to step off the intended path
> here :D
> 
> The original livepatch system states operated with a numeric .version
> field.  New livepatches could only be loaded if providing a new set of
> states, or an equal-or-better version of states already loaded by
> existing livepatches.
>
> With that in mind, a livepatch state could be thought of as an
> indication of "a context needing special handling in a (versioned) way".

I am not sure about the word "context". But it might be because I am
not a native speaker.

In my view, a state represents a side effect, made by loading
the livepatch module (enabling livepatch), which would need a special
handling when the livepatch module gets disabled or replaced.

There are currently two types of these side effects:

  + Changes made by callbacks which have to get reverted when
    the livepatch gets disabled.

  + Shadow variables which have to be freed when the livepatch
    gets disabled.

The states API was added to handle compatibility between more
livepatches. I primary had the atomic replace mode in mind.

Changes made by callbacks, and shadow variables added, by
the current livepatch usually should get preserved during
the atomic replace.

>  Livepatches claiming to deal with a particular state, needed to do so
> with its latest or current versioning.

The original approach was affected by the behavior of per-object callbacks
during the atomic replace. Note that:

	Only per-object callbacks from the _new_ livepatch were
	called during the atomic replace.

As a result, previously, new livepatch, using the atomic replace, had to be
able to revert changes made by the older livepatches when it did
not want to keep them.

This shows nicely that the original semantic was broken! There was
no obvious way how to revert obsolete states using the atomic
replace.

The new livepatch needed to provide callbacks which were able to
revert the obsoleted state. A workaround was to define it as
the same state with a higher version. The same state and
higher version made the livepatch compatible. The higher
version signalized that it was a new variant (a revert) so that
newer livepatches did not do the revert again...


> A livepatch without a particular
> state was not bound to any restriction on that state, so nothing
> prevented subsequent atomic replace patches from blowing away existing
> states (those patches cleaned up their states on their disabling),

I am confused here. Is the talking about the original or new
semantic?

In the original semantic, only callbacks from the new livepatch
were called during the atomic replace. Livepatches were
compatible only when the new livepatch supported all
existing states.

In the new semantic, the callbacks from the obsolete livepatch
are called for obsolete states. The new livepatch does not need
to know about the state. By other words, the replaced livepatch
can clean up its own mess.

> subsequent non-atomic replace patches from adding to the collective
> livepatch state.

Honestly, I do not think much about the non-atomic livepatches.
It would require input from people who use this approach.
It looks like a wild word to me ;-)

I allowed to use the same states for more non-atomic livepatches
because it might make sense to share shadow variables. Also more
livepatches might depend on the same change made by callbacks
and it need not matter which one is installed first.


> This patchset does away with .version and adds .block_disable.  This is
> very different from versioning as prevents the associated state from
> ever going away.  In an atomic-replace series of livepatches, this means
> once a state is introduced, all following patches must contain that
> state.  In non-atomic-replace series, there is no restriction on
> subsequent patches, but the original patch introducing the state cannot
> ever be disabled/unloaded.  (I'm not going to consider future hybrid
> mixed atomic/not use cases as my brain is already full.)

Yes, this describes the old behavior very well. And the impossibility
to remove existing states using the atomic replace was one of the problems.

The API solves this elegantly because it calls callbacks from
the replaced livepatch for the obsolete states. The livepatch
needed to implement these callbacks anyway to support the disable
livepatch operation.

And there is still the option to do not implement the reverse
operation when it is not easy or safe. The author could set
the new .block_disable flag. It blocks disabling the state.
Which blocks disabling the livepatch or replacing it with
a livepatch which does not support, aka is not compatible with,
the state.


> Finally, the patchset adds .is_shadow and .callbacks.  A short sequence
> of livepatches may look like:
> 
>   klp_patch A               |  klp_patch B
>     .states[x]              |    .states[y]
>       .id            = 42   |      .id            = 42
>       .callbacks            |      .callbacks
>       .block_disable        |      .block_disable
>       .is_shadow            |      .is_shadow
> 
> is there any harm or confusion if the two patches' state 42 contained
> disparate .callbacks, .block_disable, or .is_shadow contents?

Yes, two incompatible states with the same .id would break things.
The callbacks won't be called and the old shadow variables
won't get freed during an atomic replace.

It is responsibility of the author of the livepatches to use
different .id for different states.

I am not sure if we could prevent mistakes. Hmm, we might add
a check that every .id is there only once in the patch.states[] array.
Also we could add a human readable .name of the state and ensure
that it is the same. Or something like this.


> I /think/ this is allowed by the patchset (though I haven't gotten too
> deep in my understanding), but I feel that I'm starting to stretch my
> intuitive understanding of these livepatching states.  Applying them to
> a series of atomic-replace livepatches is fairly straightforward.  But
> then for a non-atomic-replace series, it starts getting weird as
> multiple callback sets will exist in multiple patches.

Yeah, as I said. The non-atomic-replace world is a kind of jungle.
It would require some real life users which might define some
sane rules.

> In a perfect world, we could describe livepatching states absent
> callbacks and shadow variables.  The documentation is a bit circular as
> one needs to understand them before fully grasping the purpose of the
> states.  But to use them, you will first need to understand how to set
> them up in the states. :)  Maybe there is no better way, but first I
> need to correct my understanding.

Yeah, the documentation would deserve some bigger refactoring. We created
separate file for each feature but it is not easy to get the full picture.

I hope that my answer helped a bit to understand the change.
Feel free to ask.

Best Regards,
Petr