[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z5eOIQ4tDJr8N4UR@pathway.suse.cz>
Date: Mon, 27 Jan 2025 14:46:09 +0100
From: Petr Mladek <pmladek@...e.com>
To: Yafang Shao <laoar.shao@...il.com>
Cc: jpoimboe@...nel.org, jikos@...nel.org, mbenes@...e.cz,
joe.lawrence@...hat.com, live-patching@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
On Mon 2025-01-27 14:35:24, Yafang Shao wrote:
> The atomic replace livepatch mechanism was introduced to handle scenarios
> where we want to unload a specific livepatch without unloading others.
> However, its current implementation has significant shortcomings, making
> it less than ideal in practice. Below are the key downsides:
>
> - It is expensive
>
> During testing with frequent replacements of an old livepatch, random RCU
> warnings were observed:
>
> [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
> [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
> [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
> [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
> [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
> [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
> [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
> [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
> [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
> [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
> [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
>
> This indicates that atomic replacement can cause performance issues,
> particularly with RCU synchronization under frequent use.
Please, provide more details about the test:
+ List of patched functions.
+ What exactly is meant by frequent replacements (busy loop?, once a minute?)
+ Size of the systems (number of CPUs, number of running processes)
+ Were there any extra changes in the livepatch code code,
e.g. debugging messages?
> - Potential Risks During Replacement
>
> One known issue involves replacing livepatched versions of critical
> functions such as do_exit(). During the replacement process, a panic
> might occur, as highlighted in [0].
I would rather prefer to make the atomic replace safe. I mean to
block the transition until all exiting processes are not gone.
Josh made a good point. We should look how this is handled by
RCU, tracing, or another subsystems which might have similar
problems.
> Other potential risks may also arise
> due to inconsistencies or race conditions during transitions.
What inconsistencies and race conditions you have in mind, please?
> - Temporary Loss of Patching
>
> During the replacement process, the old patch is set to a NOP (no-operation)
> before the new patch is fully applied. This creates a window where the
> function temporarily reverts to its original, unpatched state. If the old
> patch fixed a critical issue (e.g., one that prevented a system panic), the
> system could become vulnerable to that issue during the transition.
This is not true!
Please, look where klp_patch_object() and klp_unpatch_objects() is
called. Also look at how ops->func_stack is handled in
klp_ftrace_handler().
Also you might want to read Documentation/livepatch/livepatch.rst
> The current atomic replacement approach replaces all old livepatches,
> even when such a sweeping change is unnecessary. This can be improved
> by introducing a hybrid mode, which allows the coexistence of both
> atomic replace and non atomic replace livepatches.
>
> In the hybrid mode:
>
> - Specific livepatches can be marked as "non-replaceable" to ensure they
> remain active and unaffected during replacements.
>
> - Other livepatches can be marked as "replaceable", allowing targeted
> replacements of only those patches.
Honestly, I consider this as a workaround for a problem which should
be fixed a proper way.
The main advantage of the atomic replace is simplify the maintenance
and debugging. It reduces the amount of possible combinations. The
hybrid mode brings back the jungle.
Best Regards,
Petr
Powered by blists - more mailing lists