lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z5eOIQ4tDJr8N4UR@pathway.suse.cz>
Date: Mon, 27 Jan 2025 14:46:09 +0100
From: Petr Mladek <pmladek@...e.com>
To: Yafang Shao <laoar.shao@...il.com>
Cc: jpoimboe@...nel.org, jikos@...nel.org, mbenes@...e.cz,
	joe.lawrence@...hat.com, live-patching@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode

On Mon 2025-01-27 14:35:24, Yafang Shao wrote:
> The atomic replace livepatch mechanism was introduced to handle scenarios
> where we want to unload a specific livepatch without unloading others.
> However, its current implementation has significant shortcomings, making
> it less than ideal in practice. Below are the key downsides:
> 
> - It is expensive
> 
>   During testing with frequent replacements of an old livepatch, random RCU
>   warnings were observed:
> 
>   [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
>   [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
>   [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
>   [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
>   [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
>   [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
>   [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
>   [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
>   [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
>   [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
>   [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
>   
>   This indicates that atomic replacement can cause performance issues,
>   particularly with RCU synchronization under frequent use.

Please, provide more details about the test:

  + List of patched functions.

  + What exactly is meant by frequent replacements (busy loop?, once a minute?)

  + Size of the systems (number of CPUs, number of running processes)

  + Were there any extra changes in the livepatch code code,
    e.g. debugging messages?


> - Potential Risks During Replacement 
> 
>   One known issue involves replacing livepatched versions of critical
>   functions such as do_exit(). During the replacement process, a panic
>   might occur, as highlighted in [0].

I would rather prefer to make the atomic replace safe. I mean to
block the transition until all exiting processes are not gone.

Josh made a good point. We should look how this is handled by
RCU, tracing, or another subsystems which might have similar
problems.


> Other potential risks may also arise
>   due to inconsistencies or race conditions during transitions.

What inconsistencies and race conditions you have in mind, please?


> - Temporary Loss of Patching 
> 
>   During the replacement process, the old patch is set to a NOP (no-operation)
>   before the new patch is fully applied. This creates a window where the
>   function temporarily reverts to its original, unpatched state. If the old
>   patch fixed a critical issue (e.g., one that prevented a system panic), the
>   system could become vulnerable to that issue during the transition.

This is not true!

Please, look where klp_patch_object() and klp_unpatch_objects() is
called. Also look at how ops->func_stack is handled in
klp_ftrace_handler().

Also you might want to read Documentation/livepatch/livepatch.rst


> The current atomic replacement approach replaces all old livepatches,
> even when such a sweeping change is unnecessary. This can be improved
> by introducing a hybrid mode, which allows the coexistence of both
> atomic replace and non atomic replace livepatches.
> 
> In the hybrid mode:
> 
> - Specific livepatches can be marked as "non-replaceable" to ensure they
>   remain active and unaffected during replacements.
> 
> - Other livepatches can be marked as "replaceable", allowing targeted
>   replacements of only those patches.

Honestly, I consider this as a workaround for a problem which should
be fixed a proper way.

The main advantage of the atomic replace is simplify the maintenance
and debugging. It reduces the amount of possible combinations. The
hybrid mode brings back the jungle.

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ