lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250207023116.wx4i3n7ks3q2hfpu@jpoimboe>
Date: Thu, 6 Feb 2025 18:31:16 -0800
From: Josh Poimboeuf <jpoimboe@...nel.org>
To: Yafang Shao <laoar.shao@...il.com>
Cc: jikos@...nel.org, mbenes@...e.cz, pmladek@...e.com,
	joe.lawrence@...hat.com, live-patching@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode

On Mon, Jan 27, 2025 at 02:35:26PM +0800, Yafang Shao wrote:
> The atomic replace livepatch mechanism was introduced to handle scenarios
> where we want to unload a specific livepatch without unloading others.
> However, its current implementation has significant shortcomings, making
> it less than ideal in practice. Below are the key downsides:
> 
> - It is expensive
> 
>   During testing with frequent replacements of an old livepatch, random RCU
>   warnings were observed:
> 
>   [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
>   [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
>   [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
>   [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
>   [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
>   [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
>   [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
>   [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
>   [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
>   [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
>   [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
>   
>   This indicates that atomic replacement can cause performance issues,
>   particularly with RCU synchronization under frequent use.

Why does this happen?

> - Potential Risks During Replacement 
> 
>   One known issue involves replacing livepatched versions of critical
>   functions such as do_exit(). During the replacement process, a panic
>   might occur, as highlighted in [0]. Other potential risks may also arise
>   due to inconsistencies or race conditions during transitions.

That needs to be fixed.

> - Temporary Loss of Patching 
> 
>   During the replacement process, the old patch is set to a NOP (no-operation)
>   before the new patch is fully applied. This creates a window where the
>   function temporarily reverts to its original, unpatched state. If the old
>   patch fixed a critical issue (e.g., one that prevented a system panic), the
>   system could become vulnerable to that issue during the transition.

Are you saying that atomic replace is not atomic?  If so, this sounds
like another bug.

-- 
Josh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ