lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALOAHbB8j6RrpJAyRkzPx2U6YhjWEipRspoQQ_7cvQ+M0zgdXg@mail.gmail.com>
Date: Fri, 7 Feb 2025 11:16:45 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: Josh Poimboeuf <jpoimboe@...nel.org>
Cc: jikos@...nel.org, mbenes@...e.cz, pmladek@...e.com, 
	joe.lawrence@...hat.com, live-patching@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode

On Fri, Feb 7, 2025 at 10:31 AM Josh Poimboeuf <jpoimboe@...nel.org> wrote:
>
> On Mon, Jan 27, 2025 at 02:35:26PM +0800, Yafang Shao wrote:
> > The atomic replace livepatch mechanism was introduced to handle scenarios
> > where we want to unload a specific livepatch without unloading others.
> > However, its current implementation has significant shortcomings, making
> > it less than ideal in practice. Below are the key downsides:
> >
> > - It is expensive
> >
> >   During testing with frequent replacements of an old livepatch, random RCU
> >   warnings were observed:
> >
> >   [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
> >   [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
> >   [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
> >   [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
> >   [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
> >   [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
> >   [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
> >   [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
> >   [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
> >   [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
> >   [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
> >
> >   This indicates that atomic replacement can cause performance issues,
> >   particularly with RCU synchronization under frequent use.
>
> Why does this happen?

It occurs during the KLP transition. It seems like the KLP transition
is taking too long.

[20329703.332453] livepatch: enabling patch 'livepatch_61_release6'
[20329703.340417] livepatch: 'livepatch_61_release6': starting
patching transition
[20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is
10166 jiffies old.
[20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is
10219 jiffies old.
[20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is
10199 jiffies old.
[20329754.848036] livepatch: 'livepatch_61_release6': patching complete

>
> > - Potential Risks During Replacement
> >
> >   One known issue involves replacing livepatched versions of critical
> >   functions such as do_exit(). During the replacement process, a panic
> >   might occur, as highlighted in [0]. Other potential risks may also arise
> >   due to inconsistencies or race conditions during transitions.
>
> That needs to be fixed.
>
> > - Temporary Loss of Patching
> >
> >   During the replacement process, the old patch is set to a NOP (no-operation)
> >   before the new patch is fully applied. This creates a window where the
> >   function temporarily reverts to its original, unpatched state. If the old
> >   patch fixed a critical issue (e.g., one that prevented a system panic), the
> >   system could become vulnerable to that issue during the transition.
>
> Are you saying that atomic replace is not atomic?  If so, this sounds
> like another bug.

>From my understanding, there’s a window where the original function is
not patched.

klp_enable_patch
+ klp_init_patch
   + if (patch->replace)
          klp_add_nops(patch);  <<<< set all old patches to nop

+ __klp_enable_patch
   + klp_patch_object
      + klp_patch_func
         + ops = klp_find_ops(func->old_func);
            + if (ops)
                   // add the new patch to the func_stack list
                   list_add_rcu(&func->stack_node, &ops->func_stack);


klp_ftrace_handler
+ func = list_first_or_null_rcu(&ops->func_stack, struct klp_func
+ if (func->nop)
       goto unlock;
+ ftrace_regs_set_instruction_pointer(fregs, (unsigned long)func->new_func);

Before the new atomic replace patch is added to the func_stack list,
the old patch is already set to nop. If klp_ftrace_handler() is
triggered at this point, it will effectively do nothing—in other
words, it will execute the original function.
I might be wrong.

--
Regards
Yafang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ