[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c8c5778822344161acec7101fa9f19726ca99d31.camel@linux.ibm.com>
Date: Mon, 08 Sep 2025 11:45:52 +1000
From: Andrew Donnellan <ajd@...ux.ibm.com>
To: Christophe Leroy <christophe.leroy@...roup.eu>,
"Ritesh Harjani (IBM)"
<riteshh@...ux.ibm.com>,
linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org,
maddy@...ux.ibm.com, mpe@...erman.id.au, peterz@...radead.org,
jpoimboe@...nel.org, jbaron@...mai.com
Cc: npiggin@...il.com, rostedt@...dmis.org, ardb@...nel.org,
Erhard Furtner
<erhard_f@...lbox.org>
Subject: Re: [PATCH RFC] powerpc: Panic on jump label code patching failure
On Sat, 2025-09-06 at 08:42 +0200, Christophe Leroy wrote:
> > arch_jump_label_transform() is mainly getting called from
> > __jump_level_update() and it's used for enabling or updating static keys /
> > branch.
> >
> > But static keys can also be used by drivers / module subsystem whose
> > initialization happens late. Although I understand that if the above
> > fails, it might fail much before, from the arch setup code itself, but
> > panic() still feels like a big hammer.
>
> But not being able to patch the kernel as required means that you get a
> kernel behaving differently from what is expected.
>
> Imagine a kernel running on a board that is controlling a saw. There is
> a patch_instruction() to activate the safety feature which detects when
> your hands are too close to the blade. Do you want the kernel to
> continue running seamlessly when that patch_instruction() fails ? I'm
> sure you don't !
This is my thinking exactly - a failed patch leaves the kernel in an abnormal
state, and we don't have the infrastructure to safely roll back any patches that
have already succeeded or other associated state changes, so this should be
treated as an unrecoverable error. The resulting kernel is a different kernel
from the one you expect to have.
The fact that drivers/modules can trigger this just means that drivers/modules
can permanently ruin your kernel too, which makes this more important not less,
I think?
>
> >
> > Would pr_err() print with WARN_ON_ONCE(1) would suffice in case of an
> > err?
>
> No, that's not enough, you can't rely on a kernel that will no behave as
> expected.
>
> >
> > Also you said you ran into a problem at just one call site where above
> > was silently failing. With the above change are you able to hit the
> > panic() now? Because from what I see in patch_instruction(), it mainly
> > will boil down to calling __patch_mem() which always returns 0.
>
> As far as I can see, __patch_mem() returns -EPERM when
> __put_kernel_nofault() fails:
>
> static int __patch_mem(void *exec_addr, unsigned long val, void
> *patch_addr, bool is_dword)
> {
> if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) {
> /* For big endian correctness: plain address would use the
> wrong half */
> u32 val32 = val;
>
> __put_kernel_nofault(patch_addr, &val32, u32, failed);
> } else {
> __put_kernel_nofault(patch_addr, &val, u64, failed);
> }
>
> asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr),
> "r" (exec_addr));
>
> return 0;
>
> failed:
> mb(); /* sync */
> return -EPERM;
> }
Yes, I can confirm that -EPERM from __patch_mem() is what I was seeing, and I
experimented with the assembly to confirm that it was triggered by the stw in
__put_kernel_nofault(). __put_kernel_nofault() uses the address of the failed:
label to create a handler in the exception table for when the store instruction
faults.
>
>
> > Although there are other places where there can be an error returned,
> > so I was wondering if that is what you were hitting or something else?
>
> Andrew was hitting the -EPERM because the memory area was read-only.
>
> Christophe
--
Andrew Donnellan OzLabs, ADL Canberra
ajd@...ux.ibm.com IBM Australia Limited
Powered by blists - more mailing lists