linux-kernel - Re: [PATCH RFC] powerpc: Panic on jump label code patching failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c8c5778822344161acec7101fa9f19726ca99d31.camel@linux.ibm.com>
Date: Mon, 08 Sep 2025 11:45:52 +1000
From: Andrew Donnellan <ajd@...ux.ibm.com>
To: Christophe Leroy <christophe.leroy@...roup.eu>,
        "Ritesh Harjani (IBM)"
	 <riteshh@...ux.ibm.com>,
        linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org,
        maddy@...ux.ibm.com, mpe@...erman.id.au, peterz@...radead.org,
        jpoimboe@...nel.org, jbaron@...mai.com
Cc: npiggin@...il.com, rostedt@...dmis.org, ardb@...nel.org,
        Erhard Furtner
	 <erhard_f@...lbox.org>
Subject: Re: [PATCH RFC] powerpc: Panic on jump label code patching failure

On Sat, 2025-09-06 at 08:42 +0200, Christophe Leroy wrote:
> > arch_jump_label_transform() is mainly getting called from
> > __jump_level_update() and it's used for enabling or updating static keys /
> > branch.
> > 
> > But static keys can also be used by drivers / module subsystem whose
> > initialization happens late. Although I understand that if the above
> > fails, it might fail much before, from the arch setup code itself, but
> > panic() still feels like a big hammer.
> 
> But not being able to patch the kernel as required means that you get a 
> kernel behaving differently from what is expected.
> 
> Imagine a kernel running on a board that is controlling a saw. There is 
> a patch_instruction() to activate the safety feature which detects when 
> your hands are too close to the blade. Do you want the kernel to 
> continue running seamlessly when that patch_instruction() fails ? I'm 
> sure you don't !

This is my thinking exactly - a failed patch leaves the kernel in an abnormal
state, and we don't have the infrastructure to safely roll back any patches that
have already succeeded or other associated state changes, so this should be
treated as an unrecoverable error. The resulting kernel is a different kernel
from the one you expect to have.

The fact that drivers/modules can trigger this just means that drivers/modules
can permanently ruin your kernel too, which makes this more important not less,
I think?

> 
> > 
> > Would pr_err() print with WARN_ON_ONCE(1) would suffice in case of an
> > err?
> 
> No, that's not enough, you can't rely on a kernel that will no behave as 
> expected.
> 
> > 
> > Also you said you ran into a problem at just one call site where above
> > was silently failing. With the above change are you able to hit the
> > panic() now? Because from what I see in patch_instruction(), it mainly
> > will boil down to calling __patch_mem() which always returns 0.
> 
> As far as I can see, __patch_mem() returns -EPERM when 
> __put_kernel_nofault() fails:
> 
> static int __patch_mem(void *exec_addr, unsigned long val, void 
> *patch_addr, bool is_dword)
> {
> 	if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) {
> 		/* For big endian correctness: plain address would use the
> wrong half */
> 		u32 val32 = val;
> 
> 		__put_kernel_nofault(patch_addr, &val32, u32, failed);
> 	} else {
> 		__put_kernel_nofault(patch_addr, &val, u64, failed);
> 	}
> 
> 	asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr),
> 							    "r" (exec_addr));
> 
> 	return 0;
> 
> failed:
> 	mb();  /* sync */
> 	return -EPERM;
> }

Yes, I can confirm that -EPERM from __patch_mem() is what I was seeing, and I
experimented with the assembly to confirm that it was triggered by the stw in
__put_kernel_nofault(). __put_kernel_nofault() uses the address of the failed:
label to create a handler in the exception table for when the store instruction
faults.

> 
> 
> > Although there are other places where there can be an error returned,
> > so I was wondering if that is what you were hitting or something else?
> 
> Andrew was hitting the -EPERM because the memory area was read-only.
> 
> Christophe

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@...ux.ibm.com   IBM Australia Limited