lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c8c5778822344161acec7101fa9f19726ca99d31.camel@linux.ibm.com>
Date: Mon, 08 Sep 2025 11:45:52 +1000
From: Andrew Donnellan <ajd@...ux.ibm.com>
To: Christophe Leroy <christophe.leroy@...roup.eu>,
        "Ritesh Harjani (IBM)"
	 <riteshh@...ux.ibm.com>,
        linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org,
        maddy@...ux.ibm.com, mpe@...erman.id.au, peterz@...radead.org,
        jpoimboe@...nel.org, jbaron@...mai.com
Cc: npiggin@...il.com, rostedt@...dmis.org, ardb@...nel.org,
        Erhard Furtner
	 <erhard_f@...lbox.org>
Subject: Re: [PATCH RFC] powerpc: Panic on jump label code patching failure

On Sat, 2025-09-06 at 08:42 +0200, Christophe Leroy wrote:
> > arch_jump_label_transform() is mainly getting called from
> > __jump_level_update() and it's used for enabling or updating static keys /
> > branch.
> > 
> > But static keys can also be used by drivers / module subsystem whose
> > initialization happens late. Although I understand that if the above
> > fails, it might fail much before, from the arch setup code itself, but
> > panic() still feels like a big hammer.
> 
> But not being able to patch the kernel as required means that you get a 
> kernel behaving differently from what is expected.
> 
> Imagine a kernel running on a board that is controlling a saw. There is 
> a patch_instruction() to activate the safety feature which detects when 
> your hands are too close to the blade. Do you want the kernel to 
> continue running seamlessly when that patch_instruction() fails ? I'm 
> sure you don't !

This is my thinking exactly - a failed patch leaves the kernel in an abnormal
state, and we don't have the infrastructure to safely roll back any patches that
have already succeeded or other associated state changes, so this should be
treated as an unrecoverable error. The resulting kernel is a different kernel
from the one you expect to have.

The fact that drivers/modules can trigger this just means that drivers/modules
can permanently ruin your kernel too, which makes this more important not less,
I think?

> 
> > 
> > Would pr_err() print with WARN_ON_ONCE(1) would suffice in case of an
> > err?
> 
> No, that's not enough, you can't rely on a kernel that will no behave as 
> expected.
> 
> > 
> > Also you said you ran into a problem at just one call site where above
> > was silently failing. With the above change are you able to hit the
> > panic() now? Because from what I see in patch_instruction(), it mainly
> > will boil down to calling __patch_mem() which always returns 0.
> 
> As far as I can see, __patch_mem() returns -EPERM when 
> __put_kernel_nofault() fails:
> 
> static int __patch_mem(void *exec_addr, unsigned long val, void 
> *patch_addr, bool is_dword)
> {
> 	if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) {
> 		/* For big endian correctness: plain address would use the
> wrong half */
> 		u32 val32 = val;
> 
> 		__put_kernel_nofault(patch_addr, &val32, u32, failed);
> 	} else {
> 		__put_kernel_nofault(patch_addr, &val, u64, failed);
> 	}
> 
> 	asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr),
> 							    "r" (exec_addr));
> 
> 	return 0;
> 
> failed:
> 	mb();  /* sync */
> 	return -EPERM;
> }

Yes, I can confirm that -EPERM from __patch_mem() is what I was seeing, and I
experimented with the assembly to confirm that it was triggered by the stw in
__put_kernel_nofault(). __put_kernel_nofault() uses the address of the failed:
label to create a handler in the exception table for when the store instruction
faults.

> 
> 
> > Although there are other places where there can be an error returned,
> > so I was wondering if that is what you were hitting or something else?
> 
> Andrew was hitting the -EPERM because the memory area was read-only.
> 
> Christophe

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@...ux.ibm.com   IBM Australia Limited

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ