[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1b4cc6d5-5f5b-4b39-8fdf-ac02c94cd5e2@csgroup.eu>
Date: Sat, 6 Sep 2025 08:42:07 +0200
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: "Ritesh Harjani (IBM)" <riteshh@...ux.ibm.com>,
Andrew Donnellan <ajd@...ux.ibm.com>, linuxppc-dev@...ts.ozlabs.org,
linux-kernel@...r.kernel.org, maddy@...ux.ibm.com, mpe@...erman.id.au,
peterz@...radead.org, jpoimboe@...nel.org, jbaron@...mai.com
Cc: npiggin@...il.com, rostedt@...dmis.org, ardb@...nel.org,
Erhard Furtner <erhard_f@...lbox.org>
Subject: Re: [PATCH RFC] powerpc: Panic on jump label code patching failure
Le 06/09/2025 à 05:52, Ritesh Harjani a écrit :
> [Vous ne recevez pas souvent de courriers de riteshh@...ux.ibm.com. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
>
> Andrew Donnellan <ajd@...ux.ibm.com> writes:
>
>> If patch_branch() or patch_instruction() fails while updating a jump
>> label, we presently fail silently, leading to unpredictable behaviour
>> later on.
>>
>> Change arch_jump_label_transform() to panic on a code patching failure,
>> matching the existing behaviour of arch_static_call_transform().
>>
>> Reported-by: Erhard Furtner <erhard_f@...lbox.org>
>> Signed-off-by: Andrew Donnellan <ajd@...ux.ibm.com>
>>
>> ---
>>
>> Ran into this while debugging an issue that Erhard reported to me about my
>> PAGE_TABLE_CHECK series on a G4, where updating a static key failed
>> silently, but only for one call site, leading to an incorrect reference
>> count later on. This looks to be due to the issue fixed in [0]. A loud
>> failure would have saved us all considerable debugging time.
>>
>> Should I change the return type of arch_jump_label_transform() and handle
>> this in an arch-independent way? Are there other users of code patching
>> in powerpc that ought to be hardened?
>>
>> Or is this excessive?
>>
>> [0] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/4b5e6eb281d7b1ea77619bee17095f905a125168.1757003584.git.christophe.leroy@csgroup.eu/
>> ---
>> arch/powerpc/kernel/jump_label.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/jump_label.c b/arch/powerpc/kernel/jump_label.c
>> index 2659e1ac8604..80d41ed7ac50 100644
>> --- a/arch/powerpc/kernel/jump_label.c
>> +++ b/arch/powerpc/kernel/jump_label.c
>> @@ -12,9 +12,14 @@ void arch_jump_label_transform(struct jump_entry *entry,
>> enum jump_label_type type)
>> {
>> u32 *addr = (u32 *)jump_entry_code(entry);
>> + int err;
>>
>> if (type == JUMP_LABEL_JMP)
>> - patch_branch(addr, jump_entry_target(entry), 0);
>> + err = patch_branch(addr, jump_entry_target(entry), 0);
>> else
>> - patch_instruction(addr, ppc_inst(PPC_RAW_NOP()));
>> + err = patch_instruction(addr, ppc_inst(PPC_RAW_NOP()));
>> +
>> + if (err)
>> + panic("%s: patching failed, err %d, type %d, addr %pS, target %pS\n",
>> + __func__, err, type, addr, (void *)jump_entry_target(entry));
>> }
>
> arch_jump_label_transform() is mainly getting called from
> __jump_level_update() and it's used for enabling or updating static keys / branch.
>
> But static keys can also be used by drivers / module subsystem whose
> initialization happens late. Although I understand that if the above
> fails, it might fail much before, from the arch setup code itself, but
> panic() still feels like a big hammer.
But not being able to patch the kernel as required means that you get a
kernel behaving differently from what is expected.
Imagine a kernel running on a board that is controlling a saw. There is
a patch_instruction() to activate the safety feature which detects when
your hands are too close to the blade. Do you want the kernel to
continue running seamlessly when that patch_instruction() fails ? I'm
sure you don't !
>
> Would pr_err() print with WARN_ON_ONCE(1) would suffice in case of an
> err?
No, that's not enough, you can't rely on a kernel that will no behave as
expected.
>
> Also you said you ran into a problem at just one call site where above
> was silently failing. With the above change are you able to hit the
> panic() now? Because from what I see in patch_instruction(), it mainly
> will boil down to calling __patch_mem() which always returns 0.
As far as I can see, __patch_mem() returns -EPERM when
__put_kernel_nofault() fails:
static int __patch_mem(void *exec_addr, unsigned long val, void
*patch_addr, bool is_dword)
{
if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) {
/* For big endian correctness: plain address would use the wrong half */
u32 val32 = val;
__put_kernel_nofault(patch_addr, &val32, u32, failed);
} else {
__put_kernel_nofault(patch_addr, &val, u64, failed);
}
asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr),
"r" (exec_addr));
return 0;
failed:
mb(); /* sync */
return -EPERM;
}
> Although there are other places where there can be an error returned,
> so I was wondering if that is what you were hitting or something else?
Andrew was hitting the -EPERM because the memory area was read-only.
Christophe
Powered by blists - more mailing lists