[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aRR68v7oABi_72zo@J2N7QTR9R3>
Date: Wed, 12 Nov 2025 12:17:54 +0000
From: Mark Rutland <mark.rutland@....com>
To: Khaja Hussain Shaik Khaji <khaja.khaji@....qualcomm.com>
Cc: linux-arm-kernel@...ts.infradead.org, kprobes@...r.kernel.org,
linux-kernel@...r.kernel.org, will@...nel.org,
catalin.marinas@....com, masami.hiramatsu@...aro.org
Subject: Re: [PATCH] arm64: insn: Route BTI to simulate_nop to avoid XOL/SS
at function entry
On Tue, Nov 11, 2025 at 10:26:44AM +0000, Mark Rutland wrote:
> On Thu, Nov 06, 2025 at 04:19:55PM +0530, Khaja Hussain Shaik Khaji wrote:
> > On arm64 with branch protection, functions typically begin with a BTI
> > (Branch Target Identification) landing pad. Today the decoder treats BTI
> > as requiring out-of-line single-step (XOL), allocating a slot and placing
> > an SS-BRK. Under SMP this leaves a small window before DAIF is masked
> > where an asynchronous exception or nested probe can interleave and clear
> > current_kprobe, resulting in an SS-BRK panic.
>
> If you can take an exception here, and current_kprobe gets cleared, then
> XOL stepping is broken in general, but just for BTI.
Sorry, I typo'd the above. That should say:
If you can take an exception here, and current_kprobe gets cleared,
then XOL stepping is broken in general, *not* just for BTI.
I took a look at the exception entry code, and AFIACT DAIF is not
relevant. Upon exception entry, HW will mask all DAIF exception, and we
don't unmask any of those while handling an EL1 BRK.
Given that, IIUC the only way this can happen is if we can place a
kprobe on something used during kprobe handling (since BRK exceptions
aren't masked by DAIF). I am certain this is possible, and that kprobes
isn't generally safe; the existing __kprobes annotations are inadequent
and I don't think we can make kprobes generally sound without a
significant rework (e.g. to make it noinstr-safe).
Can you share any details on how you triggered this? e.g. what functions
you had kprobes on, whether you used any specific tooling?
Mark.
> > Handle BTI like NOP in the decoder and simulate it (advance PC by one
> > instruction). This avoids XOL/SS-BRK at these sites and removes the
> > single-step window, while preserving correctness for kprobes since BTI’s
> > branch-target enforcement has no program-visible effect in this EL1
> > exception context.
>
> One of the reasons for doing this out-of-line is that we should be able
> to mark the XOL slot as a guarded page, and get the correct BTI
> behaviour. It looks like we don't currently do that, which is a bug.
>
> Just skipping the BTI isn't right; that throws away the BTI target
> check.
>
> > In practice BTI is most commonly observed at function entry, so the main
> > effect of this change is to eliminate entry-site single-stepping. Other
> > instructions and non-entry sites are unaffected.
> >
> > Signed-off-by: Khaja Hussain Shaik Khaji <khaja.khaji@....qualcomm.com>
> > ---
> > arch/arm64/include/asm/insn.h | 5 -----
> > arch/arm64/kernel/probes/decode-insn.c | 9 ++++++---
> > arch/arm64/kernel/probes/simulate-insn.c | 1 +
> > 3 files changed, 7 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
> > index 18c7811774d3..7e80cc1f0c3d 100644
> > --- a/arch/arm64/include/asm/insn.h
> > +++ b/arch/arm64/include/asm/insn.h
> > @@ -452,11 +452,6 @@ static __always_inline bool aarch64_insn_is_steppable_hint(u32 insn)
> > case AARCH64_INSN_HINT_PACIASP:
> > case AARCH64_INSN_HINT_PACIBZ:
> > case AARCH64_INSN_HINT_PACIBSP:
> > - case AARCH64_INSN_HINT_BTI:
> > - case AARCH64_INSN_HINT_BTIC:
> > - case AARCH64_INSN_HINT_BTIJ:
> > - case AARCH64_INSN_HINT_BTIJC:
> > - case AARCH64_INSN_HINT_NOP:
> > return true;
> > default:
> > return false;
> > diff --git a/arch/arm64/kernel/probes/decode-insn.c b/arch/arm64/kernel/probes/decode-insn.c
> > index 6438bf62e753..7ce2cf5e21d3 100644
> > --- a/arch/arm64/kernel/probes/decode-insn.c
> > +++ b/arch/arm64/kernel/probes/decode-insn.c
> > @@ -79,10 +79,13 @@ enum probe_insn __kprobes
> > arm_probe_decode_insn(u32 insn, struct arch_probe_insn *api)
> > {
> > /*
> > - * While 'nop' instruction can execute in the out-of-line slot,
> > - * simulating them in breakpoint handling offers better performance.
> > + * NOP and BTI (Branch Target Identification) have no program‑visible side
> > + * effects for kprobes purposes. Simulate them to avoid XOL/SS‑BRK and the
> > + * small single‑step window. BTI’s branch‑target enforcement semantics are
> > + * irrelevant in this EL1 kprobe context, so advancing PC by one insn is
> > + * sufficient here.
> > */
> > - if (aarch64_insn_is_nop(insn)) {
> > + if (aarch64_insn_is_nop(insn) || aarch64_insn_is_bti(insn)) {
> > api->handler = simulate_nop;
> > return INSN_GOOD_NO_SLOT;
> > }
>
> I'm not necessarily opposed to emulating the BTI, but:
>
> (a) The BTI should not be emulated as a NOP. I am not keen on simulating
> the BTI exception in software, and would strongly prefer that's
> handled by HW (e.g. in the XOL slot).
>
> (b) As above, it sounds like this is bodging around a more general
> problem. We must solve that more general problem.
>
> > diff --git a/arch/arm64/kernel/probes/simulate-insn.c b/arch/arm64/kernel/probes/simulate-insn.c
> > index 4c6d2d712fbd..b83312cb70ba 100644
> > --- a/arch/arm64/kernel/probes/simulate-insn.c
> > +++ b/arch/arm64/kernel/probes/simulate-insn.c
> > @@ -200,5 +200,6 @@ simulate_ldrsw_literal(u32 opcode, long addr, struct pt_regs *regs)
> > void __kprobes
> > simulate_nop(u32 opcode, long addr, struct pt_regs *regs)
> > {
> > + /* Also used as BTI simulator: both just advance PC by one insn. */
> > arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
> > }
>
> This comment should go.
>
> Mark.
>
Powered by blists - more mailing lists