linux-kernel - Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YhizfwoddLwWWl2J@hirez.programming.kicks-ass.net>
Date:   Fri, 25 Feb 2022 11:46:23 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Masami Hiramatsu <mhiramat@...nel.org>
Cc:     x86@...nel.org, joao@...rdrivepizza.com, hjl.tools@...il.com,
        jpoimboe@...hat.com, andrew.cooper3@...rix.com,
        linux-kernel@...r.kernel.org, ndesaulniers@...gle.com,
        keescook@...omium.org, samitolvanen@...gle.com,
        mark.rutland@....com, alyssa.milburn@...el.com, mbenes@...e.cz,
        rostedt@...dmis.org, alexei.starovoitov@...il.com
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Fri, Feb 25, 2022 at 10:32:15AM +0900, Masami Hiramatsu wrote:
> Hi Peter,
> 
> On Thu, 24 Feb 2022 15:51:53 +0100
> Peter Zijlstra <peterz@...radead.org> wrote:
> 
> > With IBT on, sym+0 is no longer the __fentry__ site.
> > 
> > NOTE: the architecture has a special case and *does* allow placing an
> > INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
> > and as such we don't need to disallow probing these instructions.
> 
> Does this mean we can still putting a probe on sym+0??

I'm not sure... Possibly not. I'm not sure if there's an ABI that
by-passes kprobes_lookup_name(). Arguably you could give it a direct
address instead of a name and still hit the ENDBR I think. But the ABI
surface of this thing it too big for me to easily tell.

> If so, NAK this patch, since the KPROBES_ON_FTRACE is not meaning
> to accelerate the function entry probe, but just allows user to
> put a probe on 'call _mcount' (which can be modified by ftrace).
> 
> func:
>   endbr  <- sym+0  : INT3 is used. (kp->addr = func+0)
>   nop5   <- sym+4? : ftrace is used. (kp->addr = func+4?)
>   ...
> 
> And anyway, in some case (e.g. perf probe) symbol will be a basement
> symbol like '_text' and @offset will be the function addr - _text addr
> so that we can put a probe on local-scope function.
> 
> If you think we should not probe on the endbr, we should treat the
> pair of endbr and nop5 (or call _mcount) instructions as a virtual
> single instruction. This means kp->addr should point sym+0, but use
> ftrace to probe.
> 
> func:
>   endbr  <- sym+0  : ftrace is used. (kp->addr = func+0)
>   nop5   <- sym+4? : This is not able to be probed.
>   ...

Well, it's all a bit crap :/

This patch came from kernel/trace/trace_kprobe.c selftest failing at
boot. That tries to set a kprobe on kprobe_trace_selftest_target which
the whole kprobe machinery translates into
kprobe_trace_selftest_target+0 and then not actually hitting the fentry.

IOW, that selftest seems to hard-code/assume +0 matches __fentry__,
which just isn't true in general (arm64, powerpc are architectures that
come to mind) and now also might not be true on x86.

Calling the selftest broken works for me and I'll drop the patch.

Note that with these patches:

 - Not every function starts with ENDBR; the compiler is free to omit
   this instruction if it can determine the function address is never
   taken (and as such there's never an indirect call to it).

 - If there is an ENDBR, not every function entry will actually execute
   it. This first instruction is used exclusively as an indirect entry
   point. All direct calls should be to the next instruction.

 - If there was an ENDBR, it might be turned into a 4 byte UD1
   instruction to ensure any indirect call *will* fail.

Given all that, kprobe users are in a bit of a bind. Determining the
__fentry__ point basically means they *have* to first read the function
assembly to figure out where it is.

This patch takes the approach that sym+0 means __fentry__, irrespective
of where it might actually live. I *think* that's more or less
consistent with what other architectures do; specifically see
arch/powerpc/kernel/kprobes.c:kprobe_lookup_name(). I'm not quite sure
what ARM64 does when it has BTI on (which is then very similar to what
we have here).

What do you think makes most sense here?