linux-kernel - Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220225224249.cbabe82e530758cdb28e65e9@kernel.org>
Date:   Fri, 25 Feb 2022 22:42:49 +0900
From:   Masami Hiramatsu <mhiramat@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     x86@...nel.org, joao@...rdrivepizza.com, hjl.tools@...il.com,
        jpoimboe@...hat.com, andrew.cooper3@...rix.com,
        linux-kernel@...r.kernel.org, ndesaulniers@...gle.com,
        keescook@...omium.org, samitolvanen@...gle.com,
        mark.rutland@....com, alyssa.milburn@...el.com, mbenes@...e.cz,
        rostedt@...dmis.org, alexei.starovoitov@...il.com
Subject: Re: [PATCH v2 15/39] x86/ibt,kprobes: Fix more +0 assumptions

On Fri, 25 Feb 2022 11:46:23 +0100
Peter Zijlstra <peterz@...radead.org> wrote:

> On Fri, Feb 25, 2022 at 10:32:15AM +0900, Masami Hiramatsu wrote:
> > Hi Peter,
> > 
> > On Thu, 24 Feb 2022 15:51:53 +0100
> > Peter Zijlstra <peterz@...radead.org> wrote:
> > 
> > > With IBT on, sym+0 is no longer the __fentry__ site.
> > > 
> > > NOTE: the architecture has a special case and *does* allow placing an
> > > INT3 breakpoint over ENDBR in which case #BP has precedence over #CP
> > > and as such we don't need to disallow probing these instructions.
> > 
> > Does this mean we can still putting a probe on sym+0??
> 
> I'm not sure... Possibly not. I'm not sure if there's an ABI that
> by-passes kprobes_lookup_name(). Arguably you could give it a direct
> address instead of a name and still hit the ENDBR I think. But the ABI
> surface of this thing it too big for me to easily tell.
> 
> > If so, NAK this patch, since the KPROBES_ON_FTRACE is not meaning
> > to accelerate the function entry probe, but just allows user to
> > put a probe on 'call _mcount' (which can be modified by ftrace).
> > 
> > func:
> >   endbr  <- sym+0  : INT3 is used. (kp->addr = func+0)
> >   nop5   <- sym+4? : ftrace is used. (kp->addr = func+4?)
> >   ...
> > 
> > And anyway, in some case (e.g. perf probe) symbol will be a basement
> > symbol like '_text' and @offset will be the function addr - _text addr
> > so that we can put a probe on local-scope function.
> > 
> > If you think we should not probe on the endbr, we should treat the
> > pair of endbr and nop5 (or call _mcount) instructions as a virtual
> > single instruction. This means kp->addr should point sym+0, but use
> > ftrace to probe.
> > 
> > func:
> >   endbr  <- sym+0  : ftrace is used. (kp->addr = func+0)
> >   nop5   <- sym+4? : This is not able to be probed.
> >   ...
> 
> Well, it's all a bit crap :/
> 
> This patch came from kernel/trace/trace_kprobe.c selftest failing at
> boot. That tries to set a kprobe on kprobe_trace_selftest_target which
> the whole kprobe machinery translates into
> kprobe_trace_selftest_target+0 and then not actually hitting the fentry.

OK.

> 
> IOW, that selftest seems to hard-code/assume +0 matches __fentry__,
> which just isn't true in general (arm64, powerpc are architectures that
> come to mind) and now also might not be true on x86.

Yeah, right. But if we can handle this as above, maybe we can continue
to put the probe on the entry of the function.

> 
> Calling the selftest broken works for me and I'll drop the patch.
> 
> 
> Note that with these patches:
> 
>  - Not every function starts with ENDBR; the compiler is free to omit
>    this instruction if it can determine the function address is never
>    taken (and as such there's never an indirect call to it).
> 
>  - If there is an ENDBR, not every function entry will actually execute
>    it. This first instruction is used exclusively as an indirect entry
>    point. All direct calls should be to the next instruction.
> 
>  - If there was an ENDBR, it might be turned into a 4 byte UD1
>    instruction to ensure any indirect call *will* fail.

Ah, I see. So that is a booby trap for the cracker. 

> 
> Given all that, kprobe users are in a bit of a bind. Determining the
> __fentry__ point basically means they *have* to first read the function
> assembly to figure out where it is.

OK, this sounds like kp->addr should be "call fentry" if there is ENDBR.

> 
> This patch takes the approach that sym+0 means __fentry__, irrespective
> of where it might actually live. I *think* that's more or less
> consistent with what other architectures do; specifically see
> arch/powerpc/kernel/kprobes.c:kprobe_lookup_name(). I'm not quite sure
> what ARM64 does when it has BTI on (which is then very similar to what
> we have here).

Yeah, I know the powerpc does such thing, but I think that is not what
user expected. I actually would like to fix that, because in powerpc
and other non-x86 case (without BTI/IBT), the instructions on sym+0 is
actually executed.

> 
> What do you think makes most sense here?

Are there any way to distinguish the "preparing instructions" (part of
calling mcount) and this kind of trap instruction online[1]? If possible,
I would like to skip such traps, but put the probe on preparing
instructions.
It seems currently we are using ftrace address as the end marker of
the trap instruction, but we actually need another marker to split
the end of ENDBR and the preparing instructions.

[1]
On x86, we have

func:
endbr
call __fentry__ <-- ftrace location

But on other arch,

func:
[BTI instruction]
push return address <--- preparing instruction(s)
call __fentry__     <-- ftrace location



Thank you,

-- 
Masami Hiramatsu <mhiramat@...nel.org>