linux-kernel - Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210129013452.njuh3fomws62m4rc@ast-mbp.dhcp.thefacebook.com>
Date:   Thu, 28 Jan 2021 17:34:52 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Nikolay Borisov <nborisov@...e.com>,
        Masami Hiramatsu <masami.hiramatsu@...il.com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>, bpf@...r.kernel.org,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter()
 with nmi_enter()")

On Thu, Jan 28, 2021 at 07:24:14PM +0100, Peter Zijlstra wrote:
> On Thu, Jan 28, 2021 at 06:45:56PM +0200, Nikolay Borisov wrote:
> > it would be placed on the __fentry__ (and not endbr64) hence it works.
> > So perhaps a workaround outside of bpf could essentially detect this
> > scenario and adjust the probe to be on the __fentry__ and not preceding
> > instruction if it's detected to be endbr64 ?
> 
> Arguably the fentry handler should also set the nmi context, it can,
> after all, interrupt pretty much any other context by construction.

But that doesn't make it a real nmi.
nmi can actually interrupt anything. Whereas kprobe via int3 cannot
do nokprobe and noinstr sections. The exposure is a lot different.
ftrace is even more contained. It's only at the start of the functions.
It's even smaller subset of places than kprobes.
So ftrace < kprobe < nmi.
Grouping them all into nmi doesn't make sense to me.
That bpf breaking change came unnoticed mostly because people use
kprobes in the beginning of the functions only, but there are cases
where kprobes are in the middle too. People just didn't upgrade kernels
fast enough to notice.
imo appropriate solution would be to have some distinction between
ftrace < kprobe_via_int3 < nmi, so that bpf side can react accordingly.
That code in trace_call_bpf:
  if (in_nmi()) /* not supported yet */
was necessary in the past. Now we have all sorts of protections built in.
So it's probably ok to just drop it, but I think adding
called_via_ftrace vs called_via_kprobe_int3 vs called_via_nmi
is more appropriate solution long term.