linux-kernel - Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20210128123842.c9e33949e62f504b84bfadf5@gmail.com>
Date:   Thu, 28 Jan 2021 12:38:42 +0900
From:   Masami Hiramatsu <masami.hiramatsu@...il.com>
To:     Masami Hiramatsu <mhiramat@...nel.org>
Cc:     Nikolay Borisov <nborisov@...e.com>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>, bpf@...r.kernel.org
Subject: Re: kprobes broken since 0d00449c7a28
 ("x86: Replace ist_enter() with nmi_enter()")

Hi,

On Thu, 28 Jan 2021 10:34:15 +0900
Masami Hiramatsu <mhiramat@...nel.org> wrote:

> On Wed, 27 Jan 2021 19:57:56 +0200
> Nikolay Borisov <nborisov@...e.com> wrote:
> 
> > 
> > 
> > On 27.01.21 г. 17:24 ч., Masami Hiramatsu wrote:
> > > On Thu, 28 Jan 2021 00:13:53 +0900
> > > Masami Hiramatsu <mhiramat@...nel.org> wrote:
> > > 
> > >> Hi Nikolay,
> > >>
> > >> On Wed, 27 Jan 2021 15:43:29 +0200
> > >> Nikolay Borisov <nborisov@...e.com> wrote:
> > >>
> > >>> Hello,
> > >>>
> > >>> I'm currently seeing latest Linus' master being somewhat broken w.r.t
> > >>> krpobes. In particular I have the following test-case:
> > >>>
> > >>> #!/bin/bash
> > >>>
> > >>> mkfs.btrfs -f /dev/vdc &> /dev/null
> > >>> mount /dev/vdc /media/scratch/
> > >>>
> > >>> bpftrace -e 'kprobe:btrfs_sync_file {printf("kprobe: %s\n", kstack());}'
> > >>> &>bpf-output &
> > >>> bpf_trace_pid=$!
> > >>>
> > >>> # force btrfs_sync_file to be called
> > >>> sleep 2
> > >>> xfs_io -f -c "pwrite 0 4m" -c "fsync" /media/scratch/file5
> > >>>
> > >>> kill $bpf_trace_pid
> > >>> sleep 1
> > >>>
> > >>> grep -q kprobe bpf-output
> > >>> retval=$?
> > >>> rm -f bpf-output
> > >>> umount /media/scratch
> > >>>
> > >>> exit $retval
> > >>>
> > >>> It traces btrfs_sync_file which is called when fsync is executed on a
> > >>> btrfs file, however I don't see the stacktrace being printed i.e the
> > >>> kprobe doesn't fire at all. The following alternative program:
> > >>>
> > >>> bpftrace -e 'tracepoint:btrfs:btrfs_sync_file {printf("tracepoint:
> > >>> %s\n", kstack());} kprobe:btrfs_sync_file {printf("kprobe: %s\n",
> > >>> kstack());}'
> > >>>
> > >>> only prints the stack from the tracepoint and not from the kprobe, given
> > >>> that the tracepoint is called from the btrfs_sync_file function.
> > >>
> > >> Thank you for reporting!
> > >>
> > >> If you don't mind, could you confirm it with ftrace (tracefs)?
> > >> bpftrace etc. involves too many things. It is better to test with
> > >> simpler way to test it.
> > >> I'm not familer with the bpftrace, but I think you can check it with
> > >>
> > >> # cd /sys/kernel/debug/tracing
> > >> # echo p:myevent btrfs_sync_file >> kprobe_events
> > >> # echo stacktrace > events/kprobes/myevent/trigger
> > >>  (or echo 1 > options/stacktrace , if trigger file doesn't exist)
> > > 
> > > Of course, also you have to enable the event.
> > >  # echo 1 > events/kprobes/myevent/enable
> > > 
> > > And check the results
> > > 
> > >  # cat trace
> > > 
> > > 
> > >> Could you also share your kernel config, so that we can reproduce it?
> > > 
> > 
> > I've attached the config and indeed with the scenario you proposed it
> > seems to works. I see:
> > 
> >        xfs_io-20280   [000] d.Z.  9900.748633: myevent:
> > (btrfs_sync_file+0x0/0x580)
> >           xfs_io-20280   [000] d.Z.  9900.748647: <stack trace>
> >  => kprobe_trace_func
> >  => kprobe_dispatcher
> >  => kprobe_int3_handler
> >  => exc_int3
> >  => asm_exc_int3
> >  => btrfs_sync_file
> >  => do_fsync
> >  => __x64_sys_fsync
> >  => do_syscall_64
> >  => entry_SYSCALL_64_after_hwframe
> 
> Hmm, then there might be a problem in bpftrace or ebpf (need more info).
> At least kprobes itself isn't broken.
> I guess they check "in_nmi()" and skip such event?

Yeah, there is. Nikolay, could you try this tentative patch?

Of course this just drops the NMI check from the handler, so alternative
checker is required. But I'm not sure what the original code concerns.
As far as I can see, there seems no re-entrant block flag, nor locks
among ebpf programs in runtime.

Alexei, could you tell me what is the concerning situation for bpf?

Thank you,

>From c5cd0e5f60ef6494c9e1579ec1b82b7344c41f9a Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat@...nel.org>
Date: Thu, 28 Jan 2021 12:31:02 +0900
Subject: [PATCH] tracing: bpf: Remove in_nmi() check from kprobe handler

Since commit 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()") has
changed the kprobe handler to run in the NMI context, in_nmi() always returns
true. This means the bpf events on kprobes always skipped.

Signed-off-by: Masami Hiramatsu <mhiramat@...nel.org>
---
 kernel/trace/bpf_trace.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 6c0018abe68a..764400260eb6 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -96,9 +96,6 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 {
 	unsigned int ret;
 
-	if (in_nmi()) /* not supported yet */
-		return 1;
-
 	cant_sleep();
 
 	if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
-- 
2.25.1

-- 
Masami Hiramatsu