linux-kernel - Re: [RFC 0/1] BPF tracing for arm64 using fprobe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20221117121617.4e1529d3@gandalf.local.home>
Date:   Thu, 17 Nov 2022 12:16:17 -0500
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Florent Revest <revest@...omium.org>, bpf <bpf@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        KP Singh <kpsingh@...nel.org>,
        Brendan Jackman <jackmanb@...gle.com>, markowsky@...gle.com,
        Mark Rutland <mark.rutland@....com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Xu Kuohai <xukuohai@...wei.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Christoph Hellwig <hch@...radead.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC 0/1] BPF tracing for arm64 using fprobe

On Wed, 16 Nov 2022 18:41:26 -0800
Alexei Starovoitov <alexei.starovoitov@...il.com> wrote:

> Even with all optimization the performance overhead is not acceptable.
> It feels to me that folks are still thinking about bpf trampoline
> as a tracing facility.
> It's a lot more than that. It needs to run 24/7 with zero overhead.

It obviously doesn't have zero overhead.

And correctness and maintainability trumps micro-optimizations.

> It needs to replace the kernel functions and be invoked

What do you mean by "replace the kernel functions"? You mean an existing
kernel function can be replaced by a bpf program? Like live patching?

This seems rather dangerous, and how does one know that their system has
integrity? Is there a feature to sign bpf programs before they can be added?

Also, it may be time to bring in the lawyers. If a bpf program can replace
an existing kernel function, then it has definitely passed the "user space"
exception to the GPL, where user space must use the system call interface.
By injecting executable code into the kernel, especially something that
replaces kernel functionality, it becomes arguably derived from the kernel
itself. And the BPF program must be GPL.

Allowing proprietary BPF programs to replace kernel functionality looks
like a clear violation and circumvention of the GPL. But I could be
mistaken. As I said, it's time to bring in the lawyers on this one.

> millions times a second until the system is rebooted.
> In this environment every nanosecond counts.
> 
> Even if the fprobe side was completely free the patch 1 has so much
> overhead in copy of bpf_cookie, regs, etc that it's a non-starter
> for these use cases.
> 
> There are several other fundamental issues in this approach
> because of fprobe/ftrace.
> It has ftrace_test_recursion_trylock and disables preemption.
> Both are deal breakers.

Please explain why? The recursion protection lock is a simply bit operation
on the task struct which is used to protect against recursion at the same
context. Which if you do not have, will likely happen, and the only hint of
it is that the system triple faults and reboots. If you are only hooking to
one function, then it is easy to figure this out. But with the multi work
being done, that is no longer the case.

Hooking to functions is *extremely* intrusive. And protection against
errors is a must have, and not an option.

> 
> bpf trampoline has to allow recursion in some cases.
> See __bpf_prog_enter*() flavors.

The recursion lock allows recursions, but not at the same context. That is,
interrupt against normal context is fine. But really, you should not have
it within the same context. How do you verify that you do not run out of
stack?

> 
> bpf trampoline also has to use migrate_disable instead of preemption
> and rcu_read_lock() in some cases and rcu_read_lock_trace() in others.
> 
> bpf trampoline must never allocate memory or grab locks.

Neither should fprobes / ftrace.

-- Steve

> 
> All of these mandatory features exclude fprobe, ftrace, rethook
> from possible options.
> 
> Let's figure out how to address concerns with direct calls: