[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEf4BzbOw3=NAMA4GJq9s6KvDHfh9a3_vTHM940P+jYWfoty2Q@mail.gmail.com>
Date: Tue, 21 Apr 2020 14:49:17 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
Networking <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Wenbo Zhang <ethercflow@...il.com>,
Kernel Team <kernel-team@...com>
Subject: Re: [PATCH bpf-next 0/3] Improve raw tracepoint BTF types preservation
On Mon, Mar 2, 2020 at 8:59 PM Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
>
> On Mon, Mar 2, 2020 at 8:10 PM Andrii Nakryiko
> <andrii.nakryiko@...il.com> wrote:
> >
> > On Mon, Mar 2, 2020 at 4:59 PM Alexei Starovoitov
> > <alexei.starovoitov@...il.com> wrote:
> > >
> > > On Sun, Mar 01, 2020 at 12:10:42AM -0800, Andrii Nakryiko wrote:
> > > > Fix issue with not preserving btf_trace_##call structs when compiled under
> > > > Clang. Additionally, capture raw tracepoint arguments in raw_tp_##call
> > > > structs, directly usable from BPF programs. Convert runqslower to use those
> > > > for proof of concept and to simplify code further.
> > >
> > > Not only folks compile kernel with clang they use the latest BPF/BTF features
> > > with it. This is very nice to see!
> > > I've applied 1st patch to make clang compiled kernel emit proper BTF.
> > >
> > > As far as patch 2 I'm not sure about 'raw_tp_' prefix. tp_btf type of progs can
> > > use the same structs. So I think there could be a better name. Also bpftool can
> > > generate them as well while emitting vmlinux.h. I think that will avoid adding
> > > few kilobytes to vmlinux BTF that kernel isn't going to use atm.
> >
> > Fair enough, I'll follow up with bpftool changes to generate such
> > structs. I'm thinking to use tp_args_xxx name pattern, unless someone
> > has a better idea :)
>
> Bad news. BTF_KIND_FUNC_PROTOs don't capture argument names and having
> something like:
>
> struct tp_args_sched_switch {
> bool arg1;
> struct task_struct *arg2;
> struct task_struct *arg3;
> };
>
> doesn't seem like a good solution...
I'd like to surface this one more time. I'd like to give just a bit
more context first, though.
Currently, when using various types of BPF programs (kprobe,
fentry/fexit, lsm, etc), one can use few ways to extract input
arguments of the "intercepted" function. One of the more user-friendly
ways to do it is through BPF_PROG macro. So BPF code would look like
this:
SEC("lsm/file_mprotect")
int BPF_PROG(test_int_hook, struct vm_area_struct *vma,
unsigned long reqprot, unsigned long prot, int ret)
{
if (ret != 0) { ... }
/* use cma, reqprot, etc directly as variables */
}
This is kind of nice, but involves quite a bit of macro magic and
requires user to find and copy/paste those function prototypes and
track whether they ever change.
The alternative for this would be to have a memory-layout-compatible
struct, that can be used directly as a context. For the above example:
struct btf_lsm_file_mprotect_ctx {
struct vm_area_struct *vma;
unsigned long reqprot;
int ret __attribute__((aligned(8)));
};
With that, one can do a nice pure C code:
int test_int_hook(struct btf_lsm_file_mprotect_ctx *ctx)
{
if (ctx->ret != 0) { ... }
/* here instead of using vma, reqprot and ret directly,
* one would access them as ctx fields: ctx->vma, ctx->reqprot
*/
}
The benefit of the latter is that, hopefully, no one will have to
write such struct definitions by hand, they would be generated as part
of vmlinux.h and would come directly from kernel BTF (in one way or
another, see below). One added benefit is that such struct is
CO-RE-relocatable, so if fields are ever, say, reordered or removed,
by using CO-RE mechanisms one can write a compiled-once BPF program to
accommodate such changes and even incompatibilities.
Now to why I'm bringing this up again.
The original plan was to use bpftool to convert func_protos that are
used directly by kernel (e.g., btf_trace_xxx ones for raw tracepoints)
to dump them as memory layout-compatible structs at the end of
vmlinux.h for use by BPF programs. Unfortunately, I don't think that
will work because func_proto arguments don't preserve their names in
BTF. Which, as I pointed out in previous email, ruins usability of
generated structs.
So I'd like to solicit feedback on how we can proceed from here. I see
few possible ways to go about this:
1. Do nothing. Always an option. It sucks for users (they need to
copy-paste function definitions, use BPF_PROG macro, etc), but is
awesome for kernel (no changes, no extra stuff).
2. Bite a bullet and add compatible struct definitions into kernel
itself. It will slightly increase kernel BTF, but will require no
changes on user-space and tooling side. vmlinux.h just magically gets
proper structs that are directly usable from BPF programs. One
objection to that is that those structs are not directly used by
kernel and thus are just a dead weight.
3. Bite even bigger bullet and convert current uses of func_proto in
kernel to struct. That way we don't have unnecessary types laying
around in the kernel, verifier actually will use these structs for
verification. There might be a concern about backwards compatibility.
Libbpf can easily accommodate such changes by searching for either
struct or func_proto, whichever is available, but one can argue that
we have to leave existing btf_trace_xxx func protos intact if that's
considered to be part of UAPI.
4. Milder variant of #3 would be to convert typedef of func_protos
into a use of proper FUNCs. They still use func_proto, but that
func_proto actually preserves argument names, and would allow bpftool
to dump proper structs. Same considerations of what to do with
backwards compatibility of btf_trace_xxx typedefs+func_proto, but
won't require much verifier changes, because it's still going to be
func_proto that need to be used for verification. This is what
Yonghong's bpfdump changes actually do: they use __init empty funcs to
provide types for verifier.
I think it's important to discuss this, because more and more BPF
program types are relying on using similar approach (e.g., LSM,
bpfdump), and it would be nice to provide a good **and uniform**
solution to let users just use a proper context structure directly,
instead of using BPF_PROG macro (best case, worst case it's a direct
u64 array conversions) and copy-pasting definition.
Thanks for any feedback upfront!
Powered by blists - more mailing lists