[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4bf72b3c-5fee-269f-1d71-7f808f436db9@fb.com>
Date: Tue, 14 Apr 2020 16:59:12 -0700
From: Yonghong Song <yhs@...com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
CC: Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
Martin KaFai Lau <kafai@...com>,
Networking <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Kernel Team <kernel-team@...com>
Subject: Re: [RFC PATCH bpf-next 05/16] bpf: create file or anonymous dumpers
On 4/13/20 10:56 PM, Andrii Nakryiko wrote:
> On Wed, Apr 8, 2020 at 4:26 PM Yonghong Song <yhs@...com> wrote:
>>
>> Given a loaded dumper bpf program, which already
>> knows which target it should bind to, there
>> two ways to create a dumper:
>> - a file based dumper under hierarchy of
>> /sys/kernel/bpfdump/ which uses can
>> "cat" to print out the output.
>> - an anonymous dumper which user application
>> can "read" the dumping output.
>>
>> For file based dumper, BPF_OBJ_PIN syscall interface
>> is used. For anonymous dumper, BPF_PROG_ATTACH
>> syscall interface is used.
>
> We discussed this offline with Yonghong a bit, but I thought I'd put
> my thoughts about this in writing for completeness. To me, it seems
> like the most consistent way to do both anonymous and named dumpers is
> through the following steps:
The main motivation for me to use bpf_link is to enumerate
anonymous bpf dumpers by using idr based link_query mechanism in one
of previous Andrii's RFC patch so I do not need to re-invent the wheel.
But looks like there are some difficulties:
>
> 1. BPF_PROG_LOAD to load/verify program, that created program FD.
> 2. LINK_CREATE using that program FD and direntry FD. This creates
> dumper bpf_link (bpf_dumper_link), returns anonymous link FD. If link
bpf dump program already have the target information as part of
verification propose, so it does not need directory FD.
LINK_CREATE probably not a good fit here.
bpf dump program is kind similar to fentry/fexit program,
where after successful program loading, the program will know
where to attach trampoline.
Looking at kernel code, for fentry/fexit program, at raw_tracepoint_open
syscall, the trampoline will be installed and actually bpf program will
be called.
So, ideally, if we want to use kernel bpf_link, we want to
return a cat-able bpf_link because ultimately we want to query
file descriptors which actually 'read' bpf program outputs.
Current bpf_link is not cat-able.
I try to hack by manipulating fops and other stuff, it may work,
but looks ugly. Or we create a bpf_catable_link and build an
infrastructure around that? Not sure whether it is worthwhile for this
one-off thing (bpfdump)?
Or to query anonymous bpf dumpers, I can just write a bpf dump program
to go through all fd's to find out.
BTW, my current approach (in my private branch),
anonymous dumper:
bpf_raw_tracepoint_open(NULL, prog) -> cat-able fd
file dumper:
bpf_obj_pin(prog, path) -> a cat-able file
If you consider program itself is a link, this is like what
described below in 3 and 4.
> FD is closed, dumper program is detached and dumper is destroyed
> (unless pinned in bpffs, just like with any other bpf_link.
> 3. At this point bpf_dumper_link can be treated like a factory of
> seq_files. We can add a new BPF_DUMPER_OPEN_FILE (all names are for
> illustration purposes) command, that accepts dumper link FD and
> returns a new seq_file FD, which can be read() normally (or, e.g.,
> cat'ed from shell).
In this case, link_query may not be accurate if a bpf_dumper_link
is created but no corresponding bpf_dumper_open_file. What we really
need to iterate through all dumper seq_file FDs.
> 4. Additionally, this anonymous bpf_link can be pinned/mounted in
> bpfdumpfs. We can do it as BPF_OBJ_PIN or as a separate command. Once
> pinned at, e.g., /sys/fs/bpfdump/task/my_dumper, just opening that
> file is equivalent to BPF_DUMPER_OPEN_FILE and will create a new
> seq_file that can be read() independently from other seq_files opened
> against the same dumper. Pinning bpfdumpfs entry also bumps refcnt of
> bpf_link itself, so even if process that created link dies, bpf dumper
> stays attached until its bpfdumpfs entry is deleted.
>
> Apart from BPF_DUMPER_OPEN_FILE and open()'ing bpfdumpfs file duality,
> it seems pretty consistent and follows safe-by-default auto-cleanup of
> anonymous link, unless pinned in bpfdumpfs (or one can still pin
> bpf_link in bpffs, but it can't be open()'ed the same way, it just
> preserves BPF program from being cleaned up).
>
> Out of all schemes I could come up with, this one seems most unified
> and nicely fits into bpf_link infra. Thoughts?
>
>>
>> To facilitate target seq_ops->show() to get the
>> bpf program easily, dumper creation increased
>> the target-provided seq_file private data size
>> so bpf program pointer is also stored in seq_file
>> private data.
>>
>> Further, a seq_num which represents how many
>> bpf_dump_get_prog() has been called is also
>> available to the target seq_ops->show().
>> Such information can be used to e.g., print
>> banner before printing out actual data.
>>
>> Note the seq_num does not represent the num
>> of unique kernel objects the bpf program has
>> seen. But it should be a good approximate.
>>
>> A target feature BPF_DUMP_SEQ_NET_PRIVATE
>> is implemented specifically useful for
>> net based dumpers. It sets net namespace
>> as the current process net namespace.
>> This avoids changing existing net seq_ops
>> in order to retrieve net namespace from
>> the seq_file pointer.
>>
>> For open dumper files, anonymous or not, the
>> fdinfo will show the target and prog_id associated
>> with that file descriptor. For dumper file itself,
>> a kernel interface will be provided to retrieve the
>> prog_id in one of the later patches.
>>
>> Signed-off-by: Yonghong Song <yhs@...com>
>> ---
>> include/linux/bpf.h | 5 +
>> include/uapi/linux/bpf.h | 6 +-
>> kernel/bpf/dump.c | 338 ++++++++++++++++++++++++++++++++-
>> kernel/bpf/syscall.c | 11 +-
>> tools/include/uapi/linux/bpf.h | 6 +-
>> 5 files changed, 362 insertions(+), 4 deletions(-)
>>
>
> [...]
>
Powered by blists - more mailing lists