[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <334a91d2-1567-bf3d-4ae6-305646738132@fb.com>
Date: Fri, 10 Apr 2020 17:23:30 -0700
From: Yonghong Song <yhs@...com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
CC: Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
Martin KaFai Lau <kafai@...com>,
Networking <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Kernel Team <kernel-team@...com>
Subject: Re: [RFC PATCH bpf-next 05/16] bpf: create file or anonymous dumpers
On 4/10/20 4:25 PM, Andrii Nakryiko wrote:
> On Wed, Apr 8, 2020 at 4:26 PM Yonghong Song <yhs@...com> wrote:
>>
>> Given a loaded dumper bpf program, which already
>> knows which target it should bind to, there
>> two ways to create a dumper:
>> - a file based dumper under hierarchy of
>> /sys/kernel/bpfdump/ which uses can
>> "cat" to print out the output.
>> - an anonymous dumper which user application
>> can "read" the dumping output.
>>
>> For file based dumper, BPF_OBJ_PIN syscall interface
>> is used. For anonymous dumper, BPF_PROG_ATTACH
>> syscall interface is used.
>>
>> To facilitate target seq_ops->show() to get the
>> bpf program easily, dumper creation increased
>> the target-provided seq_file private data size
>> so bpf program pointer is also stored in seq_file
>> private data.
>>
>> Further, a seq_num which represents how many
>> bpf_dump_get_prog() has been called is also
>> available to the target seq_ops->show().
>> Such information can be used to e.g., print
>> banner before printing out actual data.
>
> So I looked up seq_operations struct and did a very cursory read of
> fs/seq_file.c and seq_file documentation, so I might be completely off
> here.
>
> start() is called before iteration begins, stop() is called after
> iteration ends. Would it be a bit better and user-friendly interface
> to have to extra calls to BPF program, say with NULL input element,
> but with extra enum/flag that specifies that this is a START or END of
> iteration, in addition to seq_num?
The current design always pass a valid object (task, file, netlink_sock,
fib6_info). That is, access to fields to those data structure won't
cause runtime exceptions.
Therefore, with the existing seq_ops implementation for ipv6_route
and netlink, etc, we don't have END information. We can get START
information though.
>
> Also, right now it's impossible to write stateful dumpers that do any
> kind of stats calculation, because it's impossible to determine when
> iteration restarted (it starts from the very beginning, not from the
> last element). It's impossible to just rememebr last processed
> seq_num, because BPF program might be called for a new "session" in
> parallel with the old one.
Theoretically, session end can be detected by checking the return
value of last bpf_seq_printf() or bpf_seq_write(). If it indicates
an overflow, that means session end.
Or bpfdump infrastructure can help do this work to provide
session id.
>
> So it seems like few things would be useful:
>
> 1. end flag for post-aggregation and/or footer printing (seq_num == 0
> is providing similar means for start flag).
the end flag is a problem. We could say hijack next or stop so we
can detect the end, but passing a NULL pointer as the object
to the bpf program may be problematic without verifier enforcement
as it may cause a lot of exceptions... Although all these exception
will be silenced by bpf infra, but still not sure whether this
is acceptable or not.
> 2. Some sort of "session id", so that bpfdumper can maintain
> per-session intermediate state. Plus with this it would be possible to
> detect restarts (if there is some state for the same session and
> seq_num == 0, this is restart).
I guess we can do this.
>
> It seems like it might be a bit more flexible to, instead of providing
> seq_file * pointer directly, actually provide a bpfdumper_context
> struct, which would have seq_file * as one of fields, other being
> session_id and start/stop flags.
As you mentioned, if we have more fields related to seq_file passing
to bpf program, yes, grouping them into a structure makes sense.
>
> A bit unstructured thoughts, but what do you think?
>
>>
>> Note the seq_num does not represent the num
>> of unique kernel objects the bpf program has
>> seen. But it should be a good approximate.
>>
>> A target feature BPF_DUMP_SEQ_NET_PRIVATE
>> is implemented specifically useful for
>> net based dumpers. It sets net namespace
>> as the current process net namespace.
>> This avoids changing existing net seq_ops
>> in order to retrieve net namespace from
>> the seq_file pointer.
>>
>> For open dumper files, anonymous or not, the
>> fdinfo will show the target and prog_id associated
>> with that file descriptor. For dumper file itself,
>> a kernel interface will be provided to retrieve the
>> prog_id in one of the later patches.
>>
>> Signed-off-by: Yonghong Song <yhs@...com>
>> ---
>> include/linux/bpf.h | 5 +
>> include/uapi/linux/bpf.h | 6 +-
>> kernel/bpf/dump.c | 338 ++++++++++++++++++++++++++++++++-
>> kernel/bpf/syscall.c | 11 +-
>> tools/include/uapi/linux/bpf.h | 6 +-
>> 5 files changed, 362 insertions(+), 4 deletions(-)
>>
>
> [...]
>
Powered by blists - more mailing lists