netdev - Re: [RFC PATCH bpf-next 05/16] bpf: create file or anonymous dumpers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <334a91d2-1567-bf3d-4ae6-305646738132@fb.com>
Date:   Fri, 10 Apr 2020 17:23:30 -0700
From:   Yonghong Song <yhs@...com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>
CC:     Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
        Martin KaFai Lau <kafai@...com>,
        Networking <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kernel Team <kernel-team@...com>
Subject: Re: [RFC PATCH bpf-next 05/16] bpf: create file or anonymous dumpers



On 4/10/20 4:25 PM, Andrii Nakryiko wrote:
> On Wed, Apr 8, 2020 at 4:26 PM Yonghong Song <yhs@...com> wrote:
>>
>> Given a loaded dumper bpf program, which already
>> knows which target it should bind to, there
>> two ways to create a dumper:
>>    - a file based dumper under hierarchy of
>>      /sys/kernel/bpfdump/ which uses can
>>      "cat" to print out the output.
>>    - an anonymous dumper which user application
>>      can "read" the dumping output.
>>
>> For file based dumper, BPF_OBJ_PIN syscall interface
>> is used. For anonymous dumper, BPF_PROG_ATTACH
>> syscall interface is used.
>>
>> To facilitate target seq_ops->show() to get the
>> bpf program easily, dumper creation increased
>> the target-provided seq_file private data size
>> so bpf program pointer is also stored in seq_file
>> private data.
>>
>> Further, a seq_num which represents how many
>> bpf_dump_get_prog() has been called is also
>> available to the target seq_ops->show().
>> Such information can be used to e.g., print
>> banner before printing out actual data.
> 
> So I looked up seq_operations struct and did a very cursory read of
> fs/seq_file.c and seq_file documentation, so I might be completely off
> here.
> 
> start() is called before iteration begins, stop() is called after
> iteration ends. Would it be a bit better and user-friendly interface
> to have to extra calls to BPF program, say with NULL input element,
> but with extra enum/flag that specifies that this is a START or END of
> iteration, in addition to seq_num?

The current design always pass a valid object (task, file, netlink_sock,
fib6_info). That is, access to fields to those data structure won't 
cause runtime exceptions.

Therefore, with the existing seq_ops implementation for ipv6_route
and netlink, etc, we don't have END information. We can get START
information though.

> 
> Also, right now it's impossible to write stateful dumpers that do any
> kind of stats calculation, because it's impossible to determine when
> iteration restarted (it starts from the very beginning, not from the
> last element). It's impossible to just rememebr last processed
> seq_num, because BPF program might be called for a new "session" in
> parallel with the old one.

Theoretically, session end can be detected by checking the return
value of last bpf_seq_printf() or bpf_seq_write(). If it indicates
an overflow, that means session end.

Or bpfdump infrastructure can help do this work to provide
session id.

> 
> So it seems like few things would be useful:
> 
> 1. end flag for post-aggregation and/or footer printing (seq_num == 0
> is providing similar means for start flag).

the end flag is a problem. We could say hijack next or stop so we
can detect the end, but passing a NULL pointer as the object
to the bpf program may be problematic without verifier enforcement
as it may cause a lot of exceptions... Although all these exception
will be silenced by bpf infra, but still not sure whether this
is acceptable or not.

> 2. Some sort of "session id", so that bpfdumper can maintain
> per-session intermediate state. Plus with this it would be possible to
> detect restarts (if there is some state for the same session and
> seq_num == 0, this is restart).

I guess we can do this.

> 
> It seems like it might be a bit more flexible to, instead of providing
> seq_file * pointer directly, actually provide a bpfdumper_context
> struct, which would have seq_file * as one of fields, other being
> session_id and start/stop flags.

As you mentioned, if we have more fields related to seq_file passing
to bpf program, yes, grouping them into a structure makes sense.

> 
> A bit unstructured thoughts, but what do you think?
> 
>>
>> Note the seq_num does not represent the num
>> of unique kernel objects the bpf program has
>> seen. But it should be a good approximate.
>>
>> A target feature BPF_DUMP_SEQ_NET_PRIVATE
>> is implemented specifically useful for
>> net based dumpers. It sets net namespace
>> as the current process net namespace.
>> This avoids changing existing net seq_ops
>> in order to retrieve net namespace from
>> the seq_file pointer.
>>
>> For open dumper files, anonymous or not, the
>> fdinfo will show the target and prog_id associated
>> with that file descriptor. For dumper file itself,
>> a kernel interface will be provided to retrieve the
>> prog_id in one of the later patches.
>>
>> Signed-off-by: Yonghong Song <yhs@...com>
>> ---
>>   include/linux/bpf.h            |   5 +
>>   include/uapi/linux/bpf.h       |   6 +-
>>   kernel/bpf/dump.c              | 338 ++++++++++++++++++++++++++++++++-
>>   kernel/bpf/syscall.c           |  11 +-
>>   tools/include/uapi/linux/bpf.h |   6 +-
>>   5 files changed, 362 insertions(+), 4 deletions(-)
>>
> 
> [...]
>