netdev - BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEf4BzZ_-vXP_3hSEjuceW10VX_H+EeuXMiV=_meBPZn7izK8A@mail.gmail.com>
Date:   Mon, 13 Jul 2020 22:21:46 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     bpf <bpf@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>
Cc:     Toke Høiland-Jørgensen <toke@...hat.com>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        Andrii Nakryiko <andriin@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...omium.org>,
        Networking <netdev@...r.kernel.org>,
        Kernel Team <kernel-team@...com>
Subject: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new
 members to bpf_attr.raw_tracepoint in bpf.h

On Mon, Jul 13, 2020 at 1:13 PM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>
> From: Toke Høiland-Jørgensen <toke@...hat.com>
>
> Sync addition of new members from main kernel tree.
>
> Signed-off-by: Toke Høiland-Jørgensen <toke@...hat.com>
> ---
>  tools/include/uapi/linux/bpf.h |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index da9bf35a26f8..662a15e4a1a1 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -573,8 +573,13 @@ union bpf_attr {
>         } query;
>
>         struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
> -               __u64 name;
> -               __u32 prog_fd;
> +               __u64           name;
> +               __u32           prog_fd;
> +               __u32           log_level;      /* verbosity level of log */
> +               __u32           log_size;       /* size of user buffer */
> +               __aligned_u64   log_buf;        /* user supplied buffer */
> +               __u32           tgt_prog_fd;
> +               __u32           tgt_btf_id;
>         } raw_tracepoint;
>
>         struct { /* anonymous struct for BPF_BTF_LOAD */
>

I think BPF syscall would benefit from common/generalized log support
across all commands, given how powerful/complex it already is.
Sometimes it's literally impossible to understand why one gets -EINVAL
without adding printk()s in the kernel.

But it feels wrong to add log_level/log_size/log_buf fields to every
section of bpf_attr and require the user to specify it differently for
each command. So before we go and start adding per-command fields,
let's discuss how we can do this more generically. I wonder if we can
come up with a good way to do it in one common way and then gradually
support that way throughout all BPF commands.

Unfortunately it's too late to just add a few common fields to
bpf_attr in front of all other per-command fields, but here's two more
ideas just to start discussion. I hope someone can come up with
something nicer.

1. Currently bpf syscall accepts 3 input arguments (cmd, uattr, size).
We can extend it with two more optional arguments: one for pointer to
log-defining attr (for log_buf pointer, log_size, log_level, maybe
something more later) and another for the size of that log attr.
Beyond that we'd need some way to specify that the user actually meant
to provide those 2 optional args. The most straightforward way would
be to use the highest bit of cmd argument. So instead of calling bpf()
with BPF_MAP_CREATE command, you'd call it with BPF_MAP_CREATE |
BPF_LOGGED, where BPF_LOGGED = 1<<31.

2. A more "stateful" approach, would be to have an extra BPF command
to set log buffer (and level) per thread. And if such a log is set, it
would be overwritten with content on each bpf() syscall invocation
(i.e., log position would be reset to 0 on each BPF syscall).

Of course, the existing BPF_LOAD_PROG command would keep working with
existing log, but could fall back to the "common one", if
BPF_LOAD_PROG-specific one is not set.

It also doesn't seem to be all that critical to signal when the log
buffer is overflown. It's pretty easy to detect from user-space:
- either last byte would be non-zero, if we don't care about
guaranteeing zero-termination for truncated log;
- of second-to-last byte would be non-zero, if BPF syscall will always
zero-terminate the log.

Of course, if user code cares about such detection of log truncation,
it will need to set last/second-to-last bytes to zero before each
syscall.

Both approaches have their pros/cons, we can dig into those later, but
I'd like to start this discussion and see what other people think. I
also wonder if there are other syscalls with similarly advanced input
arguments (like bpf_attr) and common logging, we could learn from
those.

Thoughts?