netdev - Re: [PATCH bpf-next 1/2] libbpf: add perf buffer reading API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzYe0pz3Qa_xA0Du-mxh=OPSdu3gChUekjaGJ9UgwjyFLA@mail.gmail.com>
Date:   Tue, 25 Jun 2019 21:44:52 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Song Liu <liu.song.a23@...il.com>
Cc:     Andrii Nakryiko <andriin@...com>, Alexei Starovoitov <ast@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        bpf <bpf@...r.kernel.org>, Networking <netdev@...r.kernel.org>,
        Kernel Team <kernel-team@...com>
Subject: Re: [PATCH bpf-next 1/2] libbpf: add perf buffer reading API

On Tue, Jun 25, 2019 at 7:19 PM Song Liu <liu.song.a23@...il.com> wrote:
>
> On Tue, Jun 25, 2019 at 4:28 PM Andrii Nakryiko <andriin@...com> wrote:
> >
> > BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program
> > to user space for additional processing. libbpf already has very low-level API
> > to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to
> > use and requires a lot of code to set everything up. This patch adds
> > perf_buffer abstraction on top of it, abstracting setting up and polling
> > per-CPU logic into simple and convenient API, similar to what BCC provides.
> >
> > perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF
> > map entries. It accepts two user-provided callbacks: one for handling raw
> > samples and one for get notifications of lost samples due to buffer overflow.
> >
> > perf_buffer__poll() is used to fetch ring buffer data across all CPUs,
> > utilizing epoll instance.
> >
> > perf_buffer__free() does corresponding clean up and unsets FDs from BPF map.
> >
> > All APIs are not thread-safe. User should ensure proper locking/coordination if
> > used in multi-threaded set up.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@...com>
>
> Overall looks good. Some nit below.

Thanks for review!

>
> > ---
> >  tools/lib/bpf/libbpf.c   | 282 +++++++++++++++++++++++++++++++++++++++
> >  tools/lib/bpf/libbpf.h   |  12 ++
> >  tools/lib/bpf/libbpf.map |   5 +-
> >  3 files changed, 298 insertions(+), 1 deletion(-)
>
> [...]
>
> > +struct perf_buffer *perf_buffer__new(struct bpf_map *map, size_t page_cnt,
> > +                                    perf_buffer_sample_fn sample_cb,
> > +                                    perf_buffer_lost_fn lost_cb, void *ctx)
> > +{
> > +       char msg[STRERR_BUFSIZE];
> > +       struct perf_buffer *pb;
> > +       int err, cpu;
> > +
> > +       if (bpf_map__def(map)->type != BPF_MAP_TYPE_PERF_EVENT_ARRAY) {
> > +               pr_warning("map '%s' should be BPF_MAP_TYPE_PERF_EVENT_ARRAY\n",
> > +                          bpf_map__name(map));
> > +               return ERR_PTR(-EINVAL);
> > +       }
> > +       if (bpf_map__fd(map) < 0) {
> > +               pr_warning("map '%s' doesn't have associated FD\n",
> > +                          bpf_map__name(map));
> > +               return ERR_PTR(-EINVAL);
> > +       }
> > +       if (page_cnt & (page_cnt - 1)) {
> > +               pr_warning("page count should be power of two, but is %zu\n",
> > +                          page_cnt);
> > +               return ERR_PTR(-EINVAL);
> > +       }
> > +
> > +       pb = calloc(1, sizeof(*pb));
> > +       if (!pb)
> > +               return ERR_PTR(-ENOMEM);
> > +
> > +       pb->sample_cb = sample_cb;
> > +       pb->lost_cb = lost_cb;
>
> I think we need to check sample_cb != NULL && lost_cb != NULL.

I was thinking about making them all either optional or required, but
eventually decided on making sample_cb required and lost_cb optional,
as in practice rarely sample_cb wouldn't be provided, while not every
application would care about handling lost samples (as there is little
you can do about that, except for bumping some counter).

As for checking for NULL. I feel like that's overkill. If someone
provided NULL for sample_cb, they will get SIGSEGV with stack trace
immediately showing that's it's sample_cb being NULL. Unlike Java, C
libraries tend not to double-check every pointer for NULL. Checking
for things like whether map has FD or is of correct type is valuable,
because if you don't check it early, then you'll just eventually get
-EINVAL from kernel and will start a guessing game of what's wrong.
Checking for callback to be non-null feels unnecessary, as it will be
immediately obvious (and it's quite unlikely this will happen in
practice).

>
> > +       pb->ctx = ctx;
> > +       pb->page_size = getpagesize();
> > +       pb->mmap_size = pb->page_size * page_cnt;
> > +       pb->mapfd = bpf_map__fd(map);
> > +
> > +       pb->epfd = epoll_create1(EPOLL_CLOEXEC);
> [...]
> > +perf_buffer__process_record(struct perf_event_header *e, void *ctx)
> > +{
> > +       struct perf_buffer *pb = ctx;
> > +       void *data = e;
> > +
> > +       switch (e->type) {
> > +       case PERF_RECORD_SAMPLE: {
> > +               struct perf_sample_raw *s = data;
> > +
> > +               pb->sample_cb(pb->ctx, s->data, s->size);
> > +               break;
> > +       }
> > +       case PERF_RECORD_LOST: {
> > +               struct perf_sample_lost *s = data;
> > +
> > +               if (pb->lost_cb)
> > +                       pb->lost_cb(pb->ctx, s->lost);
>
> OK, we test lost_cb here, so not necessary at init time.
>
> [...]
> >                 bpf_program__attach_perf_event;
> >                 bpf_program__attach_raw_tracepoint;
> >                 bpf_program__attach_tracepoint;
> >                 bpf_program__attach_uprobe;
> > +               btf__parse_elf;
>
> Why move btf__parse_elf ?

I realized that I haven't put it in correct alphabetical order,
decided to fix it here, as it's just a single line change.

>
> Thanks,
> Song