[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.21.2008211149530.9620@localhost>
Date: Fri, 21 Aug 2020 13:51:31 +0100 (IST)
From: Alan Maguire <alan.maguire@...cle.com>
To: Andrii Nakryiko <andriin@...com>
cc: bpf@...r.kernel.org, netdev@...r.kernel.org, ast@...com,
daniel@...earbox.net, andrii.nakryiko@...il.com, kernel-team@...com
Subject: Re: [PATCH bpf-next] libbpf: add perf_buffer APIs for better
integration with outside epoll loop
On Thu, 20 Aug 2020, Andrii Nakryiko wrote:
> Add a set of APIs to perf_buffer manage to allow applications to integrate
> perf buffer polling into existing epoll-based infrastructure. One example is
> applications using libevent already and wanting to plug perf_buffer polling,
> instead of relying on perf_buffer__poll() and waste an extra thread to do it.
> But perf_buffer is still extremely useful to set up and consume perf buffer
> rings even for such use cases.
>
> So to accomodate such new use cases, add three new APIs:
> - perf_buffer__buffer_cnt() returns number of per-CPU buffers maintained by
> given instance of perf_buffer manager;
> - perf_buffer__buffer_fd() returns FD of perf_event corresponding to
> a specified per-CPU buffer; this FD is then polled independently;
> - perf_buffer__consume_buffer() consumes data from single per-CPU buffer,
> identified by its slot index.
>
> These APIs allow for great flexiblity, but do not sacrifice general usability
> of perf_buffer.
>
This is great! If I understand correctly, you're supporting the
retrieval and ultimately insertion of the individual per-cpu buffer fds
into another epoll()ed fd. I've been exploring another possibility -
hierarchical epoll, where the top-level perf_buffer epoll_fd field is used
rather than the individual per-cpu buffers. In that context, would an
interface to return the perf_buffer epoll_fd make sense too? i.e.
int perf_buffer__fd(const struct perf_buffer *pb);
?
When events occur for the perf_buffer__fd, we can simply call
perf_buffer__poll(perf_buffer__fd(pb), ...) to handle them it seems.
That approach _appears_ to work, though I do see occasional event loss.
Is that method legit too or am I missing something?
> Also exercise and check new APIs in perf_buffer selftest.
>
> Signed-off-by: Andrii Nakryiko <andriin@...com>
A few question around the test below, but
Reviewed-by: Alan Maguire <alan.maguire@...cle.com>
> ---
> tools/lib/bpf/libbpf.c | 51 ++++++++++++++-
> tools/lib/bpf/libbpf.h | 3 +
> tools/lib/bpf/libbpf.map | 7 +++
> .../selftests/bpf/prog_tests/perf_buffer.c | 62 +++++++++++++++----
> 4 files changed, 111 insertions(+), 12 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 0bc1fd813408..a6359d49aa9d 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -9390,6 +9390,55 @@ int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms)
> return cnt < 0 ? -errno : cnt;
> }
>
> +/* Return number of PERF_EVENT_ARRAY map slots set up by this perf_buffer
> + * manager.
> + */
> +size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb)
> +{
> + return pb->cpu_cnt;
> +}
> +
> +/*
> + * Return perf_event FD of a ring buffer in *buf_idx* slot of
> + * PERF_EVENT_ARRAY BPF map. This FD can be polled for new data using
> + * select()/poll()/epoll() Linux syscalls.
> + */
> +int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx)
> +{
> + struct perf_cpu_buf *cpu_buf;
> +
> + if (buf_idx >= pb->cpu_cnt)
> + return -EINVAL;
> +
> + cpu_buf = pb->cpu_bufs[buf_idx];
> + if (!cpu_buf)
> + return -ENOENT;
> +
> + return cpu_buf->fd;
> +}
> +
> +/*
> + * Consume data from perf ring buffer corresponding to slot *buf_idx* in
> + * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to
> + * consume, do nothing and return success.
> + * Returns:
> + * - 0 on success;
> + * - <0 on failure.
> + */
> +int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx)
> +{
> + struct perf_cpu_buf *cpu_buf;
> +
> + if (buf_idx >= pb->cpu_cnt)
> + return -EINVAL;
> +
> + cpu_buf = pb->cpu_bufs[buf_idx];
> + if (!cpu_buf)
> + return -ENOENT;
> +
> + return perf_buffer__process_records(pb, cpu_buf);
> +}
> +
> int perf_buffer__consume(struct perf_buffer *pb)
> {
> int i, err;
> @@ -9402,7 +9451,7 @@ int perf_buffer__consume(struct perf_buffer *pb)
>
> err = perf_buffer__process_records(pb, cpu_buf);
> if (err) {
> - pr_warn("error while processing records: %d\n", err);
> + pr_warn("perf_buffer: failed to process records in buffer #%d: %d\n", i, err);
> return err;
> }
> }
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 5ecb4069a9f0..15e02dcda2c7 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -590,6 +590,9 @@ perf_buffer__new_raw(int map_fd, size_t page_cnt,
> LIBBPF_API void perf_buffer__free(struct perf_buffer *pb);
> LIBBPF_API int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms);
> LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
> +LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx);
> +LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
> +LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx);
>
> typedef enum bpf_perf_event_ret
> (*bpf_perf_event_print_t)(struct perf_event_header *hdr,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index e35bd6cdbdbf..77466958310a 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -299,3 +299,10 @@ LIBBPF_0.1.0 {
> btf__set_fd;
> btf__set_pointer_size;
> } LIBBPF_0.0.9;
> +
> +LIBBPF_0.2.0 {
> + global:
> + perf_buffer__buffer_cnt;
> + perf_buffer__buffer_fd;
> + perf_buffer__consume_buffer;
> +} LIBBPF_0.1.0;
> diff --git a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
> index c33ec180b3f2..add224ce17af 100644
> --- a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
> +++ b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
> @@ -7,6 +7,8 @@
> #include "test_perf_buffer.skel.h"
> #include "bpf/libbpf_internal.h"
>
> +static int duration;
> +
> /* AddressSanitizer sometimes crashes due to data dereference below, due to
> * this being mmap()'ed memory. Disable instrumentation with
> * no_sanitize_address attribute
> @@ -24,13 +26,31 @@ static void on_sample(void *ctx, int cpu, void *data, __u32 size)
> CPU_SET(cpu, cpu_seen);
> }
>
> +int trigger_on_cpu(int cpu)
> +{
> + cpu_set_t cpu_set;
> + int err;
> +
> + CPU_ZERO(&cpu_set);
> + CPU_SET(cpu, &cpu_set);
> +
> + err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
> + if (err && CHECK(err, "set_affinity", "cpu #%d, err %d\n", cpu, err))
> + return err;
> +
> + usleep(1);
> +
> + return 0;
> +}
> +
> void test_perf_buffer(void)
> {
> - int err, on_len, nr_on_cpus = 0, nr_cpus, i, duration = 0;
> + int err, on_len, nr_on_cpus = 0, nr_cpus, i;
> struct perf_buffer_opts pb_opts = {};
> struct test_perf_buffer *skel;
> - cpu_set_t cpu_set, cpu_seen;
> + cpu_set_t cpu_seen;
> struct perf_buffer *pb;
> + int last_fd = -1, fd;
> bool *online;
>
> nr_cpus = libbpf_num_possible_cpus();
> @@ -71,16 +91,8 @@ void test_perf_buffer(void)
> continue;
> }
>
> - CPU_ZERO(&cpu_set);
> - CPU_SET(i, &cpu_set);
> -
> - err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set),
> - &cpu_set);
> - if (err && CHECK(err, "set_affinity", "cpu #%d, err %d\n",
> - i, err))
> + if (trigger_on_cpu(i))
> goto out_close;
> -
> - usleep(1);
> }
>
> /* read perf buffer */
> @@ -92,6 +104,34 @@ void test_perf_buffer(void)
> "expect %d, seen %d\n", nr_on_cpus, CPU_COUNT(&cpu_seen)))
> goto out_free_pb;
>
> + if (CHECK(perf_buffer__buffer_cnt(pb) != nr_cpus, "buf_cnt",
> + "got %zu, expected %d\n", perf_buffer__buffer_cnt(pb), nr_cpus))
> + goto out_close;
> +
> + for (i = 0; i < nr_cpus; i++) {
> + if (i >= on_len || !online[i])
> + continue;
> +
> + fd = perf_buffer__buffer_fd(pb, i);
> + CHECK(last_fd == fd, "fd_check", "last fd %d == fd %d\n", last_fd, fd);
> + last_fd = fd;
> +
I'm not sure why you're testing this way - shouldn't it just be a
verification of whether we get an unexpected error code rather
than a valid fd?
> + err = perf_buffer__consume_buffer(pb, i);
> + if (CHECK(err, "drain_buf", "cpu %d, err %d\n", i, err))
> + goto out_close;
> +
I think I'm a bit lost in what processes what here. The first
perf_buffer__poll() should handle the processing of the records
associated with the first set of per-cpu triggering I think.
Is the above perf_buffer__consume_buffer() checking the
"no data, return success" case? If that's right should we do
something to explicitly check it indeed was a no-op, like CHECK()ing
CPU_ISSET(i, &cpu_seen) to ensure the on_sample() handler wasn't
called? The "drain_buf" description makes me think I'm misreading
this and we're draining additional events, so I wanted to check
what's going on here to make sure.
Thanks!
Alan
Powered by blists - more mailing lists