netdev - Re: [PATCH bpf-next] libbpf: add perf_buffer APIs for better integration with outside epoll loop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.21.2008211149530.9620@localhost>
Date:   Fri, 21 Aug 2020 13:51:31 +0100 (IST)
From:   Alan Maguire <alan.maguire@...cle.com>
To:     Andrii Nakryiko <andriin@...com>
cc:     bpf@...r.kernel.org, netdev@...r.kernel.org, ast@...com,
        daniel@...earbox.net, andrii.nakryiko@...il.com, kernel-team@...com
Subject: Re: [PATCH bpf-next] libbpf: add perf_buffer APIs for better
 integration with outside epoll loop

On Thu, 20 Aug 2020, Andrii Nakryiko wrote:

> Add a set of APIs to perf_buffer manage to allow applications to integrate
> perf buffer polling into existing epoll-based infrastructure. One example is
> applications using libevent already and wanting to plug perf_buffer polling,
> instead of relying on perf_buffer__poll() and waste an extra thread to do it.
> But perf_buffer is still extremely useful to set up and consume perf buffer
> rings even for such use cases.
> 
> So to accomodate such new use cases, add three new APIs:
>   - perf_buffer__buffer_cnt() returns number of per-CPU buffers maintained by
>     given instance of perf_buffer manager;
>   - perf_buffer__buffer_fd() returns FD of perf_event corresponding to
>     a specified per-CPU buffer; this FD is then polled independently;
>   - perf_buffer__consume_buffer() consumes data from single per-CPU buffer,
>     identified by its slot index.
> 
> These APIs allow for great flexiblity, but do not sacrifice general usability
> of perf_buffer.
>

This is great! If I understand correctly, you're supporting the
retrieval and ultimately insertion of the individual per-cpu buffer fds 
into another epoll()ed fd.  I've been exploring another possibility - 
hierarchical epoll, where the top-level perf_buffer epoll_fd field is used 
rather than the individual per-cpu buffers.  In that context, would an 
interface to return the perf_buffer epoll_fd make sense too? i.e.

int perf_buffer__fd(const struct perf_buffer *pb);

?

When events occur for the perf_buffer__fd, we can simply call
perf_buffer__poll(perf_buffer__fd(pb), ...) to handle them it seems.  
That approach _appears_ to work, though I do see occasional event loss.
Is that method legit too or am I missing something?
 
> Also exercise and check new APIs in perf_buffer selftest.
> 
> Signed-off-by: Andrii Nakryiko <andriin@...com>

A few question around the test below, but

Reviewed-by: Alan Maguire <alan.maguire@...cle.com>


> ---
>  tools/lib/bpf/libbpf.c                        | 51 ++++++++++++++-
>  tools/lib/bpf/libbpf.h                        |  3 +
>  tools/lib/bpf/libbpf.map                      |  7 +++
>  .../selftests/bpf/prog_tests/perf_buffer.c    | 62 +++++++++++++++----
>  4 files changed, 111 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 0bc1fd813408..a6359d49aa9d 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -9390,6 +9390,55 @@ int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms)
>  	return cnt < 0 ? -errno : cnt;
>  }
>  
> +/* Return number of PERF_EVENT_ARRAY map slots set up by this perf_buffer
> + * manager.
> + */
> +size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb)
> +{
> +	return pb->cpu_cnt;
> +}
> +
> +/*
> + * Return perf_event FD of a ring buffer in *buf_idx* slot of
> + * PERF_EVENT_ARRAY BPF map. This FD can be polled for new data using
> + * select()/poll()/epoll() Linux syscalls.
> + */
> +int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx)
> +{
> +	struct perf_cpu_buf *cpu_buf;
> +
> +	if (buf_idx >= pb->cpu_cnt)
> +		return -EINVAL;
> +
> +	cpu_buf = pb->cpu_bufs[buf_idx];
> +	if (!cpu_buf)
> +		return -ENOENT;
> +
> +	return cpu_buf->fd;
> +}
> +
> +/*
> + * Consume data from perf ring buffer corresponding to slot *buf_idx* in
> + * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to
> + * consume, do nothing and return success.
> + * Returns:
> + *   - 0 on success;
> + *   - <0 on failure.
> + */
> +int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx)
> +{
> +	struct perf_cpu_buf *cpu_buf;
> +
> +	if (buf_idx >= pb->cpu_cnt)
> +		return -EINVAL;
> +
> +	cpu_buf = pb->cpu_bufs[buf_idx];
> +	if (!cpu_buf)
> +		return -ENOENT;
> +
> +	return perf_buffer__process_records(pb, cpu_buf);
> +}
> +
>  int perf_buffer__consume(struct perf_buffer *pb)
>  {
>  	int i, err;
> @@ -9402,7 +9451,7 @@ int perf_buffer__consume(struct perf_buffer *pb)
>  
>  		err = perf_buffer__process_records(pb, cpu_buf);
>  		if (err) {
> -			pr_warn("error while processing records: %d\n", err);
> +			pr_warn("perf_buffer: failed to process records in buffer #%d: %d\n", i, err);
>  			return err;
>  		}
>  	}
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 5ecb4069a9f0..15e02dcda2c7 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -590,6 +590,9 @@ perf_buffer__new_raw(int map_fd, size_t page_cnt,
>  LIBBPF_API void perf_buffer__free(struct perf_buffer *pb);
>  LIBBPF_API int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms);
>  LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
> +LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx);
> +LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
> +LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx);
>  
>  typedef enum bpf_perf_event_ret
>  	(*bpf_perf_event_print_t)(struct perf_event_header *hdr,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index e35bd6cdbdbf..77466958310a 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -299,3 +299,10 @@ LIBBPF_0.1.0 {
>  		btf__set_fd;
>  		btf__set_pointer_size;
>  } LIBBPF_0.0.9;
> +
> +LIBBPF_0.2.0 {
> +	global:
> +		perf_buffer__buffer_cnt;
> +		perf_buffer__buffer_fd;
> +		perf_buffer__consume_buffer;
> +} LIBBPF_0.1.0;
> diff --git a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
> index c33ec180b3f2..add224ce17af 100644
> --- a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
> +++ b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
> @@ -7,6 +7,8 @@
>  #include "test_perf_buffer.skel.h"
>  #include "bpf/libbpf_internal.h"
>  
> +static int duration;
> +
>  /* AddressSanitizer sometimes crashes due to data dereference below, due to
>   * this being mmap()'ed memory. Disable instrumentation with
>   * no_sanitize_address attribute
> @@ -24,13 +26,31 @@ static void on_sample(void *ctx, int cpu, void *data, __u32 size)
>  	CPU_SET(cpu, cpu_seen);
>  }
>  
> +int trigger_on_cpu(int cpu)
> +{
> +	cpu_set_t cpu_set;
> +	int err;
> +
> +	CPU_ZERO(&cpu_set);
> +	CPU_SET(cpu, &cpu_set);
> +
> +	err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
> +	if (err && CHECK(err, "set_affinity", "cpu #%d, err %d\n", cpu, err))
> +		return err;
> +
> +	usleep(1);
> +
> +	return 0;
> +}
> +
>  void test_perf_buffer(void)
>  {
> -	int err, on_len, nr_on_cpus = 0,  nr_cpus, i, duration = 0;
> +	int err, on_len, nr_on_cpus = 0, nr_cpus, i;
>  	struct perf_buffer_opts pb_opts = {};
>  	struct test_perf_buffer *skel;
> -	cpu_set_t cpu_set, cpu_seen;
> +	cpu_set_t cpu_seen;
>  	struct perf_buffer *pb;
> +	int last_fd = -1, fd;
>  	bool *online;
>  
>  	nr_cpus = libbpf_num_possible_cpus();
> @@ -71,16 +91,8 @@ void test_perf_buffer(void)
>  			continue;
>  		}
>  
> -		CPU_ZERO(&cpu_set);
> -		CPU_SET(i, &cpu_set);
> -
> -		err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set),
> -					     &cpu_set);
> -		if (err && CHECK(err, "set_affinity", "cpu #%d, err %d\n",
> -				 i, err))
> +		if (trigger_on_cpu(i))
>  			goto out_close;
> -
> -		usleep(1);
>  	}
>  
>  	/* read perf buffer */
> @@ -92,6 +104,34 @@ void test_perf_buffer(void)
>  		  "expect %d, seen %d\n", nr_on_cpus, CPU_COUNT(&cpu_seen)))
>  		goto out_free_pb;
>  
> +	if (CHECK(perf_buffer__buffer_cnt(pb) != nr_cpus, "buf_cnt",
> +		  "got %zu, expected %d\n", perf_buffer__buffer_cnt(pb), nr_cpus))
> +		goto out_close;
> +
> +	for (i = 0; i < nr_cpus; i++) {
> +		if (i >= on_len || !online[i])
> +			continue;
> +
> +		fd = perf_buffer__buffer_fd(pb, i);
> +		CHECK(last_fd == fd, "fd_check", "last fd %d == fd %d\n", last_fd, fd);
> +		last_fd = fd;
> +

I'm not sure why you're testing this way - shouldn't it just be a
verification of whether we get an unexpected error code rather
than a valid fd?

> +		err = perf_buffer__consume_buffer(pb, i);
> +		if (CHECK(err, "drain_buf", "cpu %d, err %d\n", i, err))
> +			goto out_close;
> +

I think I'm a bit lost in what processes what here. The first
perf_buffer__poll() should handle the processing of the records
associated with the first set of per-cpu triggering I think.
Is the above perf_buffer__consume_buffer() checking the
"no data, return success" case? If that's right should we do 
something to explicitly check it indeed was a no-op, like CHECK()ing 
CPU_ISSET(i, &cpu_seen) to ensure the on_sample() handler wasn't
called? The  "drain_buf" description makes me think I'm misreading
this and we're draining additional events, so I wanted to check
what's going on here to make sure.

Thanks!

Alan