[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20201022082138.2322434-13-jolsa@kernel.org>
Date: Thu, 22 Oct 2020 10:21:34 +0200
From: Jiri Olsa <jolsa@...nel.org>
To: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andriin@...com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...omium.org>, Daniel Xu <dxu@...uu.xyz>,
Steven Rostedt <rostedt@...dmis.org>,
Jesper Brouer <jbrouer@...hat.com>,
Toke Høiland-Jørgensen <toke@...hat.com>,
Viktor Malik <vmalik@...hat.com>
Subject: [RFC bpf-next 12/16] bpf: Move synchronize_rcu_mult for batch processing (NOT TO BE MERGED)
I noticed some of the profiled workloads did not spend more cycles,
but took more time to finish than current code. I tracked it to rcu
synchronize_rcu_mult call in bpf_trampoline_update and when I called
it just once for batch mode it got faster.
The current processing when attaching the program is:
for each program:
bpf(BPF_RAW_TRACEPOINT_OPEN
bpf_tracing_prog_attach
bpf_trampoline_link_prog
bpf_trampoline_update
synchronize_rcu_mult
register_ftrace_direct
With the change the synchronize_rcu_mult is called just once:
bpf(BPF_TRAMPOLINE_BATCH_ATTACH
for each program:
bpf_tracing_prog_attach
bpf_trampoline_link_prog
bpf_trampoline_update
synchronize_rcu_mult
register_ftrace_direct_ips
I'm not sure this does not break stuff, because I don't follow rcu
code that much ;-) However stats are nicer now:
Before:
Performance counter stats for './test_progs -t attach_test' (5 runs):
37,410,887 cycles:k ( +- 0.98% )
70,062,158 cycles:u ( +- 0.39% )
26.80 +- 4.10 seconds time elapsed ( +- 15.31% )
After:
Performance counter stats for './test_progs -t attach_test' (5 runs):
36,812,432 cycles:k ( +- 2.52% )
69,907,191 cycles:u ( +- 0.38% )
15.04 +- 2.94 seconds time elapsed ( +- 19.54% )
Signed-off-by: Jiri Olsa <jolsa@...nel.org>
---
kernel/bpf/syscall.c | 3 +++
kernel/bpf/trampoline.c | 3 ++-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 19fb608546c0..b315803c34d3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -31,6 +31,7 @@
#include <linux/poll.h>
#include <linux/bpf-netns.h>
#include <linux/rcupdate_trace.h>
+#include <linux/rcupdate_wait.h>
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@@ -2920,6 +2921,8 @@ static int bpf_trampoline_batch(const union bpf_attr *attr, int cmd)
if (!batch)
goto out_clean;
+ synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);
+
for (i = 0; i < count; i++) {
if (cmd == BPF_TRAMPOLINE_BATCH_ATTACH) {
prog = bpf_prog_get(in[i]);
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index cdad87461e5d..0d5e4c5860a9 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -271,7 +271,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr,
* programs finish executing.
* Wait for these two grace periods together.
*/
- synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);
+ if (!batch)
+ synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);
err = arch_prepare_bpf_trampoline(new_image, new_image + PAGE_SIZE / 2,
&tr->func.model, flags, tprogs,
--
2.26.2
Powered by blists - more mailing lists