lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 22 Oct 2020 10:21:34 +0200
From:   Jiri Olsa <jolsa@...nel.org>
To:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andriin@...com>
Cc:     netdev@...r.kernel.org, bpf@...r.kernel.org,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...omium.org>, Daniel Xu <dxu@...uu.xyz>,
        Steven Rostedt <rostedt@...dmis.org>,
        Jesper Brouer <jbrouer@...hat.com>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        Viktor Malik <vmalik@...hat.com>
Subject: [RFC bpf-next 12/16] bpf: Move synchronize_rcu_mult for batch processing (NOT TO BE MERGED)

I noticed some of the profiled workloads did not spend more cycles,
but took more time to finish than current code. I tracked it to rcu
synchronize_rcu_mult call in bpf_trampoline_update and when I called
it just once for batch mode it got faster.

The current processing when attaching the program is:

  for each program:
    bpf(BPF_RAW_TRACEPOINT_OPEN
      bpf_tracing_prog_attach
        bpf_trampoline_link_prog
          bpf_trampoline_update
            synchronize_rcu_mult
            register_ftrace_direct

With the change the synchronize_rcu_mult is called just once:

  bpf(BPF_TRAMPOLINE_BATCH_ATTACH
    for each program:
      bpf_tracing_prog_attach
        bpf_trampoline_link_prog
          bpf_trampoline_update

    synchronize_rcu_mult
    register_ftrace_direct_ips

I'm not sure this does not break stuff, because I don't follow rcu
code that much ;-) However stats are nicer now:

Before:

 Performance counter stats for './test_progs -t attach_test' (5 runs):

        37,410,887      cycles:k             ( +-  0.98% )
        70,062,158      cycles:u             ( +-  0.39% )

             26.80 +- 4.10 seconds time elapsed  ( +- 15.31% )

After:

 Performance counter stats for './test_progs -t attach_test' (5 runs):

        36,812,432      cycles:k             ( +-  2.52% )
        69,907,191      cycles:u             ( +-  0.38% )

             15.04 +- 2.94 seconds time elapsed  ( +- 19.54% )

Signed-off-by: Jiri Olsa <jolsa@...nel.org>
---
 kernel/bpf/syscall.c    | 3 +++
 kernel/bpf/trampoline.c | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 19fb608546c0..b315803c34d3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -31,6 +31,7 @@
 #include <linux/poll.h>
 #include <linux/bpf-netns.h>
 #include <linux/rcupdate_trace.h>
+#include <linux/rcupdate_wait.h>
 
 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
 			  (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@@ -2920,6 +2921,8 @@ static int bpf_trampoline_batch(const union bpf_attr *attr, int cmd)
 	if (!batch)
 		goto out_clean;
 
+	synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);
+
 	for (i = 0; i < count; i++) {
 		if (cmd == BPF_TRAMPOLINE_BATCH_ATTACH) {
 			prog = bpf_prog_get(in[i]);
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index cdad87461e5d..0d5e4c5860a9 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -271,7 +271,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr,
 	 * programs finish executing.
 	 * Wait for these two grace periods together.
 	 */
-	synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);
+	if (!batch)
+		synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);
 
 	err = arch_prepare_bpf_trampoline(new_image, new_image + PAGE_SIZE / 2,
 					  &tr->func.model, flags, tprogs,
-- 
2.26.2

Powered by blists - more mailing lists