lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzaVW_HXTCJDx=iHs9AJOSaUQq3Bwg+hFc3FCdqxb5Ah6Q@mail.gmail.com>
Date: Fri, 13 Sep 2024 14:36:52 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Andrii Nakryiko <andrii@...nel.org>, peterz@...radead.org, mingo@...nel.org
Cc: linux-trace-kernel@...r.kernel.org, oleg@...hat.com, rostedt@...dmis.org, 
	mhiramat@...nel.org, bpf@...r.kernel.org, linux-kernel@...r.kernel.org, 
	jolsa@...nel.org, paulmck@...nel.org
Subject: Re: [PATCH] uprobes: switch to RCU Tasks Trace flavor for better performance

On Tue, Sep 10, 2024 at 10:43 AM Andrii Nakryiko <andrii@...nel.org> wrote:
>
> This patch switches uprobes SRCU usage to RCU Tasks Trace flavor, which
> is optimized for more lightweight and quick readers (at the expense of
> slower writers, which for uprobes is a fine tradeof) and has better
> performance and scalability with number of CPUs.
>
> Similarly to baseline vs SRCU, we've benchmarked SRCU-based
> implementation vs RCU Tasks Trace implementation.
>
> SRCU
> ====
> uprobe-nop      ( 1 cpus):    3.276 ± 0.005M/s  (  3.276M/s/cpu)
> uprobe-nop      ( 2 cpus):    4.125 ± 0.002M/s  (  2.063M/s/cpu)
> uprobe-nop      ( 4 cpus):    7.713 ± 0.002M/s  (  1.928M/s/cpu)
> uprobe-nop      ( 8 cpus):    8.097 ± 0.006M/s  (  1.012M/s/cpu)
> uprobe-nop      (16 cpus):    6.501 ± 0.056M/s  (  0.406M/s/cpu)
> uprobe-nop      (32 cpus):    4.398 ± 0.084M/s  (  0.137M/s/cpu)
> uprobe-nop      (64 cpus):    6.452 ± 0.000M/s  (  0.101M/s/cpu)
>
> uretprobe-nop   ( 1 cpus):    2.055 ± 0.001M/s  (  2.055M/s/cpu)
> uretprobe-nop   ( 2 cpus):    2.677 ± 0.000M/s  (  1.339M/s/cpu)
> uretprobe-nop   ( 4 cpus):    4.561 ± 0.003M/s  (  1.140M/s/cpu)
> uretprobe-nop   ( 8 cpus):    5.291 ± 0.002M/s  (  0.661M/s/cpu)
> uretprobe-nop   (16 cpus):    5.065 ± 0.019M/s  (  0.317M/s/cpu)
> uretprobe-nop   (32 cpus):    3.622 ± 0.003M/s  (  0.113M/s/cpu)
> uretprobe-nop   (64 cpus):    3.723 ± 0.002M/s  (  0.058M/s/cpu)
>
> RCU Tasks Trace
> ===============
> uprobe-nop      ( 1 cpus):    3.396 ± 0.002M/s  (  3.396M/s/cpu)
> uprobe-nop      ( 2 cpus):    4.271 ± 0.006M/s  (  2.135M/s/cpu)
> uprobe-nop      ( 4 cpus):    8.499 ± 0.015M/s  (  2.125M/s/cpu)
> uprobe-nop      ( 8 cpus):   10.355 ± 0.028M/s  (  1.294M/s/cpu)
> uprobe-nop      (16 cpus):    7.615 ± 0.099M/s  (  0.476M/s/cpu)
> uprobe-nop      (32 cpus):    4.430 ± 0.007M/s  (  0.138M/s/cpu)
> uprobe-nop      (64 cpus):    6.887 ± 0.020M/s  (  0.108M/s/cpu)
>
> uretprobe-nop   ( 1 cpus):    2.174 ± 0.001M/s  (  2.174M/s/cpu)
> uretprobe-nop   ( 2 cpus):    2.853 ± 0.001M/s  (  1.426M/s/cpu)
> uretprobe-nop   ( 4 cpus):    4.913 ± 0.002M/s  (  1.228M/s/cpu)
> uretprobe-nop   ( 8 cpus):    5.883 ± 0.002M/s  (  0.735M/s/cpu)
> uretprobe-nop   (16 cpus):    5.147 ± 0.001M/s  (  0.322M/s/cpu)
> uretprobe-nop   (32 cpus):    3.738 ± 0.008M/s  (  0.117M/s/cpu)
> uretprobe-nop   (64 cpus):    4.397 ± 0.002M/s  (  0.069M/s/cpu)
>
> Peak throughput for uprobes increases from 8 mln/s to 10.3 mln/s
> (+28%!), and for uretprobes from 5.3 mln/s to 5.8 mln/s (+11%), as we
> have more work to do on uretprobes side.
>
> Even single-thread (no contention) performance is slightly better: 3.276
> mln/s to 3.396 mln/s (+3.5%) for uprobes, and 2.055 mln/s to 2.174 mln/s
> (+5.8%) for uretprobes.
>
> We also select TASKS_TRACE_RCU for UPROBES in Kconfig due to the new
> dependency.
>
> Reviewed-by: Oleg Nesterov <oleg@...hat.com>
> Signed-off-by: Andrii Nakryiko <andrii@...nel.org>
> ---
>  arch/Kconfig            |  1 +
>  kernel/events/uprobes.c | 38 ++++++++++++++++----------------------
>  2 files changed, 17 insertions(+), 22 deletions(-)
>

Just in case this slipped through the cracks (and is not just waiting
its turn to be applied), ping. It would be nice to have this patch
with the rest of uprobe patches from the original patch set to go in
together. Thanks!

> diff --git a/arch/Kconfig b/arch/Kconfig
> index 975dd22a2dbd..a0df3f3dc484 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -126,6 +126,7 @@ config KPROBES_ON_FTRACE
>  config UPROBES
>         def_bool n
>         depends on ARCH_SUPPORTS_UPROBES
> +       select TASKS_TRACE_RCU
>         help
>           Uprobes is the user-space counterpart to kprobes: they
>           enable instrumentation applications (such as 'perf probe')

[...]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ