lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bzbg2ROstG5+1XUoZre403n-B3CHuW9E0UECNY364giDcw@mail.gmail.com>
Date: Fri, 11 Jul 2025 10:17:50 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Jiri Olsa <jolsa@...nel.org>
Cc: Oleg Nesterov <oleg@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
	Andrii Nakryiko <andrii@...nel.org>, Alejandro Colomar <alx@...nel.org>, Eyal Birger <eyal.birger@...il.com>, 
	kees@...nel.org, bpf@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-trace-kernel@...r.kernel.org, x86@...nel.org, 
	Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>, 
	John Fastabend <john.fastabend@...il.com>, Hao Luo <haoluo@...gle.com>, 
	Steven Rostedt <rostedt@...dmis.org>, Masami Hiramatsu <mhiramat@...nel.org>, 
	Alan Maguire <alan.maguire@...cle.com>, David Laight <David.Laight@...lab.com>, 
	Thomas Weißschuh <thomas@...ch.de>, 
	Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCHv5 perf/core 00/22] uprobes: Add support to optimize usdt
 probes on x86_64

On Fri, Jul 11, 2025 at 1:29 AM Jiri Olsa <jolsa@...nel.org> wrote:
>
> hi,
> this patchset adds support to optimize usdt probes on top of 5-byte
> nop instruction.
>
> The generic approach (optimize all uprobes) is hard due to emulating
> possible multiple original instructions and its related issues. The
> usdt case, which stores 5-byte nop seems much easier, so starting
> with that.
>
> The basic idea is to replace breakpoint exception with syscall which
> is faster on x86_64. For more details please see changelog of patch 8.
>
> The run_bench_uprobes.sh benchmark triggers uprobe (on top of different
> original instructions) in a loop and counts how many of those happened
> per second (the unit below is million loops).
>
> There's big speed up if you consider current usdt implementation
> (uprobe-nop) compared to proposed usdt (uprobe-nop5):
>
> current:
>         usermode-count :  152.501 ± 0.012M/s
>         syscall-count  :   14.463 ± 0.062M/s
> -->     uprobe-nop     :    3.160 ± 0.005M/s
>         uprobe-push    :    3.003 ± 0.003M/s
>         uprobe-ret     :    1.100 ± 0.003M/s
>         uprobe-nop5    :    3.132 ± 0.012M/s
>         uretprobe-nop  :    2.103 ± 0.002M/s
>         uretprobe-push :    2.027 ± 0.004M/s
>         uretprobe-ret  :    0.914 ± 0.002M/s
>         uretprobe-nop5 :    2.115 ± 0.002M/s
>
> after the change:
>         usermode-count :  152.343 ± 0.400M/s
>         syscall-count  :   14.851 ± 0.033M/s
>         uprobe-nop     :    3.204 ± 0.005M/s
>         uprobe-push    :    3.040 ± 0.005M/s
>         uprobe-ret     :    1.098 ± 0.003M/s
> -->     uprobe-nop5    :    7.286 ± 0.017M/s
>         uretprobe-nop  :    2.144 ± 0.001M/s
>         uretprobe-push :    2.069 ± 0.002M/s
>         uretprobe-ret  :    0.922 ± 0.000M/s
>         uretprobe-nop5 :    3.487 ± 0.001M/s
>
> I see bit more speed up on Intel (above) compared to AMD. The big nop5
> speed up is partly due to emulating nop5 and partly due to optimization.
>
> The key speed up we do this for is the USDT switch from nop to nop5:
>         uprobe-nop     :    3.160 ± 0.005M/s
>         uprobe-nop5    :    7.286 ± 0.017M/s
>

We've been waiting for this to land for so long, I hope this gets
applied soon...

Once this lands, we can finally start implementing USDT support that
can take advantage of this transparently and with no performance
regression on old kernel.

For the series:

Acked-by: Andrii Nakryiko <andrii@...nel.org>

[...]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ