lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241105133405.2703607-1-jolsa@kernel.org>
Date: Tue,  5 Nov 2024 14:33:54 +0100
From: Jiri Olsa <jolsa@...nel.org>
To: Oleg Nesterov <oleg@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrii Nakryiko <andrii@...nel.org>
Cc: bpf@...r.kernel.org,
	Song Liu <songliubraving@...com>,
	Yonghong Song <yhs@...com>,
	John Fastabend <john.fastabend@...il.com>,
	Hao Luo <haoluo@...gle.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Alan Maguire <alan.maguire@...cle.com>,
	linux-kernel@...r.kernel.org,
	linux-trace-kernel@...r.kernel.org
Subject: [RFC 00/11] uprobes: Add support to optimize usdt probes on x86_64

hi,
this patchset adds support to optimize usdt probes on top of 5-byte
nop instruction.

The generic approach (optimize all uprobes) is hard due to emulating
possible multiple original instructions and its related issues. The
usdt case, which stores 5-byte nop seems much easier, so starting
with that.

The basic idea is to replace breakpoint exception with syscall which
is faster on x86_64. For more details please see changelog of patch 7.

The first benchmark shows about 68% speed up (see below). The benchmark
triggers usdt probe in a loop and counts how many of those happened
per second.

It's still rfc state with some loose ends, but I'd be interested in
any feedback about the direction of this.

It's based on tip/perf/core with bpf-next/master merged on top of
that together with uprobe session patchset.

thanks,
jirka


current:
        # ./bench -w2 -d5 -a  trig-usdt
        Setting up benchmark 'trig-usdt'...
        Benchmark 'trig-usdt' started.
        Iter   0 ( 46.982us): hits    4.893M/s (  4.893M/prod), drops    0.000M/s, total operations    4.893M/s
        Iter   1 ( -5.967us): hits    4.892M/s (  4.892M/prod), drops    0.000M/s, total operations    4.892M/s
        Iter   2 ( -2.771us): hits    4.899M/s (  4.899M/prod), drops    0.000M/s, total operations    4.899M/s
        Iter   3 (  1.286us): hits    4.889M/s (  4.889M/prod), drops    0.000M/s, total operations    4.889M/s
        Iter   4 ( -2.871us): hits    4.881M/s (  4.881M/prod), drops    0.000M/s, total operations    4.881M/s
        Iter   5 (  1.005us): hits    4.886M/s (  4.886M/prod), drops    0.000M/s, total operations    4.886M/s
        Iter   6 ( 11.626us): hits    4.906M/s (  4.906M/prod), drops    0.000M/s, total operations    4.906M/s
        Iter   7 ( -6.638us): hits    4.896M/s (  4.896M/prod), drops    0.000M/s, total operations    4.896M/s
        Summary: hits    4.893 +- 0.009M/s (  4.893M/prod), drops    0.000 +- 0.000M/s, total operations    4.893 +- 0.009M/s

optimized:
        # ./bench -w2 -d5 -a  trig-usdt
        Setting up benchmark 'trig-usdt'...
        Benchmark 'trig-usdt' started.
        Iter   0 ( 46.073us): hits    8.258M/s (  8.258M/prod), drops    0.000M/s, total operations    8.258M/s
        Iter   1 ( -5.752us): hits    8.264M/s (  8.264M/prod), drops    0.000M/s, total operations    8.264M/s
        Iter   2 ( -1.333us): hits    8.263M/s (  8.263M/prod), drops    0.000M/s, total operations    8.263M/s
        Iter   3 ( -2.996us): hits    8.265M/s (  8.265M/prod), drops    0.000M/s, total operations    8.265M/s
        Iter   4 ( -0.620us): hits    8.264M/s (  8.264M/prod), drops    0.000M/s, total operations    8.264M/s
        Iter   5 ( -2.624us): hits    8.236M/s (  8.236M/prod), drops    0.000M/s, total operations    8.236M/s
        Iter   6 ( -0.840us): hits    8.232M/s (  8.232M/prod), drops    0.000M/s, total operations    8.232M/s
        Iter   7 ( -1.783us): hits    8.235M/s (  8.235M/prod), drops    0.000M/s, total operations    8.235M/s
        Summary: hits    8.249 +- 0.016M/s (  8.249M/prod), drops    0.000 +- 0.000M/s, total operations    8.249 +- 0.016M/s

---
Jiri Olsa (11):
      uprobes: Rename arch_uretprobe_trampoline function
      uprobes: Make copy_from_page global
      uprobes: Add len argument to uprobe_write_opcode
      uprobes: Add data argument to uprobe_write_opcode function
      uprobes: Add mapping for optimized uprobe trampolines
      uprobes: Add uprobe syscall to speed up uprobe
      uprobes/x86: Add support to optimize uprobes
      selftests/bpf: Use 5-byte nop for x86 usdt probes
      selftests/bpf: Add usdt trigger bench
      selftests/bpf: Add uprobe/usdt optimized test
      selftests/bpf: Add hit/attach/detach race optimized uprobe test

 arch/x86/entry/syscalls/syscall_64.tbl                    |   1 +
 arch/x86/include/asm/uprobes.h                            |   7 +++
 arch/x86/kernel/uprobes.c                                 | 180 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/syscalls.h                                  |   2 +
 include/linux/uprobes.h                                   |  25 +++++++++-
 kernel/events/uprobes.c                                   | 222 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 kernel/fork.c                                             |   2 +
 kernel/sys_ni.c                                           |   1 +
 tools/testing/selftests/bpf/bench.c                       |   2 +
 tools/testing/selftests/bpf/benchs/bench_trigger.c        |  45 +++++++++++++++++
 tools/testing/selftests/bpf/prog_tests/uprobe_optimized.c | 252 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/progs/trigger_bench.c         |  10 +++-
 tools/testing/selftests/bpf/progs/uprobe_optimized.c      |  29 +++++++++++
 tools/testing/selftests/bpf/sdt.h                         |   9 +++-
 14 files changed, 768 insertions(+), 19 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_optimized.c
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_optimized.c

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ