lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZzsRfhGSYXVK0mst@J2N7QTR9R3>
Date: Mon, 18 Nov 2024 10:06:01 +0000
From: Mark Rutland <mark.rutland@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Jiri Olsa <jolsa@...nel.org>, Oleg Nesterov <oleg@...hat.com>,
	Andrii Nakryiko <andrii@...nel.org>, bpf@...r.kernel.org,
	Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
	John Fastabend <john.fastabend@...il.com>,
	Hao Luo <haoluo@...gle.com>, Steven Rostedt <rostedt@...dmis.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Alan Maguire <alan.maguire@...cle.com>,
	linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
	Will Deacon <will@...nel.org>
Subject: Re: [RFC 00/11] uprobes: Add support to optimize usdt probes on
 x86_64

On Sun, Nov 17, 2024 at 12:49:46PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 05, 2024 at 02:33:54PM +0100, Jiri Olsa wrote:
> > hi,
> > this patchset adds support to optimize usdt probes on top of 5-byte
> > nop instruction.
> > 
> > The generic approach (optimize all uprobes) is hard due to emulating
> > possible multiple original instructions and its related issues. The
> > usdt case, which stores 5-byte nop seems much easier, so starting
> > with that.
> > 
> > The basic idea is to replace breakpoint exception with syscall which
> > is faster on x86_64. For more details please see changelog of patch 7.
> 
> So this is really about the fact that syscalls are faster than traps on
> x86_64? Is there something similar on ARM64, or are they roughly the
> same speed there?

>From the hardware side I would expect those to be the same speed.

>From the software side, there might be a difference, but in theory we
should be able to make the non-syscall case faster because we don't have
syscall tracing there.

> That is, I don't think this scheme will work for the various RISC
> architectures, given their very limited immediate range turns a typical
> call into a multi-instruction trainwreck real quick.
> 
> Now, that isn't a problem if their exceptions and syscalls are of equal
> speed.

Yep, on arm64 we definitely can't patch in branches reliably; using BRK
(as we do today) is the only reliable option, and it *shouldn't* be
slower than a syscall.

Looking around, we have a different latent issue with uprobes on arm64
in that only certain instructions can be modified while being
concurrently executed (in addition to the atomictiy of updating the
bytes in memory), and for everything else we need to stop-the-world. We
handle that for kprobes but it looks like we don't have any
infrastructure to handle that for uprobes.

Mark.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ