[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZzsRfhGSYXVK0mst@J2N7QTR9R3>
Date: Mon, 18 Nov 2024 10:06:01 +0000
From: Mark Rutland <mark.rutland@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Jiri Olsa <jolsa@...nel.org>, Oleg Nesterov <oleg@...hat.com>,
Andrii Nakryiko <andrii@...nel.org>, bpf@...r.kernel.org,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
Hao Luo <haoluo@...gle.com>, Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Alan Maguire <alan.maguire@...cle.com>,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
Will Deacon <will@...nel.org>
Subject: Re: [RFC 00/11] uprobes: Add support to optimize usdt probes on
x86_64
On Sun, Nov 17, 2024 at 12:49:46PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 05, 2024 at 02:33:54PM +0100, Jiri Olsa wrote:
> > hi,
> > this patchset adds support to optimize usdt probes on top of 5-byte
> > nop instruction.
> >
> > The generic approach (optimize all uprobes) is hard due to emulating
> > possible multiple original instructions and its related issues. The
> > usdt case, which stores 5-byte nop seems much easier, so starting
> > with that.
> >
> > The basic idea is to replace breakpoint exception with syscall which
> > is faster on x86_64. For more details please see changelog of patch 7.
>
> So this is really about the fact that syscalls are faster than traps on
> x86_64? Is there something similar on ARM64, or are they roughly the
> same speed there?
>From the hardware side I would expect those to be the same speed.
>From the software side, there might be a difference, but in theory we
should be able to make the non-syscall case faster because we don't have
syscall tracing there.
> That is, I don't think this scheme will work for the various RISC
> architectures, given their very limited immediate range turns a typical
> call into a multi-instruction trainwreck real quick.
>
> Now, that isn't a problem if their exceptions and syscalls are of equal
> speed.
Yep, on arm64 we definitely can't patch in branches reliably; using BRK
(as we do today) is the only reliable option, and it *shouldn't* be
slower than a syscall.
Looking around, we have a different latent issue with uprobes on arm64
in that only certain instructions can be modified while being
concurrently executed (in addition to the atomictiy of updating the
bytes in memory), and for everything else we need to stop-the-world. We
handle that for kprobes but it looks like we don't have any
infrastructure to handle that for uprobes.
Mark.
Powered by blists - more mailing lists