[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzbXYrZLF+WGBvkSmKDCvVLuos-Ywx1xKqksdaYKySB-OQ@mail.gmail.com>
Date: Mon, 18 Nov 2024 22:13:04 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Mark Rutland <mark.rutland@....com>
Cc: Peter Zijlstra <peterz@...radead.org>, Jiri Olsa <jolsa@...nel.org>,
Oleg Nesterov <oleg@...hat.com>, Andrii Nakryiko <andrii@...nel.org>, bpf@...r.kernel.org,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>, Hao Luo <haoluo@...gle.com>,
Steven Rostedt <rostedt@...dmis.org>, Masami Hiramatsu <mhiramat@...nel.org>,
Alan Maguire <alan.maguire@...cle.com>, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, Will Deacon <will@...nel.org>
Subject: Re: [RFC 00/11] uprobes: Add support to optimize usdt probes on x86_64
On Mon, Nov 18, 2024 at 2:06 AM Mark Rutland <mark.rutland@....com> wrote:
>
> On Sun, Nov 17, 2024 at 12:49:46PM +0100, Peter Zijlstra wrote:
> > On Tue, Nov 05, 2024 at 02:33:54PM +0100, Jiri Olsa wrote:
> > > hi,
> > > this patchset adds support to optimize usdt probes on top of 5-byte
> > > nop instruction.
> > >
> > > The generic approach (optimize all uprobes) is hard due to emulating
> > > possible multiple original instructions and its related issues. The
> > > usdt case, which stores 5-byte nop seems much easier, so starting
> > > with that.
> > >
> > > The basic idea is to replace breakpoint exception with syscall which
> > > is faster on x86_64. For more details please see changelog of patch 7.
> >
> > So this is really about the fact that syscalls are faster than traps on
> > x86_64? Is there something similar on ARM64, or are they roughly the
> > same speed there?
>
> From the hardware side I would expect those to be the same speed.
>
> From the software side, there might be a difference, but in theory we
> should be able to make the non-syscall case faster because we don't have
> syscall tracing there.
>
> > That is, I don't think this scheme will work for the various RISC
> > architectures, given their very limited immediate range turns a typical
> > call into a multi-instruction trainwreck real quick.
> >
> > Now, that isn't a problem if their exceptions and syscalls are of equal
> > speed.
>
> Yep, on arm64 we definitely can't patch in branches reliably; using BRK
> (as we do today) is the only reliable option, and it *shouldn't* be
> slower than a syscall.
>
> Looking around, we have a different latent issue with uprobes on arm64
> in that only certain instructions can be modified while being
> concurrently executed (in addition to the atomictiy of updating the
What does this mean for the application in practical terms? Will it
crash? Or will there be some corruption? Just curious how this can
manifest.
> bytes in memory), and for everything else we need to stop-the-world. We
> handle that for kprobes but it looks like we don't have any
> infrastructure to handle that for uprobes.
>
> Mark.
Powered by blists - more mailing lists