[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQ+3VA-SW2FKVv7iSPps00gZRkOb9L7NiKFZ5Jc5NwDedQ@mail.gmail.com>
Date: Thu, 21 Nov 2024 08:47:56 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Andrii Nakryiko <andrii.nakryiko@...il.com>, Jiri Olsa <olsajiri@...il.com>,
Oleg Nesterov <oleg@...hat.com>, Andrii Nakryiko <andrii@...nel.org>, bpf <bpf@...r.kernel.org>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>, Hao Luo <haoluo@...gle.com>,
Steven Rostedt <rostedt@...dmis.org>, Masami Hiramatsu <mhiramat@...nel.org>,
Alan Maguire <alan.maguire@...cle.com>, LKML <linux-kernel@...r.kernel.org>,
linux-trace-kernel <linux-trace-kernel@...r.kernel.org>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [RFC perf/core 05/11] uprobes: Add mapping for optimized uprobe trampolines
On Thu, Nov 21, 2024 at 8:34 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Nov 21, 2024 at 08:02:12AM -0800, Alexei Starovoitov wrote:
> > On Thu, Nov 21, 2024 at 4:17 AM Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > On Wed, Nov 20, 2024 at 04:07:38PM -0800, Andrii Nakryiko wrote:
> > >
> > > > USDTs are meant to be "transparent" to the surrounding code and they
> > > > don't mark any clobbered registers. Technically it could be added, but
> > > > I'm not a fan of this.
> > >
> > > Sure. Anyway, another thing to consider is FRED, will all of this still
> > > matter once that lands? If FRED gets us INT3 performance close to what
> > > SYSCALL has, then all this work will go unused.
> >
> > afaik not a single cpu in the datacenter supports FRED while
> > uprobe overhead is real.
> > imo it's worth improving performance today for existing cpus.
>
> I understand, but OTOH adding a syscall now, that we'll have to maintain
> for years and years, even through we know it'll not be used much is a
> bit annoying.
No. It _will_ be used for years.
>
> > I suspect arm64 might benefit too. Even if arm hw does the same
> > amount of work for trap vs syscall the sw overhead of handling
> > trap is different.
>
> Well, the RISC CPUs have a much harder time using this, their immediate
> range is typically puny and they end up needing multiple instructions
> and some register in order to set up a call.
We don't care about 32-bit archs and other exotics.
They're not the reasons to leave performance on the table
on dominant archs.
> Elsewhere in the thread Mark Rutland already noted that arm64 really
> doesn't need or want this.
Doesn't look like you've read what you quoted above.
On arm64 the _HW_ cost may be the same.
The _SW_ difference in handling trap vs syscall is real.
I bet once uprobe syscall is benchmarked on arm64 there will
be a delta.
Powered by blists - more mailing lists