linux-kernel - Re: [PATCH v1 0/7] perf bench: Add qspinlock benchmark

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAP-5=fVec0Wp-d489aWE6Tk=W4dz-r6O+JUiqSPLcEZ7TK6FJA@mail.gmail.com>
Date: Tue, 16 Sep 2025 10:00:13 -0700
From: Ian Rogers <irogers@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Mark Rutland <mark.rutland@....com>, Yuzhuo Jing <yuzhuo@...gle.com>, 
	Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Liang Kan <kan.liang@...ux.intel.com>, 
	Yuzhuo Jing <yzj@...ch.edu>, Andrea Parri <parri.andrea@...il.com>, 
	Palmer Dabbelt <palmer@...osinc.com>, Charlie Jenkins <charlie@...osinc.com>, 
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Kumar Kartikeya Dwivedi <memxor@...il.com>, 
	Alexei Starovoitov <ast@...nel.org>, Barret Rhoden <brho@...gle.com>, 
	Alexandre Ghiti <alexghiti@...osinc.com>, Guo Ren <guoren@...nel.org>, linux-kernel@...r.kernel.org, 
	linux-perf-users@...r.kernel.org
Subject: Re: [PATCH v1 0/7] perf bench: Add qspinlock benchmark

On Tue, Sep 16, 2025 at 7:18 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Mon, Aug 04, 2025 at 03:28:12PM +0100, Mark Rutland wrote:
> > On Mon, Jul 28, 2025 at 07:26:33PM -0700, Yuzhuo Jing wrote:
> > > As an effort to improve the perf bench subcommand, this patch series
> > > adds benchmark for the kernel's queued spinlock implementation.
> > >
> > > This series imports necessary kernel definitions such as atomics,
> > > introduces userspace per-cpu adapter, and imports the qspinlock
> > > implementation from the kernel tree to tools tree, with minimum
> > > adaptions.
> >
> > Who is this intended to be useful for, and when would they use this?
> >
> > This doesn't serve as a benchmark of the host kernel, since it tests
> > whatever stale copy of the qspinlock code was built into the perf
> > binary.
> >
> > I can understand that being able to test the code in userspace may be
> > helpful when making some changes, but why does this need to be built
> > into the perf tool?
>
> Right, I think most of us already have a userspace version of it. I have
> a thingy that has TAS, TICKET and QSPINLOCK wrapped in a perf self
> monitor that I can run on various x86_64 to see how it behaves.
>
> IIRC it also has a pile of 'raw' atomic ops to see the contention
> behaviour. This shows that eg. XADD is *waay* nicer than a CMPXCHG loop
> when heavily contended.
>
> Anyway, that lives as a random tar file on a random machine in my house,
> I'm not sure it makes much sense to stick that in perf as such. Rather
> specific.

The intent was that the benchmark wouldn't have stale copies of files
in the same way we keep other files in perf in sync with those in the
kernel.

The inspiration for adding a benchmark this way comes from the
existing perf bench memcpy benchmark. The reason to care is that, as
with memcpy, there are subtle effects from things like RISC-V's
non-temporal atomics (ARM near-far atomics) and the size of CPU cores.
In general queued spinlock is preferred in the kernel, a benchmark of
queued spinlock and ticket spinlock may reveal that ticket spinlock
would be a better default for certain configurations.

Does it make sense to have this in perf? It makes it easier to tune
the implementations, keep code in sync with the kernel, etc. Does it
make sense for perf to have a memcpy benchmark? Maybe not these days
of having a more reliable rep movsb. Anyway, in general the bar to
getting things into perf bench hasn't been hugely high and I don't see
disagreement that on some occasions a benchmark like this is useful.
As someone who cares about this kind of performance tuning, I care
about having the benchmark.

Thanks,
Ian