[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd5753622f2f42248495a42593b497f3@AcuMS.aculab.com>
Date: Tue, 11 Apr 2023 21:34:35 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Dave Hansen' <dave.hansen@...el.com>,
Mark Rutland <mark.rutland@....com>
CC: Uros Bizjak <ubizjak@...il.com>,
"linux-alpha@...r.kernel.org" <linux-alpha@...r.kernel.org>,
"loongarch@...ts.linux.dev" <loongarch@...ts.linux.dev>,
"linux-mips@...r.kernel.org" <linux-mips@...r.kernel.org>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
"x86@...nel.org" <x86@...nel.org>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
"linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Richard Henderson" <richard.henderson@...aro.org>,
Ivan Kokshaysky <ink@...assic.park.msu.ru>,
Matt Turner <mattst88@...il.com>,
Huacai Chen <chenhuacai@...nel.org>,
WANG Xuerui <kernel@...0n.name>,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
Michael Ellerman <mpe@...erman.id.au>,
"Nicholas Piggin" <npiggin@...il.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>, Arnd Bergmann <arnd@...db.de>,
"Peter Zijlstra" <peterz@...radead.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Ian Rogers <irogers@...gle.com>, Will Deacon <will@...nel.org>,
Boqun Feng <boqun.feng@...il.com>,
Jiaxun Yang <jiaxun.yang@...goat.com>,
Jun Yi <yijun@...ngson.cn>
Subject: RE: [PATCH v2 0/5] locking: Introduce local{,64}_try_cmpxchg
From: Dave Hansen
> Sent: 11 April 2023 14:44
>
> On 4/11/23 04:35, Mark Rutland wrote:
> > I agree it'd be nice to have performance figures, but I think those would only
> > need to demonstrate a lack of a regression rather than a performance
> > improvement, and I think it's fairly clear from eyeballing the generated
> > instructions that a regression isn't likely.
>
> Thanks for the additional context.
>
> I totally agree that there's zero burden here to show a performance
> increase. If anyone can think of a quick way to do _some_ kind of
> benchmark on the code being changed and just show that it's free of
> brown paper bags, it would be appreciated. Nothing crazy, just think of
> one workload (synthetic or not) that will stress the paths being changed
> and run it with and without these changes. Make sure there are not
> surprises.
>
> I also agree that it's unlikely to be brown paper bag material.
The only thing I can think of is that, on x86, the locked
variant may actually be faster!
Both require exclusive access to the cache line (the unlocked
variant always does the write! [1]).
So if the cache line is contended between cpu the unlocked
variant might ping-pong the cache line twice!
Of course, if the line is shared like that then performance
is horrid.
[1] I checked on an uncached PCIe address on which I can monitor
the TLP. The write always happens so you can use cmpxchg18b
with a 'known bad value' to do a 16 byte read as a single TLP
(without using an SSE register).
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists