[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4ZsSKwJ4Dz3cCAgaVsa4ypbb0e2savO-3_Ltbs=1wzgKQ@mail.gmail.com>
Date: Mon, 3 Mar 2025 13:23:39 +0100
From: Uros Bizjak <ubizjak@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic
locking insns
On Sun, Mar 2, 2025 at 9:56 PM Uros Bizjak <ubizjak@...il.com> wrote:
>
> On Fri, Feb 28, 2025 at 5:48 PM Dave Hansen <dave.hansen@...el.com> wrote:
> >
> > On 2/28/25 04:35, Uros Bizjak wrote:
> > > The code size of the resulting x86_64 defconfig object file increases
> > > for 33.264 kbytes, representing 1.2% code size increase:
> > >
> > > text data bss dec hex filename
> > > 27450107 4633332 814148 32897587 1f5fa33 vmlinux-old.o
> > > 27483371 4633784 814148 32931303 1f67de7 vmlinux-new.o
> >
> > So, first of all, thank you for including some objective measurement of
> > the impact if your patches. It's much appreciated.
> >
> > But I think the patches need to come with a solid theory of why they're
> > good. The minimum bar for that, I think, is *some* kind of actual
> > real-world performance test. I'm not picky. Just *something* that spends
> > a lot of time in the kernel and ideally where a profile points at some
> > of the code you're poking here.
> >
> > I'm seriously not picky: will-it-scale, lmbench, dbench, kernel
> > compiles. *ANYTHING*. *ANY* hardware. Run it on your laptop.
> >
> > But performance patches need to come with performance *numbers*.
>
> Please find lmbench results from unpatched (fedora.0) and patched
> (fedora.1) fedora-41 6.13.5 kernels.
>
> lmbench is from [1]
>
> [1] https://fedora.pkgs.org/41/rpm-sphere-x86_64/lmbench-3.0-0.a9.3.x86_64.rpm.html
>
> Some tests show quite different results, but I'd appreciate some help
> in interpreting the results. Maybe they show that the price of 33
> kbytes is worth the improvement, or they will motivate someone
> experienced in kernel benchmarks to benchmark the patch in a more
> scientific way.
These go from:
Process fork+exit: 270.0952 microseconds
Process fork+execve: 2620.3333 microseconds
Process fork+/bin/sh -c: 6781.0000 microseconds
File /usr/tmp/XXX write bandwidth: 1780350 KB/sec
Pagefaults on /usr/tmp/XXX: 0.3875 microseconds
to:
Process fork+exit: 298.6842 microseconds
Process fork+execve: 1662.7500 microseconds
Process fork+/bin/sh -c: 2127.6667 microseconds
File /usr/tmp/XXX write bandwidth: 1950077 KB/sec
Pagefaults on /usr/tmp/XXX: 0.1958 microseconds
and from:
Socket bandwidth using localhost
0.000001 2.52 MB/sec
0.000064 163.02 MB/sec
0.000128 321.70 MB/sec
0.000256 630.06 MB/sec
0.000512 1207.07 MB/sec
0.001024 2004.06 MB/sec
0.001437 2475.43 MB/sec
10.000000 5817.34 MB/sec
Avg xfer: 3.2KB, 41.8KB in 1.2230 millisecs, 34.15 MB/sec
AF_UNIX sock stream bandwidth: 9850.01 MB/sec
Pipe bandwidth: 4631.28 MB/sec
to:
Socket bandwidth using localhost
0.000001 3.13 MB/sec
0.000064 187.08 MB/sec
0.000128 324.12 MB/sec
0.000256 618.51 MB/sec
0.000512 1137.13 MB/sec
0.001024 1962.95 MB/sec
0.001437 2458.27 MB/sec
10.000000 6168.08 MB/sec
Avg xfer: 3.2KB, 41.8KB in 1.0060 millisecs, 41.52 MB/sec
AF_UNIX sock stream bandwidth: 9921.68 MB/sec
Pipe bandwidth: 4649.96 MB/sec
Uros.
Powered by blists - more mailing lists