[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <pep7tppcmd77ejaa47bhajc3uoy2q2n3cladgc4btdri4mth65@dqjulq2hx4l2>
Date: Thu, 9 Jan 2025 21:52:31 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Kees Cook <kees@...nel.org>
Cc: kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
lkp@...el.com, linux-kernel@...r.kernel.org,
Thomas Weißschuh <linux@...ssschuh.net>, Nilay Shroff <nilay@...ux.ibm.com>,
Yury Norov <yury.norov@...il.com>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
linux-hardening@...r.kernel.org
Subject: Re: [linus:master] [fortify] 239d87327d: vm-scalability.throughput
17.3% improvement
On Thu, Jan 09, 2025 at 12:38:04PM -0800, Kees Cook wrote:
> On Thu, Jan 09, 2025 at 08:51:44AM -0800, Kees Cook wrote:
> > On Thu, Jan 09, 2025 at 02:57:58PM +0800, kernel test robot wrote:
> > > kernel test robot noticed a 17.3% improvement of vm-scalability.throughput on:
> > >
> > > commit: 239d87327dcd361b0098038995f8908f3296864f ("fortify: Hide run-time copy size from value range tracking")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > Well that is unexpected. There should be no binary output difference
> > with that patch. I will investigate...
>
> It looks like hiding the size value from GCC has the side-effect of
> breaking memcpy inlining in many places. I would expect this to make
> things _slower_, though. O_o
>
This depends on what was emitted in place and what CPU is executing it.
Notably if gcc elected to emit rep movs{q,b}, the CPU at hand does
not have FSRM and the size is low enough, then such code can indeed be
slower than suffering a call to memcpy (which does not issue rep mov).
I had seen gcc go to great pains to align a buffer for rep movsq even
when it was guaranteed to not be necessary for example.
Can you disasm an example affected spot?
Gcc has a bunch of magic switches to tell it what to emit in line, the
thing to do is to convince it to roll with a bunch of mov (not rep mov)
for sizes small enough(tm). What constitutes small enough depends on the
uarch.
Powered by blists - more mailing lists