[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <2a1024f7-af7f-474f-8b1c-aa5e0d4bd17a@app.fastmail.com>
Date: Wed, 24 Jul 2024 10:50:54 +0200
From: "Arnd Bergmann" <arnd@...nel.org>
To: "Juergen Gross" <jgross@...e.com>,
"Lorenzo Stoakes" <lorenzo.stoakes@...cle.com>,
"Andrew Morton" <akpm@...ux-foundation.org>,
"David Laight" <david.laight@...lab.com>
Cc: "Matthew Wilcox" <willy@...radead.org>,
"Linus Torvalds" <torvalds@...ux-foundation.org>,
"Jason A . Donenfeld" <Jason@...c4.com>,
"Christoph Hellwig" <hch@...radead.org>,
"Andy Shevchenko" <andriy.shevchenko@...ux.intel.com>,
pedro.falcato@...il.com, "Mateusz Guzik" <mjguzik@...il.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: Build performance regressions originating from min()/max() macros
On Wed, Jul 24, 2024, at 10:14, Jürgen Groß wrote:
> On 23.07.24 23:59, Lorenzo Stoakes wrote:
>>
>> And resulted in the generation of 47 MB (!) of pre-processor output.
>>
>> It seems a lot of code now relies on the relaxed conditions of the newly
>> changed min/max() macros, so the question is - what can we do to address
>> these regressions?
>
> I can send a patch to simplify the problematic construct, but OTOH this
> will avoid only one particularly bad example.
It's probably a good idea do change the xen/setup.c file anyway,
as I haven't found any other file that had a regression this bad,
and it only needs a single temporary variable for a 1000x speedup.
For the overall kernel, I see at best a 2.3% speedup (20 second
CPU time) by replacing the current min()/max() macros with a version
that drops both the constant expression output feature and the
assertion, measuring an x86 defconfig build, which has xen
disabled. On a defconfig+xen kernel, that difference increases
to 4.4% or 37 seconds.
Removing only the constexpr side requires a handful of fixups
for x86 allmodconfig to replace min()/max() with something else in
drivers/edac/sb_edac.c
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
drivers/gpu/drm/drm_color_mgmt.c
drivers/input/touchscreen/cyttsp4_core.c
drivers/md/dm-integrity.c
drivers/net/can/usb/etas_es58x/es58x_devlink.c
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
fs/btrfs/tree-checker.c
lib/vsprintf.c
net/ipv4/proc.c
net/ipv6/proc.c
This gives about half the speed difference, the other
half comes from removing the assertion, but that is not
a good idea unless we can replace it with an equivalent
assertion that works on the unique_x/unique_y variables
instead of expanding the arguments.
Arnd
Powered by blists - more mailing lists