[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5882b96e-1287-4390-8174-3316d39038ef@lucifer.local>
Date: Sat, 27 Jul 2024 19:44:05 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Linus Torvalds <torvalds@...uxfoundation.org>
Cc: David Laight <David.Laight@...lab.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Christoph Hellwig <hch@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Dan Carpenter <dan.carpenter@...aro.org>,
Arnd Bergmann <arnd@...nel.org>, "Jason@...c4.com" <Jason@...c4.com>,
"pedro.falcato@...il.com" <pedro.falcato@...il.com>,
Mateusz Guzik <mjguzik@...il.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH 0/7] minmax: reduce compilation time
On Sat, Jul 27, 2024 at 09:26:43AM GMT, Linus Torvalds wrote:
> On Sat, 27 Jul 2024 at 01:08, Lorenzo Stoakes
> <lorenzo.stoakes@...cle.com> wrote:
> >
> > 62603617./drivers/staging/media/atomisp/pci/isp/kernels/ynr/ynr_1.0/ia_css_ynr.host.o.pre
>
> Heh.
>
> Longest line is drivers/.../ia_css_ynr.host.c:71 (27785kB)
>
> yeah, that's a single line that expands to 27MB in size.
>
> And yes, that line is one single min(...) expression with arguments
> that are then in turn macros with other nested min/max arguments.
>
> See also drivers/staging/media/atomisp/pci/sh_css_frac.h.
>
> On my fairly beefy (admittedly more cores than single-thread) machine,
> just generating the preprocessor file takes just under 20s.
>
> Building the object file is actually faster at "only" 8.5s for that
> one file, because it uses the built-in preprocessor and never writes
> it out, and most of the actual preprocessing result is trivial stuff
> that gets thrown away immediately.
>
> Linus
I attach a patch which addresses some of the worst culprits here including
that staging monstrosity. Changing the sDIGIT_FITTING() and
uDIGIT_FITTING() macros affects a ton of other related drivers so has an
outsized impact.
Another big one I tackled is the NET_SKB_PAD define causing slightly hidden
nesting, we can just replace that with a dumb #if and get rid of that.
I also moved MVPP2_SKB_HEADROOM to a clamp_t().
I noticed a bunch of xfs stuff that's slow too, but tracked that down to
<linux/bio.h> which I see you're covering in another thread with Willy.
There are other bits and pieces, but this seems to cover the most egregious
cases.
This patch reduces preprocessor-generated output for allmodconfig from
102,966,525,841 bytes (!) to 102,764,954,617 on my system, thus saves
~200MB of generated output.
----8<----
>From 02f844f0a623645134732aeb96f635558050d104 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Date: Sat, 27 Jul 2024 19:10:01 +0100
Subject: [PATCH] minmax: fixup call sites generating egregious macro
expansions
Adjust code that results in a combinatorial explosion of min()/max() macro
expansion, resulting in significant build performance degradation.
Simplify by using constructs that do not result in the preprocessor doing
this.
This code should have no functional impact.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
---
drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 2 +-
.../staging/media/atomisp/pci/sh_css_frac.h | 26 ++++++++++++++-----
include/linux/skbuff.h | 6 ++++-
3 files changed, 25 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
index e809f91c08fb..8b431f90efc3 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
@@ -23,7 +23,7 @@
/* The PacketOffset field is measured in units of 32 bytes and is 3 bits wide,
* so the maximum offset is 7 * 32 = 224
*/
-#define MVPP2_SKB_HEADROOM min(max(XDP_PACKET_HEADROOM, NET_SKB_PAD), 224)
+#define MVPP2_SKB_HEADROOM clamp_t(int, XDP_PACKET_HEADROOM, NET_SKB_PAD, 224)
#define MVPP2_XDP_PASS 0
#define MVPP2_XDP_DROPPED BIT(0)
diff --git a/drivers/staging/media/atomisp/pci/sh_css_frac.h b/drivers/staging/media/atomisp/pci/sh_css_frac.h
index b90b5b330dfa..ec6cc818f3c6 100644
--- a/drivers/staging/media/atomisp/pci/sh_css_frac.h
+++ b/drivers/staging/media/atomisp/pci/sh_css_frac.h
@@ -32,12 +32,24 @@
#define uISP_VAL_MAX ((unsigned int)((1 << uISP_REG_BIT) - 1))
/* a:fraction bits for 16bit precision, b:fraction bits for ISP precision */
-#define sDIGIT_FITTING(v, a, b) \
- min_t(int, max_t(int, (((v) >> sSHIFT) >> max(sFRACTION_BITS_FITTING(a) - (b), 0)), \
- sISP_VAL_MIN), sISP_VAL_MAX)
-#define uDIGIT_FITTING(v, a, b) \
- min((unsigned int)max((unsigned)(((v) >> uSHIFT) \
- >> max((int)(uFRACTION_BITS_FITTING(a) - (b)), 0)), \
- uISP_VAL_MIN), uISP_VAL_MAX)
+static inline int sDIGIT_FITTING(short v, int a, int b)
+{
+ int fit_shift = sFRACTION_BITS_FITTING(a) - b;
+
+ v >>= sSHIFT;
+ v >>= fit_shift > 0 ? fit_shift : 0;
+
+ return clamp_t(int, v, sISP_VAL_MIN, sISP_VAL_MAX);
+}
+
+static inline unsigned uDIGIT_FITTING(unsigned v, int a, int b)
+{
+ int fit_shift = uFRACTION_BITS_FITTING(a) - b;
+
+ v >>= uSHIFT;
+ v >>= fit_shift > 0 ? fit_shift : 0;
+
+ return clamp_t(unsigned, v, uISP_VAL_MIN, uISP_VAL_MAX);
+}
#endif /* __SH_CSS_FRAC_H */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 29c3ea5b6e93..d53b296df504 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3164,7 +3164,11 @@ static inline int pskb_network_may_pull(struct sk_buff *skb, unsigned int len)
* NET_IP_ALIGN(2) + ethernet_header(14) + IP_header(20/40) + ports(8)
*/
#ifndef NET_SKB_PAD
-#define NET_SKB_PAD max(32, L1_CACHE_BYTES)
+#if L1_CACHE_BYTES < 32
+#define NET_SKB_PAD 32
+#else
+#define NET_SKB_PAD L1_CACHE_BYTES
+#endif
#endif
int ___pskb_trim(struct sk_buff *skb, unsigned int len);
--
2.45.2
Powered by blists - more mailing lists