[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ddd07dbe318d451db6897b277e37410f@AcuMS.aculab.com>
Date: Wed, 10 Jan 2024 09:03:30 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Stephen Rothwell' <sfr@...b.auug.org.au>, Linus Torvalds
<torvalds@...ux-foundation.org>
CC: Jiri Slaby <jirislaby@...il.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, Andy Shevchenko
<andriy.shevchenko@...ux.intel.com>, Andrew Morton
<akpm@...ux-foundation.org>, "Matthew Wilcox (Oracle)" <willy@...radead.org>,
Christoph Hellwig <hch@...radead.org>, "Jason A. Donenfeld" <Jason@...c4.com>
Subject: RE: [PATCH next v4 0/5] minmax: Relax type checks in min() and max().
From: Stephen Rothwell
> Sent: 10 January 2024 06:18
>
> Hi Linus,
>
> On Mon, 8 Jan 2024 13:11:12 -0800 Linus Torvalds <torvalds@...ux-foundationorg> wrote:
> >
> > Whee.
>
> Yeah.
>
> > On my machine, that patch makes an "allmodconfig" build go from
> >
> > 10:41 elapsed
> >
> > to
> >
> > 8:46 elapsed
> >
> > so that min/max type checking is almost 20% of the build time.
> >
> > Yeah, I think we need to get rid of it.
> >
> > Can somebody else confirm similar time differences? Or is it just me?
>
> I was hopeful, but:
>
> no patch:
>
> $ /usr/bin/time make ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- -j140 -O -s
> 102460.07user 3710.56system 13:29.05elapsed 13122%CPU (0avgtext+0avgdata 4023168maxresident)k
> 304inputs+7917056outputs (1998673major+120730959minor)pagefaults 0swaps
>
> with patch:
>
> $ /usr/bin/time make ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- -j140 -O -s
> 99775.75user 3684.45system 13:12.89elapsed 13048%CPU (0avgtext+0avgdata 2217536maxresident)k
> 64inputs+7890304outputs (2104371major+119837267minor)pagefaults 0swaps
That looks like 2500 in 100000 (user) or about 2.5%
I did some rebuilds just changing minmax.h and got just over 1%
for changing __types_ok() to be 1.
I did try a few other things, got some marginal improvements.
But I'm not trying to compile the code with 4 nested calls.
One of the things that does explode it somewhat is the
'return constant for constant' path needed to avoid VLA.
That generates two copies of the expansion.
A separate define for that would help a bit.
Doesn't matter much until you get nested min/max they will hurt.
The other slight annoyance is an extra __builtin_choose_expr()
needed for pointer types - because (void *)1 isn't constant.
min3() was mentioned, but that seems to be a nested expansion.
It would need to be more like clamp() to get any benefit.
(And maybe removing the const-for-const option.)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists