[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <26a7af6f33fa440f986adb4d690f47dc@AcuMS.aculab.com>
Date: Thu, 1 Feb 2024 23:04:48 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Nick Kossifidis' <mick@....forth.gr>, Jisheng Zhang <jszhang@...nel.org>,
Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt
<palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>
CC: "linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Matteo Croce
<mcroce@...rosoft.com>
Subject: RE: [PATCH 3/3] riscv: optimized memset
...
> > + /* Compose an ulong with 'c' repeated 4/8 times */
> > +#ifdef CONFIG_ARCH_HAS_FAST_MULTIPLIER
> > + cu *= 0x0101010101010101UL;
That it likely to generate a compile error on 32bit.
Maybe:
cu *= (unsigned long)0x0101010101010101ULL;
> > +#else
> > + cu |= cu << 8;
> > + cu |= cu << 16;
> > + /* Suppress warning on 32 bit machines */
> > + cu |= (cu << 16) << 16;
> > +#endif
>
> I guess you could check against __SIZEOF_LONG__ here.
Or even sizeof (cu), possible as:
cu |= cu << (sizeof (cu) == 8 ? 32 : 0);
which I'm pretty sure modern compiler will throw away for 32bit.
I do wonder whether CONFIG_ARCH_HAS_FAST_MULTIPLIER is worth
testing - you'd really want to know there is a risc-v cpu
with a multiply that is slower than the shift and or version.
I actually doubt it.
Multiply is used so often (all array indexing) that you
really do need something better than a '1 bit per clock' loop.
It is worth remembering that you can implement an n*n multiply
with n*n 'full adders' (3 input bits, 2 output bits) with a
latency of 2*n adders.
So the latency is only twice that of the corresponding add.
For a modern chip that is not much logic at all.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists