[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e6bb59c32134477aa4890047ae5ad51b@AcuMS.aculab.com>
Date: Thu, 9 Feb 2023 09:35:29 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Bibo Mao' <maobibo@...ngson.cn>,
Huacai Chen <chenhuacai@...nel.org>,
WANG Xuerui <kernel@...0n.name>
CC: Jiaxun Yang <jiaxun.yang@...goat.com>,
"loongarch@...ts.linux.dev" <loongarch@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2] LoongArch: add checksum optimization for 64-bit system
From: Bibo Mao
> Sent: 09 February 2023 03:59
>
> loongArch platform is 64-bit system, which supports 8 bytes memory
> accessing, generic checksum function uses 4 byte memory access.
> This patch adds 8-bytes memory access optimization for checksum
> function on loongArch. And the code comes from arm64 system.
How fast do these functions actually run (in bytes/clock)?
It is quite possible that just adding 32bit values to a
64bit register is faster.
Any non-trivial cpu will run that at 4 bytes/clock
(for suitably unrolled and pipelined code).
On a more complex cpu adding to two registers will
give 8 bytes/clock (needs two memory loads/clock).
The fastest 64bit sum you'll get on anything mips-like
(no carry flag) is probably from something like:
val = *mem++; // 64bit read
sum += val;
carry = sum < val;
carry_sum += carry;
which is 2 bytes/instruction again.
To get to 8 bytes/clock you need to execute all 4 instructions
every clock - so 1 read and 3 arithmetic.
(c/f 2 read and 2 arithmetic for 32bit adds.)
Arm has a carry flag so the code is:
val = *mem++;
temp,carry = sum + val;
sum = sum + val + carry;
There are still two dependant arithmetic instructions for
each 8-byte word.
The dependencies on the flags register also make it harder
to get any benefit from interleaving adds to two registers.
x86-64 uses 64bit 'add with carry' chains.
No one ever noticed that they take two clocks each on
Intel cpu until (about) Haswell.
It is possible to get 12 bytes/clock with some strange
loops that use (IIRC) adxo and adxc.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists