[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7636a7bebfd44e378c5b16d6fd355232@AcuMS.aculab.com>
Date: Wed, 8 Feb 2023 14:19:27 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'WANG Xuerui' <kernel@...0n.name>,
'Bibo Mao' <maobibo@...ngson.cn>,
Huacai Chen <chenhuacai@...nel.org>
CC: Jiaxun Yang <jiaxun.yang@...goat.com>,
"loongarch@...ts.linux.dev" <loongarch@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] LoongArch: add checksum optimization for 64-bit system
From: WANG Xuerui
> Sent: 08 February 2023 13:48
...
> Yeah LoongArch can do rotates, and your suggestion can indeed reduce one
> insn from every invocation of csum_fold.
>
> From this:
>
> 000000000000096c <csum_fold>:
> sum += (sum >> 16) | (sum << 16);
> 96c: 004cc08c rotri.w $t0, $a0, 0x10
> 970: 00101184 add.w $a0, $t0, $a0
> return ~(__force __sum16)(sum >> 16);
> 974: 0044c084 srli.w $a0, $a0, 0x10
> 978: 00141004 nor $a0, $zero, $a0
> }
> 97c: 006f8084 bstrpick.w $a0, $a0, 0xf, 0x0
> 980: 4c000020 jirl $zero, $ra, 0
>
> To:
>
> 0000000000000984 <csum_fold2>:
> return (~sum - rol32(sum, 16)) >> 16;
> 984: 0014100c nor $t0, $zero, $a0
> return (x << amt) | (x >> (32 - amt));
> 988: 004cc084 rotri.w $a0, $a0, 0x10
> return (~sum - rol32(sum, 16)) >> 16;
> 98c: 00111184 sub.w $a0, $t0, $a0
> }
> 990: 00df4084 bstrpick.d $a0, $a0, 0x1f, 0x10
> 994: 4c000020 jirl $zero, $ra, 0
It is actually slightly better than that.
In the csum_fold2 version the first two instructions
are independent - so can execute in parallel on some cpu.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists