[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aea24370-2526-7f43-ca5f-55d1d8b8c4bb@loongson.cn>
Date: Thu, 9 Feb 2023 09:16:09 +0800
From: maobibo <maobibo@...ngson.cn>
To: David Laight <David.Laight@...LAB.COM>,
'WANG Xuerui' <kernel@...0n.name>,
Huacai Chen <chenhuacai@...nel.org>
Cc: Jiaxun Yang <jiaxun.yang@...goat.com>,
"loongarch@...ts.linux.dev" <loongarch@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] LoongArch: add checksum optimization for 64-bit system
在 2023/2/8 22:19, David Laight 写道:
> From: WANG Xuerui
>> Sent: 08 February 2023 13:48
> ...
>> Yeah LoongArch can do rotates, and your suggestion can indeed reduce one
>> insn from every invocation of csum_fold.
>>
>> From this:
>>
>> 000000000000096c <csum_fold>:
>> sum += (sum >> 16) | (sum << 16);
>> 96c: 004cc08c rotri.w $t0, $a0, 0x10
>> 970: 00101184 add.w $a0, $t0, $a0
>> return ~(__force __sum16)(sum >> 16);
>> 974: 0044c084 srli.w $a0, $a0, 0x10
>> 978: 00141004 nor $a0, $zero, $a0
>> }
>> 97c: 006f8084 bstrpick.w $a0, $a0, 0xf, 0x0
>> 980: 4c000020 jirl $zero, $ra, 0
>>
>> To:
>>
>> 0000000000000984 <csum_fold2>:
>> return (~sum - rol32(sum, 16)) >> 16;
>> 984: 0014100c nor $t0, $zero, $a0
>> return (x << amt) | (x >> (32 - amt));
>> 988: 004cc084 rotri.w $a0, $a0, 0x10
>> return (~sum - rol32(sum, 16)) >> 16;
>> 98c: 00111184 sub.w $a0, $t0, $a0
>> }
>> 990: 00df4084 bstrpick.d $a0, $a0, 0x1f, 0x10
>> 994: 4c000020 jirl $zero, $ra, 0
>
> It is actually slightly better than that.
> In the csum_fold2 version the first two instructions
> are independent - so can execute in parallel on some cpu.
>
> David
>
Thanks for the good suggestion.
Will send the second version soon.
regards
bibo,mao
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
Powered by blists - more mailing lists