[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7dcc4db6-d5c8-c521-1d74-46871a332b55@csgroup.eu>
Date: Thu, 17 Feb 2022 10:13:50 +0000
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: David Laight <David.Laight@...LAB.COM>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Michael Ellerman <mpe@...erman.id.au>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH 2/2] powerpc/32: Implement csum_sub
Le 13/02/2022 à 04:01, David Laight a écrit :
> From: Christophe Leroy
>> Sent: 11 February 2022 10:25
>>
>> When building kernel with CONFIG_CC_OPTIMISE_FOR_SIZE, several
>> copies of csum_sub() are generated, with the following code:
>>
>> 00000170 <csum_sub>:
>> 170: 7c 84 20 f8 not r4,r4
>> 174: 7c 63 20 14 addc r3,r3,r4
>> 178: 7c 63 01 94 addze r3,r3
>> 17c: 4e 80 00 20 blr
>>
>> Let's define a PPC32 version with subc/addme, and for it's inlining.
>>
>> It will return 0 instead of 0xffffffff when subtracting 0x80000000 to itself,
>> this is not an issue as 0 and ~0 are equivalent, refer to RFC 1624.
>
> They are not always equivalent.
> In particular in the UDP checksum field one of them is (0?) 'checksum not calculated'.
>
> I think all the Linux functions have to return a non-zero value (for non-zero input).
>
> If the csum is going to be converted to 16 bit, inverted, and put into a packet
> the code usually has to have a check that changes 0 to 0xffff.
> However if the csum functions guarantee never to return zero they can feed
> an extra 1 into the first csum_partial() then just invert and add 1 at the end.
> Because (~csum_partion(buffer, 1) + 1) is the same as ~csum_partial(buffer, 0)
> except when the buffer's csum is 0xffffffff.
>
> I did do some experiments and the 64bit value can be reduced directly to
> 16bits using '% 0xffff'.
> This is different because it returns 0 not 0xffff.
> However gcc 'randomly' picks between the fast 'multiply by reciprocal'
> and slow divide instruction paths.
> The former is (probably) faster than reducing using shifts and adc.
> The latter definitely slower.
>
Ok, I submitted a patch to force inlining of all checksum helpers in
net/checksum.h instead.
Christophe
Powered by blists - more mailing lists