lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e996ef13-c25c-5e9c-edd2-444eded88802@csgroup.eu>
Date:   Wed, 12 May 2021 14:56:56 +0200
From:   Christophe Leroy <christophe.leroy@...roup.eu>
To:     Segher Boessenkool <segher@...nel.crashing.org>
Cc:     Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Paul Mackerras <paulus@...ba.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] powerpc: Force inlining of csum_add()

Hi,

Le 11/05/2021 à 12:51, Segher Boessenkool a écrit :
> Hi!
> 
> On Tue, May 11, 2021 at 06:08:06AM +0000, Christophe Leroy wrote:
>> Commit 328e7e487a46 ("powerpc: force inlining of csum_partial() to
>> avoid multiple csum_partial() with GCC10") inlined csum_partial().
>>
>> Now that csum_partial() is inlined, GCC outlines csum_add() when
>> called by csum_partial().
> 
>> c064fb28 <csum_add>:
>> c064fb28:	7c 63 20 14 	addc    r3,r3,r4
>> c064fb2c:	7c 63 01 94 	addze   r3,r3
>> c064fb30:	4e 80 00 20 	blr
> 
> Could you build this with -fdump-tree-einline-all and send me the
> results?  Or open a GCC PR yourself :-)

Ok, I'll forward it to you in a minute.

> 
> Something seems to have decided this asm is more expensive than it is.
> That isn't always avoidable -- the compiler cannot look inside asms --
> but it seems it could be improved here.
> 
> Do you have (or can make) a self-contained testcase?

I have not tried, and I fear it might be difficult, because on a kernel build with dozens of calls 
to csum_add(), only ip6_tunnel.o exhibits such an issue.

> 
>> The sum with 0 is useless, should have been skipped.
> 
> That isn't something the compiler can do anything about (not sure if you
> were suggesting that); it has to be done in the user code (and it tries
> to already, see below).

I was not suggesting that, only that when properly inlined the sum with 0 is skipped (because we put 
the necessary stuff in csum_add() of course).

> 
>> And there is even one completely unused instance of csum_add().
> 
> That is strange, that should never happen.

It seems that several .o include unused versions of csum_add. After the final link, one remains (in 
addition to the used one) in vmlinux.

> 
>> ./arch/powerpc/include/asm/checksum.h: In function '__ip6_tnl_rcv':
>> ./arch/powerpc/include/asm/checksum.h:94:22: warning: inlining failed in call to 'csum_add': call is unlikely and code size would grow [-Winline]
>>     94 | static inline __wsum csum_add(__wsum csum, __wsum addend)
>>        |                      ^~~~~~~~
>> ./arch/powerpc/include/asm/checksum.h:172:31: note: called from here
>>    172 |                         sum = csum_add(sum, (__force __wsum)*(const u32 *)buff);
>>        |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> At least we say what happened.  Progress!  :-)

Lol. I've seen this warning for long, that's not something new I guess.

> 
>> In the non-inlined version, the first sum with 0 was performed.
>> Here it is skipped.
> 
> That is because of how __builtin_constant_p works, most likely.  As we
> discussed elsewhere it is evaluated before all forms of loop unrolling.

But we are not talking about loop unrolling here, are we ?

It seems that the reason here is that __builtin_constant_p() is evaluated long after GCC decided to 
not inline that call to csum_add().

Christophe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ