lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 6 Jan 2024 11:32:57 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Eric Dumazet <edumazet@...gle.com>
Cc: David Laight <David.Laight@...lab.com>, Noah Goldstein <goldstein.w.n@...il.com>, 
	kernel test robot <lkp@...el.com>, "x86@...nel.org" <x86@...nel.org>, 
	"oe-kbuild-all@...ts.linux.dev" <oe-kbuild-all@...ts.linux.dev>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "tglx@...utronix.de" <tglx@...utronix.de>, 
	"mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>, 
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "hpa@...or.com" <hpa@...or.com>
Subject: Re: x86/csum: Remove unnecessary odd handling

On Sat, 6 Jan 2024 at 02:26, Eric Dumazet <edumazet@...gle.com> wrote:
>
> On a related note, at least with clang, I found that csum_ipv6_magic()
> is needlessly using temporary on-stack storage,
> showing a stall on Cascade Lake unless I am patching add32_with_carry() :

This is a known compiler bug in clang:

    https://github.com/llvm/llvm-project/issues/20571
    https://github.com/llvm/llvm-project/issues/30873
    https://github.com/llvm/llvm-project/issues/34837

and while we could always just use "r" for constraints, it will

 (a) possibly run out of registers in some situations

 (b) pessimize gcc that does this right

Can you please push the clang people to not do the stupid thing they
do now, which is to say "oh, I can use registers _or_ memory, so I'll
always use memory".

Btw, it's actually much worse than that, and clang is just doing
incredibly broken things. Switching to "r" just hides the garbage.

Because sometimes the source really *is* memory, ie we have

    net/ipv4/udp.c:
                 csum = csum_add(csum, frags->csum);

and then it's criminally stupid to load it into a register when it can
be just used directly.

But clang says "criminally stupid? *I* will show you stupid!" -
because what *clang* does with the above is this thing of beauty:

        movl    136(%rax), %edi
        movl    %edi, 28(%rsp)
        addl    28(%rsp), %ecx
        adcl    $0, %ecx

which has turned from "criminally stupid" to "it's art, I tell you -
you're not supposed to understand it".

Anyway, this is *literally* a clang bug. Complain to clang people for
being *extra* stupid - we told the compiler that it can use a register
or memory, and clang decided "I'll use memory", so then when we gave
it a memory location, it said "no, not *that* memory - I'll just
reload it on stack".

In contrast, gcc gets this right - and for that udp.c case it just generates

        addl 136(%rax),%ecx     # frags_67->D.58941.D.58869.D.58836.csum, a
        adcl $0,%ecx    # a

like it should.

And for csum_ipv6_magic, gcc generates this:

        addl %edx,%eax  # tmp112, a
        adcl $0,%eax    # a

IOW, the kernel is *right*, and this is purely clang being completely bogus.

I really don't want to penalize a good compiler because a bad one
can't get its act together.

               Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ