[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@mail.gmail.com>
Date: Sat, 6 Jan 2024 11:32:57 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Eric Dumazet <edumazet@...gle.com>
Cc: David Laight <David.Laight@...lab.com>, Noah Goldstein <goldstein.w.n@...il.com>,
kernel test robot <lkp@...el.com>, "x86@...nel.org" <x86@...nel.org>,
"oe-kbuild-all@...ts.linux.dev" <oe-kbuild-all@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "hpa@...or.com" <hpa@...or.com>
Subject: Re: x86/csum: Remove unnecessary odd handling
On Sat, 6 Jan 2024 at 02:26, Eric Dumazet <edumazet@...gle.com> wrote:
>
> On a related note, at least with clang, I found that csum_ipv6_magic()
> is needlessly using temporary on-stack storage,
> showing a stall on Cascade Lake unless I am patching add32_with_carry() :
This is a known compiler bug in clang:
https://github.com/llvm/llvm-project/issues/20571
https://github.com/llvm/llvm-project/issues/30873
https://github.com/llvm/llvm-project/issues/34837
and while we could always just use "r" for constraints, it will
(a) possibly run out of registers in some situations
(b) pessimize gcc that does this right
Can you please push the clang people to not do the stupid thing they
do now, which is to say "oh, I can use registers _or_ memory, so I'll
always use memory".
Btw, it's actually much worse than that, and clang is just doing
incredibly broken things. Switching to "r" just hides the garbage.
Because sometimes the source really *is* memory, ie we have
net/ipv4/udp.c:
csum = csum_add(csum, frags->csum);
and then it's criminally stupid to load it into a register when it can
be just used directly.
But clang says "criminally stupid? *I* will show you stupid!" -
because what *clang* does with the above is this thing of beauty:
movl 136(%rax), %edi
movl %edi, 28(%rsp)
addl 28(%rsp), %ecx
adcl $0, %ecx
which has turned from "criminally stupid" to "it's art, I tell you -
you're not supposed to understand it".
Anyway, this is *literally* a clang bug. Complain to clang people for
being *extra* stupid - we told the compiler that it can use a register
or memory, and clang decided "I'll use memory", so then when we gave
it a memory location, it said "no, not *that* memory - I'll just
reload it on stack".
In contrast, gcc gets this right - and for that udp.c case it just generates
addl 136(%rax),%ecx # frags_67->D.58941.D.58869.D.58836.csum, a
adcl $0,%ecx # a
like it should.
And for csum_ipv6_magic, gcc generates this:
addl %edx,%eax # tmp112, a
adcl $0,%eax # a
IOW, the kernel is *right*, and this is purely clang being completely bogus.
I really don't want to penalize a good compiler because a bad one
can't get its act together.
Linus
Powered by blists - more mailing lists