[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK8P3a0_ovcX9tOo1UQ3_1UmM=+A2X=yErw27i2pHOj4XD40-A@mail.gmail.com>
Date: Tue, 30 Jul 2019 20:24:14 +0200
From: Arnd Bergmann <arnd@...db.de>
To: Segher Boessenkool <segher@...nel.crashing.org>
Cc: Nathan Chancellor <natechancellor@...il.com>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Michael Ellerman <mpe@...erman.id.au>,
christophe leroy <christophe.leroy@....fr>,
kbuild test robot <lkp@...el.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
clang-built-linux <clang-built-linux@...glegroups.com>
Subject: Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
On Tue, Jul 30, 2019 at 6:16 PM Segher Boessenkool
<segher@...nel.crashing.org> wrote:
>
> On Tue, Jul 30, 2019 at 04:30:29PM +0200, Arnd Bergmann wrote:
> > On Tue, Jul 30, 2019 at 3:49 PM Segher Boessenkool
> > <segher@...nel.crashing.org> wrote:
> > >
> > > On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote:
> > > > Upon a second look, I think the issue is that the "Z" is an input argument
> > > > when it should be an output. clang decides that it can make a copy of the
> > > > input and pass that into the inline asm. This is not the most efficient
> > > > way, but it seems entirely correct according to the constraints.
> > >
> > > Most dcb* (and all icb*) do not change the memory pointed to. The
> > > memory is an input here, logically as well, and that is obvious.
> >
> > Ah, right. I had only thought of dcbz here, but you are right that using
> > an output makes little sense for the others.
> >
> > readl() is another example where powerpc currently uses "Z" for an
> > input, which illustrates this even better.
>
> in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as
> well, its (not byte reversing) read will be atomic just fine, so things
> will still work correctly.
byteorder is fine, the problem I was thinking of is when moving the load/store
instructions around the barriers that synchronize with DMA, or turning
them into different-size accesses. Changing two consecutive 16-bit mmio reads
into an unaligned 32-bit read will rarely have the intended effect ;-)
Arnd
Powered by blists - more mailing lists