[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whbj+pVGhJTcQCLhY8KZJNomWOKM=s-GZSpK_G=G4fXEA@mail.gmail.com>
Date: Fri, 15 Sep 2023 19:01:14 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Matthew Wilcox <willy@...radead.org>
Cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-arch@...r.kernel.org, Nicholas Piggin <npiggin@...il.com>
Subject: Re: [PATCH 08/17] alpha: Implement xor_unlock_is_negative_byte
On Fri, 15 Sept 2023 at 17:38, Matthew Wilcox <willy@...radead.org> wrote:
>
> On Fri, Sep 15, 2023 at 05:27:17PM -0700, Linus Torvalds wrote:
> > On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle)
> > <willy@...radead.org> wrote:
> > >
> > > + "1: ldl_l %0,%4\n"
> > > + " xor %0,%3,%0\n"
> > > + " xor %0,%3,%2\n"
> > > + " stl_c %0,%1\n"
> >
> > What an odd thing to do.
> >
> > Why don't you just save the old value? That double xor looks all kinds
> > of strange, and is a data dependency for no good reason that I can
> > see.
> >
> > Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead?
> >
> > Not that I think alpha matters, but since I was looking through the
> > series, this just made me go "Whaa?"
>
> Well, this is my first time writing Alpha assembler ;-) I stole this
> from ATOMIC_OP_RETURN:
>
> "1: ldl_l %0,%1\n" \
> " " #asm_op " %0,%3,%2\n" \
> " " #asm_op " %0,%3,%0\n" \
Note how that does "orig" assignment first (ie the '%2" destination is
the first instruction), unlike your version.
So in that ATOMIC_OP_RETURN, it does indeed do the same ALU op twice,
but there's no data dependency between the two, so they can execute in
parallel.
> but yes, mov would do the trick here. Is it really faster than xor?
No, I think "mov src,dst" is just a pseudo-op for "or src,src,dst",
there's no actual "mov" instruction, iirc.
So it's an ALU op too.
What makes your version expensive is the data dependency, not the ALU op.
So the *odd* thing is not that you have two xor's per se, but how you
create the original value by xor'ing the value once, and then xoring
the new value with the same mask, giving you the original value back -
but with that odd data dependency so that it won't schedule in the
same cycle.
Does any of this matter? Nope. It's alpha. There's probably a handful
of machines, and it's maybe one extra cycle. It's really the oddity
that threw me.
In ATOMIC_OP_RETURN, the reason it does that op twice is simply that
it wants to return the new value. But you literally made it return the
*old* value by doing an xor twice in succession, which reverses the
bits twice.
Was that really what you intended?
Linus
Powered by blists - more mailing lists