lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250626213927.GQ17294@gate.crashing.org>
Date: Thu, 26 Jun 2025 16:39:27 -0500
From: Segher Boessenkool <segher@...nel.crashing.org>
To: David Laight <david.laight.linux@...il.com>
Cc: Christophe Leroy <christophe.leroy@...roup.eu>,
        Michael Ellerman <mpe@...erman.id.au>,
        Nicholas Piggin <npiggin@...il.com>, Naveen N Rao <naveen@...nel.org>,
        Madhavan Srinivasan <maddy@...ux.ibm.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
        Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Darren Hart <dvhart@...radead.org>,
        Davidlohr Bueso <dave@...olabs.net>,
        Andre Almeida <andrealmeid@...lia.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 0/5] powerpc: Implement masked user access

Hi!

On Tue, Jun 24, 2025 at 10:08:16PM +0100, David Laight wrote:
> On Tue, 24 Jun 2025 13:25:05 -0500
> Segher Boessenkool <segher@...nel.crashing.org> wrote:
> > On Tue, Jun 24, 2025 at 05:50:01PM +0100, David Laight wrote:
> > > On Tue, 24 Jun 2025 08:17:14 -0500
> > > Segher Boessenkool <segher@...nel.crashing.org> wrote:
> > >   
> > > > On Tue, Jun 24, 2025 at 07:27:47AM +0200, Christophe Leroy wrote:  
> > > > > Ah ok, I overlooked that, I didn't know the cmove instruction, seem 
> > > > > similar to the isel instruction on powerpc e500.    
> > > > 
> > > > cmove does a move (register or memory) when some condition is true.  
> > > 
> > > The destination of x86 'cmov' is always a register (only the source can be
> > > memory - and is probably always read).  
> > 
> > Both source operands can be mem, right?  But probably not both at the
> > same time.
> 
> It only has one 'real' source, but the implementation could easily
> read the destination register and then decide which value to write
> back - rather than doing a conditional write to the register file.

Yeah, in x86 many (most insns?) can read any reg that they write.  Not
a great design, but heh.

> A conditional write would be a right PITA for the alu result
> forwarding logic

Depends.  An implementation can always do the register forwarding etc.,
just annul the actual store where appropriate (and not put it in the
various store queues either, heh -- annul all the effects of the store).

> > x86 is not a RISC architecture, or more generally, a load/store
> > architecture.
> 
> It sort of is these days.

Not at all.  Most *implementations* are, the uarchs, but the
architecture (which determines the required visible semantics) is not.
That impedance difference is quite painful, yes, for code generation
more than for the processor implementation even -- as usual the
compilers have to save the day!

> The memory transfers are separate u-ops, so a 'reg += mem' instruction
> is split into two be the decoder.

Yup.  Very expensive.  Both for the implementation, and for the
performance of eventual code running on it.

> Although some u-ops get merged together and executed in one clock,
> obvious example is some 'compare+branch' pairs.

On many other architectures such things run in 0 cycles anyway :-)

> > A computational instruction is one that doesn't touch memory or does a
> > branch, or some system function, some supervisor or hypervisor
> > instruction maybe.
> > 
> > x86 does not have many computational insns, most insns can touch
> > memory :-)
> 
> Except that the memory 'bit' is executed separately from any alu 'stuff'.

On many uarchs, yes.  But not in the arch.  No uarch can decide to just
not implement these difficult and expensive insns :-)

> > > There is a planned new instruction that would do a conditional write
> > > to memory - but not on any cpu yet.  
> > 
> > Interesting!  Instructions like the atomic store insns we got for p9,
> > maybe?  They can do minimum/maximum and various kinds of more generic
> > reductions and similar.
> 
> I think they are only conditional stores.
> But they do save a conditional branch.

Yeah, but those are not ever executed *anyway*, there is branch
prediction and we require that to be pretty good to get reasonable
performance anyway.

A branch around the store insns is just fine if it can be predicted
correctly.  If it cannot be predicted correctly, you can do the store
always, just have the address that is stored to depend on the condition
(such the data is stored to some dummy memory if it "should not be
done").  Source code gets such a transform done manually in the
performance critical paths not infrequently, already.

GCC does not currently do such a transformation AFAIK, but it is a
pretty basic thing to do.  Conditional stores are not often written in
source code programs, or there would probably be an implementation for
this already :-)

> A late disable of a memory write is far less problematic than a disabled
> register file write. No one minds (too much) about slight delays between
> writes and reads of the same location (reduced by a store to load forwarder)
> but you don't want to lose clocks between adjacent simple alu instructions.

Writes to memory take tens of cycles *anyway*, but all of that is hidden
by the various memory load and store queues (which let you do forwarding
in just a few cycles).

> For my sins I re-implemented a soft cpu last year...

Ouch :-)  But it was fun to do I hope?

> Which doesn't have a 'cmov' :-(

The x86 flag register bits are so limited and complicated in the first
place, cmov is the easier part there ;-) 

> > But ancient things do not.  Both 970 (Apple G5) and Cell BE do not yet
> > have it (they are ISA 2.01 and 2.02 respectively).  And the older p5's
> > do not have it yet either, but the newer ones do.
> > 
> > And all classic PowerPC is ISA 1.xx of course.  Medieval CPUs :-)
> 
> That make more sense than the list in patch 5/5.

Not sure what list that is.  I'll find it :-)

> > > > But sure, seen from very far off both isel and cmove can be used to
> > > > implement the ternary operator ("?:"), are similar in that way :-)  
> > > 
> > > Which is exactly what you want to avoid speculation.  
> > 
> > There are cheaper / simpler / more effective / better ways to get that,
> > but sure, everything is better than a conditional branch, always :-)
> 
> Everything except a TLB miss :-)

Heh.  TLBa are just a tiny part of translation on Power.  We mostly care
about the ERATs.  Look it up, if you want to be introduced to another
level of pain :-)

> And for access_ok() avoiding the conditional is a good enough reason
> to use a 'conditional move' instruction.
> Avoiding speculation is actually free.

*Assuming* that avoiding speculation is actually free, you mean?


Segher

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ