lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7afe86ab2d934aa694a0461b1abbd2ce@AcuMS.aculab.com>
Date:   Wed, 25 Aug 2021 08:42:35 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Segher Boessenkool' <segher@...nel.crashing.org>,
        Christophe Leroy <christophe.leroy@...roup.eu>
CC:     "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
        "Paul Mackerras" <paulus@...ba.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] powerpc/32: Don't use lmw/stmw for saving/restoring non
 volatile regs

From: Segher Boessenkool
> Sent: 24 August 2021 16:28

> 
> On Tue, Aug 24, 2021 at 08:16:00AM -0500, Segher Boessenkool wrote:
> > On Tue, Aug 24, 2021 at 07:54:22AM +0200, Christophe Leroy wrote:
> > > >On mpccore both lmw and stmw are only N+1 btw.  But the serialization
> > > >might cost another cycle here?
> > >
> > > That coherent on MPC8xx, that's only 2 cycles.
> > > But on the mpc832x which has a e300c2 core, it looks like I have 10 cycles
> > > difference. Is anything wrong ?
> >
> > I don't know that core very well, I'll have a look.
> 
> So, I don't see any difference between e300c2 and e300c1 (which is 603
> basically, for this) that is significant here.  The e300c2 has two
> integer units instead of just one, but it still has only one load/store
> unit, and I don't see anything else that could matter either.  Huh.

Is the cpu as brain-damaged as the (old) strongarm (SA1100 etc)
where ldm/stm always took 1 clock to check each register bit
regardless of the number of registers to copy?
(IIRC it also took the same length of time when conditionally not
executed.)

If x86 had ever had ldm/stm then it would end up being a microcoded
instruction and take forever to decode.
Intel never managed to optimise 'loop' (dec %cx and jump nz).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ