[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7afe86ab2d934aa694a0461b1abbd2ce@AcuMS.aculab.com>
Date: Wed, 25 Aug 2021 08:42:35 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Segher Boessenkool' <segher@...nel.crashing.org>,
Christophe Leroy <christophe.leroy@...roup.eu>
CC: "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
"Paul Mackerras" <paulus@...ba.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] powerpc/32: Don't use lmw/stmw for saving/restoring non
volatile regs
From: Segher Boessenkool
> Sent: 24 August 2021 16:28
>
> On Tue, Aug 24, 2021 at 08:16:00AM -0500, Segher Boessenkool wrote:
> > On Tue, Aug 24, 2021 at 07:54:22AM +0200, Christophe Leroy wrote:
> > > >On mpccore both lmw and stmw are only N+1 btw. But the serialization
> > > >might cost another cycle here?
> > >
> > > That coherent on MPC8xx, that's only 2 cycles.
> > > But on the mpc832x which has a e300c2 core, it looks like I have 10 cycles
> > > difference. Is anything wrong ?
> >
> > I don't know that core very well, I'll have a look.
>
> So, I don't see any difference between e300c2 and e300c1 (which is 603
> basically, for this) that is significant here. The e300c2 has two
> integer units instead of just one, but it still has only one load/store
> unit, and I don't see anything else that could matter either. Huh.
Is the cpu as brain-damaged as the (old) strongarm (SA1100 etc)
where ldm/stm always took 1 clock to check each register bit
regardless of the number of registers to copy?
(IIRC it also took the same length of time when conditionally not
executed.)
If x86 had ever had ldm/stm then it would end up being a microcoded
instruction and take forever to decode.
Intel never managed to optimise 'loop' (dec %cx and jump nz).
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists