[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <357cbd67260040e4bcf17d519aaafdcb@AcuMS.aculab.com>
Date: Fri, 30 Dec 2022 11:39:35 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Holger Lubitz' <holger.lubitz@...nline.de>,
'Linus Torvalds' <torvalds@...ux-foundation.org>,
Guenter Roeck <linux@...ck-us.net>
CC: Rasmus Villemoes <rasmus.villemoes@...vas.dk>,
Geert Uytterhoeven <geert@...ux-m68k.org>,
"Jason A. Donenfeld" <Jason@...c4.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-kbuild@...r.kernel.org" <linux-kbuild@...r.kernel.org>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
"linux-toolchains@...r.kernel.org" <linux-toolchains@...r.kernel.org>,
Masahiro Yamada <masahiroy@...nel.org>,
Kees Cook <keescook@...omium.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"linux-m68k@...ts.linux-m68k.org" <linux-m68k@...ts.linux-m68k.org>
Subject: RE: [PATCH v2] kbuild: treat char as always unsigned
From: Holger Lubitz
> Sent: 24 December 2022 09:34
>
> On Thu, 2022-12-22 at 10:41 +0000, David Laight wrote:
> > I wonder how much slower it is - m68k is likely to be microcoded
> > and I don't think instruction timings are actually available.
>
> Not sure if these are in any way official, but
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68030timing.HTML
I thought about that some more and remember seeing memory timings
on a logic analyser - and getting timings that (more or less)
implied sequential execution limited by the obvious memory (cache)
accesses.
The microcoding is more apparent in the large mid-instruction
interrupt stack frames - eg for page faults.
>
> (There's also
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68000timing.HTML
> but that is probably only interesting to demo coders by now)
>
> > The fastest version probably uses subx (with carry) to generate
> > 0/-1 and leaves +delta for the other result - but getting the
> > compares and branches in the right order is hard.
>
> Guess it must have been over 20 years since I wrote any 68k asm, but
> now I actually ended up installing Debian on qemu to experiment.
>
> There are two interesting differences between 68k and x86 that can be
> useful here: Unlike x86, MOV on 68k sets the flags. And also, subx
> differs from sbb in that it resets the zero flag on a non-zero result,
> but does not set it on a zero result. So if it is set, it must have
> been set before.
>
> Here are the two functions I came up with (tested only stand-alone, not
> in a kernel build. Also no benchmarks because this 68040 is only
> emulated)
> #1 (optimized for minimum instruction count in loop,
> 68k + Coldfire ISA_B)
>
> int strcmp1(const char *cs, const char *ct)
> {
> int res;
>
> asm ("\n"
> "1: move.b (%0)+,%2\n" /* get *cs */
> " jeq 2f\n" /* end of first string? */
> " cmp.b (%1)+,%2\n" /* compare *ct */
> " jeq 1b\n" /* if equal, continue */
> " jra 3f\n" /* else skip to tail */
> "2: cmp.b (%1)+,%2\n" /* compare one last byte */
> "3: subx.l %2, %2\n" /* -1 if borrow, 0 if not */
> " jls 4f\n" /* if set, z is from sub.b */
The subx will set Z unless C was set.
So that doesn't seem right.
> " moveq.l #1, %2\n" /* 1 if !borrow */
> "4:"
> : "+a" (cs), "+a" (ct), "=d" (res));
> return res;
> }
I think this should work:
(But the jc might need to be jnc.)
" moveq.l #0,%2\n" /* zero high bits of result */
"1: move.b (%1)+,%2\n" /* get *ct */
" jeq 2f\n" /* end of second string? */
" cmp.b (%0)+,%2\n" /* compare *cs */
" jeq 1b\n" /* if equal, continue */
" jc 4f /* return +ve */
" moveq.l #-1, %2\n" /* return -ve */
" jra 4f\n"
"2: move.b (%0),%2\n" /* check for matching strings */
"4:"
> #2 (optimized for minimum code size,
> Coldfire ISA_A compatible)
>
> int strcmp2(const char *cs, const char *ct)
> {
> int res = 0, tmp = 0;
>
> asm ("\n"
> "1: move.b (%0)+,%2\n" /* get *cs */
> " move.b (%1)+,%3\n" /* get *ct */
> " subx.l %3,%2\n" /* compare a byte */
> " jeq 2f\n" /* both inputs were zero */
That doesn't seem right.
Z will be set if either *ct is zero or the bytes match.
> " tst.l %2\n" /* check result */
This only sets Z when it was already set by the subx.
> " jeq 1b\n" /* if zero, continue */
> "2:"
> : "+a" (cs), "+a" (ct), "+d" (res), "+d" (tmp));
> return res;
> }
>
> However, this one needs res and tmp to be set to zero, because we read
> only bytes (no automatic zero-extend on 68k), but then do a long
> operation on them. Coldfire ISA_A dropped cmpb, it only reappeared in
> ISA_B.
>
> So the real instruction count is likely to be two more, unless gcc
> happens to have one or two zeros it can reuse.
>
> > I believe some of the other m68k asm functions are also missing
> > the "memory" 'clobber' and so could get mis-optimised.
>
> In which case would that happen? This function doesn't clobber memory
> and its result does get used. If gcc mistakenly thinks the parameters
> haven't changed and uses a previously cached result, wouldn't that
> apply to a C function too?
You need a memory 'clobber' on anything that READS memory as well
as writes it.
> > While I can write (or rather have written) m68k asm I don't have
> > a compiler.
>
> Well, I now have an emulated Quadra 800 running Debian 68k.(Getting the
> emulated networking to work reliably was a bit problematic, though. But
> now it runs Kernel 6.0) qemu could emulate Coldfire too, but I am not
> sure where I would find a distribution for that.
>
> I did not attach a patch because it seems already to be decided that
> the function is gone. But should anyone still want to include one (or
> both) of these functions, just give credit to me and I'm fine.
Thinking further the fastest strcmp() probably uses big-endian word compares
with a check for a zero byte.
Especially on 64 bit systems that support misaligned loads.
But I'd need to think hard about the actual details.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists