lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 30 Dec 2022 11:39:35 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Holger Lubitz' <holger.lubitz@...nline.de>,
        'Linus Torvalds' <torvalds@...ux-foundation.org>,
        Guenter Roeck <linux@...ck-us.net>
CC:     Rasmus Villemoes <rasmus.villemoes@...vas.dk>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        "Jason A. Donenfeld" <Jason@...c4.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-kbuild@...r.kernel.org" <linux-kbuild@...r.kernel.org>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        "linux-toolchains@...r.kernel.org" <linux-toolchains@...r.kernel.org>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "linux-m68k@...ts.linux-m68k.org" <linux-m68k@...ts.linux-m68k.org>
Subject: RE: [PATCH v2] kbuild: treat char as always unsigned

From: Holger Lubitz
> Sent: 24 December 2022 09:34
> 
> On Thu, 2022-12-22 at 10:41 +0000, David Laight wrote:
> > I wonder how much slower it is - m68k is likely to be microcoded
> > and I don't think instruction timings are actually available.
> 
> Not sure if these are in any way official, but
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68030timing.HTML

I thought about that some more and remember seeing memory timings
on a logic analyser - and getting timings that (more or less)
implied sequential execution limited by the obvious memory (cache)
accesses.

The microcoding is more apparent in the large mid-instruction
interrupt stack frames - eg for page faults.


> 
> (There's also
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68000timing.HTML
> but that is probably only interesting to demo coders by now)
> 
> > The fastest version probably uses subx (with carry) to generate
> > 0/-1 and leaves +delta for the other result - but getting the
> > compares and branches in the right order is hard.
> 
> Guess it must have been over 20 years since I wrote any 68k asm, but
> now I actually ended up installing Debian on qemu to experiment.
> 
> There are two interesting differences between 68k and x86 that can be
> useful here: Unlike x86, MOV on 68k sets the flags. And also, subx
> differs from sbb in that it resets the zero flag on a non-zero result,
> but does not set it on a zero result. So if it is set, it must have
> been set before.
> 
> Here are the two functions I came up with (tested only stand-alone, not
> in a kernel build. Also no benchmarks because this 68040 is only
> emulated)
> #1 (optimized for minimum instruction count in loop,
>     68k + Coldfire ISA_B)
> 
> int strcmp1(const char *cs, const char *ct)
> {
>         int res;
> 
>         asm ("\n"
>                 "1: move.b  (%0)+,%2\n"  /* get *cs */
>                 "   jeq     2f\n"        /* end of first string? */
>                 "   cmp.b   (%1)+,%2\n"  /* compare *ct */
>                 "   jeq     1b\n"        /* if equal, continue */
>                 "   jra     3f\n"        /* else skip to tail */
>                 "2: cmp.b   (%1)+,%2\n"  /* compare one last byte */
>                 "3: subx.l  %2, %2\n"    /* -1 if borrow, 0 if not */
>                 "   jls     4f\n"        /* if set, z is from sub.b */

The subx will set Z unless C was set.
So that doesn't seem right.

>                 "   moveq.l #1, %2\n"    /* 1 if !borrow  */
>                 "4:"
>                 : "+a" (cs), "+a" (ct), "=d" (res));
>         return res;
> }

I think this should work:
(But the jc might need to be jnc.)

                 "   moveq.l #0,%2\n"     /* zero high bits of result */
                 "1: move.b  (%1)+,%2\n"  /* get *ct */
                 "   jeq     2f\n"        /* end of second string? */
                 "   cmp.b   (%0)+,%2\n"  /* compare *cs */
                 "   jeq     1b\n"        /* if equal, continue */
                 "   jc      4f           /* return +ve */
                 "   moveq.l #-1, %2\n"   /* return -ve */
                 "   jra     4f\n"
                 "2: move.b  (%0),%2\n"   /* check for matching strings */
                 "4:"


> #2 (optimized for minimum code size,
>     Coldfire ISA_A compatible)
> 
> int strcmp2(const char *cs, const char *ct)
> {
>         int res = 0, tmp = 0;
> 
>         asm ("\n"
>                 "1: move.b (%0)+,%2\n" /* get *cs */
>                 "   move.b (%1)+,%3\n" /* get *ct */
>                 "   subx.l %3,%2\n"    /* compare a byte */
>                 "   jeq    2f\n"       /* both inputs were zero */

That doesn't seem right.
Z will be set if either *ct is zero or the bytes match.

>                 "   tst.l  %2\n"       /* check result */

This only sets Z when it was already set by the subx.

>                 "   jeq    1b\n"       /* if zero, continue */
>                 "2:"
>                 : "+a" (cs), "+a" (ct), "+d" (res), "+d" (tmp));
>         return res;
> }
> 
> However, this one needs res and tmp to be set to zero, because we read
> only bytes (no automatic zero-extend on 68k), but then do a long
> operation on them. Coldfire ISA_A dropped cmpb, it only reappeared in
> ISA_B.
> 
> So the real instruction count is likely to be two more, unless gcc
> happens to have one or two zeros it can reuse.
> 
> > I believe some of the other m68k asm functions are also missing
> > the "memory" 'clobber' and so could get mis-optimised.
> 
> In which case would that happen? This function doesn't clobber memory
> and its result does get used. If gcc mistakenly thinks the parameters
> haven't changed and uses a previously cached result, wouldn't that
> apply to a C function too?

You need a memory 'clobber' on anything that READS memory as well
as writes it.

> > While I can write (or rather have written) m68k asm I don't have
> > a compiler.
> 
> Well, I now have an emulated Quadra 800 running Debian 68k.(Getting the
> emulated networking to work reliably was a bit problematic, though. But
> now it runs Kernel 6.0) qemu could emulate Coldfire too, but I am not
> sure where I would find a distribution for that.
> 
> I did not attach a patch because it seems already to be decided that
> the function is gone. But should anyone still want to include one (or
> both) of these functions, just give credit to me and I'm fine.

Thinking further the fastest strcmp() probably uses big-endian word compares
with a check for a zero byte.
Especially on 64 bit systems that support misaligned loads.
But I'd need to think hard about the actual details.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ