lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 21 Nov 2018 18:30:29 +0000
From:   Matt Sealey <Matt.Sealey@....com>
To:     Christoph Hellwig <hch@...radead.org>,
        Dave Rodgman <dave.rodgman@....com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        nd <nd@....com>,
        "herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "nitingupta910@...il.com" <nitingupta910@...il.com>,
        "rpurdie@...nedhand.com" <rpurdie@...nedhand.com>,
        "markus@...rhumer.com" <markus@...rhumer.com>,
        "minchan@...nel.org" <minchan@...nel.org>,
        "sergey.senozhatsky.work@...il.com" 
        <sergey.senozhatsky.work@...il.com>
Subject: RE: [PATCH 2/6] lib/lzo: enable 64-bit CTZ on Arm

Hi Christoph,

> >  #define LZO_USE_CTZ32	1
> >  #elif defined(__i386__) || defined(__powerpc__)
> >  #define LZO_USE_CTZ32	1
> > -#elif defined(__arm__) && (__LINUX_ARM_ARCH__ >= 5)
> > +#elif defined(__arm__)
> > +#if (__LINUX_ARM_ARCH__ >= 5)
> >  #define LZO_USE_CTZ32	1
> >  #endif
> > +#if (__LINUX_ARM_ARCH__ >= 6) && (CONFIG_THUMB2_KERNEL)
> > +#define LZO_USE_CTZ64	1
> 
> All of this really needs to be driven by Kconfig symbols that
> the architecture selects instead of magic arch ifdefs here.

TL;DR summary: Low hanging fruit.

TL explanation of the thought process:

Rather than using arch ifdefs, it seems we can do __builtin_ctzll()
and __builtin_clzll() on all architectures with no measurable impact
to codegen - libgcc implements them as __builtin_ctz/clz everywhere
I've looked, so gcc at least is going to use that code for the
builtins. On 32-bit Arm it introduces a tiny bit of extra register
pressure in the instruction sequence but it's not measurable.

That is to say that the difference between architectures or
architecture versions isn't the availability of a count-leading-zeros
optimization or the builtin or an efficient algorithm, but the code
preceding it to collect a 64-bit quantity to count.

That is a whole other cleanup.

Longer:

Just to further the point, though, if we rationalize the use of
builtins to use the 64-bit versions, then we may as well use the
kernel version -- __ffs64 is __builtin_clzll in asm-generic with
the right magic arch Kconfig, and if not arches provide exactly
the same code as LZO's fallback when LZO_USE_CxZnn isn't defined.

Even the distinction between little and big endian is moot, since

    __builtin_clzll(cpu_to_be32(n)) 

amounts to precisely the same disassembly on both arm and arm64
as __builtin_ctzll(n), so we can just go the whole hog, replace
the 32-bit loads and xor with 64-bit ones, and do:

    __ffs64(cpu_to_be32(n))

which greatly reduces code complexity in the main LZO code by
removing the distinction.

It turns out the compiler knows what it's doing, and where it
doesn't, the arch maintainers do.

Therein lies the rub; the source gets smaller and we lose the
arch/endian distinction but the efficiency of the preceding 64-bit
load and exclusive-or needs investigation. We still have to
figure out per architecture what the most efficient element size
is or if there's a hideous xor((u64) n) implementation.

I simply don't have confidence in anything except that the magic
arch ifdefs work to improve performance without changing too much,
and the mess it introduces here belies the performance improvement
it provides. Versus the true scope of the code cleanup to do it
right, it's a very concise and reliable and within-style improvement.

Would we agree to let it in for now on the basis that we thought
about it but I side with MFXJO's urgency?

Ta!
Matt Sealey <matt.sealey@....com>
Arm Partner Enablement Group

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ