lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250610191208.GD1649@sol>
Date: Tue, 10 Jun 2025 12:12:08 -0700
From: Eric Biggers <ebiggers@...nel.org>
To: "Jason A. Donenfeld" <Jason@...c4.com>
Cc: linux-kernel@...r.kernel.org, linux-crypto@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org, loongarch@...ts.linux.dev,
	linux-mips@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
	linux-riscv@...ts.infradead.org, linux-s390@...r.kernel.org,
	sparclinux@...r.kernel.org, x86@...nel.org,
	linux-arch@...r.kernel.org, Ard Biesheuvel <ardb@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v2 00/12] lib/crc: improve how arch-optimized code is
 integrated

On Tue, Jun 10, 2025 at 11:39:22AM -0600, Jason A. Donenfeld wrote:
> On Sun, Jun 08, 2025 at 04:48:17PM -0700, Eric Biggers wrote:
> > On Sat, Jun 07, 2025 at 05:47:02PM -0600, Jason A. Donenfeld wrote:
> > > On Sat, Jun 07, 2025 at 01:04:42PM -0700, Eric Biggers wrote:
> > > > Having arch-specific code outside arch/ was somewhat controversial when
> > > > Zinc proposed it back in 2018.  But I don't think the concerns are
> > > > warranted.  It's better from a technical perspective, as it enables the
> > > > improvements mentioned above.  This model is already successfully used
> > > > in other places in the kernel such as lib/raid6/.  The community of each
> > > > architecture still remains free to work on the code, even if it's not in
> > > > arch/.  At the time there was also a desire to put the library code in
> > > > the same files as the old-school crypto API, but that was a mistake; now
> > > > that the library is separate, that's no longer a constraint either.
> > > 
> > > I can't express how happy I am to see this revived. It's clearly the
> > > right way forward and makes it a lot simpler for us to dispatch to
> > > various arch implementations and also is organizationally simpler.
> > > 
> > > Jason
> > 
> > Thanks!  Can I turn that into an Acked-by?
> 
> Took me a little while longer to fully review it. Sure,
> 
>     Acked-by: Jason A. Donenfeld <Jason@...c4.com>
> 
> Side note: I wonder about eventually turning some of the static branches
> into static calls.

Yes, Linus was wondering the same thing earlier.  It does run into a couple
issues.  First, only x86 and powerpc implement static_call properly; everywhere
else it's just an indirect call.  Second, there's often some code shared above
the level at which we'd like to do the dispatch.  For example, consider crc32_le
on x86.  If we expand the CRC_PCLMUL macro and inline crc32_le_arch and
crc32_le_base as the compiler does, crc32_le ends up as:

    u32 crc32_le(u32 crc, const u8 *p, size_t len)
    {
            if (len >= 16 && static_branch_likely(&have_pclmulqdq) &&
                crypto_simd_usable()) {
                    const void *consts_ptr;

                    consts_ptr = crc32_lsb_0xedb88320_consts.fold_across_128_bits_consts;
                    kernel_fpu_begin();
                    crc = static_call(crc32_lsb_pclmul)(crc, p, len, consts_ptr);
                    kernel_fpu_end();
                    return crc;
            }
            while (len--)
                    crc = (crc >> 8) ^ crc32table_le[(crc & 255) ^ *p++];
            return crc;
    }

The existing static_call selects between 3 different assembly functions, all of
which require a kernel-mode FPU section and only support len >= 16.

We could instead unconditionally do a static_call upon entry to the function,
with 4 possible targets.  But then we'd have to duplicate the kernel FPU
begin/end sequence in 3 different functions.  Also, it would add an extra
function call for the case where 'len < 16', which is a common case and is
exactly the case where per-call overhead matters the most.

However, if we could actually inline the static call into the *callers* of
crc32_le(), that would make it more worthwhile.  I'm not sure that's possible,
though, especially considering that this code is tristate.

Anyway, this is tangential to this patchset.  Though the new way the code is
organized does make it more feasible to have e.g. a centralized static_call in
the future if we choose to go in that direction.

- Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ