lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXHHm+L=qE5opDXhjoWZt+1eKXFeGVS=OdvyF0VNFZivCA@mail.gmail.com>
Date:   Fri, 25 Nov 2022 09:59:17 +0100
From:   Ard Biesheuvel <ardb@...nel.org>
To:     Herbert Xu <herbert@...dor.apana.org.au>
Cc:     "Jason A. Donenfeld" <Jason@...c4.com>,
        Robert Elliott <elliott@....com>, davem@...emloft.net,
        tim.c.chen@...ux.intel.com, ap420073@...il.com,
        David.Laight@...lab.com, ebiggers@...nel.org,
        linux-crypto@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption

On Fri, 25 Nov 2022 at 09:41, Herbert Xu <herbert@...dor.apana.org.au> wrote:
>
> On Wed, Nov 16, 2022 at 12:13:51PM +0100, Jason A. Donenfeld wrote:
> > On Tue, Nov 15, 2022 at 10:13:28PM -0600, Robert Elliott wrote:
> > > +/* avoid kernel_fpu_begin/end scheduler/rcu stalls */
> > > +static const unsigned int bytes_per_fpu = 337 * 1024;
> > > +
> >
> > Use an enum for constants like this:
> >
> >     enum { BYTES_PER_FPU = ... };
> >
> > You can even make it function-local, so it's near the code that uses it,
> > which will better justify its existence.
> >
> > Also, where did you get this number? Seems kind of weird.
>
> These numbers are highly dependent on hardware and I think having
> them hard-coded is wrong.
>
> Perhaps we should try a different approach.  How about just limiting
> the size to 4K, and then depending on need_resched we break out of
> the loop? Something like:
>
>         if (!len)
>                 return 0;
>
>         kernel_fpu_begin();
>         for (;;) {
>                 unsigned int chunk = min(len, 4096);
>
>                 sha1_base_do_update(desc, data, chunk, sha1_xform);
>
>                 len -= chunk;
>                 data += chunk;
>
>                 if (!len)
>                         break;
>
>                 if (need_resched()) {
>                         kernel_fpu_end();
>                         cond_resched();
>                         kernel_fpu_begin();
>                 }
>         }
>         kernel_fpu_end();
>

On arm64, this is implemented in an assembler macro 'cond_yield' so we
don't need to preserve/restore the SIMD state state at all if the
yield is not going to result in a call to schedule(). For example, the
SHA3 code keeps 400 bytes of state in registers, which we don't want
to save and reload unless needed. (5f6cb2e617681 'crypto:
arm64/sha512-ce - simplify NEON yield')

So the asm routines will call cond_yield, and return early if a yield
is required, with the number of blocks or bytes left to process as the
return value. The C wrapper just calls the asm routine in a loop until
the return value becomes 0.

That way, we don't need magic values at all, and the yield will occur
as soon as the asm inner loop observes the yield condition so the
latency should be much lower as well.

Note that it is only used in shash implementations, given that they
are the only ones that may receive unbounded inputs.

Powered by blists - more mailing lists