linux-kernel - RE: [PATCH v4 10/24] crypto: x86/poly

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <FuRP6eq1TdvRdmbeIUM2jGr9qmB2CptLqZsWq_hVq3Bqur-pL9HX3xBLPC3iC1F6F4FT1Moi9GhPssy8MGEKO7LUKH6sA34TbV0S8K3Gn8Q=@n8pjl.ca>
Date:   Tue, 06 Dec 2022 23:06:27 +0000
From:   Peter Lafreniere <peter@...jl.ca>
To:     "Elliott, Robert (Servers)" <elliott@....com>
Cc:     Herbert Xu <herbert@...dor.apana.org.au>,
        "Jason A. Donenfeld" <Jason@...c4.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "tim.c.chen@...ux.intel.com" <tim.c.chen@...ux.intel.com>,
        "ap420073@...il.com" <ap420073@...il.com>,
        "ardb@...nel.org" <ardb@...nel.org>,
        "David.Laight@...lab.com" <David.Laight@...lab.com>,
        "ebiggers@...nel.org" <ebiggers@...nel.org>,
        "linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption

> > Subject: Re: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption
> > Perhaps we should try a different approach. How about just limiting
> > the size to 4K, and then depending on need_resched we break out of
> > the loop? Something like:
> > 
> > if (!len)
> > return 0;
> > 
> > kernel_fpu_begin();
> > for (;;) {
> > unsigned int chunk = min(len, 4096);
> > 
> > sha1_base_do_update(desc, data, chunk, sha1_xform);
> > 
> > len -= chunk;
> > data += chunk;
> > 
> > if (!len)
> > break;
> > 
> > if (need_resched()) {
> > kernel_fpu_end();
> > cond_resched();
> > kernel_fpu_begin();
> > }
> > }
> > kernel_fpu_end();
> 
> 
> I implemented that conditional approach in the sha algorithms.
> 
> The results of a boot (using sha512 for module signatures, with
> crypto extra tests enabled, comparing to sha512 with a 20 KiB
> fixed limit) are:
> 
> sha1 cond: 14479 calls; 784256 cycles doing begin/end; longest FPU context 35828 cycles
> sha256 cond: 26763 calls; 1273570 cycles doing begin/end; longest FPU context 118612 cycles
> sha512 cond: 26957 calls; 1680046 cycles doing begin/end; longest FPU context 169140982 cycles
> sha512 20KiB: 161011 calls; 16232280 cycles doing begin/end; longest FPU context 4049644 cycles
> 
> NOTE: I didn't have a patch in place to isolate the counts for each variation
> (ssse3 vs. avx vs. avx2) and
> - for sha512: sha512 vs. sha384
> - for sha256: sha256 vs. sha224
> so the numbers include sha256 and sha512 running twice as many tests
> as sha1.
> 
> This approach looks very good:
> - 16% of the number of begin/end calls
> - 10% of the CPU cycles spent making the calls
> - the FPU context is held for a long time (77 ms) but only while
> it's not needed.
> 
> That's much more efficient than releasing it every 30 us just in case.

How recently did you make this change? I implemented this conditional 
approach for ecb_cbc_helpers.h, but saw no changes at all to performance 
on serpent-avx2 and twofish-avx.

kernel_fpu_{begin,end} (after the first call to begin) don't do anything 
more than enable/disable preemption and make a few writes to the mxcsr. 
It's likely that the above approach has the tiniest bit less overhead, 
and it will preempt on non CONFIG_PREEMPT kernels, but nothing suggests 
a performance uplift.

This brings us back to this question: should crypto routines be 
preempted under PREEMPT_VOLUNTARY or not?

> I'll keep testing this to make sure RCU stalls stay away, and apply
> the approach to the other algorithms.

I missed the earlier discussions. Have you seen issues with RCU 
stalls/latency spikes because of crypto routines? If so, what preemption 
model were you running?
 
> In x86, need_resched() has to deal with a PER_CPU variable, so I'm
> not sure it's worth the hassle to figure out how to do that from
> assembly code.

Leave it in c. It'll be more maintainable that way.

Cheers,
Peter Lafreniere <peter@...jl.ca>