[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <MW5PR84MB1842A7B13F829CC3FF09A8A5ABE69@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM>
Date: Fri, 16 Dec 2022 22:12:05 +0000
From: "Elliott, Robert (Servers)" <elliott@....com>
To: Herbert Xu <herbert@...dor.apana.org.au>
CC: "Elliott, Robert (Servers)" <elliott@....com>,
Peter Lafreniere <peter@...jl.ca>,
"Jason A. Donenfeld" <Jason@...c4.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"tim.c.chen@...ux.intel.com" <tim.c.chen@...ux.intel.com>,
"ap420073@...il.com" <ap420073@...il.com>,
"ardb@...nel.org" <ardb@...nel.org>,
"David.Laight@...lab.com" <David.Laight@...lab.com>,
"ebiggers@...nel.org" <ebiggers@...nel.org>,
"linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption
> I'll keep experimenting with all the preempt modes, heavier
> workloads, and shorter RCU timeouts to confirm this solution
> is robust. It might even be appropriate for the generic
> drivers, if they suffer from the problems that sm4 shows here.
I have a set of patches that's looking promising. It's no longer
generating RCU stall warnings or soft lockups with either x86
drivers or generic drivers (sm4 is particularly taxing).
Test case:
* added 28 clones of the tcrypt module so modprobe can run it
many times in parallel (1 thread per CPU core)
* added 1 MiB big buffer functional tests (compare to
generic results)
* added 1 MiB big buffer speed tests
* 3 windows running
* 28 threads running
* modprobe with each defined test mode in order 1, 2, 3, etc.
* RCU stall timeouts set to shortest supported values
* run in preempt=none, preempt=voluntary, preempt=full modes
Patches include:
* Ard's kmap_local() patch
* Suppress RCU stall warnings during speed tests. Change the
rcu_sysrq_start()/end() functions to be general purpose and
call them from tcrypt test functions that measure time of
a crypto operation
* add crypto_yield() unilaterally in skcipher_walk_done so
it is run even if data is aligned
* add crypto_yield() in aead_encrypt/decrypt so they always
call it like skcipher
* add crypto_yield() at the end each hash update(), digest(),
and finup() function so they always call it like skcipher
* add kernel_fpu_yield() calls every 4 KiB inside x86
kernel_fpu_begin()/end() blocks, so the x86 functions always
yield to the scheduler even when they're bypassing those
helper functions (that now call crypto_yield() more
consistently)
I'll keep trying to break it over the weekend. If it holds
up I'll post the patches next week.
Powered by blists - more mailing lists