lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 22 Sep 2020 17:39:24 -0700 From: Doug Anderson <dianders@...omium.org> To: Ard Biesheuvel <ardb@...nel.org> Cc: Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>, Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, Jackie Liu <liuyun01@...inos.cn>, Linux ARM <linux-arm-kernel@...ts.infradead.org>, Ard Biesheuvel <ard.biesheuvel@...aro.org> Subject: Re: [PATCH] arm64: crypto: Add an option to assume NEON XOR is the fastest On Mon, Sep 21, 2020 at 11:25 PM Ard Biesheuvel <ardb@...nel.org> wrote: > > On Tue, 22 Sep 2020 at 02:27, Douglas Anderson <dianders@...omium.org> wrote: > > > > On every boot time we see messages like this: > > > > [ 0.025360] calling calibrate_xor_blocks+0x0/0x134 @ 1 > > [ 0.025363] xor: measuring software checksum speed > > [ 0.035351] 8regs : 3952.000 MB/sec > > [ 0.045384] 32regs : 4860.000 MB/sec > > [ 0.055418] arm64_neon: 5900.000 MB/sec > > [ 0.055423] xor: using function: arm64_neon (5900.000 MB/sec) > > [ 0.055433] initcall calibrate_xor_blocks+0x0/0x134 returned 0 after 29296 usecs > > > > As you can see, we spend 30 ms on every boot re-confirming that, yet > > again, the arm64_neon implementation is the fastest way to do XOR. > > ...and the above is on a system with HZ=1000. Due to the way the > > testing happens, if we have HZ defined to something slower it'll take > > much longer. HZ=100 means we spend 300 ms on every boot re-confirming > > a fact that will be the same for every bootup. > > > > Trying to super-optimize the xor operation makes a lot of sense if > > you're using software RAID, but the above is probably not worth it for > > most Linux users because: > > 1. Quite a few arm64 kernels are built for embedded systems where > > software raid isn't common. That means we're spending lots of time > > on every boot trying to optimize something we don't use. > > 2. Presumably, if we have neon, it's faster than alternatives. If > > it's not, it's not expected to be tons slower. > > 3. Quite a lot of arm64 systems are big.LITTLE. This means that the > > existing test is somewhat misguided because it's assuming that test > > results on the boot CPU apply to the other CPUs in the system. > > This is not necessarily the case. > > > > Let's add a new config option that allows us to just use the neon > > functions (if present) without benchmarking. > > > > NOTE: One small side effect is that on an arm64 system _without_ neon > > we'll end up testing the xor_block_8regs_p and xor_block_32regs_p > > versions of the function. That's presumably OK since we already test > > all those when KERNEL_MODE_NEON is disabled. > > > > ALSO NOTE: presumably the way to do better than this is to add some > > sort of per-CPU-core lookup table and jump to a per-CPU-core-specific > > XOR function each time xor is called. Without seeing evidence that > > this would really help someone, though, that doesn't seem worth it. > > > > Signed-off-by: Douglas Anderson <dianders@...omium.org> > > On the two arm64 machines that I happen to have running right now, I get > > SynQuacer (Cortex-A53) > > 8regs : 1917.000 MB/sec > 32regs : 2270.000 MB/sec > arm64_neon: 2053.000 MB/sec > > ThunderX2 > > 8regs : 10170.000 MB/sec > 32regs : 12051.000 MB/sec > arm64_neon: 10948.000 MB/sec > > so your assertion is not entirely valid. OK, good to know. > If the system does not need XOR, it is free not to load the module, so > there is no reason it has to affect the boot time. The fact that it was run super early somehow made me just assume that this couldn't be a module, but of course you're right that it can be a module. That works for me and saves me my precious boot time. ;-) That being said, this'll still bite anyone who wants to build this in for whatever reason. I'll respond to your other email with more...
Powered by blists - more mailing lists