[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFy8fNOxw3bnwkX1S46jKnW6i26mueaiuOsScyN3kFJp+A@mail.gmail.com>
Date: Wed, 21 Dec 2016 09:25:01 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: George Spelvin <linux@...encehorizons.net>
Cc: "Jason A. Donenfeld" <Jason@...c4.com>,
Andi Kleen <ak@...ux.intel.com>,
David Miller <davem@...emloft.net>,
David Laight <David.Laight@...lab.com>,
"Daniel J . Bernstein" <djb@...yp.to>,
Eric Biggers <ebiggers3@...il.com>,
Eric Dumazet <eric.dumazet@...il.com>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
Jean-Philippe Aumasson <jeanphilippe.aumasson@...il.com>,
"kernel-hardening@...ts.openwall.com"
<kernel-hardening@...ts.openwall.com>,
Linux Crypto Mailing List <linux-crypto@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andy Lutomirski <luto@...capital.net>,
Network Development <netdev@...r.kernel.org>,
Tom Herbert <tom@...bertland.com>,
"Theodore Ts'o" <tytso@....edu>,
Vegard Nossum <vegard.nossum@...il.com>
Subject: Re: HalfSipHash Acceptable Usage
On Wed, Dec 21, 2016 at 7:55 AM, George Spelvin
<linux@...encehorizons.net> wrote:
>
> How much does kernel_fpu_begin()/kernel_fpu_end() cost?
It's now better than it used to be, but it's absolutely disastrous
still. We're talking easily many hundreds of cycles. Under some loads,
thousands.
And I warn you already: it will _benchmark_ a hell of a lot better
than it will work in reality. In benchmarks, you'll hit all the
optimizations ("oh, I've already saved away all the FP registers, no
need to do it again").
In contrast, in reality, especially with things like "do it once or
twice per incoming packet", you'll easily hit the absolute worst
cases, where not only does it take a few hundred cycles to save the FP
state, you'll then return to user space in between packets, which
triggers the slow-path return code and reloads the FP state, which is
another few hundred cycles plus.
Similarly, in benchmarks you'll hit the "modern CPU's power on the AVX
unit and keep it powered up for a while afterwards", while in real
life you would quite easily hit the "oh, AVX is powered down because
we were idle, now it powers up at half speed which is another latency
hit _and_ the AVX unit won't run full out anyway".
Don't do it. There are basically no real situations where the AVX
state optimizations help for the kernel. We just don't have the loop
counts to make up for the problems it causes.
The one exception is likely if you're doing things like
high-throughput disk IO encryption, and then you'd be much better off
using SHA256 instead (which often has hw encryption on modern CPU's -
both x86 and ARM).
(I'm sure that you could see it on some high-throughput network
benchmark too when the benchmark entirely saturates the CPU. And then
in real life it would suck horribly for all the reasons above).
Linus
Powered by blists - more mailing lists