linux-kernel - Re: Lazy FPU restoration / moving kernel_fpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180615193438.GE2458@hirez.programming.kicks-ass.net>
Date:   Fri, 15 Jun 2018 21:34:38 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     "Jason A. Donenfeld" <Jason@...c4.com>,
        LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        Andy Lutomirski <luto@...capital.net>
Subject: Re: Lazy FPU restoration / moving kernel_fpu_end() to context switch

On Fri, Jun 15, 2018 at 06:25:39PM +0200, Thomas Gleixner wrote:
> On Fri, 15 Jun 2018, Jason A. Donenfeld wrote:
> > In a loop this looks like:
> > 
> > for (thing) {
> >   kernel_fpu_begin();
> >   encrypt(thing);
> >   kernel_fpu_end();
> > }
> > 
> > This is obviously very bad, because begin() and end() are slow, so
> > WireGuard does the obvious:
> > 
> > kernel_fpu_begin();
> > for (thing)
> >   encrypt(thing);
> > kernel_fpu_end();
> > 
> > This is fine and well, and the crypto API I'm working on will enable
> 
> It might be fine crypto performance wise, but it's a total nightmare
> latency wise because kernel_fpu_begin() disables preemption. We've seen
> latencies in the larger millisecond range due to processing large data sets
> with kernel FPU.
> 
> If you want to go there then we really need a better approach which allows
> kernel FPU usage in preemptible context and in case of preemption a way to
> stash the preempted FPU context and restore it when the task gets scheduled
> in again. Just using the existing FPU stuff and moving the loops inside the
> begin/end section and keeping preemption disabled for arbitrary time spans
> is not going to fly.

Didn't we recently do a bunch of crypto patches to help with this?

I think they had the pattern:

	kernel_fpu_begin();
	for (units-of-work) {
		do_unit_of_work();
		if (need_resched()) {
			kernel_fpu_end();
			cond_resched();
			kernel_fpu_begin();
		}
	}
	kernel_fpu_end();

I'd have to go dig out the actual series, but I think they had a bunch
of helpers to deal with that.