linux-kernel - Re: [RFC] syscall calling convention, stts/clts, and xstate latency

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110725064252.GD694@elte.hu>
Date:	Mon, 25 Jul 2011 08:42:52 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Andrew Lutomirski <luto@....edu>
Cc:	linux-kernel@...r.kernel.org, x86 <x86@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Arjan van de Ven <arjan@...radead.org>,
	Avi Kivity <avi@...hat.com>
Subject: Re: [RFC] syscall calling convention, stts/clts, and xstate latency


* Andrew Lutomirski <luto@....edu> wrote:

> On Sun, Jul 24, 2011 at 6:34 PM, Andrew Lutomirski <luto@....edu> wrote:
> >
> > I had in mind something a little less ambitious: making 
> > kernel_fpu_begin very fast, especially when used more than once. 
> > Currently it's slow enough to have spawned arch/x86/crypto/fpu.c, 
> > which is a hideous piece of infrastructure that exists solely to 
> > reduce the number of kernel_fpu_begin/end pairs when using 
> > AES-NI. Clobbering registers in syscall would reduce the cost 
> > even more, but it might require having a way to detect whether 
> > the most recent kernel entry was via syscall or some other means.
> 
> I think it will be very hard to inadvertently cause a regression, 
> because the current code looks pretty bad.

[ heh, one of the rare cases where bad code works in our favor ;-) ]

> 1. Once a task uses xstate for five timeslices, the kernel decides 
> that it will continue using it.  The only thing that clears that 
> condition is __unlazy_fpu called with TS_USEDFPU set.  The only way 
> I can see for that to happen is if kernel_fpu_begin is called twice 
> in a row between context switches, and that has little do with the 
> task's xstate usage.
> 
> 2. __switch_to, when switching to a task with fpu_counter > 5, will 
> do stts(); clts().
> 
> The combination means that when switching between two xstate-using 
> tasks (or even tasks that were once xstate-using), we pay the full 
> price of a state save/restore *and* stts/clts.

I'm all for simplifying this for modern x86 CPUs.

The lazy FPU switching logic was kind of neat on UP but started 
showing its limitations with SMP already - and that was 10 years ago.

So if the numbers prove you right then go for it. It's an added bonus 
that this could enable the kernel to be built using vector 
instructions - you may or may not want to shoot for the glory of 
achieving that feat first ;-)

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/