lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110725064252.GD694@elte.hu>
Date:	Mon, 25 Jul 2011 08:42:52 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Andrew Lutomirski <luto@....edu>
Cc:	linux-kernel@...r.kernel.org, x86 <x86@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Arjan van de Ven <arjan@...radead.org>,
	Avi Kivity <avi@...hat.com>
Subject: Re: [RFC] syscall calling convention, stts/clts, and xstate latency


* Andrew Lutomirski <luto@....edu> wrote:

> On Sun, Jul 24, 2011 at 6:34 PM, Andrew Lutomirski <luto@....edu> wrote:
> >
> > I had in mind something a little less ambitious: making 
> > kernel_fpu_begin very fast, especially when used more than once. 
> > Currently it's slow enough to have spawned arch/x86/crypto/fpu.c, 
> > which is a hideous piece of infrastructure that exists solely to 
> > reduce the number of kernel_fpu_begin/end pairs when using 
> > AES-NI. Clobbering registers in syscall would reduce the cost 
> > even more, but it might require having a way to detect whether 
> > the most recent kernel entry was via syscall or some other means.
> 
> I think it will be very hard to inadvertently cause a regression, 
> because the current code looks pretty bad.

[ heh, one of the rare cases where bad code works in our favor ;-) ]

> 1. Once a task uses xstate for five timeslices, the kernel decides 
> that it will continue using it.  The only thing that clears that 
> condition is __unlazy_fpu called with TS_USEDFPU set.  The only way 
> I can see for that to happen is if kernel_fpu_begin is called twice 
> in a row between context switches, and that has little do with the 
> task's xstate usage.
> 
> 2. __switch_to, when switching to a task with fpu_counter > 5, will 
> do stts(); clts().
> 
> The combination means that when switching between two xstate-using 
> tasks (or even tasks that were once xstate-using), we pay the full 
> price of a state save/restore *and* stts/clts.

I'm all for simplifying this for modern x86 CPUs.

The lazy FPU switching logic was kind of neat on UP but started 
showing its limitations with SMP already - and that was 10 years ago.

So if the numbers prove you right then go for it. It's an added bonus 
that this could enable the kernel to be built using vector 
instructions - you may or may not want to shoot for the glory of 
achieving that feat first ;-)

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ