[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110725064252.GD694@elte.hu>
Date: Mon, 25 Jul 2011 08:42:52 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Andrew Lutomirski <luto@....edu>
Cc: linux-kernel@...r.kernel.org, x86 <x86@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Arjan van de Ven <arjan@...radead.org>,
Avi Kivity <avi@...hat.com>
Subject: Re: [RFC] syscall calling convention, stts/clts, and xstate latency
* Andrew Lutomirski <luto@....edu> wrote:
> On Sun, Jul 24, 2011 at 6:34 PM, Andrew Lutomirski <luto@....edu> wrote:
> >
> > I had in mind something a little less ambitious: making
> > kernel_fpu_begin very fast, especially when used more than once.
> > Currently it's slow enough to have spawned arch/x86/crypto/fpu.c,
> > which is a hideous piece of infrastructure that exists solely to
> > reduce the number of kernel_fpu_begin/end pairs when using
> > AES-NI. Clobbering registers in syscall would reduce the cost
> > even more, but it might require having a way to detect whether
> > the most recent kernel entry was via syscall or some other means.
>
> I think it will be very hard to inadvertently cause a regression,
> because the current code looks pretty bad.
[ heh, one of the rare cases where bad code works in our favor ;-) ]
> 1. Once a task uses xstate for five timeslices, the kernel decides
> that it will continue using it. The only thing that clears that
> condition is __unlazy_fpu called with TS_USEDFPU set. The only way
> I can see for that to happen is if kernel_fpu_begin is called twice
> in a row between context switches, and that has little do with the
> task's xstate usage.
>
> 2. __switch_to, when switching to a task with fpu_counter > 5, will
> do stts(); clts().
>
> The combination means that when switching between two xstate-using
> tasks (or even tasks that were once xstate-using), we pay the full
> price of a state save/restore *and* stts/clts.
I'm all for simplifying this for modern x86 CPUs.
The lazy FPU switching logic was kind of neat on UP but started
showing its limitations with SMP already - and that was 10 years ago.
So if the numbers prove you right then go for it. It's an added bonus
that this could enable the kernel to be built using vector
instructions - you may or may not want to shoot for the glory of
achieving that feat first ;-)
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists