linux-kernel - Re: [PATCH 3.1?] x86: Remove useless stts/clts pair in __switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 25 Jul 2011 09:04:58 -0400
From:	Andrew Lutomirski <luto@....edu>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	x86 <x86@...nel.org>, linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Arjan van de Ven <arjan@...radead.org>,
	Avi Kivity <avi@...hat.com>
Subject: Re: [PATCH 3.1?] x86: Remove useless stts/clts pair in __switch_to

On Mon, Jul 25, 2011 at 7:12 AM, Ingo Molnar <mingo@...e.hu> wrote:
>
> * Andy Lutomirski <luto@....EDU> wrote:
>
>> An stts/clts pair takes over 70 ns by itself on Sandy Bridge, and
>> when other things are going on it's apparently even worse.  This
>> saves 10% on context switches between threads that both use extended
>> state.
>>
>> Signed-off-by: Andy Lutomirski <luto@....edu>
>> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
>> Cc: Arjan van de Ven <arjan@...radead.org>,
>> Cc: Avi Kivity <avi@...hat.com>
>> ---
>>
>> This is not as well tested as it should be (especially on 32-bit, where
>> I haven't actually tried compiling it), but I think this might be 3.1
>> material so I want to get it out for review before it's even more
>> unjustifiably late :)
>>
>> Argument for inclusion in 3.1 (after a bit more testing):
>>  - It's dead simple.
>>  - It's a 10% speedup on context switching under the right conditions [1]
>>  - It's unlikely to slow any workload down, since it doesn't add any work
>>    anywwhere.
>>
>> Argument against:
>>  - It's late.
>
> I think it's late.
>
> Would be much better to stick it into the x86/xsave tree i pointed to
> and treat and debug it as a coherent unit. FPU bugs need a lot of
> time to surface so we definitely do not want to fast-track it. In
> fact if we want it in v3.2 we should start assembling the tree right
> now.

Fair enough.  I make no guarantee that I'll have anything ready in
less than a few weeks.  I'm defending my thesis in a week, and kernel
hacking is entirely a distraction. :)  (The only thing my thesis has
to do with operating systems is that I mention recvmmsg.)

>
> Also, if you are tempted by the prospect of possibly enabling vector
> instructions for the x86 kernel, we could try that too, and get
> multiple speedups for the price of having to debug the tree only once
> ;-)

I'll play with it.  I have some other cleanup / speedup ideas, too,
and I'll see where they go.  Given that the kernel doesn't really use
floating-point math, I'm not sure that gcc will do much unless we turn
on -ftree-vectorize, and that's a little scary.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/