linux-kernel - Re: [PATCH RFC 0/6] Implement per-processor data areas for i386.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1156700663.3034.118.camel@laptopd505.fenrus.org>
Date:	Sun, 27 Aug 2006 19:44:23 +0200
From:	Arjan van de Ven <arjan@...radead.org>
To:	Jeremy Fitzhardinge <jeremy@...p.org>
Cc:	linux-kernel@...r.kernel.org,
	Chuck Ebbert <76306.1226@...puserve.com>,
	Zachary Amsden <zach@...are.com>,
	Jan Beulich <jbeulich@...ell.com>, Andi Kleen <ak@...e.de>,
	Andrew Morton <akpm@...l.org>
Subject: Re: [PATCH RFC 0/6] Implement per-processor data areas for i386.

On Sun, 2006-08-27 at 09:46 -0700, Jeremy Fitzhardinge wrote:
> Arjan van de Ven wrote:
> > this will be interesting; x86-64 has a nice instruction to help with
> > this; 32 bit does not... so far conventional wisdom has been that
> > without the instruction it's not going to be worth it.
> >   
> 
> Hm, swapgs may be quick, but it isn't very easy to use since it doesn't 
> stack, and so requires careful handling for recursive kernel entries, 
> which involves extra tests and conditional jumps.  I tried doing 
> something similar with my earlier patches, but it got all too messy.  
> Stacking %gs like the other registers turns out pretty cleanly.
> 
> > When you're benchmarking this please use multiple CPU generations from
> > different vendors; I suspect this is one of those things that vary
> > greatly between models
> >   
> 
> Hm, it seems to me that unless the existing %ds/%es register 
> save/restores are a significant part of the existing cost of going 
> through entry.S, 

iirc the %fs one is at least. But it has been a while since I've looked
at this part of the kernel via performance traces.

> adding %gs to the set shouldn't make too much 
> difference.  And I'm not sure about the relative cost of using a %gs 
> override vs. the normal current_task_info() masking, but I'm assuming 
> they're at worst equal, with the %gs override having a code-size advantage.

your worst case scenario would be if the segment override would make it
a "complex" instruction, so not parallel decodable. That'd mean it would
basically cost you 6 or 7 instruction slots that can't be filled...
while an and and such at least run nicely in parallel with other stuff.
I don't know which if any processors actually do this, but it's rare/new
enough that I'd not be surprised if there are some.



-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/