lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Sat, 1 Dec 2007 18:44:10 -0500
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Andi Kleen <andi@...stfloor.org>, "H. Peter Anvin" <hpa@...or.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andi Kleen <ak@...e.de>, Chuck Ebbert <cebbert@...hat.com>,
	Roland McGrath <roland@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	zach@...are.com
Subject: Re: [PATCH x86/mm 6/6] x86-64 ia32 ptrace get/putreg32 current task


On Nov 29, 2007, at 2:44 PM, Ingo Molnar wrote:

>
> * Andi Kleen <andi@...stfloor.org> wrote:
>
>> For i386 iirc Jeremy/Zach did the benchmarking and they settled on  
>> %fs
>> because it was faster for something (originally it was %gs too)
>
> yep. IIRC, some CPUs only optimize %fs because that's what Windows  
> uses
> and leaves Linux with %gs out in the cold.

I did measure some anomalies with the AMD K6+ (or something like  
that), in which %gs was faster than %fs.  It was pretty much  
inexplicable, but also unique - all other processors I tested (which  
was a range from Pentium MMX to current) had identical performance.

> There's also a performance
> penalty for overlapping segment use, if the segment cache is single
> entry only with an additional optimization for NULL [which just hides
> the segment cache].

Some processors do perform slightly better with null selector loads  
than GDT/LDT ones, but it wasn't really noticeable for modern  
processors.  The Intel architecture guy I asked about this said that  
it might be worth doing, but it would likely be swamped by a GDT  
cache miss.  I looked at rearranging the kernel's GDT to pack all the  
kernel entry/exit entries into as few cachelines as possible, but it  
was surprisingly fiddley.

> But if it's good for unification we could switch that to %gs again on
> 32-bit. I was one of the people who advocated the use of the 'other'
> segment register, so that the hardware has less overlap, but clean and
> unified code trumps this concern. It shouldnt be an issue on  
> reasonably
> modern CPUs anyway.

Well, overall it should be fairly easy to make the two arches use  
their own segment registers with a simple #define.  But things like  
ptrace and vm86 were tricky, though I guess the latter isn't an issue  
for 64-bit.

I originally chose %gs for the kernel, partly in the hope that  
compiler support for TLS would be helpful in the kernel, though that  
doesn't seem like a good idea in retrospect.  %gs for the sake of  
consistency would be reasonable, and wouldn't have a measurable  
downside.

	J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ