linux-kernel - Re: [PATCH x86/mm 6/6] x86-64 ia32 ptrace get/putreg32 current task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <0E451C72-3F30-4921-8C1B-60754899B19E@goop.org>
Date:	Sat, 1 Dec 2007 18:44:10 -0500
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Andi Kleen <andi@...stfloor.org>, "H. Peter Anvin" <hpa@...or.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andi Kleen <ak@...e.de>, Chuck Ebbert <cebbert@...hat.com>,
	Roland McGrath <roland@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	zach@...are.com
Subject: Re: [PATCH x86/mm 6/6] x86-64 ia32 ptrace get/putreg32 current task

On Nov 29, 2007, at 2:44 PM, Ingo Molnar wrote:

>
> * Andi Kleen <andi@...stfloor.org> wrote:
>
>> For i386 iirc Jeremy/Zach did the benchmarking and they settled on  
>> %fs
>> because it was faster for something (originally it was %gs too)
>
> yep. IIRC, some CPUs only optimize %fs because that's what Windows  
> uses
> and leaves Linux with %gs out in the cold.

I did measure some anomalies with the AMD K6+ (or something like  
that), in which %gs was faster than %fs.  It was pretty much  
inexplicable, but also unique - all other processors I tested (which  
was a range from Pentium MMX to current) had identical performance.

> There's also a performance
> penalty for overlapping segment use, if the segment cache is single
> entry only with an additional optimization for NULL [which just hides
> the segment cache].

Some processors do perform slightly better with null selector loads  
than GDT/LDT ones, but it wasn't really noticeable for modern  
processors.  The Intel architecture guy I asked about this said that  
it might be worth doing, but it would likely be swamped by a GDT  
cache miss.  I looked at rearranging the kernel's GDT to pack all the  
kernel entry/exit entries into as few cachelines as possible, but it  
was surprisingly fiddley.

> But if it's good for unification we could switch that to %gs again on
> 32-bit. I was one of the people who advocated the use of the 'other'
> segment register, so that the hardware has less overlap, but clean and
> unified code trumps this concern. It shouldnt be an issue on  
> reasonably
> modern CPUs anyway.

Well, overall it should be fairly easy to make the two arches use  
their own segment registers with a simple #define.  But things like  
ptrace and vm86 were tricky, though I guess the latter isn't an issue  
for 64-bit.

I originally chose %gs for the kernel, partly in the hope that  
compiler support for TLS would be helpful in the kernel, though that  
doesn't seem like a good idea in retrospect.  %gs for the sake of  
consistency would be reasonable, and wouldn't have a measurable  
downside.

	J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/