[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <0E451C72-3F30-4921-8C1B-60754899B19E@goop.org>
Date: Sat, 1 Dec 2007 18:44:10 -0500
From: Jeremy Fitzhardinge <jeremy@...p.org>
To: Ingo Molnar <mingo@...e.hu>
Cc: Andi Kleen <andi@...stfloor.org>, "H. Peter Anvin" <hpa@...or.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andi Kleen <ak@...e.de>, Chuck Ebbert <cebbert@...hat.com>,
Roland McGrath <roland@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
zach@...are.com
Subject: Re: [PATCH x86/mm 6/6] x86-64 ia32 ptrace get/putreg32 current task
On Nov 29, 2007, at 2:44 PM, Ingo Molnar wrote:
>
> * Andi Kleen <andi@...stfloor.org> wrote:
>
>> For i386 iirc Jeremy/Zach did the benchmarking and they settled on
>> %fs
>> because it was faster for something (originally it was %gs too)
>
> yep. IIRC, some CPUs only optimize %fs because that's what Windows
> uses
> and leaves Linux with %gs out in the cold.
I did measure some anomalies with the AMD K6+ (or something like
that), in which %gs was faster than %fs. It was pretty much
inexplicable, but also unique - all other processors I tested (which
was a range from Pentium MMX to current) had identical performance.
> There's also a performance
> penalty for overlapping segment use, if the segment cache is single
> entry only with an additional optimization for NULL [which just hides
> the segment cache].
Some processors do perform slightly better with null selector loads
than GDT/LDT ones, but it wasn't really noticeable for modern
processors. The Intel architecture guy I asked about this said that
it might be worth doing, but it would likely be swamped by a GDT
cache miss. I looked at rearranging the kernel's GDT to pack all the
kernel entry/exit entries into as few cachelines as possible, but it
was surprisingly fiddley.
> But if it's good for unification we could switch that to %gs again on
> 32-bit. I was one of the people who advocated the use of the 'other'
> segment register, so that the hardware has less overlap, but clean and
> unified code trumps this concern. It shouldnt be an issue on
> reasonably
> modern CPUs anyway.
Well, overall it should be fairly easy to make the two arches use
their own segment registers with a simple #define. But things like
ptrace and vm86 were tricky, though I guess the latter isn't an issue
for 64-bit.
I originally chose %gs for the kernel, partly in the hope that
compiler support for TLS would be helpful in the kernel, though that
doesn't seem like a good idea in retrospect. %gs for the sake of
consistency would be reasonable, and wouldn't have a measurable
downside.
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists