[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1156700663.3034.118.camel@laptopd505.fenrus.org>
Date: Sun, 27 Aug 2006 19:44:23 +0200
From: Arjan van de Ven <arjan@...radead.org>
To: Jeremy Fitzhardinge <jeremy@...p.org>
Cc: linux-kernel@...r.kernel.org,
Chuck Ebbert <76306.1226@...puserve.com>,
Zachary Amsden <zach@...are.com>,
Jan Beulich <jbeulich@...ell.com>, Andi Kleen <ak@...e.de>,
Andrew Morton <akpm@...l.org>
Subject: Re: [PATCH RFC 0/6] Implement per-processor data areas for i386.
On Sun, 2006-08-27 at 09:46 -0700, Jeremy Fitzhardinge wrote:
> Arjan van de Ven wrote:
> > this will be interesting; x86-64 has a nice instruction to help with
> > this; 32 bit does not... so far conventional wisdom has been that
> > without the instruction it's not going to be worth it.
> >
>
> Hm, swapgs may be quick, but it isn't very easy to use since it doesn't
> stack, and so requires careful handling for recursive kernel entries,
> which involves extra tests and conditional jumps. I tried doing
> something similar with my earlier patches, but it got all too messy.
> Stacking %gs like the other registers turns out pretty cleanly.
>
> > When you're benchmarking this please use multiple CPU generations from
> > different vendors; I suspect this is one of those things that vary
> > greatly between models
> >
>
> Hm, it seems to me that unless the existing %ds/%es register
> save/restores are a significant part of the existing cost of going
> through entry.S,
iirc the %fs one is at least. But it has been a while since I've looked
at this part of the kernel via performance traces.
> adding %gs to the set shouldn't make too much
> difference. And I'm not sure about the relative cost of using a %gs
> override vs. the normal current_task_info() masking, but I'm assuming
> they're at worst equal, with the %gs override having a code-size advantage.
your worst case scenario would be if the segment override would make it
a "complex" instruction, so not parallel decodable. That'd mean it would
basically cost you 6 or 7 instruction slots that can't be filled...
while an and and such at least run nicely in parallel with other stuff.
I don't know which if any processors actually do this, but it's rare/new
enough that I'd not be surprised if there are some.
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists