[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.10.0808140901390.3324@nehalem.linux-foundation.org>
Date: Thu, 14 Aug 2008 09:10:36 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
cc: Jeremy Fitzhardinge <jeremy@...p.org>,
"H. Peter Anvin" <hpa@...or.com>, Andi Kleen <andi@...stfloor.org>,
Ingo Molnar <mingo@...e.hu>,
Steven Rostedt <rostedt@...dmis.org>,
Steven Rostedt <srostedt@...hat.com>,
LKML <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
David Miller <davem@...emloft.net>,
Roland McGrath <roland@...hat.com>,
Ulrich Drepper <drepper@...hat.com>,
Rusty Russell <rusty@...tcorp.com.au>,
Gregory Haskins <ghaskins@...ell.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
"Luis Claudio R. Goncalves" <lclaudio@...g.org>,
Clark Williams <williams@...hat.com>,
Christoph Lameter <cl@...ux-foundation.org>
Subject: Re: [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with
preemptible kernel and CPU hotplug
On Thu, 14 Aug 2008, Mathieu Desnoyers wrote:
>
> I can't argue about the benefit of using VM CPU pinning to manage
> resources because I don't use it myself, but I ran some tests out of
> curiosity to find if uncontended locks were that cheap, and it turns out
> they aren't.
Absolutely.
Locked ops show up not just in microbenchmarks looping over the
instruction, they show up in "real" benchmarks too. We added a single
locked instruction (maybe it was two) to the page fault handling code some
time ago, and the reason I noticed it was that it actually made the page
fault cost visibly more expensive in lmbench. That was a _single_
instruction in the hot path (or maybe two).
And the page fault path is some of the most timing critical in the whole
kernel - if you have everything cached, the cost of doing the page faults
to populate new processes for some fork/exec-heavy workload (and compiling
the kernel is just one of those - any traditional unix behaviour will show
this) is critical.
This is one of the things AMD does a _lot_ better than Intel. Intel tends
to have a 30-50 cycle cost (with later P4s being *much* worse), while AMD
tends to have a cost of around 10-15 cycles.
It's one of the things Intel promises to have improved in the next-gen
uarch (Nehalem), an while I am not supposed to give out any benchmarks, I
can confirm that Intel is getting much better at it. But it's going to be
visible still, and it's really a _big_ issue on P4.
(Of course, on P4, the page fault exception cost itself is so high that
the cost of atomics may be _relatively_ less noticeable in that particular
path)
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists