[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090401190113.GA734@elte.hu>
Date: Wed, 1 Apr 2009 21:01:13 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Christoph Lameter <cl@...ux.com>
Cc: Tejun Heo <tj@...nel.org>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
rusty@...tcorp.com.au, tglx@...utronix.de, x86@...nel.org,
linux-kernel@...r.kernel.org, hpa@...or.com,
Paul Mundt <lethal@...ux-sh.org>, rmk@....linux.org.uk,
starvik@...s.com, ralf@...ux-mips.org, davem@...emloft.net,
cooloney@...nel.org, kyle@...artin.ca, matthew@....cx,
grundler@...isc-linux.org, takata@...ux-m32r.org,
benh@...nel.crashing.org, rth@...ddle.net,
ink@...assic.park.msu.ru, heiko.carstens@...ibm.com,
Linus Torvalds <torvalds@...ux-foundation.org>,
Nick Piggin <npiggin@...e.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH UPDATED] percpu: use dynamic percpu allocator as the
default percpu allocator
* Christoph Lameter <cl@...ux.com> wrote:
> __read_mostly should be packed as tightly as possible to increase
> the chance that one cacheline includes multiple of the critical
> variables for the hot code paths. Too much __read_mostly defeats
> its purpose.
That stance is a commonly held but quite wrong and harmful IMHO.
It stiffles the proper identification of read-mostly variables _AND_
it hurts the proper identification of critical write-often variables
as well. Not good.
The solution for critical write-often variables is what we always
used: to identify them explicitly and to place them consciously into
separate cachelines. (Or to per-cpu-ify or object-ify them where
possible/sensible.)
Then annotate everything that is read-mostly and accessed-frequently
with the __read_mostly attribute.
The rest (unannotated variables) is to be assumed "access-rarely" or
"we-dont-care", by default. This is actually 95% of the global
variables.
Yes, a spreading amount of annotations puts increasing pressure on
the places that are frequently access but not properly annotated -
but we should be happy about that: it creates the dynamics and
pressure for them to be properly annotated.
On the other hand, depending on the "put enough data bloat between
critical variables anyway, no need to care alignment" scheme is a
sloppy, fragile concept that does not lead to a reliable and
dependable end result.
It has two problems:
- Thinking that this solves false cacheline sharing reliably is
wrong: there's nothing that guarantees and enforces that slapping
a few variables between two critical variables puts them on
separate cachelines:
- Ongoing changes in code can bit-rot the
thought-to-be-large-enough distance between two critical
variables - and there's no mechanism in place.
Explicitly cacheline aligning them will preserve the
information long-term.
- There are architectures with larger cacheline sizes than
what you are developing on.
- .config variations can move variables closer or farther
apart from each other, hiding/triggering the false cacheline
sharing problem.
It is not a maintainable concept IMHO and we should not pretend
it is.
- It actually prevents true read-mostly variables from being
annotated properly. (In such a case a true read-mostly variable
bouncing around with a frequently-written variable cache line is
almost as bad in terms of MESI latencies and costs as false
cacheline sharing between two write-mostly variables.)
Architecturing the layout of variables in a knowingly random and
.config sensitive way is simply not good design and we should not
pretend it is.
We might not be able to solve the problem if not enough people care
about their variables, but we should at least not be proud of a
non-solution ;-)
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists